From noreply@sourceforge.net Fri Mar 1 07:21:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 28 Feb 2002 23:21:00 -0800 Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements Message-ID: Patches item #520694, was opened at 2002-02-20 22:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 Category: None Group: None Status: Open Resolution: Accepted Priority: 3 Submitted By: Jason Orendorff (jorend) Assigned to: Martin v. Lцwis (loewis) Summary: arraymodule.c improvements Initial Comment: This patch makes brings the array module a little more up-to-date. There are two changes: 1. Modernize the array type, memory management, and so forth. As a result, the array() builtin is no longer a function but a type. array.array is array.ArrayType. Also, it can now be subclassed in Python. 2. Add a new typecode 'u', for Unicode characters. The patch includes changes to test/test_array.py to test the new features. I would like to make a further change: add an arrayobject.h include file, and provide some array operations there, giving them names like PyArray_Check(), PyArray_GetItem(), and PyArray_GET_DATA(). Is such a change likely to find favor? ---------------------------------------------------------------------- >Comment By: Jason Orendorff (jorend) Date: 2002-03-01 07:21 Message: Logged In: YES user_id=18139 Guido: In hindsight, yes it would have been much easier. ...This version adds __iadd__ and __imul__. There's also a separate documentation patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 22:46 Message: Logged In: YES user_id=6380 Cool. I wonder if it wouldn't have been easier to first submit and commit the easy changes, and then the unicode addition separately? Anyway, I presume that Martin will commit this when it's ready. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-27 03:15 Message: Logged In: YES user_id=18139 Getting there. This version has tounicode() and fromunicode(), and a better repr() for type 'u' arrays. Also, array.typecode and array.itemsize are now listed under tp_getset; they're attribute descriptors and they show up in help(array). (Neat!) Next, documentation; then __iadd__ and __imul__. But not tonight. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-25 12:24 Message: Logged In: YES user_id=21627 Removal of __members__ is fine, then - but you do need to fill out an appropriate tp_members instead, listing "typecode" and "itemsize". Adding __iadd__ and __imul__ is fine; the equivalent feature for lists has not caused complaints, either, and anybody using *= on an array probably would consider it a bug that it isn't in-place. Please add documentation changes as well; I currently have Doc/lib/libarray.tex \lineiii{'d'}{double}{8} +\lineiii{'u'}{Py_UNICODE}{2} \end{tableiii} Misc/NEWS - array.array is now a type object. A new format character 'u' indicates Py_UNICODE arrays. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-25 00:29 Message: Logged In: YES user_id=18139 Martin writes: "There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string." Boy, are you right. There should be array.tounicode() and array.fromunicode() methods that only work on type 'u' arrays. ...I also want to fix repr for type 'u' arrays. Instead of "array.array('u', [u'x', u'y', u'z'])" it should say "array.array('u', u'xyz')". ...I would also implement __iadd__ and __imul__ (as list implements them), but this would be a semantic change! Thoughts? Count on a new patch tomorrow. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-02-24 21:38 Message: Logged In: YES user_id=31435 Without looking at any details, __members__ and __methods__ are deprecated starting with 2.2; the type/class unification PEPs aim at moving the universe toward supporting and using the class-like introspection API instead. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-24 15:56 Message: Logged In: YES user_id=21627 There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string. You have to use u"".join(arr.tolist()) This is slightly annoying, since there is it is the only case where it is not possible to get back the original constructor arguments. Also, what is the rationale for removing __members__? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-22 13:39 Message: Logged In: YES user_id=38388 How about simplifying the whole setup altogether and add arrays as standard Python types (ie. put the code in Objects/ and add the new include file to Includes/). About the inter-module C API export: I'll write up a PEP about this which will hopefully result in a new standard support mechanism for this in Python. (BTW, the approach I used in _ssl/_socket does use PyCObjects) ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-22 13:25 Message: Logged In: YES user_id=21627 With the rationale given, I'm now in favour of all parts of the patch. As for exposing the API, you need to address MAL's concerns: PyArray_* won't be available to other extension modules, instead, you need to do expose them through a C object. However, I recommend *not* to follow the approach taken in socket/ssl; I agree with Tim's concerns here. Instead, the approach taken by cStringIO (via cStringIO.cStringIO_API) is much better (i.e. put the burden of using the API onto any importer, and out of Python proper). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-21 08:40 Message: Logged In: YES user_id=38388 About the Unicode bit: if "u" maps to Py_UNICODE I for one don't have any objections. The internal encoding is available in lots of places, so that argument doesn't count and I'm sure it can be put to some good use for fast manipulation of large Unicode strings. I very much like the new exposure of the type at C level; however I don't understand how you would use it without adding the complete module to the libpythonx.x.a (unless you add some sort of inter-module C API import mechanism like the one I added to _socket and _ssl) ?! ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 02:03 Message: Logged In: YES user_id=18139 > What is the rationale for expanding PyObject_VAR_HEAD? > It doesn't seem to achieve anything. It didn't make sense for array to be a VAR_HEAD type. VAR_HEAD types are variable-size: the last member defined in the struct for such a type is an array of length 1, and type->item_size is nonzero. See e.g. PyType_GenericAlloc(), and how it decides whether to call PyObject_INIT or PyObject_VAR_INIT: It checks type->item_size. The new arraymodule.c calls PyType_GenericAlloc; the old one didn't. So a change seemed warranted. Since Arraytype has item_size == 0, it seemed most consistent to make it a non-VAR type and initialize the ob_size field myself. I'm pretty sure I got the right interpretation of this; but if not, someone wiser in the ways of Python will speak up. :) (While I was looking at this, I noticed this: http://sourceforge.net/tracker/index.php? func=detail&aid=520768&group_id=5470&atid=305470) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 01:15 Message: Logged In: YES user_id=18139 > I don't like the Unicode part of it at all. Well, I'm not attatched to it. It's very easy to subtract it from the patch. > What can you do with this feature? The same sort of thing you might do with an array of type 'c'. For example, change individual characters of a (Unicode) string and then run a (Unicode) re.match on it. > It seems to unfairly prefer a specific Unicode encoding, > without explaining what that encoding is, and without a > clear use case why this encoding is desirable. Well, why should array('h', '\x00\xff\xaa\xbb') be allowed? Why is that encoding preferable to any other particular encoding of short ints? Easy: it's the encoding of the C compiler where Python was built. For 'u' arrays, the encoding used is just the encoding that Python uses internally. However, it's not intended to be used in any situation where encode()/decode() would be appropriate. I never even thought about that possibility when I wrote it. The behavior of a 'u' array is intended to be more like this: Suppose A = array('u', ustr). Then: len(A) == len(ustr) A[0] == ustr[0] A[1] == ustr[1] ... That is, a 'u' array is an array of Unicode characters. Encoding is not an issue, any more than with the built-in unicode type. (If ustr is a non-Unicode string, then the behavior is different -- more in line with what 'b', 'h', 'i', and the others do.) If your concern is that Python currently "hides" its internal encoding, and the 'u' array exposes this unnecessarily, then consider these two examples that don't involve arrays: >>> x = u'\U00012345' # One Unicode codepoint... >>> len(x) 2 # hmm. >>> x[0] u'\ud808' # aha. UTF-16. >>> x[1] u'\udf45' >>> str(buffer(u'abc')) # Example two. 'a\x00b\x00c\x00' > It also seems to overlap with the Unicode object's > .encode method, which is much more general. Wow. Well, that wasn't my intent. It is intended, rather, to offer parity with 'c'. Java has byte[], short[], int[], long[], float[], double[], and char[]... Python doesn't currently have char[]. Shouldn't it? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-20 23:02 Message: Logged In: YES user_id=21627 What is the rationale for expanding PyObject_VAR_HEAD? It doesn't seem to achieve anything. I don't like the Unicode part of it at all. What can you do with this feature? It seems to unfairly prefer a specific Unicode encoding, without explaining what that encoding is, and without a clear use case why this encoding is desirable. It also seems to overlap with the Unicode object's .encode method, which is much more general. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 From noreply@sourceforge.net Fri Mar 1 07:25:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 28 Feb 2002 23:25:04 -0800 Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements Message-ID: Patches item #520694, was opened at 2002-02-20 22:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 Category: None Group: None Status: Open Resolution: Accepted Priority: 3 Submitted By: Jason Orendorff (jorend) Assigned to: Martin v. Lцwis (loewis) Summary: arraymodule.c improvements Initial Comment: This patch makes brings the array module a little more up-to-date. There are two changes: 1. Modernize the array type, memory management, and so forth. As a result, the array() builtin is no longer a function but a type. array.array is array.ArrayType. Also, it can now be subclassed in Python. 2. Add a new typecode 'u', for Unicode characters. The patch includes changes to test/test_array.py to test the new features. I would like to make a further change: add an arrayobject.h include file, and provide some array operations there, giving them names like PyArray_Check(), PyArray_GetItem(), and PyArray_GET_DATA(). Is such a change likely to find favor? ---------------------------------------------------------------------- >Comment By: Jason Orendorff (jorend) Date: 2002-03-01 07:25 Message: Logged In: YES user_id=18139 Documentation patch. Please check my TEX; I'm not used to it yet, and I can't get the Python docs to build on my Windows box, probably because one of the tools isn't installed properly, or something. So there's no way for me to check that it's correct, yet. (...If you let this sit for a moment I'll eventually check this for myself on the Linux box, but it'll be a little while. Thanks.) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-03-01 07:21 Message: Logged In: YES user_id=18139 Guido: In hindsight, yes it would have been much easier. ...This version adds __iadd__ and __imul__. There's also a separate documentation patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 22:46 Message: Logged In: YES user_id=6380 Cool. I wonder if it wouldn't have been easier to first submit and commit the easy changes, and then the unicode addition separately? Anyway, I presume that Martin will commit this when it's ready. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-27 03:15 Message: Logged In: YES user_id=18139 Getting there. This version has tounicode() and fromunicode(), and a better repr() for type 'u' arrays. Also, array.typecode and array.itemsize are now listed under tp_getset; they're attribute descriptors and they show up in help(array). (Neat!) Next, documentation; then __iadd__ and __imul__. But not tonight. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-25 12:24 Message: Logged In: YES user_id=21627 Removal of __members__ is fine, then - but you do need to fill out an appropriate tp_members instead, listing "typecode" and "itemsize". Adding __iadd__ and __imul__ is fine; the equivalent feature for lists has not caused complaints, either, and anybody using *= on an array probably would consider it a bug that it isn't in-place. Please add documentation changes as well; I currently have Doc/lib/libarray.tex \lineiii{'d'}{double}{8} +\lineiii{'u'}{Py_UNICODE}{2} \end{tableiii} Misc/NEWS - array.array is now a type object. A new format character 'u' indicates Py_UNICODE arrays. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-25 00:29 Message: Logged In: YES user_id=18139 Martin writes: "There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string." Boy, are you right. There should be array.tounicode() and array.fromunicode() methods that only work on type 'u' arrays. ...I also want to fix repr for type 'u' arrays. Instead of "array.array('u', [u'x', u'y', u'z'])" it should say "array.array('u', u'xyz')". ...I would also implement __iadd__ and __imul__ (as list implements them), but this would be a semantic change! Thoughts? Count on a new patch tomorrow. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-02-24 21:38 Message: Logged In: YES user_id=31435 Without looking at any details, __members__ and __methods__ are deprecated starting with 2.2; the type/class unification PEPs aim at moving the universe toward supporting and using the class-like introspection API instead. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-24 15:56 Message: Logged In: YES user_id=21627 There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string. You have to use u"".join(arr.tolist()) This is slightly annoying, since there is it is the only case where it is not possible to get back the original constructor arguments. Also, what is the rationale for removing __members__? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-22 13:39 Message: Logged In: YES user_id=38388 How about simplifying the whole setup altogether and add arrays as standard Python types (ie. put the code in Objects/ and add the new include file to Includes/). About the inter-module C API export: I'll write up a PEP about this which will hopefully result in a new standard support mechanism for this in Python. (BTW, the approach I used in _ssl/_socket does use PyCObjects) ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-22 13:25 Message: Logged In: YES user_id=21627 With the rationale given, I'm now in favour of all parts of the patch. As for exposing the API, you need to address MAL's concerns: PyArray_* won't be available to other extension modules, instead, you need to do expose them through a C object. However, I recommend *not* to follow the approach taken in socket/ssl; I agree with Tim's concerns here. Instead, the approach taken by cStringIO (via cStringIO.cStringIO_API) is much better (i.e. put the burden of using the API onto any importer, and out of Python proper). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-21 08:40 Message: Logged In: YES user_id=38388 About the Unicode bit: if "u" maps to Py_UNICODE I for one don't have any objections. The internal encoding is available in lots of places, so that argument doesn't count and I'm sure it can be put to some good use for fast manipulation of large Unicode strings. I very much like the new exposure of the type at C level; however I don't understand how you would use it without adding the complete module to the libpythonx.x.a (unless you add some sort of inter-module C API import mechanism like the one I added to _socket and _ssl) ?! ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 02:03 Message: Logged In: YES user_id=18139 > What is the rationale for expanding PyObject_VAR_HEAD? > It doesn't seem to achieve anything. It didn't make sense for array to be a VAR_HEAD type. VAR_HEAD types are variable-size: the last member defined in the struct for such a type is an array of length 1, and type->item_size is nonzero. See e.g. PyType_GenericAlloc(), and how it decides whether to call PyObject_INIT or PyObject_VAR_INIT: It checks type->item_size. The new arraymodule.c calls PyType_GenericAlloc; the old one didn't. So a change seemed warranted. Since Arraytype has item_size == 0, it seemed most consistent to make it a non-VAR type and initialize the ob_size field myself. I'm pretty sure I got the right interpretation of this; but if not, someone wiser in the ways of Python will speak up. :) (While I was looking at this, I noticed this: http://sourceforge.net/tracker/index.php? func=detail&aid=520768&group_id=5470&atid=305470) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 01:15 Message: Logged In: YES user_id=18139 > I don't like the Unicode part of it at all. Well, I'm not attatched to it. It's very easy to subtract it from the patch. > What can you do with this feature? The same sort of thing you might do with an array of type 'c'. For example, change individual characters of a (Unicode) string and then run a (Unicode) re.match on it. > It seems to unfairly prefer a specific Unicode encoding, > without explaining what that encoding is, and without a > clear use case why this encoding is desirable. Well, why should array('h', '\x00\xff\xaa\xbb') be allowed? Why is that encoding preferable to any other particular encoding of short ints? Easy: it's the encoding of the C compiler where Python was built. For 'u' arrays, the encoding used is just the encoding that Python uses internally. However, it's not intended to be used in any situation where encode()/decode() would be appropriate. I never even thought about that possibility when I wrote it. The behavior of a 'u' array is intended to be more like this: Suppose A = array('u', ustr). Then: len(A) == len(ustr) A[0] == ustr[0] A[1] == ustr[1] ... That is, a 'u' array is an array of Unicode characters. Encoding is not an issue, any more than with the built-in unicode type. (If ustr is a non-Unicode string, then the behavior is different -- more in line with what 'b', 'h', 'i', and the others do.) If your concern is that Python currently "hides" its internal encoding, and the 'u' array exposes this unnecessarily, then consider these two examples that don't involve arrays: >>> x = u'\U00012345' # One Unicode codepoint... >>> len(x) 2 # hmm. >>> x[0] u'\ud808' # aha. UTF-16. >>> x[1] u'\udf45' >>> str(buffer(u'abc')) # Example two. 'a\x00b\x00c\x00' > It also seems to overlap with the Unicode object's > .encode method, which is much more general. Wow. Well, that wasn't my intent. It is intended, rather, to offer parity with 'c'. Java has byte[], short[], int[], long[], float[], double[], and char[]... Python doesn't currently have char[]. Shouldn't it? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-20 23:02 Message: Logged In: YES user_id=21627 What is the rationale for expanding PyObject_VAR_HEAD? It doesn't seem to achieve anything. I don't like the Unicode part of it at all. What can you do with this feature? It seems to unfairly prefer a specific Unicode encoding, without explaining what that encoding is, and without a clear use case why this encoding is desirable. It also seems to overlap with the Unicode object's .encode method, which is much more general. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 From noreply@sourceforge.net Fri Mar 1 07:59:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 28 Feb 2002 23:59:03 -0800 Subject: [Patches] [ python-Patches-523268 ] pwd.getpw* returns enhanced tuple. Message-ID: Patches item #523268, was opened at 2002-02-27 05:10 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523268&group_id=5470 Category: Modules Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Sean Reifschneider (jafo) Assigned to: Nobody/Anonymous (nobody) Summary: pwd.getpw* returns enhanced tuple. Initial Comment: This patch against the current CVS implements the enhanced tuple return types for pwd.getpw*(). This makes the return similar to time.localtime() and os.stat(). Includes changes to the documents as well. ---------------------------------------------------------------------- Comment By: Quinn Dunkan (quinn_dunkan) Date: 2002-03-01 07:59 Message: Logged In: YES user_id=429749 Looks good to me. I'll go zap mine now. ---------------------------------------------------------------------- Comment By: Sean Reifschneider (jafo) Date: 2002-02-28 09:20 Message: Logged In: YES user_id=81797 I've taken a look at Quinn's patch, and have created a new version which I believe is the combination of the two. It also includes doc strings for the structs themselves, documentation of the grp module, and removes a case where a failure can cause a memory leak. I'll ask Quinn to review this patch. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-27 22:21 Message: Logged In: YES user_id=21627 Please coordinate with Quinn Dunkan (patch #522027). It seems his patch fills out some character strings where you use NULL. Ideally, you'd both come up with a revised version of the patch, and withdraw the other one. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523268&group_id=5470 From noreply@sourceforge.net Fri Mar 1 08:01:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 00:01:44 -0800 Subject: [Patches] [ python-Patches-522027 ] pwdmodule and grpmodule use structs Message-ID: Patches item #522027, was opened at 2002-02-24 11:25 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522027&group_id=5470 Category: Modules Group: None >Status: Deleted >Resolution: Duplicate Priority: 5 Submitted By: Quinn Dunkan (quinn_dunkan) Assigned to: Nobody/Anonymous (nobody) Summary: pwdmodule and grpmodule use structs Initial Comment: Here are a few patches to make pwd and grp use structs, like time.struct_time ---------------------------------------------------------------------- Comment By: Sean Reifschneider (jafo) Date: 2002-02-28 09:22 Message: Logged In: YES user_id=81797 I have created a new patch which is the union of our two patches, plus a bit. Please review it and if you have any comments let me know. Thanks, Sean ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-27 22:20 Message: Logged In: YES user_id=21627 For the pwd part, please coordinate with Sean Reifschneider and patch #523268. I like the documentation changes in that patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522027&group_id=5470 From noreply@sourceforge.net Fri Mar 1 08:21:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 00:21:39 -0800 Subject: [Patches] [ python-Patches-520483 ] Make IDLE OutputWindow handle Unicode Message-ID: Patches item #520483, was opened at 2002-02-20 15:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520483&group_id=5470 Category: IDLE Group: Python 2.2.x Status: Closed Resolution: Accepted Priority: 7 Submitted By: Jason Orendorff (jorend) Assigned to: Guido van Rossum (gvanrossum) Summary: Make IDLE OutputWindow handle Unicode Initial Comment: This one-line patch makes OutputWindow handle Unicode correctly. For example, >>> print u'\xbfQu\xe9 pas\xf3?' In 2.2 this throws a UnicodeError, not because of any problem with Unicode handling in either Python or Tk, but because IDLE does str(s) on the Unicode string. I just took out the call to str(). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-01 09:21 Message: Logged In: YES user_id=21627 Thanks, your comments are indeed helpful. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 23:59 Message: Logged In: YES user_id=6380 I think when I first wrote that code, Tkinter didn't yet support Unicode. I think I felt that write() shouldn't be called with anything besides a string, but I didn't want to put in an explicit type check, and yet I didn't want to pass non-strings to Tcl because it treats certain types special. For example, in the patched IDLE, try sys.stdout.write((1,2,(3,4))), or try sys.stdout.write(None). But I think it's no big deal, and I approve of the change. Consequently, I'm closing this bug report again. I've merged this into 2.2.1. Should I do anything else? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-02-24 01:43 Message: Logged In: YES user_id=31435 We're never too busy for people we love, Jason . Reopened, changed category to IDLE, changed group to Python 2.2.x, boosted priority, and assigned to Guido. The str() call has been there since Guido first checked this in, and its purpose isn't apparent to me either. Maybe Guido remembers. Guido? I'm not worried at all that someone might be calling it with a non-stringish argument -- it's supplying a "file- like object" interface, and .write() requires a stringish argument. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-24 01:19 Message: Logged In: YES user_id=18139 Submitted to idlefork. I'm too shy to bother "one of the major IDLE authors." It would be nice to have in 2.2.1, but I know the folks at PythonLabs are busy... ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-23 23:42 Message: Logged In: YES user_id=21627 Ok, committed as OutputWindow 1.6. I strongly recommend to submit this to idlefork as well. If want this patch to appear in Python 2.2.1, you should get a comment from one of the major IDLE authors or contributors. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 00:27 Message: Logged In: YES user_id=18139 > Isn't this too simplistic? I guess there was a reason for > the str call: could it ever happen that somebody passes > something else (beyond byte and Unicode strings)? I searched for write() in the idle directory and got 48 hits in 7 files. Then I checked them all. In every case, either write() is called with a string, or the argument is passed unchanged from another function that contains the word "write()". As for code outside IDLE, I'd be extra surprised if anyone calls obj.write(x) with x being something other than a string. Ordinary file objects don't accept it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-20 23:56 Message: Logged In: YES user_id=21627 Isn't this too simplistic? I guess there was a reason for the str call: could it ever happen that somebody passes something else (beyond byte and Unicode strings)? Also, I wonder whether IDLE patches need to go to idlefork (sf.net/projects/idlefork) first. Apart from this comments, I think your patch is quite right. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520483&group_id=5470 From noreply@sourceforge.net Fri Mar 1 08:32:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 00:32:35 -0800 Subject: [Patches] [ python-Patches-520062 ] Support IPv6 with VC.NET Message-ID: Patches item #520062, was opened at 2002-02-19 18:57 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520062&group_id=5470 Category: Windows Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: Martin v. Lцwis (loewis) Summary: Support IPv6 with VC.NET Initial Comment: This patch enables IPv6 support based on Winsock2 on Microsoft C 13 and later. Due to the implementation strategy used in the SDK headers, the resulting _socket.pyd will not require additional shared libraries, but it will instead locale the symbols dynamically, and fall back to a default implementation if none are found. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-01 09:32 Message: Logged In: YES user_id=21627 Committed as socketmodule.c 1.209; socketmodule.h 1.5; PC/pyconfig.h .7 (after changing the comment). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-01 00:39 Message: Logged In: YES user_id=31435 Back to Martin. No problems compiling or running on my Win98SE + VC6 box, incl. test_socketserver.py. The only thing I object to is the "//" comment. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-01 00:20 Message: Logged In: YES user_id=31435 The "//" style comment in pyconfig.h should change to /**/ style (I don't care that MSVC accepts either -- not everyone looking at this file uses MSVC). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-01 00:17 Message: Logged In: YES user_id=31435 Since Martin submitted the patch, I think we can assume he already agrees with the basic premise . Reassigned to me. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 00:08 Message: Logged In: YES user_id=6380 We'll have a hard time to test this, since I don't think anyone I know with a Windows build environment is set up for IPv6 yet. I'm assigning to Martin since he's the IPv6 master, to see if he agrees with the basic premises (and that it doesn't break anything on Unix -- it's a pretty small patch so that seems unlikely). Then Martin should probably assign it to Tim, so Tim can see if at least it doesn't break anything on various flavors of Windows we have lying around. Then it can be alpha and beta tested to see if it doesn't break anything else, and the original author can test if the installer we distribute actually does the right thing for him. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520062&group_id=5470 From noreply@sourceforge.net Fri Mar 1 10:30:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 02:30:13 -0800 Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements Message-ID: Patches item #520694, was opened at 2002-02-20 23:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 Category: None Group: None Status: Open Resolution: Accepted Priority: 3 Submitted By: Jason Orendorff (jorend) Assigned to: Martin v. Lцwis (loewis) Summary: arraymodule.c improvements Initial Comment: This patch makes brings the array module a little more up-to-date. There are two changes: 1. Modernize the array type, memory management, and so forth. As a result, the array() builtin is no longer a function but a type. array.array is array.ArrayType. Also, it can now be subclassed in Python. 2. Add a new typecode 'u', for Unicode characters. The patch includes changes to test/test_array.py to test the new features. I would like to make a further change: add an arrayobject.h include file, and provide some array operations there, giving them names like PyArray_Check(), PyArray_GetItem(), and PyArray_GET_DATA(). Is such a change likely to find favor? ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-01 11:30 Message: Logged In: YES user_id=21627 Thanks again for the patches; committed as libarray.tex 1.32 test_array.py 1.14 NEWS 1.358 arraymodule.c 2.67 I added Py_USING_UNICODE before checking this in. There is one open issue: printing Unicode arrays on the interpreter prompt will still repr arrays as lists of Unicode objects; this is because arrays implement tp_print? Is that necessary? My proposal: just remove the tp_print implementation. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-03-01 08:25 Message: Logged In: YES user_id=18139 Documentation patch. Please check my TEX; I'm not used to it yet, and I can't get the Python docs to build on my Windows box, probably because one of the tools isn't installed properly, or something. So there's no way for me to check that it's correct, yet. (...If you let this sit for a moment I'll eventually check this for myself on the Linux box, but it'll be a little while. Thanks.) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-03-01 08:21 Message: Logged In: YES user_id=18139 Guido: In hindsight, yes it would have been much easier. ...This version adds __iadd__ and __imul__. There's also a separate documentation patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 23:46 Message: Logged In: YES user_id=6380 Cool. I wonder if it wouldn't have been easier to first submit and commit the easy changes, and then the unicode addition separately? Anyway, I presume that Martin will commit this when it's ready. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-27 04:15 Message: Logged In: YES user_id=18139 Getting there. This version has tounicode() and fromunicode(), and a better repr() for type 'u' arrays. Also, array.typecode and array.itemsize are now listed under tp_getset; they're attribute descriptors and they show up in help(array). (Neat!) Next, documentation; then __iadd__ and __imul__. But not tonight. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-25 13:24 Message: Logged In: YES user_id=21627 Removal of __members__ is fine, then - but you do need to fill out an appropriate tp_members instead, listing "typecode" and "itemsize". Adding __iadd__ and __imul__ is fine; the equivalent feature for lists has not caused complaints, either, and anybody using *= on an array probably would consider it a bug that it isn't in-place. Please add documentation changes as well; I currently have Doc/lib/libarray.tex \lineiii{'d'}{double}{8} +\lineiii{'u'}{Py_UNICODE}{2} \end{tableiii} Misc/NEWS - array.array is now a type object. A new format character 'u' indicates Py_UNICODE arrays. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-25 01:29 Message: Logged In: YES user_id=18139 Martin writes: "There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string." Boy, are you right. There should be array.tounicode() and array.fromunicode() methods that only work on type 'u' arrays. ...I also want to fix repr for type 'u' arrays. Instead of "array.array('u', [u'x', u'y', u'z'])" it should say "array.array('u', u'xyz')". ...I would also implement __iadd__ and __imul__ (as list implements them), but this would be a semantic change! Thoughts? Count on a new patch tomorrow. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-02-24 22:38 Message: Logged In: YES user_id=31435 Without looking at any details, __members__ and __methods__ are deprecated starting with 2.2; the type/class unification PEPs aim at moving the universe toward supporting and using the class-like introspection API instead. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-24 16:56 Message: Logged In: YES user_id=21627 There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string. You have to use u"".join(arr.tolist()) This is slightly annoying, since there is it is the only case where it is not possible to get back the original constructor arguments. Also, what is the rationale for removing __members__? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-22 14:39 Message: Logged In: YES user_id=38388 How about simplifying the whole setup altogether and add arrays as standard Python types (ie. put the code in Objects/ and add the new include file to Includes/). About the inter-module C API export: I'll write up a PEP about this which will hopefully result in a new standard support mechanism for this in Python. (BTW, the approach I used in _ssl/_socket does use PyCObjects) ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-22 14:25 Message: Logged In: YES user_id=21627 With the rationale given, I'm now in favour of all parts of the patch. As for exposing the API, you need to address MAL's concerns: PyArray_* won't be available to other extension modules, instead, you need to do expose them through a C object. However, I recommend *not* to follow the approach taken in socket/ssl; I agree with Tim's concerns here. Instead, the approach taken by cStringIO (via cStringIO.cStringIO_API) is much better (i.e. put the burden of using the API onto any importer, and out of Python proper). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-21 09:40 Message: Logged In: YES user_id=38388 About the Unicode bit: if "u" maps to Py_UNICODE I for one don't have any objections. The internal encoding is available in lots of places, so that argument doesn't count and I'm sure it can be put to some good use for fast manipulation of large Unicode strings. I very much like the new exposure of the type at C level; however I don't understand how you would use it without adding the complete module to the libpythonx.x.a (unless you add some sort of inter-module C API import mechanism like the one I added to _socket and _ssl) ?! ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 03:03 Message: Logged In: YES user_id=18139 > What is the rationale for expanding PyObject_VAR_HEAD? > It doesn't seem to achieve anything. It didn't make sense for array to be a VAR_HEAD type. VAR_HEAD types are variable-size: the last member defined in the struct for such a type is an array of length 1, and type->item_size is nonzero. See e.g. PyType_GenericAlloc(), and how it decides whether to call PyObject_INIT or PyObject_VAR_INIT: It checks type->item_size. The new arraymodule.c calls PyType_GenericAlloc; the old one didn't. So a change seemed warranted. Since Arraytype has item_size == 0, it seemed most consistent to make it a non-VAR type and initialize the ob_size field myself. I'm pretty sure I got the right interpretation of this; but if not, someone wiser in the ways of Python will speak up. :) (While I was looking at this, I noticed this: http://sourceforge.net/tracker/index.php? func=detail&aid=520768&group_id=5470&atid=305470) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 02:15 Message: Logged In: YES user_id=18139 > I don't like the Unicode part of it at all. Well, I'm not attatched to it. It's very easy to subtract it from the patch. > What can you do with this feature? The same sort of thing you might do with an array of type 'c'. For example, change individual characters of a (Unicode) string and then run a (Unicode) re.match on it. > It seems to unfairly prefer a specific Unicode encoding, > without explaining what that encoding is, and without a > clear use case why this encoding is desirable. Well, why should array('h', '\x00\xff\xaa\xbb') be allowed? Why is that encoding preferable to any other particular encoding of short ints? Easy: it's the encoding of the C compiler where Python was built. For 'u' arrays, the encoding used is just the encoding that Python uses internally. However, it's not intended to be used in any situation where encode()/decode() would be appropriate. I never even thought about that possibility when I wrote it. The behavior of a 'u' array is intended to be more like this: Suppose A = array('u', ustr). Then: len(A) == len(ustr) A[0] == ustr[0] A[1] == ustr[1] ... That is, a 'u' array is an array of Unicode characters. Encoding is not an issue, any more than with the built-in unicode type. (If ustr is a non-Unicode string, then the behavior is different -- more in line with what 'b', 'h', 'i', and the others do.) If your concern is that Python currently "hides" its internal encoding, and the 'u' array exposes this unnecessarily, then consider these two examples that don't involve arrays: >>> x = u'\U00012345' # One Unicode codepoint... >>> len(x) 2 # hmm. >>> x[0] u'\ud808' # aha. UTF-16. >>> x[1] u'\udf45' >>> str(buffer(u'abc')) # Example two. 'a\x00b\x00c\x00' > It also seems to overlap with the Unicode object's > .encode method, which is much more general. Wow. Well, that wasn't my intent. It is intended, rather, to offer parity with 'c'. Java has byte[], short[], int[], long[], float[], double[], and char[]... Python doesn't currently have char[]. Shouldn't it? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-21 00:02 Message: Logged In: YES user_id=21627 What is the rationale for expanding PyObject_VAR_HEAD? It doesn't seem to achieve anything. I don't like the Unicode part of it at all. What can you do with this feature? It seems to unfairly prefer a specific Unicode encoding, without explaining what that encoding is, and without a clear use case why this encoding is desirable. It also seems to overlap with the Unicode object's .encode method, which is much more general. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 From noreply@sourceforge.net Fri Mar 1 10:48:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 02:48:12 -0800 Subject: [Patches] [ python-Patches-523268 ] pwd.getpw* returns enhanced tuple. Message-ID: Patches item #523268, was opened at 2002-02-27 06:10 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523268&group_id=5470 Category: Modules Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Sean Reifschneider (jafo) Assigned to: Nobody/Anonymous (nobody) Summary: pwd.getpw* returns enhanced tuple. Initial Comment: This patch against the current CVS implements the enhanced tuple return types for pwd.getpw*(). This makes the return similar to time.localtime() and os.stat(). Includes changes to the documents as well. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-01 11:48 Message: Logged In: YES user_id=21627 Thanks, committed as libgrp.tex 1.16, grpmodule.c 2.17, pwdmodule.c 1.27, libpwd.tex 1.14, NEWS 1.359. ---------------------------------------------------------------------- Comment By: Quinn Dunkan (quinn_dunkan) Date: 2002-03-01 08:59 Message: Logged In: YES user_id=429749 Looks good to me. I'll go zap mine now. ---------------------------------------------------------------------- Comment By: Sean Reifschneider (jafo) Date: 2002-02-28 10:20 Message: Logged In: YES user_id=81797 I've taken a look at Quinn's patch, and have created a new version which I believe is the combination of the two. It also includes doc strings for the structs themselves, documentation of the grp module, and removes a case where a failure can cause a memory leak. I'll ask Quinn to review this patch. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-27 23:21 Message: Logged In: YES user_id=21627 Please coordinate with Quinn Dunkan (patch #522027). It seems his patch fills out some character strings where you use NULL. Ideally, you'd both come up with a revised version of the patch, and withdraw the other one. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523268&group_id=5470 From noreply@sourceforge.net Fri Mar 1 11:34:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 03:34:23 -0800 Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching Message-ID: Patches item #521478, was opened at 2002-02-22 15:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Rejected Priority: 5 Submitted By: Camiel Dobbelaar (camield) Assigned to: Guido van Rossum (gvanrossum) Summary: mailbox / fromline matching Initial Comment: mailbox.py does not parse this 'From' line correctly: >From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200 ^^^^^ This is because of the trailing timezone information, that the regex does not account for. Also, 'From' should match at the beginning of the line. ---------------------------------------------------------------------- >Comment By: Camiel Dobbelaar (camield) Date: 2002-03-01 12:34 Message: Logged In: YES user_id=466784 I have tracked this down to Pine, the mailreader. In imap/src/c-client/mail.c, it has this flag: static int notimezones = NIL; /* write timezones in "From " header */ (so timezones are written in the "From" lines by default) I also found the following comment in imap/docs/FAQ in the Pine distribution: """ So, good mail reading software only considers a line to be a "From " line if it follows the actual specification for a "From " line. This means, among other things, that the day of week is fixed-format: "May 14", but "May 7" (note the extra space) as opposed to "May 7". ctime() format for the date is the most common, although POSIX also allows a numeric timezone after the year. """ While I don't consider Pine to be the ultimate mailreader, its heritage may warrant that the 'From ' lines it creates are considered 'standard'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 23:37 Message: Logged In: YES user_id=6380 That From line is simply illegal, or at least nonstandard. If your system uses this nonstandard format, you can extend the mailbox parser by overriding the ._isrealfromline method. The pattern doesn't need ^ because match() is used, which only matches at the start of the line. Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 From noreply@sourceforge.net Fri Mar 1 13:18:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 05:18:23 -0800 Subject: [Patches] [ python-Patches-524008 ] pysvr portability bug on new POSIX hosts Message-ID: Patches item #524008, was opened at 2002-02-28 20:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524008&group_id=5470 Category: Demos and tools Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Paul Eggert (eggert) Assigned to: Nobody/Anonymous (nobody) Summary: pysvr portability bug on new POSIX hosts Initial Comment: The new POSIX standard is now official (IEEE Std 1003.1-2001), and it has removed support for the obsolescent syntax "tail +2l". You are now supposed to use "tail -n +2" instead. As a result of this change, the pysvr demo fails on my Solaris 8 host if I am using GNU textutils 2.0.21 and have defined _POSIX2_VERSION=200112 and POSIXLY_CORRECT=true in my environment. Here is a patch, relative to Python 2.2. 2002-02-28 Paul Eggert * Demo/pysvr/pysvr.c (ps): Don't use "tail +2l", as POSIX 1003.1-2001 no longer allows this. Use "sed 1d" instead, as it's more portable. =================================================================== RCS file: Demo/pysvr/pysvr.c,v retrieving revision 2.2 retrieving revision 2.2.0.1 diff -pu -r2.2 -r2.2.0.1 --- Demo/pysvr/pysvr.c 2001/11/28 20:27:42 2.2 +++ Demo/pysvr/pysvr.c 2002/02/28 19:02:54 2.2.0.1 @@ -365,6 +365,6 @@ ps(void) { char buffer[100]; PyOS_snprintf(buffer, sizeof(buffer), - "ps -l -p %d Comment By: Martin v. Lцwis (loewis) Date: 2002-03-01 14:18 Message: Logged In: YES user_id=21627 Thanks for the patch. Fixed in pysvr.c 1.11. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524008&group_id=5470 From noreply@sourceforge.net Fri Mar 1 13:46:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 05:46:02 -0800 Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL Message-ID: Patches item #524327, was opened at 2002-03-01 14:46 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Tino Lange (tinolange) Assigned to: Nobody/Anonymous (nobody) Summary: imaplib.py and SSL Initial Comment: Hallo! Our company has decided to allow only SSL connections to the e-mailbox from outside. So I needed a SSL capable "imaplib.py" to run my mailwatcher-scripts from home. Thanks to the socket.ssl() in recent Pythons it was nearly no problem to derive an IMAP4_SSL-class from the existing IMAP4-class in Python's standard library. Maybe you want to look over the very small additions that were necessary to implement the IMAP-over-SSL- functionality and add it as a part of the next official "imaplib.py"? Here's the context diff from the most recent CVS version (1.43). It works fine for me this way and it's only a few straight-forward lines of code. Maybe I could contribute a bit to the Python project with this patch? Best regards Tino Lange ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 From noreply@sourceforge.net Fri Mar 1 14:05:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 06:05:30 -0800 Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL Message-ID: Patches item #524327, was opened at 2002-03-01 13:46 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Tino Lange (tinolange) >Assigned to: Piers Lauder (pierslauder) Summary: imaplib.py and SSL Initial Comment: Hallo! Our company has decided to allow only SSL connections to the e-mailbox from outside. So I needed a SSL capable "imaplib.py" to run my mailwatcher-scripts from home. Thanks to the socket.ssl() in recent Pythons it was nearly no problem to derive an IMAP4_SSL-class from the existing IMAP4-class in Python's standard library. Maybe you want to look over the very small additions that were necessary to implement the IMAP-over-SSL- functionality and add it as a part of the next official "imaplib.py"? Here's the context diff from the most recent CVS version (1.43). It works fine for me this way and it's only a few straight-forward lines of code. Maybe I could contribute a bit to the Python project with this patch? Best regards Tino Lange ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 From noreply@sourceforge.net Fri Mar 1 14:34:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 06:34:28 -0800 Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response Message-ID: Patches item #517256, was opened at 2002-02-14 00:48 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 Category: Library (Lib) >Group: Python 2.1.2 Status: Open >Resolution: Accepted Priority: 5 Submitted By: James Rucker (jamesrucker) Assigned to: Fredrik Lundh (effbot) Summary: poor performance in xmlrpc response Initial Comment: xmlrpclib.Transport.parse_response() (called from xmlrpclib.Transport.request()) is exhibiting poor performance - approx. 10x slower than expected. I investigated based on using a simple app that sent a msg to a server, where all the server did was return the message back to the caller. From profiling, it became clear that the return trip was taken 10x the time consumed by the client->server trip, and that the time was spent getting things across the wire. parse_response() reads from a file object created via socket.makefile(), and as a result exhibits performance that is about an order of magnitude worse than what it would be if socket.recv() were used on the socket. The patch provided uses socket.recv() when possible, to improve performance. The patch provided is against revision 1.15. Its use provides performance for the return trip that is more or less equivalent to that of the forward trip. ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2002-03-01 15:34 Message: Logged In: YES user_id=38376 looks fine to me. I'll merge it with SLAB changes, and will check it into the 2.3 codebase asap. (we probably should try to figure out why makefile causes a 10x slowdown too -- xmlrpclib isn't exactly the only client library reading from a buffered socket) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 00:23 Message: Logged In: YES user_id=6380 Fredrik, does this look OK to you? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 From noreply@sourceforge.net Fri Mar 1 14:40:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 06:40:36 -0800 Subject: [Patches] [ python-Patches-514641 ] Negative ob_size of LongObjects Message-ID: Patches item #514641, was opened at 2002-02-07 22:26 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514641&group_id=5470 Category: Core (C code) >Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Naofumi Honda (naofumi-h) Assigned to: Nobody/Anonymous (nobody) Summary: Negative ob_size of LongObjects Initial Comment: I found the following bugs due to the negative ob_size of LongObjects representing the negative values. 1) The access of attribute "__dict__" causes panic. class A(long): pass x = A(-1) x.__dict__ ==> core dump! 2) pickle neglects the sign of LongObjects import pickle class A(long): pass x = A(-1) pickle.dumps(x) ==> a string containing 1L (not -1L) !!! The patch will resolve the above problems. Naofumi Honda ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-02-08 08:21 Message: Logged In: YES user_id=33168 There is a bug report for this item: #506679. https://sourceforge.net/tracker/index.php?func=detail&aid=506679&group_id=5470&atid=105470 ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514641&group_id=5470 From noreply@sourceforge.net Fri Mar 1 16:14:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 08:14:21 -0800 Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response Message-ID: Patches item #517256, was opened at 2002-02-13 18:48 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 Category: Library (Lib) Group: Python 2.1.2 Status: Open Resolution: Accepted Priority: 5 Submitted By: James Rucker (jamesrucker) Assigned to: Fredrik Lundh (effbot) Summary: poor performance in xmlrpc response Initial Comment: xmlrpclib.Transport.parse_response() (called from xmlrpclib.Transport.request()) is exhibiting poor performance - approx. 10x slower than expected. I investigated based on using a simple app that sent a msg to a server, where all the server did was return the message back to the caller. From profiling, it became clear that the return trip was taken 10x the time consumed by the client->server trip, and that the time was spent getting things across the wire. parse_response() reads from a file object created via socket.makefile(), and as a result exhibits performance that is about an order of magnitude worse than what it would be if socket.recv() were used on the socket. The patch provided uses socket.recv() when possible, to improve performance. The patch provided is against revision 1.15. Its use provides performance for the return trip that is more or less equivalent to that of the forward trip. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 11:14 Message: Logged In: YES user_id=6380 My guess makefile() isn't buffering properly. This has been a long-standing problem on Windows; I'm not sure if it's an issue on Unix. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-01 09:34 Message: Logged In: YES user_id=38376 looks fine to me. I'll merge it with SLAB changes, and will check it into the 2.3 codebase asap. (we probably should try to figure out why makefile causes a 10x slowdown too -- xmlrpclib isn't exactly the only client library reading from a buffered socket) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 18:23 Message: Logged In: YES user_id=6380 Fredrik, does this look OK to you? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 From noreply@sourceforge.net Fri Mar 1 21:31:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 13:31:06 -0800 Subject: [Patches] [ python-Patches-517245 ] fix for mpzmodule.c Message-ID: Patches item #517245, was opened at 2002-02-13 18:18 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517245&group_id=5470 Category: Modules Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Marc Recht (marc) >Assigned to: Guido van Rossum (gvanrossum) Summary: fix for mpzmodule.c Initial Comment: This a one line to get mpzmodule compiled with GMP version >= 2. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 16:31 Message: Logged In: YES user_id=6380 Thanks, Fixed. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517245&group_id=5470 From noreply@sourceforge.net Fri Mar 1 21:35:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 13:35:25 -0800 Subject: [Patches] [ python-Patches-523241 ] MimeWriter must use CRLF instead of LF Message-ID: Patches item #523241, was opened at 2002-02-26 21:33 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523241&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Clarence Gardner (cgardner) Assigned to: Nobody/Anonymous (nobody) Summary: MimeWriter must use CRLF instead of LF Initial Comment: In all of the output that MimeWriter does (headers and boundaries), a CRLF must be written rather than just LF. (CRLF at the end of the header, and at the beginning and end of the boundaries.) Here's hoping I'm doing this right :) ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-01 16:35 Message: Logged In: YES user_id=12800 Guido is correct, and while I personally consider MIMEWriter obsolete , I have taken the same approach with the email package. IMO, both modules should read and write native line endings. It is the responsibility of smtplib (in the case of sending the msg over the wire) or the MTA's program/file filter (in the case of receiving the msg from the wire) to translate from RFC 2822 line endings to native line endings, and vice versa. I recommend this patch be rejected. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 18:03 Message: Logged In: YES user_id=6380 Thanks for bearing with us. SF may be the worst possible tool, but I don't know anything better. :-( Having seen the patch, I disagree with your intent. This issue has come up before. While the MIME standard stipulates that newlines are represented as CRLF on the wire, we're not writing files on the wire. We're using the local line ending convention consistently whenever we read or write email, and some other entity is responsible for translating these to the proper CRLF. Maybe you can come up with a fix to the documentation that explains this policy instead? ---------------------------------------------------------------------- Comment By: Clarence Gardner (cgardner) Date: 2002-02-28 17:41 Message: Logged In: YES user_id=409146 Actually, it's a SourceForge bug :( I did check the box and attach the file, but it gave me a "Bad Filename" error. I did it again, removing the quotes that my browser put around the filename, and it said "You already submitted this! Don't doubleclick!" So maybe not *everybody's* submissions that lack a file came from an idiot :) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 17:32 Message: Logged In: YES user_id=6380 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523241&group_id=5470 From noreply@sourceforge.net Fri Mar 1 21:42:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 13:42:16 -0800 Subject: [Patches] [ python-Patches-523241 ] MimeWriter must use CRLF instead of LF Message-ID: Patches item #523241, was opened at 2002-02-26 21:33 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523241&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Clarence Gardner (cgardner) >Assigned to: Barry Warsaw (bwarsaw) Summary: MimeWriter must use CRLF instead of LF Initial Comment: In all of the output that MimeWriter does (headers and boundaries), a CRLF must be written rather than just LF. (CRLF at the end of the header, and at the beginning and end of the boundaries.) Here's hoping I'm doing this right :) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-01 16:35 Message: Logged In: YES user_id=12800 Guido is correct, and while I personally consider MIMEWriter obsolete , I have taken the same approach with the email package. IMO, both modules should read and write native line endings. It is the responsibility of smtplib (in the case of sending the msg over the wire) or the MTA's program/file filter (in the case of receiving the msg from the wire) to translate from RFC 2822 line endings to native line endings, and vice versa. I recommend this patch be rejected. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 18:03 Message: Logged In: YES user_id=6380 Thanks for bearing with us. SF may be the worst possible tool, but I don't know anything better. :-( Having seen the patch, I disagree with your intent. This issue has come up before. While the MIME standard stipulates that newlines are represented as CRLF on the wire, we're not writing files on the wire. We're using the local line ending convention consistently whenever we read or write email, and some other entity is responsible for translating these to the proper CRLF. Maybe you can come up with a fix to the documentation that explains this policy instead? ---------------------------------------------------------------------- Comment By: Clarence Gardner (cgardner) Date: 2002-02-28 17:41 Message: Logged In: YES user_id=409146 Actually, it's a SourceForge bug :( I did check the box and attach the file, but it gave me a "Bad Filename" error. I did it again, removing the quotes that my browser put around the filename, and it said "You already submitted this! Don't doubleclick!" So maybe not *everybody's* submissions that lack a file came from an idiot :) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 17:32 Message: Logged In: YES user_id=6380 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523241&group_id=5470 From noreply@sourceforge.net Fri Mar 1 21:42:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 13:42:23 -0800 Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching Message-ID: Patches item #521478, was opened at 2002-02-22 09:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Rejected Priority: 5 Submitted By: Camiel Dobbelaar (camield) Assigned to: Guido van Rossum (gvanrossum) Summary: mailbox / fromline matching Initial Comment: mailbox.py does not parse this 'From' line correctly: >From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200 ^^^^^ This is because of the trailing timezone information, that the regex does not account for. Also, 'From' should match at the beginning of the line. ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-01 16:42 Message: Logged In: YES user_id=12800 IMO, Jamie Zawinski (author of the original mail/news reader in Netscape among other accomplishments), wrote the definitive answer on From_ http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html As far as Python's support for this in the mailbox module, for backwards compatibility, the UnixMailbox class has a strict-ish interpretation of the From_ delimiter, which I think should not change. It also has a class called PortableUnixMailbox which recognizes delimiters as specified in JWZ's document. Personally, if I was trolling over a real world mbox file I'd only use PortableUnixMailbox (as long as non-delimiter From_ lines were properly escaped -- I have some code in Mailman which tries to intelligently "fix" non-escaped mbox files). I agree with the Rejected resolution. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-01 06:34 Message: Logged In: YES user_id=466784 I have tracked this down to Pine, the mailreader. In imap/src/c-client/mail.c, it has this flag: static int notimezones = NIL; /* write timezones in "From " header */ (so timezones are written in the "From" lines by default) I also found the following comment in imap/docs/FAQ in the Pine distribution: """ So, good mail reading software only considers a line to be a "From " line if it follows the actual specification for a "From " line. This means, among other things, that the day of week is fixed-format: "May 14", but "May 7" (note the extra space) as opposed to "May 7". ctime() format for the date is the most common, although POSIX also allows a numeric timezone after the year. """ While I don't consider Pine to be the ultimate mailreader, its heritage may warrant that the 'From ' lines it creates are considered 'standard'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 17:37 Message: Logged In: YES user_id=6380 That From line is simply illegal, or at least nonstandard. If your system uses this nonstandard format, you can extend the mailbox parser by overriding the ._isrealfromline method. The pattern doesn't need ^ because match() is used, which only matches at the start of the line. Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 From noreply@sourceforge.net Fri Mar 1 22:25:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 14:25:23 -0800 Subject: [Patches] [ python-Patches-514641 ] Negative ob_size of LongObjects Message-ID: Patches item #514641, was opened at 2002-02-07 22:26 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514641&group_id=5470 Category: Core (C code) Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Naofumi Honda (naofumi-h) >Assigned to: Guido van Rossum (gvanrossum) Summary: Negative ob_size of LongObjects Initial Comment: I found the following bugs due to the negative ob_size of LongObjects representing the negative values. 1) The access of attribute "__dict__" causes panic. class A(long): pass x = A(-1) x.__dict__ ==> core dump! 2) pickle neglects the sign of LongObjects import pickle class A(long): pass x = A(-1) pickle.dumps(x) ==> a string containing 1L (not -1L) !!! The patch will resolve the above problems. Naofumi Honda ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:25 Message: Logged In: YES user_id=6380 Thanks, good catch! I've applied roughly your patch, and added a test. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-02-08 08:21 Message: Logged In: YES user_id=33168 There is a bug report for this item: #506679. https://sourceforge.net/tracker/index.php?func=detail&aid=506679&group_id=5470&atid=105470 ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514641&group_id=5470 From noreply@sourceforge.net Fri Mar 1 22:36:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 14:36:51 -0800 Subject: [Patches] [ python-Patches-515015 ] inspect.py raise exception if code not found Message-ID: Patches item #515015, was opened at 2002-02-08 17:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515015&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: inspect.py raise exception if code not found Initial Comment: there is a comment which says the suffixes should be sorted by length, but there is no comparison function. this patch adds a comparison (lambda). also, there are two functions which are documented to return IOError if there are problems, but if the function reaches the end, there were no raises. This patch adds raise IOErrors. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:36 Message: Logged In: YES user_id=6380 Neal, can you check this is and mark as bugfix? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-02-09 09:16 Message: Logged In: YES user_id=33168 Sorry, I saw the map/lambda above, but misread the code. Attached is a new file (just contains the 2 raises). I really need to add a test for this as well. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-02-08 18:10 Message: Logged In: YES user_id=31435 Please remove the lambda trick from the patch. The comment is explaining why the negation of the length is the first element of the tuples being sorted (that's what guarantees the longest suffix is checked first in case of overlap). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515015&group_id=5470 From noreply@sourceforge.net Fri Mar 1 22:40:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 14:40:02 -0800 Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection Message-ID: Patches item #515003, was opened at 2002-02-08 16:39 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: Added HTTP{,S}ProxyConnection Initial Comment: This patch adds HTTP*Connection classes for proxy connections. Authenticated proxies are also supported. One can argue urllib2 already implements this. It does not do HTTPS tunneling through proxies, and this is intended to be lower-level than urllib2. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:40 Message: Logged In: YES user_id=6380 This patch fails to seduce me. There's no explanation why this would be useful, or how it should be used, and no documentation, and a hint that urllib2 already does this. Maybe you can get someone who's known on python-dev to champion it, if you think it's useful? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 From noreply@sourceforge.net Fri Mar 1 22:42:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 14:42:02 -0800 Subject: [Patches] [ python-Patches-514997 ] remove extra SET_LINENOs Message-ID: Patches item #514997, was opened at 2002-02-08 16:22 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470 Category: Parser/Compiler Group: None Status: Open Resolution: None >Priority: 3 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: remove extra SET_LINENOs Initial Comment: This patch removes consecutive SET_LINENOs. The patch fixes test_hotspot, but does not fix a failure in inspect. I wasn't sure what was the problem was or why SET_LINENO would matter for inspect. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:42 Message: Logged In: YES user_id=6380 Can you find someone interested in answering the inspect question? Otherwise this patch is stalled... ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470 From noreply@sourceforge.net Fri Mar 1 22:43:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 14:43:47 -0800 Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior Message-ID: Patches item #514662, was opened at 2002-02-07 23:49 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Naofumi Honda (naofumi-h) >Assigned to: Guido van Rossum (gvanrossum) Summary: On the update_slot() behavior Initial Comment: Inherited method __getitem__ of list type in the new subclass is unexpectedly slow. For example, x = list([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m2.390s class nlist(list): pass x = nlist([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m7.040s about 3times slower!!! The reason is: for the __getitem__ attribute, there are two slotdefs in typeobject.c (one for the mapping type, and the other for the sequence type). In the creation of new_type of list type, fixup_slot_dispatchers() and update_slot() functions in typeobject.c allocate the functions to both sq_item and mp_subscript slots (the mp_subscript slot had originally no function, because the list type is a sequence type), and it's an unexpected allocation for the mapping slot since the descriptor type of __getitem__ is now WrapperType for the sequence operations. If you will trace x[1] using gdb, you will find that in PyObject_GetItem() m->mp_subscript = slot_mp_subscript is called instead of a sequece operation because mp_subscript slot was allocated by fixup_slot_dispatchers(). In the slot_mp_subscirpt(), call_method(self, "__getitem__", ...) is invoked, and turn out to call a wrapper descriptors for the sq_item. As a result, the method of list type finally called, but it needs many unexpected function calls. I will fix the behavior of fixup_slot_dispachers() and update_slot() as follows: Only the case where *) two or more slotdefs have the same attribute name where at most one corresponding slot has a non null pointer *) the descriptor type of the attribute is WrapperType, these functions will allocate the only one function to the apropriate slot. The other case, the behavior not changed to keep compatiblity! (in particular, considering the case where user override methods exist!) The following patch also includes speed up routines to find the slotdef duplications, but it's not essential! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 From noreply@sourceforge.net Fri Mar 1 22:45:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 14:45:41 -0800 Subject: [Patches] [ python-Patches-514628 ] bug in pydoc on python 2.2 release Message-ID: Patches item #514628, was opened at 2002-02-07 21:09 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514628&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Raj Kunjithapadam (mmaster25) >Assigned to: Tim Peters (tim_one) Summary: bug in pydoc on python 2.2 release Initial Comment: pydoc has a bug when trying to generate html doc more importantly it has bug in the method writedoc() attached is my fix. Here is the diff between my fix and the regular dist 1338c1338 < def writedoc(thing, forceload=0): --- > def writedoc(key, forceload=0): 1340,1346c1340,1343 < object = thing < if type(thing) is type(''): < try: < object = locate(thing, forceload) < except ErrorDuringImport, value: < print value < return --- > try: > object = locate(key, forceload) > except ErrorDuringImport, value: > print value 1351c1348 < file = open(thing.__name__ + '.html', 'w') --- > file = open(key + '.html', 'w') 1354c1351 < print 'wrote', thing.__name__ + '.html' --- > print 'wrote', key + '.html' 1356c1353 < print 'no Python documentation found for %s' % repr(thing) --- > print 'no Python documentation found for %s' % repr(key) ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:45 Message: Logged In: YES user_id=6380 assigned to Tim; this may be Ping's terrain but Ping is typically not responsive. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514628&group_id=5470 From noreply@sourceforge.net Fri Mar 1 22:58:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 14:58:11 -0800 Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection Message-ID: Patches item #515003, was opened at 2002-02-08 16:39 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: Added HTTP{,S}ProxyConnection Initial Comment: This patch adds HTTP*Connection classes for proxy connections. Authenticated proxies are also supported. One can argue urllib2 already implements this. It does not do HTTPS tunneling through proxies, and this is intended to be lower-level than urllib2. ---------------------------------------------------------------------- >Comment By: Mihai Ibanescu (misa) Date: 2002-03-01 17:58 Message: Logged In: YES user_id=205865 I will add documentation and show the intended usage. urllib* doesn't deal with proxying over SSL (using CONNECT instead of GET/POST). urllib* also use the compatibility classes, HTTP/HTTPS, instead of HTTPConnection (this is not an argument by itself). Thanks for the suggestion. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:40 Message: Logged In: YES user_id=6380 This patch fails to seduce me. There's no explanation why this would be useful, or how it should be used, and no documentation, and a hint that urllib2 already does this. Maybe you can get someone who's known on python-dev to champion it, if you think it's useful? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 From noreply@sourceforge.net Fri Mar 1 23:00:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 15:00:39 -0800 Subject: [Patches] [ python-Patches-500002 ] Fix for #221791 (bad \x escape) Message-ID: Patches item #500002, was opened at 2002-01-05 19:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for #221791 (bad \x escape) Initial Comment: This patch adds file and line output if a bad \x escape was found in the source. It does so with the following modifications: - PyErr_Display now recognizes syntax errors not by their class, but by an attribute print_file_and_line - this attribute is set for all SyntaxError instances - PyErr_SyntaxLocation is enhanced to set all attributes expected for a syntax error, even if the current exception has a different class. - compile.c now invokes PyErr_SyntaxLocation for all non-syntax exceptions also, mostly through com_error. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 18:00 Message: Logged In: YES user_id=6380 If the pydebug problem can be fixed, I'd be all for implementing it, and adding to 2.2.1. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-01-30 10:01 Message: Logged In: YES user_id=6656 This doesn't compile --with-pydebug (he suddenly notices). There's an assert(val == NULL) in compile.c, but no variable val. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470 From noreply@sourceforge.net Fri Mar 1 23:12:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 01 Mar 2002 15:12:30 -0800 Subject: [Patches] [ python-Patches-515003 ] Added HTTP{,S}ProxyConnection Message-ID: Patches item #515003, was opened at 2002-02-08 16:39 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Mihai Ibanescu (misa) Assigned to: Nobody/Anonymous (nobody) Summary: Added HTTP{,S}ProxyConnection Initial Comment: This patch adds HTTP*Connection classes for proxy connections. Authenticated proxies are also supported. One can argue urllib2 already implements this. It does not do HTTPS tunneling through proxies, and this is intended to be lower-level than urllib2. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 18:12 Message: Logged In: YES user_id=6380 OK, thanks; I'll wait! ---------------------------------------------------------------------- Comment By: Mihai Ibanescu (misa) Date: 2002-03-01 17:58 Message: Logged In: YES user_id=205865 I will add documentation and show the intended usage. urllib* doesn't deal with proxying over SSL (using CONNECT instead of GET/POST). urllib* also use the compatibility classes, HTTP/HTTPS, instead of HTTPConnection (this is not an argument by itself). Thanks for the suggestion. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:40 Message: Logged In: YES user_id=6380 This patch fails to seduce me. There's no explanation why this would be useful, or how it should be used, and no documentation, and a hint that urllib2 already does this. Maybe you can get someone who's known on python-dev to champion it, if you think it's useful? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515003&group_id=5470 From noreply@sourceforge.net Sat Mar 2 14:34:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 06:34:26 -0800 Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching Message-ID: Patches item #521478, was opened at 2002-02-22 15:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Rejected Priority: 5 Submitted By: Camiel Dobbelaar (camield) Assigned to: Guido van Rossum (gvanrossum) Summary: mailbox / fromline matching Initial Comment: mailbox.py does not parse this 'From' line correctly: >From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200 ^^^^^ This is because of the trailing timezone information, that the regex does not account for. Also, 'From' should match at the beginning of the line. ---------------------------------------------------------------------- >Comment By: Camiel Dobbelaar (camield) Date: 2002-03-02 15:34 Message: Logged In: YES user_id=466784 PortableUnixMailbox is not that useful, because it only matches '^From '. From-quoting is an even bigger mess then From-headerlines, so that does not really help. I submit a new diff that matches '\n\nFrom ' or 'From ', which makes PortableUnixMailbox useful for my purposes. It is not that intrusive as the comment in the mailbox.py suggests. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-01 22:42 Message: Logged In: YES user_id=12800 IMO, Jamie Zawinski (author of the original mail/news reader in Netscape among other accomplishments), wrote the definitive answer on From_ http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html As far as Python's support for this in the mailbox module, for backwards compatibility, the UnixMailbox class has a strict-ish interpretation of the From_ delimiter, which I think should not change. It also has a class called PortableUnixMailbox which recognizes delimiters as specified in JWZ's document. Personally, if I was trolling over a real world mbox file I'd only use PortableUnixMailbox (as long as non-delimiter From_ lines were properly escaped -- I have some code in Mailman which tries to intelligently "fix" non-escaped mbox files). I agree with the Rejected resolution. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-01 12:34 Message: Logged In: YES user_id=466784 I have tracked this down to Pine, the mailreader. In imap/src/c-client/mail.c, it has this flag: static int notimezones = NIL; /* write timezones in "From " header */ (so timezones are written in the "From" lines by default) I also found the following comment in imap/docs/FAQ in the Pine distribution: """ So, good mail reading software only considers a line to be a "From " line if it follows the actual specification for a "From " line. This means, among other things, that the day of week is fixed-format: "May 14", but "May 7" (note the extra space) as opposed to "May 7". ctime() format for the date is the most common, although POSIX also allows a numeric timezone after the year. """ While I don't consider Pine to be the ultimate mailreader, its heritage may warrant that the 'From ' lines it creates are considered 'standard'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 23:37 Message: Logged In: YES user_id=6380 That From line is simply illegal, or at least nonstandard. If your system uses this nonstandard format, you can extend the mailbox parser by overriding the ._isrealfromline method. The pattern doesn't need ^ because match() is used, which only matches at the start of the line. Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 From noreply@sourceforge.net Sat Mar 2 14:38:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 06:38:02 -0800 Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching Message-ID: Patches item #521478, was opened at 2002-02-22 15:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Rejected Priority: 5 Submitted By: Camiel Dobbelaar (camield) Assigned to: Guido van Rossum (gvanrossum) Summary: mailbox / fromline matching Initial Comment: mailbox.py does not parse this 'From' line correctly: >From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200 ^^^^^ This is because of the trailing timezone information, that the regex does not account for. Also, 'From' should match at the beginning of the line. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-02 15:34 Message: Logged In: YES user_id=466784 PortableUnixMailbox is not that useful, because it only matches '^From '. From-quoting is an even bigger mess then From-headerlines, so that does not really help. I submit a new diff that matches '\n\nFrom ' or 'From ', which makes PortableUnixMailbox useful for my purposes. It is not that intrusive as the comment in the mailbox.py suggests. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-01 22:42 Message: Logged In: YES user_id=12800 IMO, Jamie Zawinski (author of the original mail/news reader in Netscape among other accomplishments), wrote the definitive answer on From_ http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html As far as Python's support for this in the mailbox module, for backwards compatibility, the UnixMailbox class has a strict-ish interpretation of the From_ delimiter, which I think should not change. It also has a class called PortableUnixMailbox which recognizes delimiters as specified in JWZ's document. Personally, if I was trolling over a real world mbox file I'd only use PortableUnixMailbox (as long as non-delimiter From_ lines were properly escaped -- I have some code in Mailman which tries to intelligently "fix" non-escaped mbox files). I agree with the Rejected resolution. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-01 12:34 Message: Logged In: YES user_id=466784 I have tracked this down to Pine, the mailreader. In imap/src/c-client/mail.c, it has this flag: static int notimezones = NIL; /* write timezones in "From " header */ (so timezones are written in the "From" lines by default) I also found the following comment in imap/docs/FAQ in the Pine distribution: """ So, good mail reading software only considers a line to be a "From " line if it follows the actual specification for a "From " line. This means, among other things, that the day of week is fixed-format: "May 14", but "May 7" (note the extra space) as opposed to "May 7". ctime() format for the date is the most common, although POSIX also allows a numeric timezone after the year. """ While I don't consider Pine to be the ultimate mailreader, its heritage may warrant that the 'From ' lines it creates are considered 'standard'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 23:37 Message: Logged In: YES user_id=6380 That From line is simply illegal, or at least nonstandard. If your system uses this nonstandard format, you can extend the mailbox parser by overriding the ._isrealfromline method. The pattern doesn't need ^ because match() is used, which only matches at the start of the line. Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 From noreply@sourceforge.net Sat Mar 2 14:38:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 06:38:42 -0800 Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching Message-ID: Patches item #521478, was opened at 2002-02-22 15:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Rejected Priority: 5 Submitted By: Camiel Dobbelaar (camield) Assigned to: Guido van Rossum (gvanrossum) Summary: mailbox / fromline matching Initial Comment: mailbox.py does not parse this 'From' line correctly: >From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200 ^^^^^ This is because of the trailing timezone information, that the regex does not account for. Also, 'From' should match at the beginning of the line. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-02 15:34 Message: Logged In: YES user_id=466784 PortableUnixMailbox is not that useful, because it only matches '^From '. From-quoting is an even bigger mess then From-headerlines, so that does not really help. I submit a new diff that matches '\n\nFrom ' or 'From ', which makes PortableUnixMailbox useful for my purposes. It is not that intrusive as the comment in the mailbox.py suggests. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-01 22:42 Message: Logged In: YES user_id=12800 IMO, Jamie Zawinski (author of the original mail/news reader in Netscape among other accomplishments), wrote the definitive answer on From_ http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html As far as Python's support for this in the mailbox module, for backwards compatibility, the UnixMailbox class has a strict-ish interpretation of the From_ delimiter, which I think should not change. It also has a class called PortableUnixMailbox which recognizes delimiters as specified in JWZ's document. Personally, if I was trolling over a real world mbox file I'd only use PortableUnixMailbox (as long as non-delimiter From_ lines were properly escaped -- I have some code in Mailman which tries to intelligently "fix" non-escaped mbox files). I agree with the Rejected resolution. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-01 12:34 Message: Logged In: YES user_id=466784 I have tracked this down to Pine, the mailreader. In imap/src/c-client/mail.c, it has this flag: static int notimezones = NIL; /* write timezones in "From " header */ (so timezones are written in the "From" lines by default) I also found the following comment in imap/docs/FAQ in the Pine distribution: """ So, good mail reading software only considers a line to be a "From " line if it follows the actual specification for a "From " line. This means, among other things, that the day of week is fixed-format: "May 14", but "May 7" (note the extra space) as opposed to "May 7". ctime() format for the date is the most common, although POSIX also allows a numeric timezone after the year. """ While I don't consider Pine to be the ultimate mailreader, its heritage may warrant that the 'From ' lines it creates are considered 'standard'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 23:37 Message: Logged In: YES user_id=6380 That From line is simply illegal, or at least nonstandard. If your system uses this nonstandard format, you can extend the mailbox parser by overriding the ._isrealfromline method. The pattern doesn't need ^ because match() is used, which only matches at the start of the line. Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 From noreply@sourceforge.net Sat Mar 2 16:47:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 08:47:33 -0800 Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching Message-ID: Patches item #521478, was opened at 2002-02-22 09:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Open Resolution: Rejected Priority: 5 Submitted By: Camiel Dobbelaar (camield) >Assigned to: Barry Warsaw (bwarsaw) Summary: mailbox / fromline matching Initial Comment: mailbox.py does not parse this 'From' line correctly: >From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200 ^^^^^ This is because of the trailing timezone information, that the regex does not account for. Also, 'From' should match at the beginning of the line. ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-02 11:47 Message: Logged In: YES user_id=12800 Re-opening and assigning to myself. I'll take a look at your patches asap. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-02 09:34 Message: Logged In: YES user_id=466784 PortableUnixMailbox is not that useful, because it only matches '^From '. From-quoting is an even bigger mess then From-headerlines, so that does not really help. I submit a new diff that matches '\n\nFrom ' or 'From ', which makes PortableUnixMailbox useful for my purposes. It is not that intrusive as the comment in the mailbox.py suggests. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-01 16:42 Message: Logged In: YES user_id=12800 IMO, Jamie Zawinski (author of the original mail/news reader in Netscape among other accomplishments), wrote the definitive answer on From_ http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html As far as Python's support for this in the mailbox module, for backwards compatibility, the UnixMailbox class has a strict-ish interpretation of the From_ delimiter, which I think should not change. It also has a class called PortableUnixMailbox which recognizes delimiters as specified in JWZ's document. Personally, if I was trolling over a real world mbox file I'd only use PortableUnixMailbox (as long as non-delimiter From_ lines were properly escaped -- I have some code in Mailman which tries to intelligently "fix" non-escaped mbox files). I agree with the Rejected resolution. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-01 06:34 Message: Logged In: YES user_id=466784 I have tracked this down to Pine, the mailreader. In imap/src/c-client/mail.c, it has this flag: static int notimezones = NIL; /* write timezones in "From " header */ (so timezones are written in the "From" lines by default) I also found the following comment in imap/docs/FAQ in the Pine distribution: """ So, good mail reading software only considers a line to be a "From " line if it follows the actual specification for a "From " line. This means, among other things, that the day of week is fixed-format: "May 14", but "May 7" (note the extra space) as opposed to "May 7". ctime() format for the date is the most common, although POSIX also allows a numeric timezone after the year. """ While I don't consider Pine to be the ultimate mailreader, its heritage may warrant that the 'From ' lines it creates are considered 'standard'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 17:37 Message: Logged In: YES user_id=6380 That From line is simply illegal, or at least nonstandard. If your system uses this nonstandard format, you can extend the mailbox parser by overriding the ._isrealfromline method. The pattern doesn't need ^ because match() is used, which only matches at the start of the line. Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 From noreply@sourceforge.net Sat Mar 2 20:24:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 12:24:57 -0800 Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements Message-ID: Patches item #520694, was opened at 2002-02-20 22:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 Category: None Group: None Status: Open Resolution: Accepted Priority: 3 Submitted By: Jason Orendorff (jorend) Assigned to: Martin v. Lцwis (loewis) Summary: arraymodule.c improvements Initial Comment: This patch makes brings the array module a little more up-to-date. There are two changes: 1. Modernize the array type, memory management, and so forth. As a result, the array() builtin is no longer a function but a type. array.array is array.ArrayType. Also, it can now be subclassed in Python. 2. Add a new typecode 'u', for Unicode characters. The patch includes changes to test/test_array.py to test the new features. I would like to make a further change: add an arrayobject.h include file, and provide some array operations there, giving them names like PyArray_Check(), PyArray_GetItem(), and PyArray_GET_DATA(). Is such a change likely to find favor? ---------------------------------------------------------------------- >Comment By: Jason Orendorff (jorend) Date: 2002-03-02 20:24 Message: Logged In: YES user_id=18139 Removing array's tp_print sounds good to me. (I did not notice this behavior because on Windows, type(sys.stdout) is not file so array_print wasn't being invoked.) ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-01 10:30 Message: Logged In: YES user_id=21627 Thanks again for the patches; committed as libarray.tex 1.32 test_array.py 1.14 NEWS 1.358 arraymodule.c 2.67 I added Py_USING_UNICODE before checking this in. There is one open issue: printing Unicode arrays on the interpreter prompt will still repr arrays as lists of Unicode objects; this is because arrays implement tp_print? Is that necessary? My proposal: just remove the tp_print implementation. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-03-01 07:25 Message: Logged In: YES user_id=18139 Documentation patch. Please check my TEX; I'm not used to it yet, and I can't get the Python docs to build on my Windows box, probably because one of the tools isn't installed properly, or something. So there's no way for me to check that it's correct, yet. (...If you let this sit for a moment I'll eventually check this for myself on the Linux box, but it'll be a little while. Thanks.) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-03-01 07:21 Message: Logged In: YES user_id=18139 Guido: In hindsight, yes it would have been much easier. ...This version adds __iadd__ and __imul__. There's also a separate documentation patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 22:46 Message: Logged In: YES user_id=6380 Cool. I wonder if it wouldn't have been easier to first submit and commit the easy changes, and then the unicode addition separately? Anyway, I presume that Martin will commit this when it's ready. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-27 03:15 Message: Logged In: YES user_id=18139 Getting there. This version has tounicode() and fromunicode(), and a better repr() for type 'u' arrays. Also, array.typecode and array.itemsize are now listed under tp_getset; they're attribute descriptors and they show up in help(array). (Neat!) Next, documentation; then __iadd__ and __imul__. But not tonight. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-25 12:24 Message: Logged In: YES user_id=21627 Removal of __members__ is fine, then - but you do need to fill out an appropriate tp_members instead, listing "typecode" and "itemsize". Adding __iadd__ and __imul__ is fine; the equivalent feature for lists has not caused complaints, either, and anybody using *= on an array probably would consider it a bug that it isn't in-place. Please add documentation changes as well; I currently have Doc/lib/libarray.tex \lineiii{'d'}{double}{8} +\lineiii{'u'}{Py_UNICODE}{2} \end{tableiii} Misc/NEWS - array.array is now a type object. A new format character 'u' indicates Py_UNICODE arrays. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-25 00:29 Message: Logged In: YES user_id=18139 Martin writes: "There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string." Boy, are you right. There should be array.tounicode() and array.fromunicode() methods that only work on type 'u' arrays. ...I also want to fix repr for type 'u' arrays. Instead of "array.array('u', [u'x', u'y', u'z'])" it should say "array.array('u', u'xyz')". ...I would also implement __iadd__ and __imul__ (as list implements them), but this would be a semantic change! Thoughts? Count on a new patch tomorrow. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-02-24 21:38 Message: Logged In: YES user_id=31435 Without looking at any details, __members__ and __methods__ are deprecated starting with 2.2; the type/class unification PEPs aim at moving the universe toward supporting and using the class-like introspection API instead. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-24 15:56 Message: Logged In: YES user_id=21627 There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string. You have to use u"".join(arr.tolist()) This is slightly annoying, since there is it is the only case where it is not possible to get back the original constructor arguments. Also, what is the rationale for removing __members__? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-22 13:39 Message: Logged In: YES user_id=38388 How about simplifying the whole setup altogether and add arrays as standard Python types (ie. put the code in Objects/ and add the new include file to Includes/). About the inter-module C API export: I'll write up a PEP about this which will hopefully result in a new standard support mechanism for this in Python. (BTW, the approach I used in _ssl/_socket does use PyCObjects) ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-22 13:25 Message: Logged In: YES user_id=21627 With the rationale given, I'm now in favour of all parts of the patch. As for exposing the API, you need to address MAL's concerns: PyArray_* won't be available to other extension modules, instead, you need to do expose them through a C object. However, I recommend *not* to follow the approach taken in socket/ssl; I agree with Tim's concerns here. Instead, the approach taken by cStringIO (via cStringIO.cStringIO_API) is much better (i.e. put the burden of using the API onto any importer, and out of Python proper). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-21 08:40 Message: Logged In: YES user_id=38388 About the Unicode bit: if "u" maps to Py_UNICODE I for one don't have any objections. The internal encoding is available in lots of places, so that argument doesn't count and I'm sure it can be put to some good use for fast manipulation of large Unicode strings. I very much like the new exposure of the type at C level; however I don't understand how you would use it without adding the complete module to the libpythonx.x.a (unless you add some sort of inter-module C API import mechanism like the one I added to _socket and _ssl) ?! ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 02:03 Message: Logged In: YES user_id=18139 > What is the rationale for expanding PyObject_VAR_HEAD? > It doesn't seem to achieve anything. It didn't make sense for array to be a VAR_HEAD type. VAR_HEAD types are variable-size: the last member defined in the struct for such a type is an array of length 1, and type->item_size is nonzero. See e.g. PyType_GenericAlloc(), and how it decides whether to call PyObject_INIT or PyObject_VAR_INIT: It checks type->item_size. The new arraymodule.c calls PyType_GenericAlloc; the old one didn't. So a change seemed warranted. Since Arraytype has item_size == 0, it seemed most consistent to make it a non-VAR type and initialize the ob_size field myself. I'm pretty sure I got the right interpretation of this; but if not, someone wiser in the ways of Python will speak up. :) (While I was looking at this, I noticed this: http://sourceforge.net/tracker/index.php? func=detail&aid=520768&group_id=5470&atid=305470) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 01:15 Message: Logged In: YES user_id=18139 > I don't like the Unicode part of it at all. Well, I'm not attatched to it. It's very easy to subtract it from the patch. > What can you do with this feature? The same sort of thing you might do with an array of type 'c'. For example, change individual characters of a (Unicode) string and then run a (Unicode) re.match on it. > It seems to unfairly prefer a specific Unicode encoding, > without explaining what that encoding is, and without a > clear use case why this encoding is desirable. Well, why should array('h', '\x00\xff\xaa\xbb') be allowed? Why is that encoding preferable to any other particular encoding of short ints? Easy: it's the encoding of the C compiler where Python was built. For 'u' arrays, the encoding used is just the encoding that Python uses internally. However, it's not intended to be used in any situation where encode()/decode() would be appropriate. I never even thought about that possibility when I wrote it. The behavior of a 'u' array is intended to be more like this: Suppose A = array('u', ustr). Then: len(A) == len(ustr) A[0] == ustr[0] A[1] == ustr[1] ... That is, a 'u' array is an array of Unicode characters. Encoding is not an issue, any more than with the built-in unicode type. (If ustr is a non-Unicode string, then the behavior is different -- more in line with what 'b', 'h', 'i', and the others do.) If your concern is that Python currently "hides" its internal encoding, and the 'u' array exposes this unnecessarily, then consider these two examples that don't involve arrays: >>> x = u'\U00012345' # One Unicode codepoint... >>> len(x) 2 # hmm. >>> x[0] u'\ud808' # aha. UTF-16. >>> x[1] u'\udf45' >>> str(buffer(u'abc')) # Example two. 'a\x00b\x00c\x00' > It also seems to overlap with the Unicode object's > .encode method, which is much more general. Wow. Well, that wasn't my intent. It is intended, rather, to offer parity with 'c'. Java has byte[], short[], int[], long[], float[], double[], and char[]... Python doesn't currently have char[]. Shouldn't it? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-20 23:02 Message: Logged In: YES user_id=21627 What is the rationale for expanding PyObject_VAR_HEAD? It doesn't seem to achieve anything. I don't like the Unicode part of it at all. What can you do with this feature? It seems to unfairly prefer a specific Unicode encoding, without explaining what that encoding is, and without a clear use case why this encoding is desirable. It also seems to overlap with the Unicode object's .encode method, which is much more general. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 From noreply@sourceforge.net Sun Mar 3 03:19:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 19:19:45 -0800 Subject: [Patches] [ python-Patches-450267 ] OS/2+EMX port - changes to Python core Message-ID: Patches item #450267, was opened at 2001-08-12 21:34 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=450267&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Andrew I MacIntyre (aimacintyre) >Assigned to: Andrew I MacIntyre (aimacintyre) Summary: OS/2+EMX port - changes to Python core Initial Comment: The attached patch incorporates the changes to the source tree between Python 2.1.1 and the 010812 release of the OS/2+EMX port. It includes changes to files in Include/, Modules/, Objects/ and Python/. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 14:19 Message: Logged In: YES user_id=250749 All parts now committed. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-02-17 16:16 Message: Logged In: YES user_id=250749 Following discussion on python-dev, I have created patches for Objects/stringobject.c and Objects/unicodeobject.c that aim to rationalise the %#x/%#X format conversion mess. These two patches remove approaches specific to the various bugs and standard violations encountered with these format conversions, and take the approach of relying on the behaviour of the %x/%X format conversions and directly supplying Python's preferred prefix (0x/0X respectively). The patches presented are against CVS of 15Feb02 1430 AEST, and have been tested on both OS/2 and FreeBSD. If acceptable, I would prefer to apply my pre-existing patches for these files (the Objects patch below) before these new patches, as my earlier patches with their EMX specifics in OS/2 specific #ifdefs are "failsafe" as far as other platforms are concerned. Then if the new approach causes other platforms to fail, these patches can be backed out without breaking the OS/2 port. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-01-27 17:19 Message: Logged In: YES user_id=250749 I have split the original approach into patches for each of the Include, Modules, Objects and Python directories. Of particular note: - the patches to import.c are general to both VACPP and EMX ports, and have been trialled by Michael Muller with satisfactory results. - Modules/unicodedata.c has a name clash between its internally defined _getname() and an EMX routine of the same name defined in . Is the solution in the patch acceptable? - both Objects/stringobject.c and Objects/unicodeobject.c have changes to deal with EMX's runtime not producing a desired "0X" prefix in response to a "%X" format specifier (it produces "0x" instead). The patched source tree has been built and regression tested on both EMX and FreeBSD 4.4, with no unexpected results. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-10-02 08:59 Message: Logged In: YES user_id=21627 Please review this patch carefully again, asking in each case whether this chunk *really* belongs to this patch. Do so by asking "is it specific to the port of Python to os2emx?" There are some changes that are desirable, but are unrelated (like the whitespace changes in PyThread_down_sema). Please submit those in a separate patch. There are also changes that don't belong here at all, like the inclusion of a Modules/Setup. If you are revising this patch, you may also split it into a part that is absolutely necessary, and a part that is nice-to-have. E.g. the termios changes are probably system-specific, but I guess the port would work well without them. Without going in small steps, it seems that we won't move at all. You may consider making use of your checkin permissions for uncritical operations. If you need help in CVS operations, please let me know. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2001-08-13 23:21 Message: Logged In: YES user_id=250749 Thanks for the feedback. At this stage of the game, I'd prefer to work with a "supervisor" rather than take on CVS commit privs, though I realise that "supervisors" are a scarce resource. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-08-13 00:53 Message: Logged In: YES user_id=6380 Hi Andrew, Thanks for the patches. There's a lot of code (here and in the two previous patches). I'm going to see if we can give you CVS commit permission so you can apply the changes yourself. Note that commit permission (if you get it) doesn't mean that the patch is automatically approved -- I've seen some changes in your diffs that look questionable. You probably know which ones. :-) In general, the guidelines are that you can make changes freely (a) in code you own because it's in a file or directory that's specific to your port; (b) in code specific to your port that's inside #ifdefs for your port (this includes adding); (c) to fix an *obvious* small typo or buglet that bothers your compiler (example: if your compiler warns about an unused variable, feel free to delete it, as long as the unusedness isn't dependent on an #ifdef). For other changes we all appreciate it if you discuss them on python-dev or on the SF patch manager first. Oh, and if you ever check something in that breaks the build on another platform or causes the test suite to fail, people will demand very quick remedies or the checkin will be reversed. :-) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=450267&group_id=5470 From noreply@sourceforge.net Sun Mar 3 03:21:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 19:21:54 -0800 Subject: [Patches] [ python-Patches-514490 ] Better pager selection for OS/2 Message-ID: Patches item #514490, was opened at 2002-02-08 07:10 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514490&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Stefan Schwarzer (sschwarzer) Assigned to: Andrew I MacIntyre (aimacintyre) Summary: Better pager selection for OS/2 Initial Comment: With the current implementation (rev. 1.56) of pydoc.py the first call of the help command gives (when the pager environmment variable is not set): Python 2.2 (#0, Dec 24 2001, 18:42:48) [EMX GCC 2.8.1] on os2emx Type "help", "copyright", "credits" or "license" for more information. >>> help(help) SYS0003: The system cannot find the path specified. Help on instance of _Helper: Type help() for interactive help, or help(object) for help about object. >>> After the error message one has to press Ctrl-C. Further invocations of help work, though. The attached patch selects 'more <' as the default pager when no PAGER env. variable is set, like on Windows. I use os.platform.startswith to deal with a possible future port with os.platform == 'os2vac'. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 14:21 Message: Logged In: YES user_id=250749 Committed. Thanks for the patch! ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-02-13 23:40 Message: Logged In: YES user_id=250749 The patch looks Ok to me. I plan to apply it after I have all the EMX port patches into CVS. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514490&group_id=5470 From noreply@sourceforge.net Sun Mar 3 03:32:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 19:32:30 -0800 Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen() Message-ID: Patches item #523415, was opened at 2002-02-28 01:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Andy Gimblett (gimbo) Assigned to: Nobody/Anonymous (nobody) Summary: Explict proxies for urllib.urlopen() Initial Comment: This patch extends urllib.urlopen() so that proxies may be specified explicitly. This is achieved by adding an optional "proxies" parameter. If this parameter is omitted, urlopen() acts exactly as before, ie gets proxy settings from the environment. This is useful if you want to tell urlopen() not to use the proxy: just pass an empty dictionary. Also included is a patch to the urllib documentation explaining the new parameter. Apologies if patch format is not exactly as required: this is my first submission. All feedback appreciated. :-) ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 14:32 Message: Logged In: YES user_id=250749 Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) directly. If this is the intent, the docs could be a little clearer about this. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 From noreply@sourceforge.net Sun Mar 3 03:34:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 02 Mar 2002 19:34:49 -0800 Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen() Message-ID: Patches item #523415, was opened at 2002-02-28 01:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Andy Gimblett (gimbo) Assigned to: Nobody/Anonymous (nobody) Summary: Explict proxies for urllib.urlopen() Initial Comment: This patch extends urllib.urlopen() so that proxies may be specified explicitly. This is achieved by adding an optional "proxies" parameter. If this parameter is omitted, urlopen() acts exactly as before, ie gets proxy settings from the environment. This is useful if you want to tell urlopen() not to use the proxy: just pass an empty dictionary. Also included is a patch to the urllib documentation explaining the new parameter. Apologies if patch format is not exactly as required: this is my first submission. All feedback appreciated. :-) ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 14:34 Message: Logged In: YES user_id=250749 BTW, the patch guidelines indicate a strong preference for context diffs with unified diffs a poor second. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 14:32 Message: Logged In: YES user_id=250749 Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) directly. If this is the intent, the docs could be a little clearer about this. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 From noreply@sourceforge.net Sun Mar 3 11:58:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 03:58:48 -0800 Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes Message-ID: Patches item #525109, was opened at 2002-03-03 11:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 Category: IDLE Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Martin Liebmann (mliebmann) Assigned to: Nobody/Anonymous (nobody) Summary: Extension to Calltips / Show attributes Initial Comment: The attached files (unified diff files) implement a (quick and dirty but usefull) extension to IDLE 0.8 (Python 2.2) - Tested on WINDOWS 95/98/NT/2000 - Similar to "CallTips" this extension shows (context sensitive) all available member functions and attributes of the current object after hitting the 'dot'-key. The toplevel help widget now supports scrolling. (Key- Up and Key-Down events) ...that is why I changed among else the first argument of 'showtip' from 'text string' to a 'list of text strings' ... The 'space'-key is used to insert the topmost item of the help widget into an IDLE text window. ...the even handling seems to be a critical part of the current IDLE implementation. That is why I added the new functionallity as a patch of CallTips.py and CallTipWindow.py. May be you still have a better implementation ... Greetings Martin Liebmann ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 From noreply@sourceforge.net Sun Mar 3 18:29:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 10:29:41 -0800 Subject: [Patches] [ python-Patches-500002 ] Fix for #221791 (bad \x escape) Message-ID: Patches item #500002, was opened at 2002-01-06 00:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Lцwis (loewis) >Assigned to: Martin v. Lцwis (loewis) Summary: Fix for #221791 (bad \x escape) Initial Comment: This patch adds file and line output if a bad \x escape was found in the source. It does so with the following modifications: - PyErr_Display now recognizes syntax errors not by their class, but by an attribute print_file_and_line - this attribute is set for all SyntaxError instances - PyErr_SyntaxLocation is enhanced to set all attributes expected for a syntax error, even if the current exception has a different class. - compile.c now invokes PyErr_SyntaxLocation for all non-syntax exceptions also, mostly through com_error. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 23:00 Message: Logged In: YES user_id=6380 If the pydebug problem can be fixed, I'd be all for implementing it, and adding to 2.2.1. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-01-30 15:01 Message: Logged In: YES user_id=6656 This doesn't compile --with-pydebug (he suddenly notices). There's an assert(val == NULL) in compile.c, but no variable val. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470 From noreply@sourceforge.net Sun Mar 3 19:50:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 11:50:37 -0800 Subject: [Patches] [ python-Patches-525211 ] Utils.py imported module not used Message-ID: Patches item #525211, was opened at 2002-03-03 12:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Evelyn Mitchell (efm) Assigned to: Nobody/Anonymous (nobody) Summary: Utils.py imported module not used Initial Comment: pychecker complains of Utils.py:1: Imported module (re) not used in email/Utils.py ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470 From noreply@sourceforge.net Sun Mar 3 20:47:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 12:47:10 -0800 Subject: [Patches] [ python-Patches-525225 ] email Generator.py unused import Message-ID: Patches item #525225, was opened at 2002-03-03 13:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Evelyn Mitchell (efm) Assigned to: Nobody/Anonymous (nobody) Summary: email Generator.py unused import Initial Comment: pychecker complains: Generator.py:15: Imported module (Message) not used Generator.py:16: Imported module (Errors) not used ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470 From noreply@sourceforge.net Sun Mar 3 21:33:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 13:33:59 -0800 Subject: [Patches] [ python-Patches-500002 ] Fix for #221791 (bad \x escape) Message-ID: Patches item #500002, was opened at 2002-01-06 01:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: Martin v. Lцwis (loewis) Summary: Fix for #221791 (bad \x escape) Initial Comment: This patch adds file and line output if a bad \x escape was found in the source. It does so with the following modifications: - PyErr_Display now recognizes syntax errors not by their class, but by an attribute print_file_and_line - this attribute is set for all SyntaxError instances - PyErr_SyntaxLocation is enhanced to set all attributes expected for a syntax error, even if the current exception has a different class. - compile.c now invokes PyErr_SyntaxLocation for all non-syntax exceptions also, mostly through com_error. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-03 22:33 Message: Logged In: YES user_id=21627 Both asserts in this place where non-sensical left-overs from an earlier version, and are now removed. Committed as NEWS 1.360 1.337.2.4.2.2 compile.c 2.239 2.234.4.3 errors.c 2.67 2.66.10.1 exceptions.c 1.29 1.28.6.1 pythonrun.c 2.155 2.153.6.2 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-02 00:00 Message: Logged In: YES user_id=6380 If the pydebug problem can be fixed, I'd be all for implementing it, and adding to 2.2.1. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-01-30 16:01 Message: Logged In: YES user_id=6656 This doesn't compile --with-pydebug (he suddenly notices). There's an assert(val == NULL) in compile.c, but no variable val. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500002&group_id=5470 From noreply@sourceforge.net Sun Mar 3 21:36:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 13:36:57 -0800 Subject: [Patches] [ python-Patches-525211 ] Utils.py imported module not used Message-ID: Patches item #525211, was opened at 2002-03-03 20:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Evelyn Mitchell (efm) Assigned to: Nobody/Anonymous (nobody) Summary: Utils.py imported module not used Initial Comment: pychecker complains of Utils.py:1: Imported module (re) not used in email/Utils.py ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-03 22:36 Message: Logged In: YES user_id=21627 It is used, in the line ecre = re.compile(r''' [...] ''', re.VERBOSE | re.IGNORECASE) pychecker bug? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470 From noreply@sourceforge.net Sun Mar 3 21:39:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 13:39:27 -0800 Subject: [Patches] [ python-Patches-525225 ] email Generator.py unused import Message-ID: Patches item #525225, was opened at 2002-03-03 21:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Evelyn Mitchell (efm) >Assigned to: Barry Warsaw (bwarsaw) Summary: email Generator.py unused import Initial Comment: pychecker complains: Generator.py:15: Imported module (Message) not used Generator.py:16: Imported module (Errors) not used ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-03 22:39 Message: Logged In: YES user_id=21627 Barry, those are indeed unused. Ok to remove them? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470 From noreply@sourceforge.net Sun Mar 3 22:02:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 14:02:09 -0800 Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes Message-ID: Patches item #525109, was opened at 2002-03-03 11:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 Category: IDLE Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Martin Liebmann (mliebmann) Assigned to: Nobody/Anonymous (nobody) Summary: Extension to Calltips / Show attributes Initial Comment: The attached files (unified diff files) implement a (quick and dirty but usefull) extension to IDLE 0.8 (Python 2.2) - Tested on WINDOWS 95/98/NT/2000 - Similar to "CallTips" this extension shows (context sensitive) all available member functions and attributes of the current object after hitting the 'dot'-key. The toplevel help widget now supports scrolling. (Key- Up and Key-Down events) ...that is why I changed among else the first argument of 'showtip' from 'text string' to a 'list of text strings' ... The 'space'-key is used to insert the topmost item of the help widget into an IDLE text window. ...the even handling seems to be a critical part of the current IDLE implementation. That is why I added the new functionallity as a patch of CallTips.py and CallTipWindow.py. May be you still have a better implementation ... Greetings Martin Liebmann ---------------------------------------------------------------------- >Comment By: Martin Liebmann (mliebmann) Date: 2002-03-03 22:02 Message: Logged In: YES user_id=475133 '' must be substituted by '.' within CallTip.py ! ( Linux do not support an event named ) Running idle on Linux, I found the warning, that 'import *' is not allowed within function '_dir_main' of CallTip.py ??? Nevertheless CallTips works fine on Linux ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 From noreply@sourceforge.net Sun Mar 3 22:04:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 14:04:56 -0800 Subject: [Patches] [ python-Patches-525211 ] Utils.py imported module not used Message-ID: Patches item #525211, was opened at 2002-03-03 12:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Evelyn Mitchell (efm) Assigned to: Nobody/Anonymous (nobody) Summary: Utils.py imported module not used Initial Comment: pychecker complains of Utils.py:1: Imported module (re) not used in email/Utils.py ---------------------------------------------------------------------- >Comment By: Evelyn Mitchell (efm) Date: 2002-03-03 15:04 Message: Logged In: YES user_id=13263 Yeah, it's probably a pychecker bug. I'll submit it there. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-03 14:36 Message: Logged In: YES user_id=21627 It is used, in the line ecre = re.compile(r''' [...] ''', re.VERBOSE | re.IGNORECASE) pychecker bug? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470 From noreply@sourceforge.net Sun Mar 3 22:46:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 14:46:11 -0800 Subject: [Patches] [ python-Patches-525225 ] email Generator.py unused import Message-ID: Patches item #525225, was opened at 2002-03-03 15:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Evelyn Mitchell (efm) Assigned to: Barry Warsaw (bwarsaw) Summary: email Generator.py unused import Initial Comment: pychecker complains: Generator.py:15: Imported module (Message) not used Generator.py:16: Imported module (Errors) not used ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-03 17:46 Message: Logged In: YES user_id=12800 Accepted. I actually fixed this in email v1.1 (standalone), which has not yet been integrated into the Python trunk ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-03 16:39 Message: Logged In: YES user_id=21627 Barry, those are indeed unused. Ok to remove them? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525225&group_id=5470 From noreply@sourceforge.net Mon Mar 4 05:47:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 03 Mar 2002 21:47:55 -0800 Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL Message-ID: Patches item #524327, was opened at 2002-03-02 00:46 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Tino Lange (tinolange) Assigned to: Piers Lauder (pierslauder) Summary: imaplib.py and SSL Initial Comment: Hallo! Our company has decided to allow only SSL connections to the e-mailbox from outside. So I needed a SSL capable "imaplib.py" to run my mailwatcher-scripts from home. Thanks to the socket.ssl() in recent Pythons it was nearly no problem to derive an IMAP4_SSL-class from the existing IMAP4-class in Python's standard library. Maybe you want to look over the very small additions that were necessary to implement the IMAP-over-SSL- functionality and add it as a part of the next official "imaplib.py"? Here's the context diff from the most recent CVS version (1.43). It works fine for me this way and it's only a few straight-forward lines of code. Maybe I could contribute a bit to the Python project with this patch? Best regards Tino Lange ---------------------------------------------------------------------- >Comment By: Piers Lauder (pierslauder) Date: 2002-03-04 16:47 Message: Logged In: YES user_id=196212 This seems fine to me, but i can't test it as i don't have access to an ssl-enabled imapd. My only caveat is - do socket.ssl objects have a "sendall" method? - in which case that is what should be used in the send method. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 From noreply@sourceforge.net Mon Mar 4 09:01:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Mar 2002 01:01:54 -0800 Subject: [Patches] [ python-Patches-525211 ] Utils.py imported module not used Message-ID: Patches item #525211, was opened at 2002-03-03 20:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Evelyn Mitchell (efm) Assigned to: Nobody/Anonymous (nobody) Summary: Utils.py imported module not used Initial Comment: pychecker complains of Utils.py:1: Imported module (re) not used in email/Utils.py ---------------------------------------------------------------------- Comment By: Evelyn Mitchell (efm) Date: 2002-03-03 23:04 Message: Logged In: YES user_id=13263 Yeah, it's probably a pychecker bug. I'll submit it there. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-03 22:36 Message: Logged In: YES user_id=21627 It is used, in the line ecre = re.compile(r''' [...] ''', re.VERBOSE | re.IGNORECASE) pychecker bug? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525211&group_id=5470 From noreply@sourceforge.net Mon Mar 4 09:33:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Mar 2002 01:33:17 -0800 Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen() Message-ID: Patches item #523415, was opened at 2002-02-27 14:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Andy Gimblett (gimbo) Assigned to: Nobody/Anonymous (nobody) Summary: Explict proxies for urllib.urlopen() Initial Comment: This patch extends urllib.urlopen() so that proxies may be specified explicitly. This is achieved by adding an optional "proxies" parameter. If this parameter is omitted, urlopen() acts exactly as before, ie gets proxy settings from the environment. This is useful if you want to tell urlopen() not to use the proxy: just pass an empty dictionary. Also included is a patch to the urllib documentation explaining the new parameter. Apologies if patch format is not exactly as required: this is my first submission. All feedback appreciated. :-) ---------------------------------------------------------------------- >Comment By: Andy Gimblett (gimbo) Date: 2002-03-04 09:33 Message: Logged In: YES user_id=262849 Thanks for feedback re: diffs. Have now found out about context diffs and attached new version - hope this is better. Regarding the patch itself, this arose out of a newbie question on c.l.py and I was reminded that this was an issue I'd come across in my early days too. Personally I'd never picked up the hint that you should use FancyURLopener directly. If preferred, I could have a go at patching the docs to make that clearer? ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 03:34 Message: Logged In: YES user_id=250749 BTW, the patch guidelines indicate a strong preference for context diffs with unified diffs a poor second. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 03:32 Message: Logged In: YES user_id=250749 Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) directly. If this is the intent, the docs could be a little clearer about this. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 From noreply@sourceforge.net Mon Mar 4 09:41:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Mar 2002 01:41:31 -0800 Subject: [Patches] [ python-Patches-520694 ] arraymodule.c improvements Message-ID: Patches item #520694, was opened at 2002-02-20 23:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 Category: None Group: None >Status: Closed Resolution: Accepted Priority: 3 Submitted By: Jason Orendorff (jorend) Assigned to: Martin v. Lцwis (loewis) Summary: arraymodule.c improvements Initial Comment: This patch makes brings the array module a little more up-to-date. There are two changes: 1. Modernize the array type, memory management, and so forth. As a result, the array() builtin is no longer a function but a type. array.array is array.ArrayType. Also, it can now be subclassed in Python. 2. Add a new typecode 'u', for Unicode characters. The patch includes changes to test/test_array.py to test the new features. I would like to make a further change: add an arrayobject.h include file, and provide some array operations there, giving them names like PyArray_Check(), PyArray_GetItem(), and PyArray_GET_DATA(). Is such a change likely to find favor? ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-04 10:41 Message: Logged In: YES user_id=21627 Deleted tp_print, closing this patch. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-03-02 21:24 Message: Logged In: YES user_id=18139 Removing array's tp_print sounds good to me. (I did not notice this behavior because on Windows, type(sys.stdout) is not file so array_print wasn't being invoked.) ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-01 11:30 Message: Logged In: YES user_id=21627 Thanks again for the patches; committed as libarray.tex 1.32 test_array.py 1.14 NEWS 1.358 arraymodule.c 2.67 I added Py_USING_UNICODE before checking this in. There is one open issue: printing Unicode arrays on the interpreter prompt will still repr arrays as lists of Unicode objects; this is because arrays implement tp_print? Is that necessary? My proposal: just remove the tp_print implementation. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-03-01 08:25 Message: Logged In: YES user_id=18139 Documentation patch. Please check my TEX; I'm not used to it yet, and I can't get the Python docs to build on my Windows box, probably because one of the tools isn't installed properly, or something. So there's no way for me to check that it's correct, yet. (...If you let this sit for a moment I'll eventually check this for myself on the Linux box, but it'll be a little while. Thanks.) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-03-01 08:21 Message: Logged In: YES user_id=18139 Guido: In hindsight, yes it would have been much easier. ...This version adds __iadd__ and __imul__. There's also a separate documentation patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 23:46 Message: Logged In: YES user_id=6380 Cool. I wonder if it wouldn't have been easier to first submit and commit the easy changes, and then the unicode addition separately? Anyway, I presume that Martin will commit this when it's ready. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-27 04:15 Message: Logged In: YES user_id=18139 Getting there. This version has tounicode() and fromunicode(), and a better repr() for type 'u' arrays. Also, array.typecode and array.itemsize are now listed under tp_getset; they're attribute descriptors and they show up in help(array). (Neat!) Next, documentation; then __iadd__ and __imul__. But not tonight. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-25 13:24 Message: Logged In: YES user_id=21627 Removal of __members__ is fine, then - but you do need to fill out an appropriate tp_members instead, listing "typecode" and "itemsize". Adding __iadd__ and __imul__ is fine; the equivalent feature for lists has not caused complaints, either, and anybody using *= on an array probably would consider it a bug that it isn't in-place. Please add documentation changes as well; I currently have Doc/lib/libarray.tex \lineiii{'d'}{double}{8} +\lineiii{'u'}{Py_UNICODE}{2} \end{tableiii} Misc/NEWS - array.array is now a type object. A new format character 'u' indicates Py_UNICODE arrays. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-25 01:29 Message: Logged In: YES user_id=18139 Martin writes: "There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string." Boy, are you right. There should be array.tounicode() and array.fromunicode() methods that only work on type 'u' arrays. ...I also want to fix repr for type 'u' arrays. Instead of "array.array('u', [u'x', u'y', u'z'])" it should say "array.array('u', u'xyz')". ...I would also implement __iadd__ and __imul__ (as list implements them), but this would be a semantic change! Thoughts? Count on a new patch tomorrow. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-02-24 22:38 Message: Logged In: YES user_id=31435 Without looking at any details, __members__ and __methods__ are deprecated starting with 2.2; the type/class unification PEPs aim at moving the universe toward supporting and using the class-like introspection API instead. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-24 16:56 Message: Logged In: YES user_id=21627 There is a flaw in the extension of arrays to Unicode: There is no easy way to get back the Unicode string. You have to use u"".join(arr.tolist()) This is slightly annoying, since there is it is the only case where it is not possible to get back the original constructor arguments. Also, what is the rationale for removing __members__? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-22 14:39 Message: Logged In: YES user_id=38388 How about simplifying the whole setup altogether and add arrays as standard Python types (ie. put the code in Objects/ and add the new include file to Includes/). About the inter-module C API export: I'll write up a PEP about this which will hopefully result in a new standard support mechanism for this in Python. (BTW, the approach I used in _ssl/_socket does use PyCObjects) ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-22 14:25 Message: Logged In: YES user_id=21627 With the rationale given, I'm now in favour of all parts of the patch. As for exposing the API, you need to address MAL's concerns: PyArray_* won't be available to other extension modules, instead, you need to do expose them through a C object. However, I recommend *not* to follow the approach taken in socket/ssl; I agree with Tim's concerns here. Instead, the approach taken by cStringIO (via cStringIO.cStringIO_API) is much better (i.e. put the burden of using the API onto any importer, and out of Python proper). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-02-21 09:40 Message: Logged In: YES user_id=38388 About the Unicode bit: if "u" maps to Py_UNICODE I for one don't have any objections. The internal encoding is available in lots of places, so that argument doesn't count and I'm sure it can be put to some good use for fast manipulation of large Unicode strings. I very much like the new exposure of the type at C level; however I don't understand how you would use it without adding the complete module to the libpythonx.x.a (unless you add some sort of inter-module C API import mechanism like the one I added to _socket and _ssl) ?! ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 03:03 Message: Logged In: YES user_id=18139 > What is the rationale for expanding PyObject_VAR_HEAD? > It doesn't seem to achieve anything. It didn't make sense for array to be a VAR_HEAD type. VAR_HEAD types are variable-size: the last member defined in the struct for such a type is an array of length 1, and type->item_size is nonzero. See e.g. PyType_GenericAlloc(), and how it decides whether to call PyObject_INIT or PyObject_VAR_INIT: It checks type->item_size. The new arraymodule.c calls PyType_GenericAlloc; the old one didn't. So a change seemed warranted. Since Arraytype has item_size == 0, it seemed most consistent to make it a non-VAR type and initialize the ob_size field myself. I'm pretty sure I got the right interpretation of this; but if not, someone wiser in the ways of Python will speak up. :) (While I was looking at this, I noticed this: http://sourceforge.net/tracker/index.php? func=detail&aid=520768&group_id=5470&atid=305470) ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-21 02:15 Message: Logged In: YES user_id=18139 > I don't like the Unicode part of it at all. Well, I'm not attatched to it. It's very easy to subtract it from the patch. > What can you do with this feature? The same sort of thing you might do with an array of type 'c'. For example, change individual characters of a (Unicode) string and then run a (Unicode) re.match on it. > It seems to unfairly prefer a specific Unicode encoding, > without explaining what that encoding is, and without a > clear use case why this encoding is desirable. Well, why should array('h', '\x00\xff\xaa\xbb') be allowed? Why is that encoding preferable to any other particular encoding of short ints? Easy: it's the encoding of the C compiler where Python was built. For 'u' arrays, the encoding used is just the encoding that Python uses internally. However, it's not intended to be used in any situation where encode()/decode() would be appropriate. I never even thought about that possibility when I wrote it. The behavior of a 'u' array is intended to be more like this: Suppose A = array('u', ustr). Then: len(A) == len(ustr) A[0] == ustr[0] A[1] == ustr[1] ... That is, a 'u' array is an array of Unicode characters. Encoding is not an issue, any more than with the built-in unicode type. (If ustr is a non-Unicode string, then the behavior is different -- more in line with what 'b', 'h', 'i', and the others do.) If your concern is that Python currently "hides" its internal encoding, and the 'u' array exposes this unnecessarily, then consider these two examples that don't involve arrays: >>> x = u'\U00012345' # One Unicode codepoint... >>> len(x) 2 # hmm. >>> x[0] u'\ud808' # aha. UTF-16. >>> x[1] u'\udf45' >>> str(buffer(u'abc')) # Example two. 'a\x00b\x00c\x00' > It also seems to overlap with the Unicode object's > .encode method, which is much more general. Wow. Well, that wasn't my intent. It is intended, rather, to offer parity with 'c'. Java has byte[], short[], int[], long[], float[], double[], and char[]... Python doesn't currently have char[]. Shouldn't it? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-21 00:02 Message: Logged In: YES user_id=21627 What is the rationale for expanding PyObject_VAR_HEAD? It doesn't seem to achieve anything. I don't like the Unicode part of it at all. What can you do with this feature? It seems to unfairly prefer a specific Unicode encoding, without explaining what that encoding is, and without a clear use case why this encoding is desirable. It also seems to overlap with the Unicode object's .encode method, which is much more general. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470 From noreply@sourceforge.net Mon Mar 4 10:55:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Mar 2002 02:55:18 -0800 Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL Message-ID: Patches item #524327, was opened at 2002-03-01 14:46 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Tino Lange (tinolange) Assigned to: Piers Lauder (pierslauder) Summary: imaplib.py and SSL Initial Comment: Hallo! Our company has decided to allow only SSL connections to the e-mailbox from outside. So I needed a SSL capable "imaplib.py" to run my mailwatcher-scripts from home. Thanks to the socket.ssl() in recent Pythons it was nearly no problem to derive an IMAP4_SSL-class from the existing IMAP4-class in Python's standard library. Maybe you want to look over the very small additions that were necessary to implement the IMAP-over-SSL- functionality and add it as a part of the next official "imaplib.py"? Here's the context diff from the most recent CVS version (1.43). It works fine for me this way and it's only a few straight-forward lines of code. Maybe I could contribute a bit to the Python project with this patch? Best regards Tino Lange ---------------------------------------------------------------------- >Comment By: Tino Lange (tinolange) Date: 2002-03-04 11:55 Message: Logged In: YES user_id=212920 Hallo! socket.ssl() -Objects only have _two_ methods read() write() I don't know how they handle write() internally - whether they use a send() or a sendall() equivalent for the underlying socket call. I didn't look in the C sources for that. That's also why I had to code the readline() by hand in the while-loop, because socket.ssl() - Objects only have read(), no readline(). But the implementation works quite fine (by the way also under Windows after replacing the _socket.pyd with an SSL enabled one). Best regards Tino ---------------------------------------------------------------------- Comment By: Piers Lauder (pierslauder) Date: 2002-03-04 06:47 Message: Logged In: YES user_id=196212 This seems fine to me, but i can't test it as i don't have access to an ssl-enabled imapd. My only caveat is - do socket.ssl objects have a "sendall" method? - in which case that is what should be used in the send method. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 From noreply@sourceforge.net Mon Mar 4 14:50:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Mar 2002 06:50:34 -0800 Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores Message-ID: Patches item #525532, was opened at 2002-03-04 14:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Add support for POSIX semaphores Initial Comment: thread_pthread.h can be modified to use POSIX semaphores if available. This is more efficient than emulating them with mutexes and condition variables, and at least one platform that supports POSIX semaphores has a race condition in its condition variable support. The new file would still be supporting POSIX threads, although from both and , so perhaps ought to be renamed if this patch is accepted. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 From outros@kyky.zzn.com Mon Mar 4 16:22:51 2002 From: outros@kyky.zzn.com (Bordeaux Buffet) Date: Mon, 4 Mar 2002 13:22:51 -0300 Subject: [Patches] Nгo Compre... Alugue! Message-ID: Nгo Compre... Alugue!

:: Bordeaux Buffet ::
Aluguel de Materiais para festas.

Alugue todo o material para o seu evento!

Cadeiras - Mesas - Toalhas - Copos - Talheres - Pratos - Baixelas - Rechaud - Samovar - Estufa e muito mais!!!

Sгo mais de 1000 itens a sua escolha. Entregamos em todo o territуrio nacional!

Fornecemos gelo em cubo, barra e triturado.

www.bordeaux.com.br

Para retirar seu nome de nossa lista de e-mails, retorne este e-mail com o Subject ( Assunto ) = Remover
Esta mensagem й enviada de acordo com a nova legislaзгo sobre correio eletrфnico, Seзгo 301, Parбgrafo (a) (2) (c) Decreto S. 1618, Tнtulo Terceiro aprovado pelo "105 Congresso Base das Normativas Internacionais sobre o SPAM". Este E-mail nгo poderб ser considerado SPAM quando inclua uma forma de ser removido.
From noreply@sourceforge.net Mon Mar 4 22:55:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Mar 2002 14:55:28 -0800 Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL Message-ID: Patches item #524327, was opened at 2002-03-02 00:46 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Tino Lange (tinolange) Assigned to: Piers Lauder (pierslauder) Summary: imaplib.py and SSL Initial Comment: Hallo! Our company has decided to allow only SSL connections to the e-mailbox from outside. So I needed a SSL capable "imaplib.py" to run my mailwatcher-scripts from home. Thanks to the socket.ssl() in recent Pythons it was nearly no problem to derive an IMAP4_SSL-class from the existing IMAP4-class in Python's standard library. Maybe you want to look over the very small additions that were necessary to implement the IMAP-over-SSL- functionality and add it as a part of the next official "imaplib.py"? Here's the context diff from the most recent CVS version (1.43). It works fine for me this way and it's only a few straight-forward lines of code. Maybe I could contribute a bit to the Python project with this patch? Best regards Tino Lange ---------------------------------------------------------------------- >Comment By: Piers Lauder (pierslauder) Date: 2002-03-05 09:55 Message: Logged In: YES user_id=196212 Ok, (the boring bit :-) please provide a matching patch for the documentation (in dist/src/Doc/lib/libimaplib.tex), and I'll install both patches. Thanks Tino! ---------------------------------------------------------------------- Comment By: Tino Lange (tinolange) Date: 2002-03-04 21:55 Message: Logged In: YES user_id=212920 Hallo! socket.ssl() -Objects only have _two_ methods read() write() I don't know how they handle write() internally - whether they use a send() or a sendall() equivalent for the underlying socket call. I didn't look in the C sources for that. That's also why I had to code the readline() by hand in the while-loop, because socket.ssl() - Objects only have read(), no readline(). But the implementation works quite fine (by the way also under Windows after replacing the _socket.pyd with an SSL enabled one). Best regards Tino ---------------------------------------------------------------------- Comment By: Piers Lauder (pierslauder) Date: 2002-03-04 16:47 Message: Logged In: YES user_id=196212 This seems fine to me, but i can't test it as i don't have access to an ssl-enabled imapd. My only caveat is - do socket.ssl objects have a "sendall" method? - in which case that is what should be used in the send method. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 From noreply@sourceforge.net Tue Mar 5 02:59:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 04 Mar 2002 18:59:17 -0800 Subject: [Patches] [ python-Patches-525763 ] minor fix for regen on IRIX Message-ID: Patches item #525763, was opened at 2002-03-04 18:59 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Michael Pruett (mpruett) Assigned to: Nobody/Anonymous (nobody) Summary: minor fix for regen on IRIX Initial Comment: The Lib/plat-irix6/regen script does not catch IRIX 6 (only IRIX 4 and 5), and it doesn't handle systems which report themselves as running 'IRIX64' rather than just 'IRIX'. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470 From noreply@sourceforge.net Tue Mar 5 08:58:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 00:58:12 -0800 Subject: [Patches] [ python-Patches-525870 ] urllib2: duplicate call, stat attrs Message-ID: Patches item #525870, was opened at 2002-03-05 09:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525870&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2: duplicate call, stat attrs Initial Comment: This patch removes a duplicate call to os.stat in urllib2.FileHandler.open_local_file() In addition to that, it uses the new stat attributes, so importing stat is no longer neccessary. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525870&group_id=5470 From noreply@sourceforge.net Tue Mar 5 13:45:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 05:45:16 -0800 Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results Message-ID: Patches item #462296, was opened at 2001-09-17 17:57 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Nick Mathewson (nickm) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Add attributes to os.stat results Initial Comment: See bug #111481, and PEP 0042. Both suggest that the return values for os.{stat,lstat,statvfs,fstatvfs} ought to be struct-like objects rather than simple tuples. With this patch, the os module will modify the aformentioned functions so that their results still obey the previous tuple protocol, but now have read-only attributes as well. In other words, "os.stat('filename')[0]" is now synonymous with "os.stat('filename').st_mode. The patch also modifies test_os.py to test the new behavior. In order to prevent old code from breaking, these new return types extend tuple. They also use the new attribute descriptor interface. (Thanks for PEP-025[23], Guido!) Backward compatibility: Code will only break if it assumes that type(os.stat(...)) == TupleType, or if it assumes that os.stat(...) has no attributes beyond those defined in tuple. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-05 13:45 Message: Logged In: YES user_id=6656 I know this patch is closed, but it seems a vaguely sane place to ask the question: why do we vary the number of field of os.stat_result across platforms? Wouldn't it be better to let it always have the same values & fill in one's that don't exists locally with -1 or something? It's hard to pickle os.stat_results portably the way things are at the moment... ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-11-29 20:49 Message: Logged In: YES user_id=3066 This has been checked in, edited, and checked in again. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-18 22:53 Message: Logged In: YES user_id=499 Here's a documentation patch for libos.tex. I don't know the TeX macros well enough to write an analogous one for libtime.tex; fortunately, it should be fairly easy to extrapolate from the included patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 20:35 Message: Logged In: YES user_id=6380 Thanks, Nick! Good job. Checked in, just in time for 2.2b1. I'm passing this tracker entry on to Fred for documentation. (Fred, feel free to pester Nick for docs. Nick, feel free to upload approximate patches to Doc/libos.tex and Doc/libtime.tex. :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 19:24 Message: Logged In: YES user_id=6380 I'm looking at this now. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-03 13:55 Message: Logged In: YES user_id=6380 Patience, please. I'm behind reviewing this, probably won't have time today either. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2001-10-03 13:51 Message: Logged In: YES user_id=6656 If this goes in, I'd like to see it used for termios.tc {get,set}attr too. I could probably implement this (but not *right* now...). ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-02 01:56 Message: Logged In: YES user_id=499 The fifth all-C (!) version, with changes as suggested by Guido's comments via email. Big changes: This version no longer subclasses tuple. Instead, it creates a general-purpose mechanism for making struct/sequence hybrids in C. It now includes a patch for timemodule.c as well. Shortcomings: (1) As before, macmodule and riscosmodule aren't tested. (2) These new classes don't participate in GC and aren't subclassable. (Famous last words: "I don't think this will matter." :) ) (3) This isn't a brand-new metaclass; it's just a quick bit of C. As such, you can't use this mechanism to create new struct/tuple hybrids from Python. (I claim this isn't a drawback, since it's way easier to reimplement this in python than it is to make it accessible from python.) So, how's *this* one? ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 15:37 Message: Logged In: YES user_id=499 I've sent my email address to 'guido at python.org'. For reference, it's 'nickm at alum.mit.edu'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-01 14:09 Message: Logged In: YES user_id=6380 Nick, what's your real email? I have a bunch of feedback related to your use of the new type stuff -- this is uncharted territory for me too, and this SF box is too small to type comfortably. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 02:51 Message: Logged In: YES user_id=499 I think this might be the one... or at least, the next-to-last-one. This version of the patch: (1) moves the shared C code into a new module, "_stat", for internal use. (2) updates macmodule and riscosmodule to use the new code. (3) fixes a significant reference leak in previous versions. (4) is immune to the __new__ and __init__ bugs in previous versions. Things to note: (A) I've tried to make sure that my Mac/RISCOS code was correct, but I don't have any way to compile or test it. (B) I'm not sure my use of PyImport_ImportModule is legit. (C) I've allowed users to construct instances of stat_result with < or > 13 arguments. When this happens, attempts to get nonexistant attributes now raise AttributeError. (D) When dealing with Mac.xstat and RISCOS.stat, I chose to keep backward compatibility rather than enforcing the 10-tuple rule in the docs. Because there are new files, I can't make 'cvs diff' get everything. I'm uploading a zip file that contains _statmodule.c, _statmodule.h, and a unified diff. Please let me know if you'd prefer a different format. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 14:23 Message: Logged In: YES user_id=6380 Another comment: we should move this to its own file so that other os.stat() implementations (esp. MacOS, maybe RiscOS) that aren't in posixmodule.c can also use it, rather than having to maintain three separate versions of the code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 14:18 Message: Logged In: YES user_id=6380 One comment on the patch: beautiful use of the new type stuff, but there's something funky with the constructors going on. It seems that the built-in __new__ (inherited from the tuple class) requires exactly one argument -- a sequence to be tuplified -- but your __init__ requires 13 arguments. So construction by using posix.stat_result(...) always fails. It makes more sense to fix the init routine to require a 13-tuple as argument. I would also recommend overriding the tp_new slot to require a 13-tuple: right now, I can cause an easy core dump as follows: >>> import os >>> a = os.stat_result.__new__(os.stat_result, ()) >>> a.st_ctime Segmentation fault (core dumped) $ ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-28 04:20 Message: Logged In: YES user_id=499 I've fixed it with the suggestions you made, and also 1) Added docstrings 2) Fixed a nasty segfault bug that would be triggered by os.stat("/foo").__class__((10,)).st_size and added tests to keep it from reappearing. I'm not sure I know how to cover Mac and RISCOS properly: riscos.stat returns a 13-element tuple, and is hence already incompatible with posix.stat; whereas mac.{stat|xstat} return differing types. If somebody with experience with these modules could let give me guidance as to the Right Thing, I'll be happy to give it a shot... but my shot isn't likely to be half as good as somebody who knew the modules better. (For example, I don't have the facilities to compile macmodule or riscmodule at all, much less test them.) I'd also be glad to make any changes that would help maintainers of those modules. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-24 08:44 Message: Logged In: YES user_id=21627 The patch looks good to me. Are you willing to revise it one more time to cover all the stat implementations? A few comments on the implementation: - Why do you try to have your type participate in GC? they will never be part of a cycle. If that ever becomes an issue, you probably need to implement a traversal function as well. - I'd avoid declaring PosixStatResult, since the field declarations are misleading. Instead, you should just add the right number of additional in the type declaration. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 20:07 Message: Logged In: YES user_id=499 And here's an even better all-C version. (This one doesn't use a dictionary to store optional attributes.) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 18:01 Message: Logged In: YES user_id=499 Well, here's a posixmodule-only, all-C version. If this seems like a good approach, I'll add some better docstrings, move it into whichever module you like, and make riscosmodule.c and macmodule.c use it too. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-20 04:35 Message: Logged In: YES user_id=6380 Or you could put it in modsupport.c, which is already a grab-bag of handy stuff. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 18:36 Message: Logged In: YES user_id=21627 There aren't actually so many copies of the module, since posixmodule implements "posix","nt", and "os2". I found alternative implementations in riscosmodule and macmodule. Still, putting the support type into a shared C file is appropriate. I can think of two candidate places: tupleobject.c and fileobject.c. It may be actually worthwhile attempting to share the stat() implementations as well, but that could be an add-on. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-19 18:10 Message: Logged In: YES user_id=499 I'm becoming more and more convinced that doing it in C is the right thing, but I have issue with doing it in the posix module. The stat function is provided on (nearly?) all platforms, and doing it in C will require minor changes to all of these modules. We can probably live with this, but I don't think we should duplicate code between all of the os modules. Is there some other appropriate place to put it in C? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 06:52 Message: Logged In: YES user_id=21627 Using posix.stat is common, see http://groups.yahoo.com/group/python-list/message/4349 http://www.washington.edu/computing/training/125/mkdoc.html http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu for examples. None of these would break with your change, though, since they don't rely on the lenght of the tuple. If you are going to implement the type in C, I'd put it in the posix module. If you are going to implement it in Python (and only use it from the Posix module), making it general-purpose may be desirable. However, a number of things would need to be considered, so a PEP might be appropriate. If that is done, I'd propose an interface like tuple_with_attrs((value-tuple), (tuple-of-field-names), exposed-length-of-tuple)) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 21:11 Message: Logged In: YES user_id=499 Ah! Now I see. I hadn't realized that anybody used the posix module directly. (People really do this?) I'll try to write up a patch in C tonight or tomorrow morning. A couple of questions on which I could use advice: (1) Where is the proper place to put this kind of tuple-with-fields hybrid? Modules? Objects? In a new file or an existing one? (2) Should I try to make it general enough for non-stat use? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 07:54 Message: Logged In: YES user_id=21627 The problem with your second and third patch is that it includes an incompatibility for users of posix.stat (and friends), since it changes the siye of the tuple. If you want to continue to return a tuple (as the top-level data structure), you'll break compatibility for applications using the C module directly. An example of code that would be broken is mode, ino, dev, nlink, uid, gid, size, a, c, m = posix.stat(filename) To pass the additional fields, you already need your class _StatResult available in C. You may find a way to define it in Python and use it in C, but that has proven to be very fragile in the past. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-18 01:54 Message: Logged In: YES user_id=6380 Haven't had time to review the patch yet, but the idea of providing a structure with fields that doubles as a tuple is a good one. It's been tried before and can be done in pure Python as well. Regarding the field names: I think the field names should keep their st_ prefix -- IMO this makes the code more recognizable and hence readable. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 00:32 Message: Logged In: YES user_id=499 Here's the revised (*example only*) patch that takes the more portable approach I mention below. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 23:10 Message: Logged In: YES user_id=499 On further consideration, the approach taken in the second (*example only*) patch is indeed too fragile. The C code should not lengthen the tuple arbitrarily and depend on the Python code to decode it; instead, it should return a dictionary of extra fields. I think that this approach uses a minimum of C, is easily maintainable, and very extensible. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 22:53 Message: Logged In: YES user_id=499 Martin: I'm not entirely sure what you mean here; while my patch for extra fields requires a minor chunk of C (to access the struct fields), the rest still works in pure python. I'm attaching this second version for reference. I'm not sure it makes much sense to do this with pure C; it would certainly take a lot more code, with little benefit I can descern. But you're more experienced than I; what am I missing? I agree that the field naming is suboptimal; I was taking my lead from the stat and statvfs modules. If people prefer, we can name the fields whatever we like. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-17 22:24 Message: Logged In: YES user_id=21627 I second the request for supporting additional fields where available. At the same time, it appears unimplementable using pure Python. Consequently, I'd like to see this patch redone in C. The implementation strategy could probably remain the same, i.e. inherit from tuple for best compatibility; add the remaining fields as slots. It may be reasonable to implement attribute access using a custom getattr function, though. I have also my doubts about the naming of the fields. The st_ prefix originates from the time where struct fields were living in the global namespace (i.e. across different structures), so prefixing them for uniqueness was essential. I'm not sure whether we should inherit this into Python... ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 20:58 Message: Logged In: YES user_id=499 BTW, if this gets in, I have another patch that adds support for st_blksize, st_blocks, and st_rdev on platforms that support them. It don't expose these new fields in the tuple, as that would break all the old code that tries to unpack all the fields of the tuple. Instead, these fields are only accessible as attributes. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 From noreply@sourceforge.net Tue Mar 5 13:59:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 05:59:29 -0800 Subject: [Patches] [ python-Patches-525945 ] urllib: Defering open call for file urls Message-ID: Patches item #525945, was opened at 2002-03-05 14:59 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: urllib: Defering open call for file urls Initial Comment: This patch changes the handling of local files in urllib.urlopen() and urllib2.urlopen(). Opening the file is deferred until the first time read(), readline (), readlines() or fileno() is called. This makes it possible to retrieve the header information for all URLs via urlopen in a uniform way, without actually having to open the file. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470 From noreply@sourceforge.net Tue Mar 5 14:09:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 06:09:41 -0800 Subject: [Patches] [ python-Patches-525945 ] urllib: Defering open call for file urls Message-ID: Patches item #525945, was opened at 2002-03-05 08:59 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: urllib: Defering open call for file urls Initial Comment: This patch changes the handling of local files in urllib.urlopen() and urllib2.urlopen(). Opening the file is deferred until the first time read(), readline (), readlines() or fileno() is called. This makes it possible to retrieve the header information for all URLs via urlopen in a uniform way, without actually having to open the file. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-05 09:09 Message: Logged In: YES user_id=6380 I don't understand. Can you explain why you care about this? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470 From noreply@sourceforge.net Tue Mar 5 14:21:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 06:21:28 -0800 Subject: [Patches] [ python-Patches-525945 ] urllib: Defering open call for file urls Message-ID: Patches item #525945, was opened at 2002-03-05 14:59 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: urllib: Defering open call for file urls Initial Comment: This patch changes the handling of local files in urllib.urlopen() and urllib2.urlopen(). Opening the file is deferred until the first time read(), readline (), readlines() or fileno() is called. This makes it possible to retrieve the header information for all URLs via urlopen in a uniform way, without actually having to open the file. ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-05 15:21 Message: Logged In: YES user_id=89016 I'm currently writing a make in Python. This make should be able to handle not only local files, but remote files (http, ftp, etc.). One project might have several thousand targets, and some of them are remote. I want to be able to handle both types in a uniform way, i.e. via urllib/urllib2. This means, that I call urllib2.urlopen() to get the header information about the last modification date, but I don't want to open the file right away. Only when the data is required (because the source resource is newer than the target) should the file be read. And this might open the door to making streams that are returned from urlopen() writable (simply by using open (..., "wb") instead of open(..., "rb") when the first write is called. Another possibility might be using urllib.urlretrieve(), but the API is horrible (one global cleanup function) and not supported by urllib2. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-05 15:09 Message: Logged In: YES user_id=6380 I don't understand. Can you explain why you care about this? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525945&group_id=5470 From noreply@sourceforge.net Tue Mar 5 15:22:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 07:22:51 -0800 Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results Message-ID: Patches item #462296, was opened at 2001-09-17 19:57 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Nick Mathewson (nickm) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Add attributes to os.stat results Initial Comment: See bug #111481, and PEP 0042. Both suggest that the return values for os.{stat,lstat,statvfs,fstatvfs} ought to be struct-like objects rather than simple tuples. With this patch, the os module will modify the aformentioned functions so that their results still obey the previous tuple protocol, but now have read-only attributes as well. In other words, "os.stat('filename')[0]" is now synonymous with "os.stat('filename').st_mode. The patch also modifies test_os.py to test the new behavior. In order to prevent old code from breaking, these new return types extend tuple. They also use the new attribute descriptor interface. (Thanks for PEP-025[23], Guido!) Backward compatibility: Code will only break if it assumes that type(os.stat(...)) == TupleType, or if it assumes that os.stat(...) has no attributes beyond those defined in tuple. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 16:22 Message: Logged In: YES user_id=21627 Adding all fields is both difficult and undesirable. It is difficult because you may not know in advance what fields will be added in future versions, and it is undesirable because applications may think that there is a value even though the is none. What problem does that cause for pickling, and why would a complete list of all attributes solve this problem? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 14:45 Message: Logged In: YES user_id=6656 I know this patch is closed, but it seems a vaguely sane place to ask the question: why do we vary the number of field of os.stat_result across platforms? Wouldn't it be better to let it always have the same values & fill in one's that don't exists locally with -1 or something? It's hard to pickle os.stat_results portably the way things are at the moment... ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-11-29 21:49 Message: Logged In: YES user_id=3066 This has been checked in, edited, and checked in again. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-19 00:53 Message: Logged In: YES user_id=499 Here's a documentation patch for libos.tex. I don't know the TeX macros well enough to write an analogous one for libtime.tex; fortunately, it should be fairly easy to extrapolate from the included patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 22:35 Message: Logged In: YES user_id=6380 Thanks, Nick! Good job. Checked in, just in time for 2.2b1. I'm passing this tracker entry on to Fred for documentation. (Fred, feel free to pester Nick for docs. Nick, feel free to upload approximate patches to Doc/libos.tex and Doc/libtime.tex. :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 21:24 Message: Logged In: YES user_id=6380 I'm looking at this now. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-03 15:55 Message: Logged In: YES user_id=6380 Patience, please. I'm behind reviewing this, probably won't have time today either. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2001-10-03 15:51 Message: Logged In: YES user_id=6656 If this goes in, I'd like to see it used for termios.tc {get,set}attr too. I could probably implement this (but not *right* now...). ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-02 03:56 Message: Logged In: YES user_id=499 The fifth all-C (!) version, with changes as suggested by Guido's comments via email. Big changes: This version no longer subclasses tuple. Instead, it creates a general-purpose mechanism for making struct/sequence hybrids in C. It now includes a patch for timemodule.c as well. Shortcomings: (1) As before, macmodule and riscosmodule aren't tested. (2) These new classes don't participate in GC and aren't subclassable. (Famous last words: "I don't think this will matter." :) ) (3) This isn't a brand-new metaclass; it's just a quick bit of C. As such, you can't use this mechanism to create new struct/tuple hybrids from Python. (I claim this isn't a drawback, since it's way easier to reimplement this in python than it is to make it accessible from python.) So, how's *this* one? ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 17:37 Message: Logged In: YES user_id=499 I've sent my email address to 'guido at python.org'. For reference, it's 'nickm at alum.mit.edu'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-01 16:09 Message: Logged In: YES user_id=6380 Nick, what's your real email? I have a bunch of feedback related to your use of the new type stuff -- this is uncharted territory for me too, and this SF box is too small to type comfortably. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 04:51 Message: Logged In: YES user_id=499 I think this might be the one... or at least, the next-to-last-one. This version of the patch: (1) moves the shared C code into a new module, "_stat", for internal use. (2) updates macmodule and riscosmodule to use the new code. (3) fixes a significant reference leak in previous versions. (4) is immune to the __new__ and __init__ bugs in previous versions. Things to note: (A) I've tried to make sure that my Mac/RISCOS code was correct, but I don't have any way to compile or test it. (B) I'm not sure my use of PyImport_ImportModule is legit. (C) I've allowed users to construct instances of stat_result with < or > 13 arguments. When this happens, attempts to get nonexistant attributes now raise AttributeError. (D) When dealing with Mac.xstat and RISCOS.stat, I chose to keep backward compatibility rather than enforcing the 10-tuple rule in the docs. Because there are new files, I can't make 'cvs diff' get everything. I'm uploading a zip file that contains _statmodule.c, _statmodule.h, and a unified diff. Please let me know if you'd prefer a different format. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 16:23 Message: Logged In: YES user_id=6380 Another comment: we should move this to its own file so that other os.stat() implementations (esp. MacOS, maybe RiscOS) that aren't in posixmodule.c can also use it, rather than having to maintain three separate versions of the code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 16:18 Message: Logged In: YES user_id=6380 One comment on the patch: beautiful use of the new type stuff, but there's something funky with the constructors going on. It seems that the built-in __new__ (inherited from the tuple class) requires exactly one argument -- a sequence to be tuplified -- but your __init__ requires 13 arguments. So construction by using posix.stat_result(...) always fails. It makes more sense to fix the init routine to require a 13-tuple as argument. I would also recommend overriding the tp_new slot to require a 13-tuple: right now, I can cause an easy core dump as follows: >>> import os >>> a = os.stat_result.__new__(os.stat_result, ()) >>> a.st_ctime Segmentation fault (core dumped) $ ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-28 06:20 Message: Logged In: YES user_id=499 I've fixed it with the suggestions you made, and also 1) Added docstrings 2) Fixed a nasty segfault bug that would be triggered by os.stat("/foo").__class__((10,)).st_size and added tests to keep it from reappearing. I'm not sure I know how to cover Mac and RISCOS properly: riscos.stat returns a 13-element tuple, and is hence already incompatible with posix.stat; whereas mac.{stat|xstat} return differing types. If somebody with experience with these modules could let give me guidance as to the Right Thing, I'll be happy to give it a shot... but my shot isn't likely to be half as good as somebody who knew the modules better. (For example, I don't have the facilities to compile macmodule or riscmodule at all, much less test them.) I'd also be glad to make any changes that would help maintainers of those modules. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-24 10:44 Message: Logged In: YES user_id=21627 The patch looks good to me. Are you willing to revise it one more time to cover all the stat implementations? A few comments on the implementation: - Why do you try to have your type participate in GC? they will never be part of a cycle. If that ever becomes an issue, you probably need to implement a traversal function as well. - I'd avoid declaring PosixStatResult, since the field declarations are misleading. Instead, you should just add the right number of additional in the type declaration. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 22:07 Message: Logged In: YES user_id=499 And here's an even better all-C version. (This one doesn't use a dictionary to store optional attributes.) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 20:01 Message: Logged In: YES user_id=499 Well, here's a posixmodule-only, all-C version. If this seems like a good approach, I'll add some better docstrings, move it into whichever module you like, and make riscosmodule.c and macmodule.c use it too. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-20 06:35 Message: Logged In: YES user_id=6380 Or you could put it in modsupport.c, which is already a grab-bag of handy stuff. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 20:36 Message: Logged In: YES user_id=21627 There aren't actually so many copies of the module, since posixmodule implements "posix","nt", and "os2". I found alternative implementations in riscosmodule and macmodule. Still, putting the support type into a shared C file is appropriate. I can think of two candidate places: tupleobject.c and fileobject.c. It may be actually worthwhile attempting to share the stat() implementations as well, but that could be an add-on. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-19 20:10 Message: Logged In: YES user_id=499 I'm becoming more and more convinced that doing it in C is the right thing, but I have issue with doing it in the posix module. The stat function is provided on (nearly?) all platforms, and doing it in C will require minor changes to all of these modules. We can probably live with this, but I don't think we should duplicate code between all of the os modules. Is there some other appropriate place to put it in C? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 08:52 Message: Logged In: YES user_id=21627 Using posix.stat is common, see http://groups.yahoo.com/group/python-list/message/4349 http://www.washington.edu/computing/training/125/mkdoc.html http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu for examples. None of these would break with your change, though, since they don't rely on the lenght of the tuple. If you are going to implement the type in C, I'd put it in the posix module. If you are going to implement it in Python (and only use it from the Posix module), making it general-purpose may be desirable. However, a number of things would need to be considered, so a PEP might be appropriate. If that is done, I'd propose an interface like tuple_with_attrs((value-tuple), (tuple-of-field-names), exposed-length-of-tuple)) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 23:11 Message: Logged In: YES user_id=499 Ah! Now I see. I hadn't realized that anybody used the posix module directly. (People really do this?) I'll try to write up a patch in C tonight or tomorrow morning. A couple of questions on which I could use advice: (1) Where is the proper place to put this kind of tuple-with-fields hybrid? Modules? Objects? In a new file or an existing one? (2) Should I try to make it general enough for non-stat use? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 09:54 Message: Logged In: YES user_id=21627 The problem with your second and third patch is that it includes an incompatibility for users of posix.stat (and friends), since it changes the siye of the tuple. If you want to continue to return a tuple (as the top-level data structure), you'll break compatibility for applications using the C module directly. An example of code that would be broken is mode, ino, dev, nlink, uid, gid, size, a, c, m = posix.stat(filename) To pass the additional fields, you already need your class _StatResult available in C. You may find a way to define it in Python and use it in C, but that has proven to be very fragile in the past. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-18 03:54 Message: Logged In: YES user_id=6380 Haven't had time to review the patch yet, but the idea of providing a structure with fields that doubles as a tuple is a good one. It's been tried before and can be done in pure Python as well. Regarding the field names: I think the field names should keep their st_ prefix -- IMO this makes the code more recognizable and hence readable. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 02:32 Message: Logged In: YES user_id=499 Here's the revised (*example only*) patch that takes the more portable approach I mention below. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 01:10 Message: Logged In: YES user_id=499 On further consideration, the approach taken in the second (*example only*) patch is indeed too fragile. The C code should not lengthen the tuple arbitrarily and depend on the Python code to decode it; instead, it should return a dictionary of extra fields. I think that this approach uses a minimum of C, is easily maintainable, and very extensible. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 00:53 Message: Logged In: YES user_id=499 Martin: I'm not entirely sure what you mean here; while my patch for extra fields requires a minor chunk of C (to access the struct fields), the rest still works in pure python. I'm attaching this second version for reference. I'm not sure it makes much sense to do this with pure C; it would certainly take a lot more code, with little benefit I can descern. But you're more experienced than I; what am I missing? I agree that the field naming is suboptimal; I was taking my lead from the stat and statvfs modules. If people prefer, we can name the fields whatever we like. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 00:24 Message: Logged In: YES user_id=21627 I second the request for supporting additional fields where available. At the same time, it appears unimplementable using pure Python. Consequently, I'd like to see this patch redone in C. The implementation strategy could probably remain the same, i.e. inherit from tuple for best compatibility; add the remaining fields as slots. It may be reasonable to implement attribute access using a custom getattr function, though. I have also my doubts about the naming of the fields. The st_ prefix originates from the time where struct fields were living in the global namespace (i.e. across different structures), so prefixing them for uniqueness was essential. I'm not sure whether we should inherit this into Python... ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 22:58 Message: Logged In: YES user_id=499 BTW, if this gets in, I have another patch that adds support for st_blksize, st_blocks, and st_rdev on platforms that support them. It don't expose these new fields in the tuple, as that would break all the old code that tries to unpack all the fields of the tuple. Instead, these fields are only accessible as attributes. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 From noreply@sourceforge.net Tue Mar 5 15:50:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 07:50:59 -0800 Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results Message-ID: Patches item #462296, was opened at 2001-09-17 17:57 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Nick Mathewson (nickm) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Add attributes to os.stat results Initial Comment: See bug #111481, and PEP 0042. Both suggest that the return values for os.{stat,lstat,statvfs,fstatvfs} ought to be struct-like objects rather than simple tuples. With this patch, the os module will modify the aformentioned functions so that their results still obey the previous tuple protocol, but now have read-only attributes as well. In other words, "os.stat('filename')[0]" is now synonymous with "os.stat('filename').st_mode. The patch also modifies test_os.py to test the new behavior. In order to prevent old code from breaking, these new return types extend tuple. They also use the new attribute descriptor interface. (Thanks for PEP-025[23], Guido!) Backward compatibility: Code will only break if it assumes that type(os.stat(...)) == TupleType, or if it assumes that os.stat(...) has no attributes beyond those defined in tuple. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-05 15:50 Message: Logged In: YES user_id=6656 I'm not worried about cross version problems. The problem with pickling is that stat_results (as of today) get pickled as "os.stat_result" and a tuple of arguments. The number of arguments os.stat_result takes varies by platform (it seems to be 10 on this NT box, but it's 13 on the starship, f'ex). So if a stat_result gets pickled on the starship and shoved down a socket to an NT machine, it can't be unpickled. I don't know if this sort of thing ever happens, but I could see it being surprising & annoying if I ran into it. If os.stat_result took 13 arguments everywhere, this problem obviously wouldn't arise. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 15:22 Message: Logged In: YES user_id=21627 Adding all fields is both difficult and undesirable. It is difficult because you may not know in advance what fields will be added in future versions, and it is undesirable because applications may think that there is a value even though the is none. What problem does that cause for pickling, and why would a complete list of all attributes solve this problem? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 13:45 Message: Logged In: YES user_id=6656 I know this patch is closed, but it seems a vaguely sane place to ask the question: why do we vary the number of field of os.stat_result across platforms? Wouldn't it be better to let it always have the same values & fill in one's that don't exists locally with -1 or something? It's hard to pickle os.stat_results portably the way things are at the moment... ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-11-29 20:49 Message: Logged In: YES user_id=3066 This has been checked in, edited, and checked in again. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-18 22:53 Message: Logged In: YES user_id=499 Here's a documentation patch for libos.tex. I don't know the TeX macros well enough to write an analogous one for libtime.tex; fortunately, it should be fairly easy to extrapolate from the included patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 20:35 Message: Logged In: YES user_id=6380 Thanks, Nick! Good job. Checked in, just in time for 2.2b1. I'm passing this tracker entry on to Fred for documentation. (Fred, feel free to pester Nick for docs. Nick, feel free to upload approximate patches to Doc/libos.tex and Doc/libtime.tex. :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 19:24 Message: Logged In: YES user_id=6380 I'm looking at this now. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-03 13:55 Message: Logged In: YES user_id=6380 Patience, please. I'm behind reviewing this, probably won't have time today either. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2001-10-03 13:51 Message: Logged In: YES user_id=6656 If this goes in, I'd like to see it used for termios.tc {get,set}attr too. I could probably implement this (but not *right* now...). ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-02 01:56 Message: Logged In: YES user_id=499 The fifth all-C (!) version, with changes as suggested by Guido's comments via email. Big changes: This version no longer subclasses tuple. Instead, it creates a general-purpose mechanism for making struct/sequence hybrids in C. It now includes a patch for timemodule.c as well. Shortcomings: (1) As before, macmodule and riscosmodule aren't tested. (2) These new classes don't participate in GC and aren't subclassable. (Famous last words: "I don't think this will matter." :) ) (3) This isn't a brand-new metaclass; it's just a quick bit of C. As such, you can't use this mechanism to create new struct/tuple hybrids from Python. (I claim this isn't a drawback, since it's way easier to reimplement this in python than it is to make it accessible from python.) So, how's *this* one? ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 15:37 Message: Logged In: YES user_id=499 I've sent my email address to 'guido at python.org'. For reference, it's 'nickm at alum.mit.edu'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-01 14:09 Message: Logged In: YES user_id=6380 Nick, what's your real email? I have a bunch of feedback related to your use of the new type stuff -- this is uncharted territory for me too, and this SF box is too small to type comfortably. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 02:51 Message: Logged In: YES user_id=499 I think this might be the one... or at least, the next-to-last-one. This version of the patch: (1) moves the shared C code into a new module, "_stat", for internal use. (2) updates macmodule and riscosmodule to use the new code. (3) fixes a significant reference leak in previous versions. (4) is immune to the __new__ and __init__ bugs in previous versions. Things to note: (A) I've tried to make sure that my Mac/RISCOS code was correct, but I don't have any way to compile or test it. (B) I'm not sure my use of PyImport_ImportModule is legit. (C) I've allowed users to construct instances of stat_result with < or > 13 arguments. When this happens, attempts to get nonexistant attributes now raise AttributeError. (D) When dealing with Mac.xstat and RISCOS.stat, I chose to keep backward compatibility rather than enforcing the 10-tuple rule in the docs. Because there are new files, I can't make 'cvs diff' get everything. I'm uploading a zip file that contains _statmodule.c, _statmodule.h, and a unified diff. Please let me know if you'd prefer a different format. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 14:23 Message: Logged In: YES user_id=6380 Another comment: we should move this to its own file so that other os.stat() implementations (esp. MacOS, maybe RiscOS) that aren't in posixmodule.c can also use it, rather than having to maintain three separate versions of the code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 14:18 Message: Logged In: YES user_id=6380 One comment on the patch: beautiful use of the new type stuff, but there's something funky with the constructors going on. It seems that the built-in __new__ (inherited from the tuple class) requires exactly one argument -- a sequence to be tuplified -- but your __init__ requires 13 arguments. So construction by using posix.stat_result(...) always fails. It makes more sense to fix the init routine to require a 13-tuple as argument. I would also recommend overriding the tp_new slot to require a 13-tuple: right now, I can cause an easy core dump as follows: >>> import os >>> a = os.stat_result.__new__(os.stat_result, ()) >>> a.st_ctime Segmentation fault (core dumped) $ ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-28 04:20 Message: Logged In: YES user_id=499 I've fixed it with the suggestions you made, and also 1) Added docstrings 2) Fixed a nasty segfault bug that would be triggered by os.stat("/foo").__class__((10,)).st_size and added tests to keep it from reappearing. I'm not sure I know how to cover Mac and RISCOS properly: riscos.stat returns a 13-element tuple, and is hence already incompatible with posix.stat; whereas mac.{stat|xstat} return differing types. If somebody with experience with these modules could let give me guidance as to the Right Thing, I'll be happy to give it a shot... but my shot isn't likely to be half as good as somebody who knew the modules better. (For example, I don't have the facilities to compile macmodule or riscmodule at all, much less test them.) I'd also be glad to make any changes that would help maintainers of those modules. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-24 08:44 Message: Logged In: YES user_id=21627 The patch looks good to me. Are you willing to revise it one more time to cover all the stat implementations? A few comments on the implementation: - Why do you try to have your type participate in GC? they will never be part of a cycle. If that ever becomes an issue, you probably need to implement a traversal function as well. - I'd avoid declaring PosixStatResult, since the field declarations are misleading. Instead, you should just add the right number of additional in the type declaration. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 20:07 Message: Logged In: YES user_id=499 And here's an even better all-C version. (This one doesn't use a dictionary to store optional attributes.) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 18:01 Message: Logged In: YES user_id=499 Well, here's a posixmodule-only, all-C version. If this seems like a good approach, I'll add some better docstrings, move it into whichever module you like, and make riscosmodule.c and macmodule.c use it too. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-20 04:35 Message: Logged In: YES user_id=6380 Or you could put it in modsupport.c, which is already a grab-bag of handy stuff. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 18:36 Message: Logged In: YES user_id=21627 There aren't actually so many copies of the module, since posixmodule implements "posix","nt", and "os2". I found alternative implementations in riscosmodule and macmodule. Still, putting the support type into a shared C file is appropriate. I can think of two candidate places: tupleobject.c and fileobject.c. It may be actually worthwhile attempting to share the stat() implementations as well, but that could be an add-on. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-19 18:10 Message: Logged In: YES user_id=499 I'm becoming more and more convinced that doing it in C is the right thing, but I have issue with doing it in the posix module. The stat function is provided on (nearly?) all platforms, and doing it in C will require minor changes to all of these modules. We can probably live with this, but I don't think we should duplicate code between all of the os modules. Is there some other appropriate place to put it in C? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 06:52 Message: Logged In: YES user_id=21627 Using posix.stat is common, see http://groups.yahoo.com/group/python-list/message/4349 http://www.washington.edu/computing/training/125/mkdoc.html http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu for examples. None of these would break with your change, though, since they don't rely on the lenght of the tuple. If you are going to implement the type in C, I'd put it in the posix module. If you are going to implement it in Python (and only use it from the Posix module), making it general-purpose may be desirable. However, a number of things would need to be considered, so a PEP might be appropriate. If that is done, I'd propose an interface like tuple_with_attrs((value-tuple), (tuple-of-field-names), exposed-length-of-tuple)) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 21:11 Message: Logged In: YES user_id=499 Ah! Now I see. I hadn't realized that anybody used the posix module directly. (People really do this?) I'll try to write up a patch in C tonight or tomorrow morning. A couple of questions on which I could use advice: (1) Where is the proper place to put this kind of tuple-with-fields hybrid? Modules? Objects? In a new file or an existing one? (2) Should I try to make it general enough for non-stat use? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 07:54 Message: Logged In: YES user_id=21627 The problem with your second and third patch is that it includes an incompatibility for users of posix.stat (and friends), since it changes the siye of the tuple. If you want to continue to return a tuple (as the top-level data structure), you'll break compatibility for applications using the C module directly. An example of code that would be broken is mode, ino, dev, nlink, uid, gid, size, a, c, m = posix.stat(filename) To pass the additional fields, you already need your class _StatResult available in C. You may find a way to define it in Python and use it in C, but that has proven to be very fragile in the past. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-18 01:54 Message: Logged In: YES user_id=6380 Haven't had time to review the patch yet, but the idea of providing a structure with fields that doubles as a tuple is a good one. It's been tried before and can be done in pure Python as well. Regarding the field names: I think the field names should keep their st_ prefix -- IMO this makes the code more recognizable and hence readable. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 00:32 Message: Logged In: YES user_id=499 Here's the revised (*example only*) patch that takes the more portable approach I mention below. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 23:10 Message: Logged In: YES user_id=499 On further consideration, the approach taken in the second (*example only*) patch is indeed too fragile. The C code should not lengthen the tuple arbitrarily and depend on the Python code to decode it; instead, it should return a dictionary of extra fields. I think that this approach uses a minimum of C, is easily maintainable, and very extensible. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 22:53 Message: Logged In: YES user_id=499 Martin: I'm not entirely sure what you mean here; while my patch for extra fields requires a minor chunk of C (to access the struct fields), the rest still works in pure python. I'm attaching this second version for reference. I'm not sure it makes much sense to do this with pure C; it would certainly take a lot more code, with little benefit I can descern. But you're more experienced than I; what am I missing? I agree that the field naming is suboptimal; I was taking my lead from the stat and statvfs modules. If people prefer, we can name the fields whatever we like. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-17 22:24 Message: Logged In: YES user_id=21627 I second the request for supporting additional fields where available. At the same time, it appears unimplementable using pure Python. Consequently, I'd like to see this patch redone in C. The implementation strategy could probably remain the same, i.e. inherit from tuple for best compatibility; add the remaining fields as slots. It may be reasonable to implement attribute access using a custom getattr function, though. I have also my doubts about the naming of the fields. The st_ prefix originates from the time where struct fields were living in the global namespace (i.e. across different structures), so prefixing them for uniqueness was essential. I'm not sure whether we should inherit this into Python... ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 20:58 Message: Logged In: YES user_id=499 BTW, if this gets in, I have another patch that adds support for st_blksize, st_blocks, and st_rdev on platforms that support them. It don't expose these new fields in the tuple, as that would break all the old code that tries to unpack all the fields of the tuple. Instead, these fields are only accessible as attributes. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 From noreply@sourceforge.net Tue Mar 5 16:15:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 08:15:16 -0800 Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results Message-ID: Patches item #462296, was opened at 2001-09-17 19:57 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Nick Mathewson (nickm) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Add attributes to os.stat results Initial Comment: See bug #111481, and PEP 0042. Both suggest that the return values for os.{stat,lstat,statvfs,fstatvfs} ought to be struct-like objects rather than simple tuples. With this patch, the os module will modify the aformentioned functions so that their results still obey the previous tuple protocol, but now have read-only attributes as well. In other words, "os.stat('filename')[0]" is now synonymous with "os.stat('filename').st_mode. The patch also modifies test_os.py to test the new behavior. In order to prevent old code from breaking, these new return types extend tuple. They also use the new attribute descriptor interface. (Thanks for PEP-025[23], Guido!) Backward compatibility: Code will only break if it assumes that type(os.stat(...)) == TupleType, or if it assumes that os.stat(...) has no attributes beyond those defined in tuple. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 17:15 Message: Logged In: YES user_id=21627 To support pickling, I think structseq objects should implement a __reduce__ method, returning the type and a dictionary. The type's tp_new should accept dictionaries, and reconstruct the instance from the dictionary. Alternatively, copy_reg could grow support for stat_result, which seems desirable anyway, since os.stat returns a 'nt.stat_result' instance on Windows. Furthermore, fixing the number of arguments does not help at all in pickling; __reduce__ will return an argument tuple which includes the original object; in turn, pickle will recurse until the stack overflows. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 16:50 Message: Logged In: YES user_id=6656 I'm not worried about cross version problems. The problem with pickling is that stat_results (as of today) get pickled as "os.stat_result" and a tuple of arguments. The number of arguments os.stat_result takes varies by platform (it seems to be 10 on this NT box, but it's 13 on the starship, f'ex). So if a stat_result gets pickled on the starship and shoved down a socket to an NT machine, it can't be unpickled. I don't know if this sort of thing ever happens, but I could see it being surprising & annoying if I ran into it. If os.stat_result took 13 arguments everywhere, this problem obviously wouldn't arise. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 16:22 Message: Logged In: YES user_id=21627 Adding all fields is both difficult and undesirable. It is difficult because you may not know in advance what fields will be added in future versions, and it is undesirable because applications may think that there is a value even though the is none. What problem does that cause for pickling, and why would a complete list of all attributes solve this problem? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 14:45 Message: Logged In: YES user_id=6656 I know this patch is closed, but it seems a vaguely sane place to ask the question: why do we vary the number of field of os.stat_result across platforms? Wouldn't it be better to let it always have the same values & fill in one's that don't exists locally with -1 or something? It's hard to pickle os.stat_results portably the way things are at the moment... ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-11-29 21:49 Message: Logged In: YES user_id=3066 This has been checked in, edited, and checked in again. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-19 00:53 Message: Logged In: YES user_id=499 Here's a documentation patch for libos.tex. I don't know the TeX macros well enough to write an analogous one for libtime.tex; fortunately, it should be fairly easy to extrapolate from the included patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 22:35 Message: Logged In: YES user_id=6380 Thanks, Nick! Good job. Checked in, just in time for 2.2b1. I'm passing this tracker entry on to Fred for documentation. (Fred, feel free to pester Nick for docs. Nick, feel free to upload approximate patches to Doc/libos.tex and Doc/libtime.tex. :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 21:24 Message: Logged In: YES user_id=6380 I'm looking at this now. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-03 15:55 Message: Logged In: YES user_id=6380 Patience, please. I'm behind reviewing this, probably won't have time today either. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2001-10-03 15:51 Message: Logged In: YES user_id=6656 If this goes in, I'd like to see it used for termios.tc {get,set}attr too. I could probably implement this (but not *right* now...). ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-02 03:56 Message: Logged In: YES user_id=499 The fifth all-C (!) version, with changes as suggested by Guido's comments via email. Big changes: This version no longer subclasses tuple. Instead, it creates a general-purpose mechanism for making struct/sequence hybrids in C. It now includes a patch for timemodule.c as well. Shortcomings: (1) As before, macmodule and riscosmodule aren't tested. (2) These new classes don't participate in GC and aren't subclassable. (Famous last words: "I don't think this will matter." :) ) (3) This isn't a brand-new metaclass; it's just a quick bit of C. As such, you can't use this mechanism to create new struct/tuple hybrids from Python. (I claim this isn't a drawback, since it's way easier to reimplement this in python than it is to make it accessible from python.) So, how's *this* one? ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 17:37 Message: Logged In: YES user_id=499 I've sent my email address to 'guido at python.org'. For reference, it's 'nickm at alum.mit.edu'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-01 16:09 Message: Logged In: YES user_id=6380 Nick, what's your real email? I have a bunch of feedback related to your use of the new type stuff -- this is uncharted territory for me too, and this SF box is too small to type comfortably. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 04:51 Message: Logged In: YES user_id=499 I think this might be the one... or at least, the next-to-last-one. This version of the patch: (1) moves the shared C code into a new module, "_stat", for internal use. (2) updates macmodule and riscosmodule to use the new code. (3) fixes a significant reference leak in previous versions. (4) is immune to the __new__ and __init__ bugs in previous versions. Things to note: (A) I've tried to make sure that my Mac/RISCOS code was correct, but I don't have any way to compile or test it. (B) I'm not sure my use of PyImport_ImportModule is legit. (C) I've allowed users to construct instances of stat_result with < or > 13 arguments. When this happens, attempts to get nonexistant attributes now raise AttributeError. (D) When dealing with Mac.xstat and RISCOS.stat, I chose to keep backward compatibility rather than enforcing the 10-tuple rule in the docs. Because there are new files, I can't make 'cvs diff' get everything. I'm uploading a zip file that contains _statmodule.c, _statmodule.h, and a unified diff. Please let me know if you'd prefer a different format. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 16:23 Message: Logged In: YES user_id=6380 Another comment: we should move this to its own file so that other os.stat() implementations (esp. MacOS, maybe RiscOS) that aren't in posixmodule.c can also use it, rather than having to maintain three separate versions of the code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 16:18 Message: Logged In: YES user_id=6380 One comment on the patch: beautiful use of the new type stuff, but there's something funky with the constructors going on. It seems that the built-in __new__ (inherited from the tuple class) requires exactly one argument -- a sequence to be tuplified -- but your __init__ requires 13 arguments. So construction by using posix.stat_result(...) always fails. It makes more sense to fix the init routine to require a 13-tuple as argument. I would also recommend overriding the tp_new slot to require a 13-tuple: right now, I can cause an easy core dump as follows: >>> import os >>> a = os.stat_result.__new__(os.stat_result, ()) >>> a.st_ctime Segmentation fault (core dumped) $ ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-28 06:20 Message: Logged In: YES user_id=499 I've fixed it with the suggestions you made, and also 1) Added docstrings 2) Fixed a nasty segfault bug that would be triggered by os.stat("/foo").__class__((10,)).st_size and added tests to keep it from reappearing. I'm not sure I know how to cover Mac and RISCOS properly: riscos.stat returns a 13-element tuple, and is hence already incompatible with posix.stat; whereas mac.{stat|xstat} return differing types. If somebody with experience with these modules could let give me guidance as to the Right Thing, I'll be happy to give it a shot... but my shot isn't likely to be half as good as somebody who knew the modules better. (For example, I don't have the facilities to compile macmodule or riscmodule at all, much less test them.) I'd also be glad to make any changes that would help maintainers of those modules. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-24 10:44 Message: Logged In: YES user_id=21627 The patch looks good to me. Are you willing to revise it one more time to cover all the stat implementations? A few comments on the implementation: - Why do you try to have your type participate in GC? they will never be part of a cycle. If that ever becomes an issue, you probably need to implement a traversal function as well. - I'd avoid declaring PosixStatResult, since the field declarations are misleading. Instead, you should just add the right number of additional in the type declaration. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 22:07 Message: Logged In: YES user_id=499 And here's an even better all-C version. (This one doesn't use a dictionary to store optional attributes.) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 20:01 Message: Logged In: YES user_id=499 Well, here's a posixmodule-only, all-C version. If this seems like a good approach, I'll add some better docstrings, move it into whichever module you like, and make riscosmodule.c and macmodule.c use it too. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-20 06:35 Message: Logged In: YES user_id=6380 Or you could put it in modsupport.c, which is already a grab-bag of handy stuff. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 20:36 Message: Logged In: YES user_id=21627 There aren't actually so many copies of the module, since posixmodule implements "posix","nt", and "os2". I found alternative implementations in riscosmodule and macmodule. Still, putting the support type into a shared C file is appropriate. I can think of two candidate places: tupleobject.c and fileobject.c. It may be actually worthwhile attempting to share the stat() implementations as well, but that could be an add-on. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-19 20:10 Message: Logged In: YES user_id=499 I'm becoming more and more convinced that doing it in C is the right thing, but I have issue with doing it in the posix module. The stat function is provided on (nearly?) all platforms, and doing it in C will require minor changes to all of these modules. We can probably live with this, but I don't think we should duplicate code between all of the os modules. Is there some other appropriate place to put it in C? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 08:52 Message: Logged In: YES user_id=21627 Using posix.stat is common, see http://groups.yahoo.com/group/python-list/message/4349 http://www.washington.edu/computing/training/125/mkdoc.html http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu for examples. None of these would break with your change, though, since they don't rely on the lenght of the tuple. If you are going to implement the type in C, I'd put it in the posix module. If you are going to implement it in Python (and only use it from the Posix module), making it general-purpose may be desirable. However, a number of things would need to be considered, so a PEP might be appropriate. If that is done, I'd propose an interface like tuple_with_attrs((value-tuple), (tuple-of-field-names), exposed-length-of-tuple)) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 23:11 Message: Logged In: YES user_id=499 Ah! Now I see. I hadn't realized that anybody used the posix module directly. (People really do this?) I'll try to write up a patch in C tonight or tomorrow morning. A couple of questions on which I could use advice: (1) Where is the proper place to put this kind of tuple-with-fields hybrid? Modules? Objects? In a new file or an existing one? (2) Should I try to make it general enough for non-stat use? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 09:54 Message: Logged In: YES user_id=21627 The problem with your second and third patch is that it includes an incompatibility for users of posix.stat (and friends), since it changes the siye of the tuple. If you want to continue to return a tuple (as the top-level data structure), you'll break compatibility for applications using the C module directly. An example of code that would be broken is mode, ino, dev, nlink, uid, gid, size, a, c, m = posix.stat(filename) To pass the additional fields, you already need your class _StatResult available in C. You may find a way to define it in Python and use it in C, but that has proven to be very fragile in the past. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-18 03:54 Message: Logged In: YES user_id=6380 Haven't had time to review the patch yet, but the idea of providing a structure with fields that doubles as a tuple is a good one. It's been tried before and can be done in pure Python as well. Regarding the field names: I think the field names should keep their st_ prefix -- IMO this makes the code more recognizable and hence readable. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 02:32 Message: Logged In: YES user_id=499 Here's the revised (*example only*) patch that takes the more portable approach I mention below. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 01:10 Message: Logged In: YES user_id=499 On further consideration, the approach taken in the second (*example only*) patch is indeed too fragile. The C code should not lengthen the tuple arbitrarily and depend on the Python code to decode it; instead, it should return a dictionary of extra fields. I think that this approach uses a minimum of C, is easily maintainable, and very extensible. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 00:53 Message: Logged In: YES user_id=499 Martin: I'm not entirely sure what you mean here; while my patch for extra fields requires a minor chunk of C (to access the struct fields), the rest still works in pure python. I'm attaching this second version for reference. I'm not sure it makes much sense to do this with pure C; it would certainly take a lot more code, with little benefit I can descern. But you're more experienced than I; what am I missing? I agree that the field naming is suboptimal; I was taking my lead from the stat and statvfs modules. If people prefer, we can name the fields whatever we like. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 00:24 Message: Logged In: YES user_id=21627 I second the request for supporting additional fields where available. At the same time, it appears unimplementable using pure Python. Consequently, I'd like to see this patch redone in C. The implementation strategy could probably remain the same, i.e. inherit from tuple for best compatibility; add the remaining fields as slots. It may be reasonable to implement attribute access using a custom getattr function, though. I have also my doubts about the naming of the fields. The st_ prefix originates from the time where struct fields were living in the global namespace (i.e. across different structures), so prefixing them for uniqueness was essential. I'm not sure whether we should inherit this into Python... ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 22:58 Message: Logged In: YES user_id=499 BTW, if this gets in, I have another patch that adds support for st_blksize, st_blocks, and st_rdev on platforms that support them. It don't expose these new fields in the tuple, as that would break all the old code that tries to unpack all the fields of the tuple. Instead, these fields are only accessible as attributes. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 From noreply@sourceforge.net Tue Mar 5 16:32:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 08:32:53 -0800 Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results Message-ID: Patches item #462296, was opened at 2001-09-17 17:57 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Nick Mathewson (nickm) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Add attributes to os.stat results Initial Comment: See bug #111481, and PEP 0042. Both suggest that the return values for os.{stat,lstat,statvfs,fstatvfs} ought to be struct-like objects rather than simple tuples. With this patch, the os module will modify the aformentioned functions so that their results still obey the previous tuple protocol, but now have read-only attributes as well. In other words, "os.stat('filename')[0]" is now synonymous with "os.stat('filename').st_mode. The patch also modifies test_os.py to test the new behavior. In order to prevent old code from breaking, these new return types extend tuple. They also use the new attribute descriptor interface. (Thanks for PEP-025[23], Guido!) Backward compatibility: Code will only break if it assumes that type(os.stat(...)) == TupleType, or if it assumes that os.stat(...) has no attributes beyond those defined in tuple. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-05 16:32 Message: Logged In: YES user_id=6656 Martin, I may not have been 100% clear in my last note, but please run cvs up Objects/structseq.c structseq objects *do* now implement a __reduce__ method, but it returns a tuple. Using a dictionary would be more complicated, and not solve the issue completely: what happens when you go from a platform with less fields to one with more? What value does the not-prepared-for field have? Hmm, the point about nt.stat_result is a good one. Getting support into copy_reg.py leads to interesting bootstrapping problems when using uninstalled builds, unfortunately (site.py imports distutils imports re imports copy_reg; try to import, say, time, and you can't, because the whole reason to import distutils was to set up the path to find dynamically linked libraries...). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 16:15 Message: Logged In: YES user_id=21627 To support pickling, I think structseq objects should implement a __reduce__ method, returning the type and a dictionary. The type's tp_new should accept dictionaries, and reconstruct the instance from the dictionary. Alternatively, copy_reg could grow support for stat_result, which seems desirable anyway, since os.stat returns a 'nt.stat_result' instance on Windows. Furthermore, fixing the number of arguments does not help at all in pickling; __reduce__ will return an argument tuple which includes the original object; in turn, pickle will recurse until the stack overflows. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 15:50 Message: Logged In: YES user_id=6656 I'm not worried about cross version problems. The problem with pickling is that stat_results (as of today) get pickled as "os.stat_result" and a tuple of arguments. The number of arguments os.stat_result takes varies by platform (it seems to be 10 on this NT box, but it's 13 on the starship, f'ex). So if a stat_result gets pickled on the starship and shoved down a socket to an NT machine, it can't be unpickled. I don't know if this sort of thing ever happens, but I could see it being surprising & annoying if I ran into it. If os.stat_result took 13 arguments everywhere, this problem obviously wouldn't arise. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 15:22 Message: Logged In: YES user_id=21627 Adding all fields is both difficult and undesirable. It is difficult because you may not know in advance what fields will be added in future versions, and it is undesirable because applications may think that there is a value even though the is none. What problem does that cause for pickling, and why would a complete list of all attributes solve this problem? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 13:45 Message: Logged In: YES user_id=6656 I know this patch is closed, but it seems a vaguely sane place to ask the question: why do we vary the number of field of os.stat_result across platforms? Wouldn't it be better to let it always have the same values & fill in one's that don't exists locally with -1 or something? It's hard to pickle os.stat_results portably the way things are at the moment... ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-11-29 20:49 Message: Logged In: YES user_id=3066 This has been checked in, edited, and checked in again. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-18 22:53 Message: Logged In: YES user_id=499 Here's a documentation patch for libos.tex. I don't know the TeX macros well enough to write an analogous one for libtime.tex; fortunately, it should be fairly easy to extrapolate from the included patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 20:35 Message: Logged In: YES user_id=6380 Thanks, Nick! Good job. Checked in, just in time for 2.2b1. I'm passing this tracker entry on to Fred for documentation. (Fred, feel free to pester Nick for docs. Nick, feel free to upload approximate patches to Doc/libos.tex and Doc/libtime.tex. :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 19:24 Message: Logged In: YES user_id=6380 I'm looking at this now. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-03 13:55 Message: Logged In: YES user_id=6380 Patience, please. I'm behind reviewing this, probably won't have time today either. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2001-10-03 13:51 Message: Logged In: YES user_id=6656 If this goes in, I'd like to see it used for termios.tc {get,set}attr too. I could probably implement this (but not *right* now...). ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-02 01:56 Message: Logged In: YES user_id=499 The fifth all-C (!) version, with changes as suggested by Guido's comments via email. Big changes: This version no longer subclasses tuple. Instead, it creates a general-purpose mechanism for making struct/sequence hybrids in C. It now includes a patch for timemodule.c as well. Shortcomings: (1) As before, macmodule and riscosmodule aren't tested. (2) These new classes don't participate in GC and aren't subclassable. (Famous last words: "I don't think this will matter." :) ) (3) This isn't a brand-new metaclass; it's just a quick bit of C. As such, you can't use this mechanism to create new struct/tuple hybrids from Python. (I claim this isn't a drawback, since it's way easier to reimplement this in python than it is to make it accessible from python.) So, how's *this* one? ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 15:37 Message: Logged In: YES user_id=499 I've sent my email address to 'guido at python.org'. For reference, it's 'nickm at alum.mit.edu'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-01 14:09 Message: Logged In: YES user_id=6380 Nick, what's your real email? I have a bunch of feedback related to your use of the new type stuff -- this is uncharted territory for me too, and this SF box is too small to type comfortably. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 02:51 Message: Logged In: YES user_id=499 I think this might be the one... or at least, the next-to-last-one. This version of the patch: (1) moves the shared C code into a new module, "_stat", for internal use. (2) updates macmodule and riscosmodule to use the new code. (3) fixes a significant reference leak in previous versions. (4) is immune to the __new__ and __init__ bugs in previous versions. Things to note: (A) I've tried to make sure that my Mac/RISCOS code was correct, but I don't have any way to compile or test it. (B) I'm not sure my use of PyImport_ImportModule is legit. (C) I've allowed users to construct instances of stat_result with < or > 13 arguments. When this happens, attempts to get nonexistant attributes now raise AttributeError. (D) When dealing with Mac.xstat and RISCOS.stat, I chose to keep backward compatibility rather than enforcing the 10-tuple rule in the docs. Because there are new files, I can't make 'cvs diff' get everything. I'm uploading a zip file that contains _statmodule.c, _statmodule.h, and a unified diff. Please let me know if you'd prefer a different format. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 14:23 Message: Logged In: YES user_id=6380 Another comment: we should move this to its own file so that other os.stat() implementations (esp. MacOS, maybe RiscOS) that aren't in posixmodule.c can also use it, rather than having to maintain three separate versions of the code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 14:18 Message: Logged In: YES user_id=6380 One comment on the patch: beautiful use of the new type stuff, but there's something funky with the constructors going on. It seems that the built-in __new__ (inherited from the tuple class) requires exactly one argument -- a sequence to be tuplified -- but your __init__ requires 13 arguments. So construction by using posix.stat_result(...) always fails. It makes more sense to fix the init routine to require a 13-tuple as argument. I would also recommend overriding the tp_new slot to require a 13-tuple: right now, I can cause an easy core dump as follows: >>> import os >>> a = os.stat_result.__new__(os.stat_result, ()) >>> a.st_ctime Segmentation fault (core dumped) $ ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-28 04:20 Message: Logged In: YES user_id=499 I've fixed it with the suggestions you made, and also 1) Added docstrings 2) Fixed a nasty segfault bug that would be triggered by os.stat("/foo").__class__((10,)).st_size and added tests to keep it from reappearing. I'm not sure I know how to cover Mac and RISCOS properly: riscos.stat returns a 13-element tuple, and is hence already incompatible with posix.stat; whereas mac.{stat|xstat} return differing types. If somebody with experience with these modules could let give me guidance as to the Right Thing, I'll be happy to give it a shot... but my shot isn't likely to be half as good as somebody who knew the modules better. (For example, I don't have the facilities to compile macmodule or riscmodule at all, much less test them.) I'd also be glad to make any changes that would help maintainers of those modules. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-24 08:44 Message: Logged In: YES user_id=21627 The patch looks good to me. Are you willing to revise it one more time to cover all the stat implementations? A few comments on the implementation: - Why do you try to have your type participate in GC? they will never be part of a cycle. If that ever becomes an issue, you probably need to implement a traversal function as well. - I'd avoid declaring PosixStatResult, since the field declarations are misleading. Instead, you should just add the right number of additional in the type declaration. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 20:07 Message: Logged In: YES user_id=499 And here's an even better all-C version. (This one doesn't use a dictionary to store optional attributes.) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 18:01 Message: Logged In: YES user_id=499 Well, here's a posixmodule-only, all-C version. If this seems like a good approach, I'll add some better docstrings, move it into whichever module you like, and make riscosmodule.c and macmodule.c use it too. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-20 04:35 Message: Logged In: YES user_id=6380 Or you could put it in modsupport.c, which is already a grab-bag of handy stuff. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 18:36 Message: Logged In: YES user_id=21627 There aren't actually so many copies of the module, since posixmodule implements "posix","nt", and "os2". I found alternative implementations in riscosmodule and macmodule. Still, putting the support type into a shared C file is appropriate. I can think of two candidate places: tupleobject.c and fileobject.c. It may be actually worthwhile attempting to share the stat() implementations as well, but that could be an add-on. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-19 18:10 Message: Logged In: YES user_id=499 I'm becoming more and more convinced that doing it in C is the right thing, but I have issue with doing it in the posix module. The stat function is provided on (nearly?) all platforms, and doing it in C will require minor changes to all of these modules. We can probably live with this, but I don't think we should duplicate code between all of the os modules. Is there some other appropriate place to put it in C? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 06:52 Message: Logged In: YES user_id=21627 Using posix.stat is common, see http://groups.yahoo.com/group/python-list/message/4349 http://www.washington.edu/computing/training/125/mkdoc.html http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu for examples. None of these would break with your change, though, since they don't rely on the lenght of the tuple. If you are going to implement the type in C, I'd put it in the posix module. If you are going to implement it in Python (and only use it from the Posix module), making it general-purpose may be desirable. However, a number of things would need to be considered, so a PEP might be appropriate. If that is done, I'd propose an interface like tuple_with_attrs((value-tuple), (tuple-of-field-names), exposed-length-of-tuple)) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 21:11 Message: Logged In: YES user_id=499 Ah! Now I see. I hadn't realized that anybody used the posix module directly. (People really do this?) I'll try to write up a patch in C tonight or tomorrow morning. A couple of questions on which I could use advice: (1) Where is the proper place to put this kind of tuple-with-fields hybrid? Modules? Objects? In a new file or an existing one? (2) Should I try to make it general enough for non-stat use? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 07:54 Message: Logged In: YES user_id=21627 The problem with your second and third patch is that it includes an incompatibility for users of posix.stat (and friends), since it changes the siye of the tuple. If you want to continue to return a tuple (as the top-level data structure), you'll break compatibility for applications using the C module directly. An example of code that would be broken is mode, ino, dev, nlink, uid, gid, size, a, c, m = posix.stat(filename) To pass the additional fields, you already need your class _StatResult available in C. You may find a way to define it in Python and use it in C, but that has proven to be very fragile in the past. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-18 01:54 Message: Logged In: YES user_id=6380 Haven't had time to review the patch yet, but the idea of providing a structure with fields that doubles as a tuple is a good one. It's been tried before and can be done in pure Python as well. Regarding the field names: I think the field names should keep their st_ prefix -- IMO this makes the code more recognizable and hence readable. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 00:32 Message: Logged In: YES user_id=499 Here's the revised (*example only*) patch that takes the more portable approach I mention below. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 23:10 Message: Logged In: YES user_id=499 On further consideration, the approach taken in the second (*example only*) patch is indeed too fragile. The C code should not lengthen the tuple arbitrarily and depend on the Python code to decode it; instead, it should return a dictionary of extra fields. I think that this approach uses a minimum of C, is easily maintainable, and very extensible. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 22:53 Message: Logged In: YES user_id=499 Martin: I'm not entirely sure what you mean here; while my patch for extra fields requires a minor chunk of C (to access the struct fields), the rest still works in pure python. I'm attaching this second version for reference. I'm not sure it makes much sense to do this with pure C; it would certainly take a lot more code, with little benefit I can descern. But you're more experienced than I; what am I missing? I agree that the field naming is suboptimal; I was taking my lead from the stat and statvfs modules. If people prefer, we can name the fields whatever we like. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-17 22:24 Message: Logged In: YES user_id=21627 I second the request for supporting additional fields where available. At the same time, it appears unimplementable using pure Python. Consequently, I'd like to see this patch redone in C. The implementation strategy could probably remain the same, i.e. inherit from tuple for best compatibility; add the remaining fields as slots. It may be reasonable to implement attribute access using a custom getattr function, though. I have also my doubts about the naming of the fields. The st_ prefix originates from the time where struct fields were living in the global namespace (i.e. across different structures), so prefixing them for uniqueness was essential. I'm not sure whether we should inherit this into Python... ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 20:58 Message: Logged In: YES user_id=499 BTW, if this gets in, I have another patch that adds support for st_blksize, st_blocks, and st_rdev on platforms that support them. It don't expose these new fields in the tuple, as that would break all the old code that tries to unpack all the fields of the tuple. Instead, these fields are only accessible as attributes. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 From noreply@sourceforge.net Tue Mar 5 16:46:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 08:46:33 -0800 Subject: [Patches] [ python-Patches-462296 ] Add attributes to os.stat results Message-ID: Patches item #462296, was opened at 2001-09-17 19:57 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Nick Mathewson (nickm) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Add attributes to os.stat results Initial Comment: See bug #111481, and PEP 0042. Both suggest that the return values for os.{stat,lstat,statvfs,fstatvfs} ought to be struct-like objects rather than simple tuples. With this patch, the os module will modify the aformentioned functions so that their results still obey the previous tuple protocol, but now have read-only attributes as well. In other words, "os.stat('filename')[0]" is now synonymous with "os.stat('filename').st_mode. The patch also modifies test_os.py to test the new behavior. In order to prevent old code from breaking, these new return types extend tuple. They also use the new attribute descriptor interface. (Thanks for PEP-025[23], Guido!) Backward compatibility: Code will only break if it assumes that type(os.stat(...)) == TupleType, or if it assumes that os.stat(...) has no attributes beyond those defined in tuple. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 17:46 Message: Logged In: YES user_id=21627 I'd not put the copyreg support into copy_reg, but into os.py. Pickling would save a reference to os._load_stat_result (or some such). When pickle tries to restore the value, it would first restore os.load_stat_result. For that, it would import os, which would register the copy_reg support. As for constructing structseq objects from dictionaries: it would be a ValueError if fields within [:n_sequence_fields] are not filled out; leaving out other fields is fine. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 17:32 Message: Logged In: YES user_id=6656 Martin, I may not have been 100% clear in my last note, but please run cvs up Objects/structseq.c structseq objects *do* now implement a __reduce__ method, but it returns a tuple. Using a dictionary would be more complicated, and not solve the issue completely: what happens when you go from a platform with less fields to one with more? What value does the not-prepared-for field have? Hmm, the point about nt.stat_result is a good one. Getting support into copy_reg.py leads to interesting bootstrapping problems when using uninstalled builds, unfortunately (site.py imports distutils imports re imports copy_reg; try to import, say, time, and you can't, because the whole reason to import distutils was to set up the path to find dynamically linked libraries...). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 17:15 Message: Logged In: YES user_id=21627 To support pickling, I think structseq objects should implement a __reduce__ method, returning the type and a dictionary. The type's tp_new should accept dictionaries, and reconstruct the instance from the dictionary. Alternatively, copy_reg could grow support for stat_result, which seems desirable anyway, since os.stat returns a 'nt.stat_result' instance on Windows. Furthermore, fixing the number of arguments does not help at all in pickling; __reduce__ will return an argument tuple which includes the original object; in turn, pickle will recurse until the stack overflows. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 16:50 Message: Logged In: YES user_id=6656 I'm not worried about cross version problems. The problem with pickling is that stat_results (as of today) get pickled as "os.stat_result" and a tuple of arguments. The number of arguments os.stat_result takes varies by platform (it seems to be 10 on this NT box, but it's 13 on the starship, f'ex). So if a stat_result gets pickled on the starship and shoved down a socket to an NT machine, it can't be unpickled. I don't know if this sort of thing ever happens, but I could see it being surprising & annoying if I ran into it. If os.stat_result took 13 arguments everywhere, this problem obviously wouldn't arise. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-05 16:22 Message: Logged In: YES user_id=21627 Adding all fields is both difficult and undesirable. It is difficult because you may not know in advance what fields will be added in future versions, and it is undesirable because applications may think that there is a value even though the is none. What problem does that cause for pickling, and why would a complete list of all attributes solve this problem? ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-05 14:45 Message: Logged In: YES user_id=6656 I know this patch is closed, but it seems a vaguely sane place to ask the question: why do we vary the number of field of os.stat_result across platforms? Wouldn't it be better to let it always have the same values & fill in one's that don't exists locally with -1 or something? It's hard to pickle os.stat_results portably the way things are at the moment... ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-11-29 21:49 Message: Logged In: YES user_id=3066 This has been checked in, edited, and checked in again. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-19 00:53 Message: Logged In: YES user_id=499 Here's a documentation patch for libos.tex. I don't know the TeX macros well enough to write an analogous one for libtime.tex; fortunately, it should be fairly easy to extrapolate from the included patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 22:35 Message: Logged In: YES user_id=6380 Thanks, Nick! Good job. Checked in, just in time for 2.2b1. I'm passing this tracker entry on to Fred for documentation. (Fred, feel free to pester Nick for docs. Nick, feel free to upload approximate patches to Doc/libos.tex and Doc/libtime.tex. :-) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 21:24 Message: Logged In: YES user_id=6380 I'm looking at this now. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-03 15:55 Message: Logged In: YES user_id=6380 Patience, please. I'm behind reviewing this, probably won't have time today either. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2001-10-03 15:51 Message: Logged In: YES user_id=6656 If this goes in, I'd like to see it used for termios.tc {get,set}attr too. I could probably implement this (but not *right* now...). ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-02 03:56 Message: Logged In: YES user_id=499 The fifth all-C (!) version, with changes as suggested by Guido's comments via email. Big changes: This version no longer subclasses tuple. Instead, it creates a general-purpose mechanism for making struct/sequence hybrids in C. It now includes a patch for timemodule.c as well. Shortcomings: (1) As before, macmodule and riscosmodule aren't tested. (2) These new classes don't participate in GC and aren't subclassable. (Famous last words: "I don't think this will matter." :) ) (3) This isn't a brand-new metaclass; it's just a quick bit of C. As such, you can't use this mechanism to create new struct/tuple hybrids from Python. (I claim this isn't a drawback, since it's way easier to reimplement this in python than it is to make it accessible from python.) So, how's *this* one? ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 17:37 Message: Logged In: YES user_id=499 I've sent my email address to 'guido at python.org'. For reference, it's 'nickm at alum.mit.edu'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-01 16:09 Message: Logged In: YES user_id=6380 Nick, what's your real email? I have a bunch of feedback related to your use of the new type stuff -- this is uncharted territory for me too, and this SF box is too small to type comfortably. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-10-01 04:51 Message: Logged In: YES user_id=499 I think this might be the one... or at least, the next-to-last-one. This version of the patch: (1) moves the shared C code into a new module, "_stat", for internal use. (2) updates macmodule and riscosmodule to use the new code. (3) fixes a significant reference leak in previous versions. (4) is immune to the __new__ and __init__ bugs in previous versions. Things to note: (A) I've tried to make sure that my Mac/RISCOS code was correct, but I don't have any way to compile or test it. (B) I'm not sure my use of PyImport_ImportModule is legit. (C) I've allowed users to construct instances of stat_result with < or > 13 arguments. When this happens, attempts to get nonexistant attributes now raise AttributeError. (D) When dealing with Mac.xstat and RISCOS.stat, I chose to keep backward compatibility rather than enforcing the 10-tuple rule in the docs. Because there are new files, I can't make 'cvs diff' get everything. I'm uploading a zip file that contains _statmodule.c, _statmodule.h, and a unified diff. Please let me know if you'd prefer a different format. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 16:23 Message: Logged In: YES user_id=6380 Another comment: we should move this to its own file so that other os.stat() implementations (esp. MacOS, maybe RiscOS) that aren't in posixmodule.c can also use it, rather than having to maintain three separate versions of the code. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-28 16:18 Message: Logged In: YES user_id=6380 One comment on the patch: beautiful use of the new type stuff, but there's something funky with the constructors going on. It seems that the built-in __new__ (inherited from the tuple class) requires exactly one argument -- a sequence to be tuplified -- but your __init__ requires 13 arguments. So construction by using posix.stat_result(...) always fails. It makes more sense to fix the init routine to require a 13-tuple as argument. I would also recommend overriding the tp_new slot to require a 13-tuple: right now, I can cause an easy core dump as follows: >>> import os >>> a = os.stat_result.__new__(os.stat_result, ()) >>> a.st_ctime Segmentation fault (core dumped) $ ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-28 06:20 Message: Logged In: YES user_id=499 I've fixed it with the suggestions you made, and also 1) Added docstrings 2) Fixed a nasty segfault bug that would be triggered by os.stat("/foo").__class__((10,)).st_size and added tests to keep it from reappearing. I'm not sure I know how to cover Mac and RISCOS properly: riscos.stat returns a 13-element tuple, and is hence already incompatible with posix.stat; whereas mac.{stat|xstat} return differing types. If somebody with experience with these modules could let give me guidance as to the Right Thing, I'll be happy to give it a shot... but my shot isn't likely to be half as good as somebody who knew the modules better. (For example, I don't have the facilities to compile macmodule or riscmodule at all, much less test them.) I'd also be glad to make any changes that would help maintainers of those modules. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-24 10:44 Message: Logged In: YES user_id=21627 The patch looks good to me. Are you willing to revise it one more time to cover all the stat implementations? A few comments on the implementation: - Why do you try to have your type participate in GC? they will never be part of a cycle. If that ever becomes an issue, you probably need to implement a traversal function as well. - I'd avoid declaring PosixStatResult, since the field declarations are misleading. Instead, you should just add the right number of additional in the type declaration. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 22:07 Message: Logged In: YES user_id=499 And here's an even better all-C version. (This one doesn't use a dictionary to store optional attributes.) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-21 20:01 Message: Logged In: YES user_id=499 Well, here's a posixmodule-only, all-C version. If this seems like a good approach, I'll add some better docstrings, move it into whichever module you like, and make riscosmodule.c and macmodule.c use it too. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-20 06:35 Message: Logged In: YES user_id=6380 Or you could put it in modsupport.c, which is already a grab-bag of handy stuff. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 20:36 Message: Logged In: YES user_id=21627 There aren't actually so many copies of the module, since posixmodule implements "posix","nt", and "os2". I found alternative implementations in riscosmodule and macmodule. Still, putting the support type into a shared C file is appropriate. I can think of two candidate places: tupleobject.c and fileobject.c. It may be actually worthwhile attempting to share the stat() implementations as well, but that could be an add-on. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-19 20:10 Message: Logged In: YES user_id=499 I'm becoming more and more convinced that doing it in C is the right thing, but I have issue with doing it in the posix module. The stat function is provided on (nearly?) all platforms, and doing it in C will require minor changes to all of these modules. We can probably live with this, but I don't think we should duplicate code between all of the os modules. Is there some other appropriate place to put it in C? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 08:52 Message: Logged In: YES user_id=21627 Using posix.stat is common, see http://groups.yahoo.com/group/python-list/message/4349 http://www.washington.edu/computing/training/125/mkdoc.html http://groups.google.com/groups?th=7d7d118fed161e0&seekm=5qdjch%24dci%40nntp6.u.washington.edu for examples. None of these would break with your change, though, since they don't rely on the lenght of the tuple. If you are going to implement the type in C, I'd put it in the posix module. If you are going to implement it in Python (and only use it from the Posix module), making it general-purpose may be desirable. However, a number of things would need to be considered, so a PEP might be appropriate. If that is done, I'd propose an interface like tuple_with_attrs((value-tuple), (tuple-of-field-names), exposed-length-of-tuple)) ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 23:11 Message: Logged In: YES user_id=499 Ah! Now I see. I hadn't realized that anybody used the posix module directly. (People really do this?) I'll try to write up a patch in C tonight or tomorrow morning. A couple of questions on which I could use advice: (1) Where is the proper place to put this kind of tuple-with-fields hybrid? Modules? Objects? In a new file or an existing one? (2) Should I try to make it general enough for non-stat use? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 09:54 Message: Logged In: YES user_id=21627 The problem with your second and third patch is that it includes an incompatibility for users of posix.stat (and friends), since it changes the siye of the tuple. If you want to continue to return a tuple (as the top-level data structure), you'll break compatibility for applications using the C module directly. An example of code that would be broken is mode, ino, dev, nlink, uid, gid, size, a, c, m = posix.stat(filename) To pass the additional fields, you already need your class _StatResult available in C. You may find a way to define it in Python and use it in C, but that has proven to be very fragile in the past. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-18 03:54 Message: Logged In: YES user_id=6380 Haven't had time to review the patch yet, but the idea of providing a structure with fields that doubles as a tuple is a good one. It's been tried before and can be done in pure Python as well. Regarding the field names: I think the field names should keep their st_ prefix -- IMO this makes the code more recognizable and hence readable. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 02:32 Message: Logged In: YES user_id=499 Here's the revised (*example only*) patch that takes the more portable approach I mention below. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 01:10 Message: Logged In: YES user_id=499 On further consideration, the approach taken in the second (*example only*) patch is indeed too fragile. The C code should not lengthen the tuple arbitrarily and depend on the Python code to decode it; instead, it should return a dictionary of extra fields. I think that this approach uses a minimum of C, is easily maintainable, and very extensible. ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-18 00:53 Message: Logged In: YES user_id=499 Martin: I'm not entirely sure what you mean here; while my patch for extra fields requires a minor chunk of C (to access the struct fields), the rest still works in pure python. I'm attaching this second version for reference. I'm not sure it makes much sense to do this with pure C; it would certainly take a lot more code, with little benefit I can descern. But you're more experienced than I; what am I missing? I agree that the field naming is suboptimal; I was taking my lead from the stat and statvfs modules. If people prefer, we can name the fields whatever we like. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 00:24 Message: Logged In: YES user_id=21627 I second the request for supporting additional fields where available. At the same time, it appears unimplementable using pure Python. Consequently, I'd like to see this patch redone in C. The implementation strategy could probably remain the same, i.e. inherit from tuple for best compatibility; add the remaining fields as slots. It may be reasonable to implement attribute access using a custom getattr function, though. I have also my doubts about the naming of the fields. The st_ prefix originates from the time where struct fields were living in the global namespace (i.e. across different structures), so prefixing them for uniqueness was essential. I'm not sure whether we should inherit this into Python... ---------------------------------------------------------------------- Comment By: Nick Mathewson (nickm) Date: 2001-09-17 22:58 Message: Logged In: YES user_id=499 BTW, if this gets in, I have another patch that adds support for st_blksize, st_blocks, and st_rdev on platforms that support them. It don't expose these new fields in the tuple, as that would break all the old code that tries to unpack all the fields of the tuple. Instead, these fields are only accessible as attributes. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462296&group_id=5470 From noreply@sourceforge.net Tue Mar 5 16:49:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 08:49:15 -0800 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 13:43 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aдoцuьЯ".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 16:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 10:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 10:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-27 03:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-23 17:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gьrk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 11:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"дць".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-12 11:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gьrk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"дць".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 12:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gьrk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 20:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 17:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 15:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 13:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gьrk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 08:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 19:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 18:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 16:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 16:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 14:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Tue Mar 5 17:49:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 09:49:47 -0800 Subject: [Patches] [ python-Patches-415226 ] new base class for binary packaging Message-ID: Patches item #415226, was opened at 2001-04-10 19:51 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=415226&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Alexander (mwa) >Assigned to: M.-A. Lemburg (lemburg) Summary: new base class for binary packaging Initial Comment: bdist_packager.py provides an abstract base class for bdist commands. It provides easy access to all the PEP 241 metadata fields, plus "revision" for the package revision and installation scripts for preinstall, postinstall preremove, and postremove. That covers the base characteristics of all the package managers that I'm familiar with. If anyone can think of any others, let me know, otherwise additional extensions would be implemented in the specific packager's commands. I would, however, discourage _requiring_ any additional fields. It would be nice if by simply supplying the PEP241 metadata under the [bdist_packager] section all subclassed packagers worked with no further effort. It also has rudimentary relocation support by including a --no-autorelocate option. The bdist_packager is also where I see creating seperate binary packages for sub-packages supported. My need for that is much less than my desire for it right now, so I didn't give it much thought as I wrote it. I'd be delighted to hear any comments and suggestions on how to approach sub-packaging, though. ---------------------------------------------------------------------- Comment By: Mark Alexander (mwa) Date: 2001-10-02 21:10 Message: Logged In: YES user_id=12810 Regarding script code: The preinstall, postinstall, etc. scripts are hooked into the package manager specific subclasses. It's the responsibility of the specific class to "do the right thing". For *NIX package managers, this is usually script code, although changing the help text to be more informative isn't a problem. More specifically, using python scripts under pkgtool and sdux would fail. Install scripts are not executed, they're sourced (in some wierd fashion I've yet to identify). Theoretically, using a shell script to find the python interpreter by querying the package manager and calling it with either -i or a runtime created script should work fine. This is intended as a class for instantiating new bdist commands with full support for pep 241. Current bdist commands do their own thing, and they do it very differently. I'd rather see this put in as a migration path than shut down bdist commands that function just fine on their own. Eventual adoption of a standard abstract base would mean that module authors could provide all metadata in a standard format, and distutils would be able to create binary packages for systems the author doesn't have access to. This works for Solaris pkgtool and HP-UX SDUX. All three patches can be included with ZERO side effects on any other aspect of Distutils. I'm really kind of curious why they're not integrated yet so other's can try them out. ---------------------------------------------------------------------- Comment By: david arnold (dja) Date: 2001-09-20 09:08 Message: Logged In: YES user_id=78574 i recently struck a case where i wanted the ability to run a post-install script on Windows (from a bdist_wininst-produced package). while i agree with what seems to be the basic intention of this patch, wouldn't it be more useful to have the various scripts run by the Python interpreter, rather than Bourne shell (which is extremely seldom available on Windows, MacOS, etc) ? i went looking for the source of the .exe file embedded in the wininst command, but couldn't find it. does anyone know where it lives? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-06-07 05:33 Message: Logged In: YES user_id=21627 Shouldn't the patch also modify the existing bdist commands to use this as a base class? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=415226&group_id=5470 From noreply@sourceforge.net Tue Mar 5 19:00:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 05 Mar 2002 11:00:47 -0800 Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II Message-ID: Patches item #526072, was opened at 2002-03-05 19:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Martin v. Lцwis (loewis) Summary: pickling os.stat results round II Initial Comment: Following discussion in patch #462296, I've tried to implement what Martin suggested, i.e. 1) structseq's contructors now take an additional, optional second argument which should be a dictionary. If any of the "invisible" fields are not specified by the sequence first argument, their values are looked for in this dict (if not found, None is used). Extra keys are ignored. 2) structseq's __reduce__ methods return invisible fields in a dict. 3) I also fix the bug I just submitted, namely #526039. Martin, can you look the code over? I'm not sure it's maximally-sensibly written. WRT the finding-the-type-object issue: how about making os.stat_result.__name__ == "os.stat_result" rather than "posix.stat_result". ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 From noreply@sourceforge.net Wed Mar 6 11:04:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Mar 2002 03:04:19 -0800 Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II Message-ID: Patches item #526072, was opened at 2002-03-05 20:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Martin v. Lцwis (loewis) Summary: pickling os.stat results round II Initial Comment: Following discussion in patch #462296, I've tried to implement what Martin suggested, i.e. 1) structseq's contructors now take an additional, optional second argument which should be a dictionary. If any of the "invisible" fields are not specified by the sequence first argument, their values are looked for in this dict (if not found, None is used). Extra keys are ignored. 2) structseq's __reduce__ methods return invisible fields in a dict. 3) I also fix the bug I just submitted, namely #526039. Martin, can you look the code over? I'm not sure it's maximally-sensibly written. WRT the finding-the-type-object issue: how about making os.stat_result.__name__ == "os.stat_result" rather than "posix.stat_result". ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-06 12:04 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 From noreply@sourceforge.net Wed Mar 6 11:13:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Mar 2002 03:13:35 -0800 Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II Message-ID: Patches item #526072, was opened at 2002-03-05 19:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Martin v. Lцwis (loewis) Summary: pickling os.stat results round II Initial Comment: Following discussion in patch #462296, I've tried to implement what Martin suggested, i.e. 1) structseq's contructors now take an additional, optional second argument which should be a dictionary. If any of the "invisible" fields are not specified by the sequence first argument, their values are looked for in this dict (if not found, None is used). Extra keys are ignored. 2) structseq's __reduce__ methods return invisible fields in a dict. 3) I also fix the bug I just submitted, namely #526039. Martin, can you look the code over? I'm not sure it's maximally-sensibly written. WRT the finding-the-type-object issue: how about making os.stat_result.__name__ == "os.stat_result" rather than "posix.stat_result". ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-06 11:13 Message: Logged In: YES user_id=6656 Oops, how embarrassing. I don't think I can blame sf for this one -- I think I just forgot. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-06 11:04 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 From noreply@sourceforge.net Wed Mar 6 11:17:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Mar 2002 03:17:55 -0800 Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II Message-ID: Patches item #526072, was opened at 2002-03-05 19:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Martin v. Lцwis (loewis) Summary: pickling os.stat results round II Initial Comment: Following discussion in patch #462296, I've tried to implement what Martin suggested, i.e. 1) structseq's contructors now take an additional, optional second argument which should be a dictionary. If any of the "invisible" fields are not specified by the sequence first argument, their values are looked for in this dict (if not found, None is used). Extra keys are ignored. 2) structseq's __reduce__ methods return invisible fields in a dict. 3) I also fix the bug I just submitted, namely #526039. Martin, can you look the code over? I'm not sure it's maximally-sensibly written. WRT the finding-the-type-object issue: how about making os.stat_result.__name__ == "os.stat_result" rather than "posix.stat_result". ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-06 11:17 Message: Logged In: YES user_id=6656 I forgot a Py_DECREF. Look at the -pickle3.diff file. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-06 11:13 Message: Logged In: YES user_id=6656 Oops, how embarrassing. I don't think I can blame sf for this one -- I think I just forgot. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-06 11:04 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 From noreply@sourceforge.net Wed Mar 6 12:16:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Mar 2002 04:16:33 -0800 Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II Message-ID: Patches item #526072, was opened at 2002-03-05 20:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 Category: Core (C code) Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Michael Hudson (mwh) >Assigned to: Michael Hudson (mwh) Summary: pickling os.stat results round II Initial Comment: Following discussion in patch #462296, I've tried to implement what Martin suggested, i.e. 1) structseq's contructors now take an additional, optional second argument which should be a dictionary. If any of the "invisible" fields are not specified by the sequence first argument, their values are looked for in this dict (if not found, None is used). Extra keys are ignored. 2) structseq's __reduce__ methods return invisible fields in a dict. 3) I also fix the bug I just submitted, namely #526039. Martin, can you look the code over? I'm not sure it's maximally-sensibly written. WRT the finding-the-type-object issue: how about making os.stat_result.__name__ == "os.stat_result" rather than "posix.stat_result". ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-06 13:16 Message: Logged In: YES user_id=21627 The patch looks ok to me. Renaming the type to os.stat_result is one option; the other option is to add a function os._make_stat_result, and have __reduce__ return this (much like object.__reduce__ returns copy_reg._reduce). Chose whichever you like more. [the missing-upload text is a canned response; I didn't actually type all that :-] ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-06 12:17 Message: Logged In: YES user_id=6656 I forgot a Py_DECREF. Look at the -pickle3.diff file. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-06 12:13 Message: Logged In: YES user_id=6656 Oops, how embarrassing. I don't think I can blame sf for this one -- I think I just forgot. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-06 12:04 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 From noreply@sourceforge.net Wed Mar 6 17:03:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Mar 2002 09:03:13 -0800 Subject: [Patches] [ python-Patches-523944 ] imputil.py can't import "\r\n" .py files Message-ID: Patches item #523944, was opened at 2002-02-28 10:17 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Mitch Chapman (mitchchapman) >Assigned to: M.-A. Lemburg (lemburg) >Summary: imputil.py can't import "\r\n" .py files Initial Comment: __builtin__.compile() requires that codestring line endings consist of "\n". imputil._compile() does not enforce this. One result is that imputil may be unable to import modules created on Win32. The attached patch to the latest (CVS revision 1.23) imputil.py replaces both "\r\n" and "\r" with "\n" before passing a code string to __builtin__.compile(). This is consistent with the behavior of e.g. Lib/py_compile.py. ---------------------------------------------------------------------- >Comment By: Mitch Chapman (mitchchapman) Date: 2002-03-06 10:03 Message: Logged In: YES user_id=348188 Please pardon if it's inappropriate to assign patches to project developers. I'm doing so on the advice of a post by Skip Montanaro. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470 From noreply@sourceforge.net Wed Mar 6 17:14:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Mar 2002 09:14:11 -0800 Subject: [Patches] [ python-Patches-526072 ] pickling os.stat results round II Message-ID: Patches item #526072, was opened at 2002-03-05 19:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 Category: Core (C code) Group: None >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Michael Hudson (mwh) Assigned to: Michael Hudson (mwh) Summary: pickling os.stat results round II Initial Comment: Following discussion in patch #462296, I've tried to implement what Martin suggested, i.e. 1) structseq's contructors now take an additional, optional second argument which should be a dictionary. If any of the "invisible" fields are not specified by the sequence first argument, their values are looked for in this dict (if not found, None is used). Extra keys are ignored. 2) structseq's __reduce__ methods return invisible fields in a dict. 3) I also fix the bug I just submitted, namely #526039. Martin, can you look the code over? I'm not sure it's maximally-sensibly written. WRT the finding-the-type-object issue: how about making os.stat_result.__name__ == "os.stat_result" rather than "posix.stat_result". ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-06 17:14 Message: Logged In: YES user_id=6656 Checked in this patch as Objects/structseq.c revision 1.5. Custom pickle method for stat_results (and statvfs_results) in Lib/os.py revision 1.52 (used an approach roughly like your suggestion -- which isn't what object.__reduce__ does, I think). Tests in Lib/test/pickletester.py revision 1.14 I know about the canned response; I've been caught out by it on occasion but this time it was just me being dense. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-06 12:16 Message: Logged In: YES user_id=21627 The patch looks ok to me. Renaming the type to os.stat_result is one option; the other option is to add a function os._make_stat_result, and have __reduce__ return this (much like object.__reduce__ returns copy_reg._reduce). Chose whichever you like more. [the missing-upload text is a canned response; I didn't actually type all that :-] ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-06 11:17 Message: Logged In: YES user_id=6656 I forgot a Py_DECREF. Look at the -pickle3.diff file. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-06 11:13 Message: Logged In: YES user_id=6656 Oops, how embarrassing. I don't think I can blame sf for this one -- I think I just forgot. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-06 11:04 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526072&group_id=5470 From noreply@sourceforge.net Wed Mar 6 17:14:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Mar 2002 09:14:22 -0800 Subject: [Patches] [ python-Patches-523944 ] imputil.py can't import "\r\n" .py files Message-ID: Patches item #523944, was opened at 2002-02-28 17:17 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Mitch Chapman (mitchchapman) >Assigned to: Greg Stein (gstein) >Summary: imputil.py can't import "\r\n" .py files Initial Comment: __builtin__.compile() requires that codestring line endings consist of "\n". imputil._compile() does not enforce this. One result is that imputil may be unable to import modules created on Win32. The attached patch to the latest (CVS revision 1.23) imputil.py replaces both "\r\n" and "\r" with "\n" before passing a code string to __builtin__.compile(). This is consistent with the behavior of e.g. Lib/py_compile.py. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-06 17:14 Message: Logged In: YES user_id=38388 Assigning to Greg Stein -- imputil.py is his baby. ---------------------------------------------------------------------- Comment By: Mitch Chapman (mitchchapman) Date: 2002-03-06 17:03 Message: Logged In: YES user_id=348188 Please pardon if it's inappropriate to assign patches to project developers. I'm doing so on the advice of a post by Skip Montanaro. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470 From noreply@sourceforge.net Thu Mar 7 01:29:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 06 Mar 2002 17:29:47 -0800 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 15:43 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aдoцuьЯ".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-07 02:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 17:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 12:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 12:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-27 05:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-23 19:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gьrk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 13:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"дць".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-12 13:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gьrk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"дць".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 14:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gьrk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 22:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 19:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 17:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 15:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gьrk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 10:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 21:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 20:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 20:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 16:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Thu Mar 7 09:11:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 01:11:45 -0800 Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation Message-ID: Patches item #526840, was opened at 2002-03-07 09:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 Category: Parser/Compiler Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: PEP 263 Implementation Initial Comment: The attached patch implements PEP 263. The following differences to the PEP (rev. 1.8) are known: - The implementation interprets "ASCII compatible" as meaning "bytes below 128 always denote ASCII characters", although this property is only used for ",', and \. There have been other readings of "ASCII compatible", so this should probably be elaborated in the PEP. - The check whether all bytes follow the declared or system encoding (including comments and string literals) is only performed if the encoding is "ascii". ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 10:11 Message: Logged In: YES user_id=21627 A note on the implementation strategy: it turned out that communicating the encoding into the abstract syntax was the biggest challenge. To solve this, I introduced encoding_decl pseudo node: it is an unused non-terminal whose STR() is the encoding, and whose only child is the true root of the syntax tree. As such, it is the only non-terminal which has a STR value. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 From noreply@sourceforge.net Thu Mar 7 11:06:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 03:06:43 -0800 Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation Message-ID: Patches item #526840, was opened at 2002-03-07 08:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 Category: Parser/Compiler Group: None Status: Open Resolution: None >Priority: 7 Submitted By: Martin v. Lцwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: PEP 263 Implementation Initial Comment: The attached patch implements PEP 263. The following differences to the PEP (rev. 1.8) are known: - The implementation interprets "ASCII compatible" as meaning "bytes below 128 always denote ASCII characters", although this property is only used for ",', and \. There have been other readings of "ASCII compatible", so this should probably be elaborated in the PEP. - The check whether all bytes follow the declared or system encoding (including comments and string literals) is only performed if the encoding is "ascii". ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-07 11:06 Message: Logged In: YES user_id=38388 Thank you ! I'll add a note to the PEP about the way the first two lines are processed (removing the ASCII mention...). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 09:11 Message: Logged In: YES user_id=21627 A note on the implementation strategy: it turned out that communicating the encoding into the abstract syntax was the biggest challenge. To solve this, I introduced encoding_decl pseudo node: it is an unused non-terminal whose STR() is the encoding, and whose only child is the true root of the syntax tree. As such, it is the only non-terminal which has a STR value. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 From noreply@sourceforge.net Thu Mar 7 14:06:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 06:06:10 -0800 Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation Message-ID: Patches item #526840, was opened at 2002-03-07 03:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 Category: Parser/Compiler >Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Martin v. Lцwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: PEP 263 Implementation Initial Comment: The attached patch implements PEP 263. The following differences to the PEP (rev. 1.8) are known: - The implementation interprets "ASCII compatible" as meaning "bytes below 128 always denote ASCII characters", although this property is only used for ",', and \. There have been other readings of "ASCII compatible", so this should probably be elaborated in the PEP. - The check whether all bytes follow the declared or system encoding (including comments and string literals) is only performed if the encoding is "ascii". ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 09:06 Message: Logged In: YES user_id=6380 I've set the group to Python 2.3 so the priority has some context (I'd rather you move the priority down to 5 but I understand this is your personal priority). I haven't accepted the PEP yet (although I expect I will), so please don't check this in yet (if you feel it needs to be saved in CVS, use a branch). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-07 06:06 Message: Logged In: YES user_id=38388 Thank you ! I'll add a note to the PEP about the way the first two lines are processed (removing the ASCII mention...). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 04:11 Message: Logged In: YES user_id=21627 A note on the implementation strategy: it turned out that communicating the encoding into the abstract syntax was the biggest challenge. To solve this, I introduced encoding_decl pseudo node: it is an unused non-terminal whose STR() is the encoding, and whose only child is the true root of the syntax tree. As such, it is the only non-terminal which has a STR value. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 From noreply@sourceforge.net Thu Mar 7 16:45:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 08:45:13 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Thu Mar 7 18:01:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 10:01:05 -0800 Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation Message-ID: Patches item #526840, was opened at 2002-03-07 08:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Martin v. Lцwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: PEP 263 Implementation Initial Comment: The attached patch implements PEP 263. The following differences to the PEP (rev. 1.8) are known: - The implementation interprets "ASCII compatible" as meaning "bytes below 128 always denote ASCII characters", although this property is only used for ",', and \. There have been other readings of "ASCII compatible", so this should probably be elaborated in the PEP. - The check whether all bytes follow the declared or system encoding (including comments and string literals) is only performed if the encoding is "ascii". ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-07 18:01 Message: Logged In: YES user_id=38388 Ok, I've had a look at the patch. It looks good except for the overly complicated implementation of the unicode-escape codec. Even though there's a bit of code duplication, I'd prefer to have two separate functions here: one for the standard char* pointer type and another one for Py_UNICODE*, ie. PyUnicode_DecodeUnicodeEscape(char*...) and PyUnicode_DecodeUnicodeEscapeFromUnicode(Py_UNICODE*...) This is easier to support and gives better performance since the compiler can optimize the two functions making different assumptions. You'll also need to include a name mangling at the top of the header for the new API. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 14:06 Message: Logged In: YES user_id=6380 I've set the group to Python 2.3 so the priority has some context (I'd rather you move the priority down to 5 but I understand this is your personal priority). I haven't accepted the PEP yet (although I expect I will), so please don't check this in yet (if you feel it needs to be saved in CVS, use a branch). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-07 11:06 Message: Logged In: YES user_id=38388 Thank you ! I'll add a note to the PEP about the way the first two lines are processed (removing the ASCII mention...). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 09:11 Message: Logged In: YES user_id=21627 A note on the implementation strategy: it turned out that communicating the encoding into the abstract syntax was the biggest challenge. To solve this, I introduced encoding_decl pseudo node: it is an unused non-terminal whose STR() is the encoding, and whose only child is the true root of the syntax tree. As such, it is the only non-terminal which has a STR value. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 From noreply@sourceforge.net Thu Mar 7 18:24:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 10:24:43 -0800 Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation Message-ID: Patches item #526840, was opened at 2002-03-07 09:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Martin v. Lцwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: PEP 263 Implementation Initial Comment: The attached patch implements PEP 263. The following differences to the PEP (rev. 1.8) are known: - The implementation interprets "ASCII compatible" as meaning "bytes below 128 always denote ASCII characters", although this property is only used for ",', and \. There have been other readings of "ASCII compatible", so this should probably be elaborated in the PEP. - The check whether all bytes follow the declared or system encoding (including comments and string literals) is only performed if the encoding is "ascii". ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:24 Message: Logged In: YES user_id=21627 Changing the decoding functions will not result in one additional function, but in two of them: you'll also get PyUnicode_DecodeRawUnicodeEscapeFromUnicode. That seems quite unmaintainable to me: any change now needs to propagate into four functions. OTOH, I don't think that the code that allows parsing a variable-sized strings is overly complicated. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-07 19:01 Message: Logged In: YES user_id=38388 Ok, I've had a look at the patch. It looks good except for the overly complicated implementation of the unicode-escape codec. Even though there's a bit of code duplication, I'd prefer to have two separate functions here: one for the standard char* pointer type and another one for Py_UNICODE*, ie. PyUnicode_DecodeUnicodeEscape(char*...) and PyUnicode_DecodeUnicodeEscapeFromUnicode(Py_UNICODE*...) This is easier to support and gives better performance since the compiler can optimize the two functions making different assumptions. You'll also need to include a name mangling at the top of the header for the new API. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 15:06 Message: Logged In: YES user_id=6380 I've set the group to Python 2.3 so the priority has some context (I'd rather you move the priority down to 5 but I understand this is your personal priority). I haven't accepted the PEP yet (although I expect I will), so please don't check this in yet (if you feel it needs to be saved in CVS, use a branch). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-07 12:06 Message: Logged In: YES user_id=38388 Thank you ! I'll add a note to the PEP about the way the first two lines are processed (removing the ASCII mention...). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 10:11 Message: Logged In: YES user_id=21627 A note on the implementation strategy: it turned out that communicating the encoding into the abstract syntax was the biggest challenge. To solve this, I introduced encoding_decl pseudo node: it is an unused non-terminal whose STR() is the encoding, and whose only child is the true root of the syntax tree. As such, it is the only non-terminal which has a STR value. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 From noreply@sourceforge.net Thu Mar 7 18:41:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 10:41:21 -0800 Subject: [Patches] [ python-Patches-401022 ] Removal of SET_LINENO (experimental) Message-ID: Patches item #401022, was opened at 2000-07-30 23:08 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=401022&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Out of Date Priority: 5 Submitted By: Vladimir Marangozov (marangoz) Assigned to: Nobody/Anonymous (nobody) Summary: Removal of SET_LINENO (experimental) Initial Comment: ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-07 18:41 Message: Logged In: YES user_id=35752 I worked a bit on porting to this patch to 2.2+ CVS. I ran into a snag with generators. Generators save the instruction pointer (i.e. the bytecode offset) on yield. That makes the on-the-fly bytecode translation approach more complicated. Since Guido is going to redesign the whole VM it's probably not work spending any more effort on this. :-) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-11-27 21:54 Message: Logged In: YES user_id=31435 Unassigned again -- I'm not gonna get to this in this lifetime. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-09-10 18:51 Message: Logged In: YES user_id=6380 Tim wants to revisit this. It could be the quickest way to a 7% speedup in pystone that we can think of... ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2000-11-13 19:42 Message: Rejected. It's in the archives for reference, but for now, I don't think it's worth spending cycles worrying about this kind of stuff. I'll eventually redesign the entire VM. ---------------------------------------------------------------------- Comment By: Vladimir Marangozov (marangoz) Date: 2000-10-27 11:08 Message: Oops, the last patch update does not contain the f.f_lineno computation in frame_getattr. This is necessary, cf. the following messages: http://www.python.org/pipermail/python-dev/2000-July/014395.html http://www.python.org/pipermail/python-dev/2000-July/014401.html Patch assigned to Guido, for review or further assignment. ---------------------------------------------------------------------- Comment By: Vladimir Marangozov (marangoz) Date: 2000-10-26 00:42 Message: noreply@sourceforge.net wrote: > > Date: 2000-Oct-25 13:56 > By: gvanrossum > > Comment: > Vladimir, are you there? So-so :) I'm a moving target, checking my mail occasionally these days. Luckily, today is one of these days. > > The patch doesn't apply cleanly to the current CVS tree any more... Ah, this one's easy. Here's an update relative to 2.0 final, not CVS. I got some r/w access error trying to update my CVS copy from SF that I have no time to investigate right now... The Web interface still works though :) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2000-10-25 20:56 Message: Vladimir, are you there? The patch doesn't apply cleanly to the current CVS tree any more... ---------------------------------------------------------------------- Comment By: Vladimir Marangozov (marangoz) Date: 2000-08-03 19:22 Message: Fix missing DECREF on error condition in start_tracing() + some renaming. ---------------------------------------------------------------------- Comment By: Vladimir Marangozov (marangoz) Date: 2000-07-31 17:50 Message: A last tiny fix of the SET_LINENO opcode for better b/w compatibility. Stopping here and entering standby mode for reactions & feedback. PS: the last idea about not duplicating co_code and tweaking the original with CALL_TRACE is a bad one. I remember Guido being against it because co_code could be used elsewhere (copied, written to disk, whatever) and he's right! Better operate on an internal copy created in ceval. ---------------------------------------------------------------------- Comment By: Vladimir Marangozov (marangoz) Date: 2000-07-31 14:57 Message: Another rewrite, making this whole strategy b/w compatible according to the 1st incompatibility point a) described in: http://www.python.org/pipermail/python-dev/2000-July/014364.html Changes: 1. f.f_lineno is computed and updated on f_lineno attribute requests for f, given f.f_lasti. Correctness is ensured because f.f_lasti is updated on *all* attribute accesses (in LOAD_ATTR in the main loop). 2. The standard setup does not generate SET_LINENO, but uses co_lnotab for computing the source line number (e.g. tracebacks) This is equivalent to the actual "python -O". 3. With "python -d", we fall back to the current version of the interpreter (with SET_LINENO) thus making it easy to test whether this patch fully replaces SET_LINENO's behavior. (modulo f->f_lineno accesses from legacy C code, but this is insane). IMO, this version already worths the pain to be truly tested and improved. One improvement is to define a nicer public C API for breakpoints: - PyCode_SetBreakPointAtLine(line) - PyCode_SetBreakPointAtAddr(addr) or similar, which would install a CALL_TRACE opcode in the appropriate location of the copy of co_code. Another idea is to avoid duplicating the entire co_code just for storing the CALL_TRACE opcodes. We can store them in the original and keep a table of breakpoints. Setting the breakpoints would occur whenever the sys.settrace hook is set. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2000-07-31 13:40 Message: Status set to postponed to indicate that this is still experimental. ---------------------------------------------------------------------- Comment By: Vladimir Marangozov (marangoz) Date: 2000-07-31 01:16 Message: A nit: inline the argfetch in CALL_TRACE and goto the switch, instead of jumping to get_oparg which splits the sequence [fetch opcode, fetch oparg] -- this can slow things down. ---------------------------------------------------------------------- Comment By: Vladimir Marangozov (marangoz) Date: 2000-07-30 23:12 Message: For testing, as discussed on python-dev. For a gentle summary, see: http://www.python.org/pipermail/python-dev/2000-July/014364.html ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=401022&group_id=5470 From noreply@sourceforge.net Thu Mar 7 21:41:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 13:41:08 -0800 Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes Message-ID: Patches item #525109, was opened at 2002-03-03 11:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 Category: IDLE Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Martin Liebmann (mliebmann) Assigned to: Nobody/Anonymous (nobody) Summary: Extension to Calltips / Show attributes Initial Comment: The attached files (unified diff files) implement a (quick and dirty but usefull) extension to IDLE 0.8 (Python 2.2) - Tested on WINDOWS 95/98/NT/2000 - Similar to "CallTips" this extension shows (context sensitive) all available member functions and attributes of the current object after hitting the 'dot'-key. The toplevel help widget now supports scrolling. (Key- Up and Key-Down events) ...that is why I changed among else the first argument of 'showtip' from 'text string' to a 'list of text strings' ... The 'space'-key is used to insert the topmost item of the help widget into an IDLE text window. ...the even handling seems to be a critical part of the current IDLE implementation. That is why I added the new functionallity as a patch of CallTips.py and CallTipWindow.py. May be you still have a better implementation ... Greetings Martin Liebmann ---------------------------------------------------------------------- >Comment By: Martin Liebmann (mliebmann) Date: 2002-03-07 21:41 Message: Logged In: YES user_id=475133 Patched and more robust version of the extended files CallTips.py and CallTipWindows.py. (Now more compatible to earlier versions of python) ---------------------------------------------------------------------- Comment By: Martin Liebmann (mliebmann) Date: 2002-03-03 22:02 Message: Logged In: YES user_id=475133 '' must be substituted by '.' within CallTip.py ! ( Linux do not support an event named ) Running idle on Linux, I found the warning, that 'import *' is not allowed within function '_dir_main' of CallTip.py ??? Nevertheless CallTips works fine on Linux ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 From noreply@sourceforge.net Thu Mar 7 22:28:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 14:28:08 -0800 Subject: [Patches] [ python-Patches-524327 ] imaplib.py and SSL Message-ID: Patches item #524327, was opened at 2002-03-01 14:46 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Tino Lange (tinolange) Assigned to: Piers Lauder (pierslauder) Summary: imaplib.py and SSL Initial Comment: Hallo! Our company has decided to allow only SSL connections to the e-mailbox from outside. So I needed a SSL capable "imaplib.py" to run my mailwatcher-scripts from home. Thanks to the socket.ssl() in recent Pythons it was nearly no problem to derive an IMAP4_SSL-class from the existing IMAP4-class in Python's standard library. Maybe you want to look over the very small additions that were necessary to implement the IMAP-over-SSL- functionality and add it as a part of the next official "imaplib.py"? Here's the context diff from the most recent CVS version (1.43). It works fine for me this way and it's only a few straight-forward lines of code. Maybe I could contribute a bit to the Python project with this patch? Best regards Tino Lange ---------------------------------------------------------------------- >Comment By: Tino Lange (tinolange) Date: 2002-03-07 23:28 Message: Logged In: YES user_id=212920 Hi Piers! Here we are ... diffs attached. Best regards Tino ---------------------------------------------------------------------- Comment By: Piers Lauder (pierslauder) Date: 2002-03-04 23:55 Message: Logged In: YES user_id=196212 Ok, (the boring bit :-) please provide a matching patch for the documentation (in dist/src/Doc/lib/libimaplib.tex), and I'll install both patches. Thanks Tino! ---------------------------------------------------------------------- Comment By: Tino Lange (tinolange) Date: 2002-03-04 11:55 Message: Logged In: YES user_id=212920 Hallo! socket.ssl() -Objects only have _two_ methods read() write() I don't know how they handle write() internally - whether they use a send() or a sendall() equivalent for the underlying socket call. I didn't look in the C sources for that. That's also why I had to code the readline() by hand in the while-loop, because socket.ssl() - Objects only have read(), no readline(). But the implementation works quite fine (by the way also under Windows after replacing the _socket.pyd with an SSL enabled one). Best regards Tino ---------------------------------------------------------------------- Comment By: Piers Lauder (pierslauder) Date: 2002-03-04 06:47 Message: Logged In: YES user_id=196212 This seems fine to me, but i can't test it as i don't have access to an ssl-enabled imapd. My only caveat is - do socket.ssl objects have a "sendall" method? - in which case that is what should be used in the send method. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=524327&group_id=5470 From noreply@sourceforge.net Thu Mar 7 23:09:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 15:09:58 -0800 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 15:43 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aдoцuьЯ".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-08 00:09 Message: Logged In: YES user_id=89016 I'm think about extending the API a little bit: Consider the following example: >>> "\u1".decode("unicode-escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 2: truncated \uXXXX escape The error message is a lie: Not the '1' in position 2 is the problem, but the complete truncated sequence '\u1'. For this the decoder should pass a start and an end position to the handler. For encoding this would be useful too: Suppose I want to have an encoder that colors the unencodable character via an ANSI escape sequences. Then I could do the following: >>> import codecs >>> def color(enc, uni, pos, why, sta): ... return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1) ... >>> codecs.register_unicodeencodeerrorhandler("color", color) >>> u"aдьцo".encode("ascii", "color") 'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b [0mo' But here the sequences "\x1b[0m\x1b[1m" are not needed. To fix this problem the encoder could collect as many unencodable characters as possible and pass those to the error callback in one go (passing a start and end+1 position). This fixes the above problem and reduces the number of calls to the callback, so it should speed up the algorithms in case of custom encoding names. (And it makes the implementation very interesting ;)) What do you think? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-07 02:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 17:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 12:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 12:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-27 05:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-23 19:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gьrk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 13:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"дць".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-12 13:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gьrk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"дць".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 14:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gьrk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 22:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 19:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 17:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 15:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gьrk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 10:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 21:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 20:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 20:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 16:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Fri Mar 8 02:37:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 18:37:51 -0800 Subject: [Patches] [ python-Patches-502080 ] BaseHTTPServer send_error bug fix Message-ID: Patches item #502080, was opened at 2002-01-10 16:33 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502080&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Jonathan Gardner (jgardn) Assigned to: Skip Montanaro (montanaro) Summary: BaseHTTPServer send_error bug fix Initial Comment: BaseHTTPServer's send_error function didn't send "Content-Type: text/html". While this was okay for Mozilla 0.9.7, Konqueror 2.2.2 rendered it as plain text. I added one line to send the Content-Type and everything works great. A BETTER solution would be to figure out what kind of document the error message is, but that is left as an exercise for a beefier HTTP server, which is not what BaseHTTPServer is intended to be. ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-03-07 20:37 Message: Logged In: YES user_id=44345 I could hardly let the experiment fail! Checked in as BaseHTTPServer.py v 1.18 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 19:37 Message: Logged In: YES user_id=6380 Looks good to me. As an experiment, assigning to Skip, who can check it in or pass it on. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502080&group_id=5470 From noreply@sourceforge.net Fri Mar 8 04:44:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 07 Mar 2002 20:44:39 -0800 Subject: [Patches] [ python-Patches-512799 ] webchecker protocol bug Message-ID: Patches item #512799, was opened at 2002-02-04 10:40 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=512799&group_id=5470 Category: Demos and tools Group: None Status: Open Resolution: None Priority: 5 Submitted By: seb bacon (sebbacon) >Assigned to: A.M. Kuchling (akuchling) Summary: webchecker protocol bug Initial Comment: Tools/webchecker.py checks protocol of URLs and ignores redundant ones like mailto. However, urllib.splittype returns a tuple where the code expects a string, so it doesn't work. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=512799&group_id=5470 From noreply@sourceforge.net Fri Mar 8 10:22:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 02:22:06 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 16:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 10:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 18:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 17:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Fri Mar 8 11:09:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 03:09:16 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 12:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 11:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 11:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 18:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Fri Mar 8 13:14:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 05:14:04 -0800 Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582 Message-ID: Patches item #527371, was opened at 2002-03-08 04:14 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for sre bug 470582 Initial Comment: Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 From noreply@sourceforge.net Fri Mar 8 13:20:51 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 05:20:51 -0800 Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582 Message-ID: Patches item #527371, was opened at 2002-03-08 04:14 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for sre bug 470582 Initial Comment: Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. ---------------------------------------------------------------------- >Comment By: Greg Chapman (glchapman) Date: 2002-03-08 04:20 Message: Logged In: YES user_id=86307 I forgot: here's a patch for re_tests.py which adds the case from the bug report as a test. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 From noreply@sourceforge.net Fri Mar 8 13:29:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 05:29:11 -0800 Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582 Message-ID: Patches item #527371, was opened at 2002-03-08 08:14 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for sre bug 470582 Initial Comment: Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-08 08:29 Message: Logged In: YES user_id=33168 Confirmed that the test w/o fix fails and the test passes with the fix to _sre.c. But I'm not sure if the memset can go too far: memset(state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*)); I can try under purify, but that doesn't guarantee anything. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 08:20 Message: Logged In: YES user_id=86307 I forgot: here's a patch for re_tests.py which adds the case from the bug report as a test. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 From noreply@sourceforge.net Fri Mar 8 14:44:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 06:44:11 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 11:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 09:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 06:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 05:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 05:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 13:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 12:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Fri Mar 8 15:15:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 07:15:56 -0800 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 13:43 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed >Priority: 6 Submitted By: Walter Dцrwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aдoцuьЯ".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 15:15 Message: Logged In: YES user_id=38388 Sounds like a good idea. Please keep the encoder and decoder APIs symmetric, though, ie. add the slice information to both APIs. The slice should use the same format as Python's standard slices, that is left inclusive, right exclusive. I like the highlighting feature ! ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-07 23:09 Message: Logged In: YES user_id=89016 I'm think about extending the API a little bit: Consider the following example: >>> "\u1".decode("unicode-escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 2: truncated \uXXXX escape The error message is a lie: Not the '1' in position 2 is the problem, but the complete truncated sequence '\u1'. For this the decoder should pass a start and an end position to the handler. For encoding this would be useful too: Suppose I want to have an encoder that colors the unencodable character via an ANSI escape sequences. Then I could do the following: >>> import codecs >>> def color(enc, uni, pos, why, sta): ... return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1) ... >>> codecs.register_unicodeencodeerrorhandler("color", color) >>> u"aдьцo".encode("ascii", "color") 'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b [0mo' But here the sequences "\x1b[0m\x1b[1m" are not needed. To fix this problem the encoder could collect as many unencodable characters as possible and pass those to the error callback in one go (passing a start and end+1 position). This fixes the above problem and reduces the number of calls to the callback, so it should speed up the algorithms in case of custom encoding names. (And it makes the implementation very interesting ;)) What do you think? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-07 01:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 16:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 10:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 10:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-27 03:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-23 17:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gьrk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 11:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"дць".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-12 11:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gьrk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"дць".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 12:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gьrk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 20:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 17:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 15:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 13:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gьrk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 08:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 19:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 18:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 16:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 16:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 14:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Fri Mar 8 15:23:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 07:23:33 -0800 Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582 Message-ID: Patches item #527371, was opened at 2002-03-08 04:14 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for sre bug 470582 Initial Comment: Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. ---------------------------------------------------------------------- >Comment By: Greg Chapman (glchapman) Date: 2002-03-08 06:23 Message: Logged In: YES user_id=86307 I'm pretty sure the memset is correct; state->lastmark is the index of last mark written to (not the index of the next potential write). Also, it occurred to me that there is another related error here: >>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123') >>> m.groups() (None, None) >>> m.lastindex 2 In other words, lastindex claims that group 2 was the last that matched, even though it didn't really match. Since lastindex is undocumented, this probably doesn't matter too much. Still, it probably should be reset if it is pointing to a group which gets "unmatched" when state->lastmark is reduced. Perhaps a function like the following should be added for use in the three places where state->lastmark is reset to a previous value: void lastmark_restore(SRE_STATE *state, int lastmark) { assert(lastmark >= 0); if (state->lastmark > lastmark) { int lastvalidindex = (lastmark == 0) ? -1 : (lastmark-1)/2+1; if (state->lastindex > lastvalidindex) state->lastindex = lastvalidindex; memset( state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*) ); } state->lastmark = lastmark; } ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-08 04:29 Message: Logged In: YES user_id=33168 Confirmed that the test w/o fix fails and the test passes with the fix to _sre.c. But I'm not sure if the memset can go too far: memset(state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*)); I can try under purify, but that doesn't guarantee anything. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 04:20 Message: Logged In: YES user_id=86307 I forgot: here's a patch for re_tests.py which adds the case from the bug report as a test. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 From noreply@sourceforge.net Fri Mar 8 15:39:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 07:39:48 -0800 Subject: [Patches] [ python-Patches-527427 ] minidom fails to use NodeList sometimes Message-ID: Patches item #527427, was opened at 2002-03-08 12:39 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Cesar Eduardo Barros (cesarb) Assigned to: Nobody/Anonymous (nobody) Summary: minidom fails to use NodeList sometimes Initial Comment: (why is the summary box so small?) xml.dom.minidom doesn't use a NodeList as the return type of GetElementsByTagName{,NS} as it should. The patch (against 2.2 or HEAD) fixes it. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470 From noreply@sourceforge.net Fri Mar 8 17:31:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 09:31:11 -0800 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 15:43 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 6 Submitted By: Walter Dцrwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aдoцuьЯ".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-08 18:31 Message: Logged In: YES user_id=89016 What should replace do: Return u"?" or (end-start)*u"?" ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 16:15 Message: Logged In: YES user_id=38388 Sounds like a good idea. Please keep the encoder and decoder APIs symmetric, though, ie. add the slice information to both APIs. The slice should use the same format as Python's standard slices, that is left inclusive, right exclusive. I like the highlighting feature ! ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-08 00:09 Message: Logged In: YES user_id=89016 I'm think about extending the API a little bit: Consider the following example: >>> "\u1".decode("unicode-escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 2: truncated \uXXXX escape The error message is a lie: Not the '1' in position 2 is the problem, but the complete truncated sequence '\u1'. For this the decoder should pass a start and an end position to the handler. For encoding this would be useful too: Suppose I want to have an encoder that colors the unencodable character via an ANSI escape sequences. Then I could do the following: >>> import codecs >>> def color(enc, uni, pos, why, sta): ... return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1) ... >>> codecs.register_unicodeencodeerrorhandler("color", color) >>> u"aдьцo".encode("ascii", "color") 'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b [0mo' But here the sequences "\x1b[0m\x1b[1m" are not needed. To fix this problem the encoder could collect as many unencodable characters as possible and pass those to the error callback in one go (passing a start and end+1 position). This fixes the above problem and reduces the number of calls to the callback, so it should speed up the algorithms in case of custom encoding names. (And it makes the implementation very interesting ;)) What do you think? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-07 02:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 17:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 12:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 12:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-27 05:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-23 19:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gьrk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 13:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"дць".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-12 13:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gьrk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"дць".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 14:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gьrk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 22:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 19:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 17:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 15:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gьrk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 10:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 21:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 20:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 20:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 16:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Fri Mar 8 18:28:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 08 Mar 2002 10:28:06 -0800 Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582 Message-ID: Patches item #527371, was opened at 2002-03-08 08:14 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) >Assigned to: Fredrik Lundh (effbot) Summary: Fix for sre bug 470582 Initial Comment: Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-08 13:28 Message: Logged In: YES user_id=31435 Assigned to /F -- he's the expert here. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 10:23 Message: Logged In: YES user_id=86307 I'm pretty sure the memset is correct; state->lastmark is the index of last mark written to (not the index of the next potential write). Also, it occurred to me that there is another related error here: >>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123') >>> m.groups() (None, None) >>> m.lastindex 2 In other words, lastindex claims that group 2 was the last that matched, even though it didn't really match. Since lastindex is undocumented, this probably doesn't matter too much. Still, it probably should be reset if it is pointing to a group which gets "unmatched" when state->lastmark is reduced. Perhaps a function like the following should be added for use in the three places where state->lastmark is reset to a previous value: void lastmark_restore(SRE_STATE *state, int lastmark) { assert(lastmark >= 0); if (state->lastmark > lastmark) { int lastvalidindex = (lastmark == 0) ? -1 : (lastmark-1)/2+1; if (state->lastindex > lastvalidindex) state->lastindex = lastvalidindex; memset( state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*) ); } state->lastmark = lastmark; } ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-08 08:29 Message: Logged In: YES user_id=33168 Confirmed that the test w/o fix fails and the test passes with the fix to _sre.c. But I'm not sure if the memset can go too far: memset(state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*)); I can try under purify, but that doesn't guarantee anything. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 08:20 Message: Logged In: YES user_id=86307 I forgot: here's a patch for re_tests.py which adds the case from the bug report as a test. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 From noreply@sourceforge.net Sat Mar 9 10:08:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 02:08:15 -0800 Subject: [Patches] [ python-Patches-500136 ] Update ext build documentation Message-ID: Patches item #500136, was opened at 2002-01-06 14:27 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500136&group_id=5470 Category: None Group: None >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: Martin v. Lцwis (loewis) Summary: Update ext build documentation Initial Comment: The attached file documents how extensions are build using distutils. It is intended to replace atleast unix.tex, and possible also windows.tex. Fred, if this is ok, I would like to check it in as ext/building.tex, and remove ext/unix.tex. I would then add a comment on top of windows.tex that the build procedure using distutils should work out of the box on Windows as well. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 11:08 Message: Logged In: YES user_id=21627 Committed as building.tex 1.1, ext.tex 1.105, windows.tex 1.4, deleting unix.tex. I left windows.tex, since the technology described in this section continues to work; I only added a note that developers shouod consider distutils instead. ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-03-08 04:36 Message: Logged In: YES user_id=3066 Please check this & and close the bugs this fixes. Thanks, and sorry for the delay in looking at this. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-01-23 09:34 Message: Logged In: YES user_id=44345 This looks good. I had written a short replacement file called build.tex and was going to submit it this morning, but yours looks better. Presuming Distutils gets rid of the need for Windows-specific build solutions, I agree both unix.tex and windows.tex should be replaced. One phrase didn't make sense to me. Near the top it says (known as related to Makefile.pre.in, and Setup files) I don't know what that means. I would just zap any reference to the old build method. Skip ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-01-23 08:59 Message: Logged In: YES user_id=21627 This fixes bugs #497695, #500115, #506545, ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500136&group_id=5470 From noreply@sourceforge.net Sat Mar 9 10:47:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 02:47:35 -0800 Subject: [Patches] [ python-Patches-403972 ] threaded profiler. Message-ID: Patches item #403972, was opened at 2001-02-23 16:21 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403972&group_id=5470 Category: Demos and tools Group: None Status: Open Resolution: None Priority: 5 Submitted By: Amila Fernando (amila) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: threaded profiler. Initial Comment: Basically a profiler that can handle threaded programs and generate profiling snapshots. It does however have some situations it cannot handle well . (see included README for details). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 11:47 Message: Logged In: YES user_id=21627 I recommend to reject this patch. Since it is pure-Python, it is probably more suited as a stand-alone package. For inclusion into Python, trying to hook into thread creation is a hack, IMO, there are certainly ways to cheat that technique. ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-07-04 06:27 Message: Logged In: YES user_id=3066 Assigned to me since I've been digging into the profiling support lately. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2001-05-09 18:11 Message: Logged In: YES user_id=31392 Perhaps you could share this on comp.lang.python and see if people can help you fix the situations it doesn't handle well. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403972&group_id=5470 From noreply@sourceforge.net Sat Mar 9 10:56:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 02:56:55 -0800 Subject: [Patches] [ python-Patches-437733 ] Additional Argument to Python/C Function Message-ID: Patches item #437733, was opened at 2001-07-01 18:24 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=437733&group_id=5470 Category: None Group: None >Status: Closed Resolution: Rejected Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Additional Argument to Python/C Function Initial Comment: this patch makes it possible that a python/c function gets an aditional void* argument. this makes it easier to use python with c++. PS:i'm a bad despriptor,so please look at the diff file. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 11:56 Message: Logged In: YES user_id=21627 Was that re-opened by mistake? Closing it again. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 23:03 Message: Logged In: YES user_id=21627 It appears that the C++ fragment is broken and does not work as intended. Apparently, PyCppFunction is a member pointer, which is intended to be passed through to the invocation of pycfunction. However, AFAICT, addmethod converts the pointer-to-member-function into a void* before passing it into the methoddefs. This C++ code has undefined behaviour: there is no guarantee that a pointer-to-member can fit into a void*. In fact, on g++, a pointer-to-member is larger than a void* (8 bytes on a 32-bit machine). It may be possible to fix this. However, I think there are much more issues to integrating C++ classes into Python; such a class structure would add little if any value. Therefore, I'm in favour of rejecting this patch. ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-07-25 23:54 Message: Logged In: YES user_id=3066 Just for the record, I'm not ignoring your emailed plea for re-consideration; I just haven't had time to dig back into this matter. Here's the sample class from the email, so it will be easier to keep track of and for others to comment on the approach: class PyClass{ public: PyClass(); typedef PyObject* (PyClass::*PyCppFunction) (PyObject*); void addmethod(const char* name,PyCppFunction func); ~PyClass(); operator PyObject*(){return (PyObject*)obj;} private: struct PyClassObject{ PyObject_HEAD PyClass *self; }; std::vector methods; static PyObject *pycfunc(PyClassObject *self,PyObject *arg,void *p); static PyObject *getattr(PyClassObject *self,char *name); static void dealloc(PyClassObject *){} PyTypeObject typeobject; PyClassObject *obj; }; void PyClass::addmethod(const char *name,PyCppFunction func) { PyMethodDef meth={ strdup(name), (PyCFunction)pycfunc, METH_VARARGS|METH_USERARG, NULL, *(void**)&func }; methods.insert(methods.begin(),meth); } PyClass::~PyClass(){ for(vector::iterator i=methods.begin ();i!=methods.end();i++) free(i->ml_name); methods.resize(0); } PyObject *PyClass::pycfunc(PyClassObject *self,PyObject *arg,void *p){ PyCppFunction func=*(PyCppFunction*)&p; return (self->self->*func)(arg); } PyClass::PyClass(){ PyTypeObject Xxtype = { PyObject_HEAD_INIT(&PyType_Type) 0, /*ob_size*/ "xx", /*tp_name*/ sizeof(PyClassObject), /*tp_basicsize*/ 0, /*tp_itemsize*/ /* methods */ (destructor)dealloc, /*tp_dealloc*/ 0, /*tp_print*/ (getattrfunc)getattr, /*tp_getattr*/ (setattrfunc)0, /*tp_setattr*/ 0, /*tp_compare*/ 0, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mappi ng*/ 0, /*tp_hash*/ }; typeobject=Xxtype; obj=PyObject_NEW(PyClassObject,&typeobject); obj->self=this; PyMethodDef md={0}; methods.push_back(md); } PyObject *PyClass::getattr(PyClassObject *self,char *name){ return Py_FindMethod(&self->self->methods[0], (PyObject*)self,name); } This class is meant to be a base class for other classes that represent python types. ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-07-04 06:48 Message: Logged In: YES user_id=3066 The patch is easy enough to understand, but the motivation for this is not at all clear. Rejecting as code bloat. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=437733&group_id=5470 From noreply@sourceforge.net Sat Mar 9 11:02:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 03:02:18 -0800 Subject: [Patches] [ python-Patches-440407 ] Remote execution patch for IDLE Message-ID: Patches item #440407, was opened at 2001-07-11 15:35 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470 Category: IDLE Group: None Status: Open >Resolution: Out of Date Priority: 3 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Guido van Rossum (gvanrossum) Summary: Remote execution patch for IDLE Initial Comment: This is the code I have for the remote execution patch. (Remote execution must be enabled with an explicit command line argument -r.) Caveats: - undocumented - slow - security issue: the subprocess should not be the server but the client, to prevent a hacker from gaining access This should apply cleanly against IDLE as currently checked into the Python CVS tree. I don't want to check this in yet because of the security issue, and I don't have time to work on it. I hope the idlefork project will pick this up though and address the issues above. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:02 Message: Logged In: YES user_id=21627 It appears the patch is slightly outdated now, atleast the chunk removing set_break does not apply anymore. Has this been integrated to idlefork? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-07-11 15:38 Message: Logged In: YES user_id=6380 Uploading the patch again. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470 From noreply@sourceforge.net Sat Mar 9 11:04:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 03:04:30 -0800 Subject: [Patches] [ python-Patches-443899 ] Minor fix to gzip.py module. Message-ID: Patches item #443899, was opened at 2001-07-23 21:35 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=443899&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Titus Brown (titus) >Assigned to: Martin v. Lцwis (loewis) Summary: Minor fix to gzip.py module. Initial Comment: --- from cStringIO import StringIO from gzip import GzipFile stringFile = StringIO() gzFile = GzipFile("test1", 'wb', 9, stringFile) gzFile.write('howdy there!') r = gzFile.read() --- the above code fragment gave a nonintuitive error response (attribute missing). Now, an exception is raised stating that the file is not opened for reading or writing. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:04 Message: Logged In: YES user_id=21627 Taken the load from Jeremy. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-10-19 13:03 Message: Logged In: YES user_id=21627 I think gzip files should behave like fileobjects with respect to exceptions. Perhaps inconsistently, performing read or write on files that are opened only for the other operation raises an IOError (EBADF), since Posix says so, whereas performing close on a closed file raises a ValueError (it can't perform a system call since the file descriptor might have been recycled meanwhile). So I'm still in favour of applying this patch, with the valueerror changed to IOError, and perhaps passing EBADF as the error code in all cases of IOError. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-19 04:09 Message: Logged In: YES user_id=6380 Time to look at this again? ---------------------------------------------------------------------- Comment By: Titus Brown (titus) Date: 2001-08-16 22:33 Message: Logged In: YES user_id=23486 Re: context diff, thanks & sorry for the trouble; my newer patches are being submitted this way. Re: IOError, I wasn't sure which exception to use at the time. I therefore took my cue from other code in the gzip module, which raises a ValueError when self.fileobj is closed. The only IO errors raised in the module are those that pertain to incorrect file formats. I'd be happy to change any and all of the ValueErrors that are raised into IOErrors, but I think the current consistency of errors should be maintained ;). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-08-16 20:42 Message: Logged In: YES user_id=21627 Please always submit context (-c) or unified (-u) diffs; I've reformatted your patch by retrieving 1.24, applying the patch, updating to the current version, and regenerating the patch. Apart from that, the patch looks fine to me, and I recommend to approve it. One consideration is the exception being raised: Maybe IOError is more appropriate. ---------------------------------------------------------------------- Comment By: Titus Brown (titus) Date: 2001-08-14 21:15 Message: Logged In: YES user_id=23486 (sorry -- misunderstanding of how the changelog view works) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=443899&group_id=5470 From noreply@sourceforge.net Sat Mar 9 11:24:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 03:24:05 -0800 Subject: [Patches] [ python-Patches-448038 ] a move function for shutil.py Message-ID: Patches item #448038, was opened at 2001-08-05 02:08 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=448038&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 3 Submitted By: William McVey (wamcvey) Assigned to: Nobody/Anonymous (nobody) Summary: a move function for shutil.py Initial Comment: Although shutil.py has some nice copy functions but no real equivalent of mv(1). This is a very simple implimentation (as in not a whole lot of stuff has been implimented) but it's functional. Simply calls rename, and if that fails tries to copy and unlink. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:24 Message: Logged In: YES user_id=21627 Here is an attempt to provide error handling for copytree. It collects all exceptions in a list, and raises them as shutil.Error. This would be inconsistent with shutil.rmtree, which offers the choice of ignore errors,invoke an error callback, or raise an exception at the problem. Which of these alternatives would you like to see implemented? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-08-08 17:41 Message: Logged In: YES user_id=6380 This is OK, but only perpetuates the problem with this module -- it doesn't have a decent error handling strategy (prints to stdout!?!?!?!). If someone wants to put some more effort in this, I would recommend at least adding an option to copytree() to control error handling. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=448038&group_id=5470 From noreply@sourceforge.net Sat Mar 9 11:30:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 03:30:36 -0800 Subject: [Patches] [ python-Patches-450583 ] Extend/embed tools for AIX Message-ID: Patches item #450583, was opened at 2001-08-13 21:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=450583&group_id=5470 Category: Demos and tools Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: Extend/embed tools for AIX Initial Comment: The support tools for extending and embedding with AIX are installed into ${LIBPL}, but "configure" still creates a pointer in the Makefile as if they were installed into ${LIBP} instead. This patch corrects "configure"s behavior to match the install behavior in Makefile. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:30 Message: Logged In: YES user_id=21627 What version was this originally against? It appears that this is a duplicate of #103679, applied as configure.in 1.201. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-08-14 16:04 Message: Logged In: YES user_id=21627 Attached patch as requested in Xns90FCA43515C62beablebeable@30.146.28.98 Your comments do show up; just don't use the "Back" button of your browser without reloading the page. Also, you may consider getting an account so that you don't appear anonymous here. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2001-08-14 13:39 Message: Logged In: NO Hmmm ... it seems that my followup comments are not being displayed; are they being retained somewhere for moderating? For the third time, the patch is: *** configure.in.orig Mon Aug 13 15:45:14 2001 --- configure.in Mon Aug 13 15:55:33 2001 *************** *** 571,577 **** case $ac_sys_system/$ac_sys_release in AIX*) BLDSHARED="\/Modules/ld_so_aix \ -bI:Modules/python.exp" ! LDSHARED="\/ld_so_aix \ - bI:\/python.exp" ;; BeOS*) BLDSHARED="\/Modules/ld_so_beos $LDLIBRARY" --- 571,577 ---- case $ac_sys_system/$ac_sys_release in AIX*) BLDSHARED="\/Modules/ld_so_aix \ -bI:Modules/python.exp" ! LDSHARED="\/config/ld_so_aix \ -bI:\/config/python.exp" ;; BeOS*) BLDSHARED="\/Modules/ld_so_beos $LDLIBRARY" ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2001-08-14 13:35 Message: Logged In: NO The attachment won't seem to attach ... perhaps there wasn't quite enough testing of the patch manager for poor schmucks like myself who are trapped behind corporate firewalls. The patch contents: *** configure.in.orig Mon Aug 13 15:45:14 2001 --- configure.in Mon Aug 13 15:55:33 2001 *************** *** 571,577 **** case $ac_sys_system/$ac_sys_release in AIX*) BLDSHARED="\/Modules/ld_so_aix \ -bI:Modules/python.exp" ! LDSHARED="\/ld_so_aix \ - bI:\/python.exp" ;; BeOS*) BLDSHARED="\/Modules/ld_so_beos $LDLIBRARY" --- 571,577 ---- case $ac_sys_system/$ac_sys_release in AIX*) BLDSHARED="\/Modules/ld_so_aix \ -bI:Modules/python.exp" ! LDSHARED="\/config/ld_so_aix \ -bI:\/config/python.exp" ;; BeOS*) BLDSHARED="\/Modules/ld_so_beos $LDLIBRARY" ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-08-13 23:48 Message: Logged In: YES user_id=21627 Which patch? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=450583&group_id=5470 From noreply@sourceforge.net Sat Mar 9 11:40:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 03:40:20 -0800 Subject: [Patches] [ python-Patches-452232 ] timestamp function for time module Message-ID: Patches item #452232, was opened at 2001-08-17 23:37 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=452232&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gareth Harris (garethharris) Assigned to: Nobody/Anonymous (nobody) Summary: timestamp function for time module Initial Comment: Timestamp creates timestamp strings in ISO or ODBC format in UTC or local timezones. It can also add microseconds where needed. Timestamps are often needed outside database or XML activities, so its proposed location is the time module. timestamp(secs=None,fmt='ISO',TZ=None,fracsec=None): '''Make ISO or ODBC timestamp from [current] time. Parameters: secs= float seconds, else default = time() fmt = 'ISO' use ISO 8601 standard format = "YYYY-MM-DDTHH:MM:SS.mmmmmmZ" Zulu or "YYYY-MM-DDTHH:MM:SS.mmmmmm-hh:mm" local else "YYYY-MM-DD HH:MM:SS.mmmmmm" ODBC TZ = None=GMT/UTC/Zulu, else local time zone fracsec = None, else add microseconds to string ''' Any improvement or standardization is welcome. Gareth Harris gharris@nrao.edu 2001-08-17T21:36:00Z ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:40 Message: Logged In: YES user_id=21627 If you want to see the code included, you'd need to provide a context diff, including docs and test cases. However, notice that there may be overlap with the emerging builtin DateTime type, see http://www.zope.org/Members/fdrake/DateTimeWiki/FrontPage ---------------------------------------------------------------------- Comment By: Gareth Harris (garethharris) Date: 2002-01-02 17:41 Message: Logged In: YES user_id=300900 Back from travel, other projects etc. [2001.01.02] Thanks for comments thus far. Maybe I will finally meet some of you in Feb. --- I proposed to put this in TIME module UNLESS someone has an idea for a better location. Who takes care of that module? Shall I provide: doc?, test suite? Is a companion decode function needed? OTHERWISE I will put it in sourceforge/activestate? Which is preferred? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-01-01 21:27 Message: Logged In: YES user_id=21627 Gareth, Can you please propose a strategy to advance this patch or withdraw it? If there is no action, I propose to close it by Feb 1, 2002. ---------------------------------------------------------------------- Comment By: Fred L. Drake, Jr. (fdrake) Date: 2001-12-06 15:57 Message: Logged In: YES user_id=3066 Another possible alternate home for this would be the Python Snippet repository on SourceForge: http://sourceforge.net/snippet/browse.php?by=lang&lang=6 I'm not suggesting that this doesn't belong in the standard library, however. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-19 19:46 Message: Logged In: YES user_id=21627 Nice patch. If you want to see this included, you should complete it: Decide on location of the function, provide documentation and test cases. As the location, it may be that the calendar module could provide a home, but you may ask in the newsgroup. If you merely wanted to publish this code snippet, I suggest that you find a better home than the Python patch database, e.g. the Cookbook: http://aspn.activestate.com/ASPN/Cookbook/Python There are a number of other places that collect Python snippets; this is just one option. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=452232&group_id=5470 From noreply@sourceforge.net Sat Mar 9 11:44:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 03:44:09 -0800 Subject: [Patches] [ python-Patches-462754 ] no '_d' ending for mingw32 Message-ID: Patches item #462754, was opened at 2001-09-19 05:29 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: Gerhard Hдring (ghaering) Assigned to: Nobody/Anonymous (nobody) Summary: no '_d' ending for mingw32 Initial Comment: This patch prevents distutils from naming the extension modules _d.pyd when compiled with mingw32 on Windows in debug mode. Instead, the extension modules will get the normal name .pyd. Technically, the patch doesn't prevent the behaviour for mingw32, but only adds the _d for MS Visual C++ and Borland compilers (though I don't know about the Borland case). The reason for this? Adding "_d" doesn't make any sense for GNU compilers. I think it's just a MS Visual C++ madness. If you want to debug an extension module that was compiled with gcc, you have to use gdb anyway, because the debugging symbols of MSVC++ and gcc are incompatible. So you normally use a release Python version (from the python.org binary download) and compile your extensions with mingw32. To put it shortly: The current state is that you do a "setup.py build --compiler=mingw32 --debug" and then rename the extension modules, removing the _d. Then fire up gdb to debug your module. With this patch, the renaming isn't necessary anymore. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:44 Message: Logged In: YES user_id=21627 Does the patch actually work? It seems to me that, if compiled with-pydebug, import will automatically search for the _d version, and complain if it is not found. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-01-04 12:52 Message: Logged In: YES user_id=21627 The rationale for using the debugging version of MSVCRT are not the debugging information alone, but also the additional functionalities, like heap consistency checks and other assertions. So it is not obvious that you do not want to use the debugging version of this library in a debug build. ---------------------------------------------------------------------- Comment By: Gerhard Hдring (ghaering) Date: 2002-01-04 03:50 Message: Logged In: YES user_id=163326 mingw links with msvcrt.dll. I've plans to add mingw32 support to the autoconf build process (hopefully soon enough for 2.3). The GNU and MS debugger symbols are incompatible, though, so I think that mingw32 shouldn't link to the debug version of msrcrt (gdb doesn't understand the Microsoft debugger symbols; and the Visual Studio debugger has no idea what the debugging symbols of gcc are all about; isn't cross-platform and cross-compiler programming fun?). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-12-30 14:13 Message: Logged In: YES user_id=21627 How does the mingw port interact with the debugging libraries? With MSVC, the debug build will link to the debug versions of the CRT. What C library will mingw link with (I hope it won't use crtdll.dll)? ---------------------------------------------------------------------- Comment By: Gerhard Hдring (ghaering) Date: 2001-09-28 23:28 Message: Logged In: YES user_id=163326 Yes. But mingw32 isn't emulating Unix under Windows (that would be Cygwin). It's just a version of gcc and friends that targets native win32. It links against msvcrt (not a Posix emulation library like Cygwin does). This is a bit hypothetical because I didn't yet hack the autoconf build process for native win32 with mingw32. Currently, you cannot build a complete Python with mingw32, but you *can* build extension modules against an existing Python (compiled with M$ VC++). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-09-28 22:43 Message: Logged In: YES user_id=31435 All else being equal, a system emulating Unix under Windows should strive to make life comfortable for Unix folks. The question is thus whether all else is in fact equal . ---------------------------------------------------------------------- Comment By: Gerhard Hдring (ghaering) Date: 2001-09-28 20:37 Message: Logged In: YES user_id=163326 Hmm. I don't like the _d endings at all. But if the policy on win32 is that debug executables and libraries get a "_d" ending, then I'm unsure wether this patch should be applied. I have plans to hack the autoconf madness to build a native win32 Python with mingw32. But that won't be ready by tomorror. And I don't think that I'll add "_d" endings there for debugging, because that would be inconsistent with the normal autoconf builds on Unix. I'm glad that *I* don't have to decide wether this patch is a Good Thing. Being consistent with Python win32 build or with GNU (gcc/autoconf). Take your pick :-) ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-09-19 05:46 Message: Logged In: YES user_id=31435 FYI, MSVC never adds _d on its own -- Mark Hammond and/or Guido forced it to do that. I don't remember why, but one of them explained it to me long ago and it made good sense at the time . MSCV normally compiles debug and release builds into distinct subdirectories, and uses the same names in both. But *our* MSVC setup forces it to compile both flavors of build directly into the PCbuild directory, so has to give the resulting DLLs and executables different names (else the second build would overwrite the results of the first build). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=462754&group_id=5470 From noreply@sourceforge.net Sat Mar 9 11:54:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 03:54:57 -0800 Subject: [Patches] [ python-Patches-472523 ] Reminder: 2.3 should check tp_compare Message-ID: Patches item #472523, was opened at 2001-10-18 21:17 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=472523&group_id=5470 Category: None Group: None Status: Open Resolution: None >Priority: 6 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Nobody/Anonymous (nobody) Summary: Reminder: 2.3 should check tp_compare Initial Comment: In 2.3, the outcome of tp_compare should be required to be -1, 0 or 1; other values should be considered *illegal*. (In 2.2, the docs were changed to stress this but for backwards compatibility this isn't enforced.) ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:54 Message: Logged In: YES user_id=21627 Attached is a patch that implements this test, producing a warning if tp_compare does not follow that restriction. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:54 Message: Logged In: YES user_id=21627 Attached is a patch that implements this test, producing a warning if tp_compare does not follow that restriction. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=472523&group_id=5470 From noreply@sourceforge.net Sat Mar 9 12:00:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 04:00:16 -0800 Subject: [Patches] [ python-Patches-491936 ] Opt for tok_nextc Message-ID: Patches item #491936, was opened at 2001-12-12 09:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=491936&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: David Jacobs (dbj) Assigned to: Nobody/Anonymous (nobody) Summary: Opt for tok_nextc Initial Comment: tokenizer.c - revision 2.53 I tried to pick a routine that looked like it was heavily used and optimizations that do not increase the maintenance burden (I wont feel bad if you reject it though, I'll keep on trying as long as you don't consider it a burden :-). I changed one strcpy to a memcpy because the length had already been computed. I also changed the pattern: a = strchr(b,'\0'); to a = b + strlen(b); Which is an idiom I've seen in many other places in the code so I don't think it makes it harder to understand and strlen is significantly more efficient than strchr. Aloha, David Jacobs (your pico optimizer :-) ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 13:00 Message: Logged In: YES user_id=21627 Can you report some data about the resulting speedup? I seriously doubt that this is a significant change; unless data is forthcoming proving me wrong, I recommend to reject this patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=491936&group_id=5470 From noreply@sourceforge.net Sat Mar 9 12:04:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 04:04:45 -0800 Subject: [Patches] [ python-Patches-494047 ] removes 64-bit ?: to cope on plan9 Message-ID: Patches item #494047, was opened at 2001-12-17 03:18 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494047&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Russ Cox (rsc) Assigned to: Nobody/Anonymous (nobody) Summary: removes 64-bit ?: to cope on plan9 Initial Comment: The Plan 9 C compiler can't handle 64-bit numbers as the branches of a ternary operation. Rewrite a ? b : c into if (a) then b else c in two places. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 13:04 Message: Logged In: YES user_id=21627 Thanks for the patch, committed as longobject.c 1.115. I have not integrated it into 2.2.1, since I believe it is unlikely that all other plan9 changes are that trivial, so there is little chance that 2.2.1 will work out of the box on that system. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-17 03:55 Message: Logged In: YES user_id=6380 Thanks. We'll do this in 2.2.1 or 2.3, since (IMO) it's too close to the release date of 2.2. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494047&group_id=5470 From noreply@sourceforge.net Sat Mar 9 12:11:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 04:11:23 -0800 Subject: [Patches] [ python-Patches-504224 ] add plan9 threads include to thread.c Message-ID: Patches item #504224, was opened at 2002-01-16 07:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504224&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Russ Cox (rsc) Assigned to: Nobody/Anonymous (nobody) Summary: add plan9 threads include to thread.c Initial Comment: Adds the usual #ifdef and #include. I still haven't submitted any of the Plan 9 specific files (e.g., thread-plan9.h) since they're still in flux. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 13:11 Message: Logged In: YES user_id=21627 Thanks, applied as thread.c 2.41. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504224&group_id=5470 From noreply@sourceforge.net Sat Mar 9 14:19:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 06:19:46 -0800 Subject: [Patches] [ python-Patches-440407 ] Remote execution patch for IDLE Message-ID: Patches item #440407, was opened at 2001-07-11 09:35 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470 Category: IDLE Group: None Status: Open Resolution: Out of Date Priority: 3 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Guido van Rossum (gvanrossum) Summary: Remote execution patch for IDLE Initial Comment: This is the code I have for the remote execution patch. (Remote execution must be enabled with an explicit command line argument -r.) Caveats: - undocumented - slow - security issue: the subprocess should not be the server but the client, to prevent a hacker from gaining access This should apply cleanly against IDLE as currently checked into the Python CVS tree. I don't want to check this in yet because of the security issue, and I don't have time to work on it. I hope the idlefork project will pick this up though and address the issues above. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-09 09:19 Message: Logged In: YES user_id=6380 No, the IDLEfork project has stalled except for tweaking the configuration code (which would be good to merge into the Python IDLE tree when it's ready). I expect the patch failure is shallow so I won't bother fixing it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 06:02 Message: Logged In: YES user_id=21627 It appears the patch is slightly outdated now, atleast the chunk removing set_break does not apply anymore. Has this been integrated to idlefork? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-07-11 09:38 Message: Logged In: YES user_id=6380 Uploading the patch again. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470 From noreply@sourceforge.net Sun Mar 10 05:31:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 21:31:41 -0800 Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen() Message-ID: Patches item #523415, was opened at 2002-02-28 01:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Andy Gimblett (gimbo) Assigned to: Nobody/Anonymous (nobody) Summary: Explict proxies for urllib.urlopen() Initial Comment: This patch extends urllib.urlopen() so that proxies may be specified explicitly. This is achieved by adding an optional "proxies" parameter. If this parameter is omitted, urlopen() acts exactly as before, ie gets proxy settings from the environment. This is useful if you want to tell urlopen() not to use the proxy: just pass an empty dictionary. Also included is a patch to the urllib documentation explaining the new parameter. Apologies if patch format is not exactly as required: this is my first submission. All feedback appreciated. :-) ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-10 16:31 Message: Logged In: YES user_id=250749 I think expanding the docs is the go here. In looking at the 2.2 docs (11.4 urllib), the bits that I think could usefully be improved include:- - the paragraph describing the proxy environment variables should note that on Windows, browser (at least for InternetExplorer - I don't know about Netscape) registry settings for proxies will be used when available; - a short para noting that proxies can be overridden using URLopener/FancyURLopener class instances, documented further down the page, placed just before the note about not supporting authenticating proxies; - adding a description of the "proxies" parameter to the URLopener class definition; - adding an example of bypassing proxies to the examples subsection (11.4.2). If/when you upload a doc patch, I suggest that you assign it to Fred Drake, who is the chief docs person. ---------------------------------------------------------------------- Comment By: Andy Gimblett (gimbo) Date: 2002-03-04 20:33 Message: Logged In: YES user_id=262849 Thanks for feedback re: diffs. Have now found out about context diffs and attached new version - hope this is better. Regarding the patch itself, this arose out of a newbie question on c.l.py and I was reminded that this was an issue I'd come across in my early days too. Personally I'd never picked up the hint that you should use FancyURLopener directly. If preferred, I could have a go at patching the docs to make that clearer? ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 14:34 Message: Logged In: YES user_id=250749 BTW, the patch guidelines indicate a strong preference for context diffs with unified diffs a poor second. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 14:32 Message: Logged In: YES user_id=250749 Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) directly. If this is the intent, the docs could be a little clearer about this. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 From noreply@sourceforge.net Sun Mar 10 05:45:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 09 Mar 2002 21:45:44 -0800 Subject: [Patches] [ python-Patches-528022 ] PEP 285 - Adding a bool type Message-ID: Patches item #528022, was opened at 2002-03-10 00:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528022&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Nobody/Anonymous (nobody) Summary: PEP 285 - Adding a bool type Initial Comment: Here's a preliminary implementation of the PEP, including unittests checking the promises made in the PEP (test_bool.py) and (some) documentation. With this 12 tests fail for me (on Linux); I'll look into these later. They appear shallow (mostly doctests dying on True or False where 1 or 0 was expected). Note: the presence of this patch does not mean that the PEP is accepted -- it just means that a sample implementation exists in case someone wants to explore the effects of the PEP on their code. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528022&group_id=5470 From noreply@sourceforge.net Sun Mar 10 08:00:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Mar 2002 00:00:42 -0800 Subject: [Patches] [ python-Patches-499062 ] Minor typo in test_generators.py Message-ID: Patches item #499062, was opened at 2002-01-03 13:32 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499062&group_id=5470 Category: Tests Group: None >Status: Closed >Resolution: Fixed Priority: 3 Submitted By: Uche Ogbuji (uche) Assigned to: Tim Peters (tim_one) Summary: Minor typo in test_generators.py Initial Comment: This one caused me a bit of confusion. Traditionally "leaves" refer to tree nodes with no children. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-10 03:00 Message: Logged In: YES user_id=31435 Changed, in dist/src/Lib/test/test_generators.py; new revision: 1.31 nondist/peps/pep-0255.txt; new revision: 1.18 ---------------------------------------------------------------------- Comment By: Uche Ogbuji (uche) Date: 2002-01-03 23:06 Message: Logged In: YES user_id=38966 No more argument. s/leaves/labels it is. Thanks. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-01-03 19:35 Message: Logged In: YES user_id=31435 Yes, I think "leaf" == "no kids" is universally accepted. I don't like changing it to plain "nodes", though, because the example code does not generate the nodes, it generates only the node labels -- someone confused by the misuse of "leaves" here is also likely to be confused by the misuse of "nodes" -- and I'm going to reduce the priority of this patch every time you argue back . ---------------------------------------------------------------------- Comment By: Uche Ogbuji (uche) Date: 2002-01-03 19:23 Message: Logged In: YES user_id=38966 It's s/leaves/nodes/. Maybe I've been working with DOM too much. At any rate, I have always thought of leaf nodes as only those with no children. It doesn't look as if anything from my patch made it through: neither the comment nor the patch. Sometimes I hate SF. I'll try again, though it hardly seems necessary... ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-01-03 19:18 Message: Logged In: YES user_id=31435 Assigned to me; added a "Tests" category and recategorized accordingly. Uche, if you tried to upload a patch, it didn't work (did you remember to check the upload box)? What is that you want to see changed? s/leaves/labels/? Note that the example in the docstring test is lifted directly out of PEP 255, so tell me what would shut you up and I'll make the change in both places. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499062&group_id=5470 From noreply@sourceforge.net Sun Mar 10 12:46:38 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Mar 2002 04:46:38 -0800 Subject: [Patches] [ python-Patches-528038 ] __nonzero__ being improperly called Message-ID: Patches item #528038, was opened at 2002-03-10 09:16 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528038&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Burton Radons (loth) Assigned to: Nobody/Anonymous (nobody) Summary: __nonzero__ being improperly called Initial Comment: As noted in Bug #527816, if you call the __nonzero__ method of a builtin type directly it will SIGSEGV you. The reason is that internally the nonzero slot is being called with "PyObject *(*) (PyObject *)" casting, rather than the actual "int (*) (PyObject *)". This small patch adds a new static function that's just a copy of wrap_hashfunc and gets it called properly later on. If this isn't how we want bugfixes handled, please advise and I'll revise. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-10 13:46 Message: Logged In: YES user_id=21627 The patch looks good. However, wouldn't it be simpler to use wrap_inquiry instead? (esp. since nb_nonzero is defined as inquiryfunc). Also, a test case (perhaps inside test_descr) which currently crashes but succeeds under your patch would be appreciated. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528038&group_id=5470 From noreply@sourceforge.net Sun Mar 10 14:28:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Mar 2002 06:28:07 -0800 Subject: [Patches] [ python-Patches-528038 ] __nonzero__ being improperly called Message-ID: Patches item #528038, was opened at 2002-03-10 03:16 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528038&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Burton Radons (loth) Assigned to: Guido van Rossum (gvanrossum) Summary: __nonzero__ being improperly called Initial Comment: As noted in Bug #527816, if you call the __nonzero__ method of a builtin type directly it will SIGSEGV you. The reason is that internally the nonzero slot is being called with "PyObject *(*) (PyObject *)" casting, rather than the actual "int (*) (PyObject *)". This small patch adds a new static function that's just a copy of wrap_hashfunc and gets it called properly later on. If this isn't how we want bugfixes handled, please advise and I'll revise. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-10 09:11 Message: Logged In: YES user_id=6380 Thanks! Fixed in CVS, using Martin's approach. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-10 07:46 Message: Logged In: YES user_id=21627 The patch looks good. However, wouldn't it be simpler to use wrap_inquiry instead? (esp. since nb_nonzero is defined as inquiryfunc). Also, a test case (perhaps inside test_descr) which currently crashes but succeeds under your patch would be appreciated. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=528038&group_id=5470 From noreply@sourceforge.net Sun Mar 10 18:46:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Mar 2002 10:46:59 -0800 Subject: [Patches] [ python-Patches-500311 ] Work around for buggy https servers Message-ID: Patches item #500311, was opened at 2002-01-07 09:49 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500311&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michel Van den Bergh (vdbergh) >Assigned to: Martin v. Lцwis (loewis) Summary: Work around for buggy https servers Initial Comment: Python 2.2. Tested on RH 7.1. This a workaround for, http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=494762 The problem is that some https servers close an ssl connection without properly resetting it first. In the above bug description it is suggested that this only occurs for IIS but apparently some (modified) Apache servers also suffer from it (see telemeter.telenet.be). One of the suggested workarounds is to modify httplib.py so as to ignore the combination of err[0]==SSL_ERROR_SYSCALL and err[1]=="EOF occurred in violation of protocol". However I think one should never compare error strings since in principle they may depend on language etc... So I decided to modify _socket.c slightly so that it becomes possible to return error codes which are not in in ssl.h. When an ssl-connection is closed without reset I now return the error code SSL_ERROR_EOF. Then I ignore this (apparently benign) error in httplib.py. In addition I fixed what I think was an error in PySSL_SetError(SSL *ssl, int ret) in socketmodule.c. Originally there was: case SSL_ERROR_SSL: { unsigned long e = ERR_get_error(); if (e == 0) { /* an EOF was observed that violates the protocol */ errstr = "EOF occurred in violation of protocol"; etc... but if I understand the documentation for SSL_get_error then the test should be: e==0 && ret==0. A similar error occurs a few lines later. ---------------------------------------------------------------------- Comment By: Michel Van den Bergh (vdbergh) Date: 2002-01-09 11:25 Message: Logged In: YES user_id=10252 Due to some problems with sourceforge and incompetence on my part I submitted this several times. Please see patch 500311. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=500311&group_id=5470 From noreply@sourceforge.net Sun Mar 10 22:14:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Mar 2002 14:14:40 -0800 Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior Message-ID: Patches item #514662, was opened at 2002-02-07 23:49 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Naofumi Honda (naofumi-h) Assigned to: Guido van Rossum (gvanrossum) Summary: On the update_slot() behavior Initial Comment: Inherited method __getitem__ of list type in the new subclass is unexpectedly slow. For example, x = list([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m2.390s class nlist(list): pass x = nlist([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m7.040s about 3times slower!!! The reason is: for the __getitem__ attribute, there are two slotdefs in typeobject.c (one for the mapping type, and the other for the sequence type). In the creation of new_type of list type, fixup_slot_dispatchers() and update_slot() functions in typeobject.c allocate the functions to both sq_item and mp_subscript slots (the mp_subscript slot had originally no function, because the list type is a sequence type), and it's an unexpected allocation for the mapping slot since the descriptor type of __getitem__ is now WrapperType for the sequence operations. If you will trace x[1] using gdb, you will find that in PyObject_GetItem() m->mp_subscript = slot_mp_subscript is called instead of a sequece operation because mp_subscript slot was allocated by fixup_slot_dispatchers(). In the slot_mp_subscirpt(), call_method(self, "__getitem__", ...) is invoked, and turn out to call a wrapper descriptors for the sq_item. As a result, the method of list type finally called, but it needs many unexpected function calls. I will fix the behavior of fixup_slot_dispachers() and update_slot() as follows: Only the case where *) two or more slotdefs have the same attribute name where at most one corresponding slot has a non null pointer *) the descriptor type of the attribute is WrapperType, these functions will allocate the only one function to the apropriate slot. The other case, the behavior not changed to keep compatiblity! (in particular, considering the case where user override methods exist!) The following patch also includes speed up routines to find the slotdef duplications, but it's not essential! ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-10 17:14 Message: Logged In: YES user_id=6380 Thanks for the analysis! Would you mind submitting a new patch without the #ifdef ORIGINAL_CODE stuff? Just delete/replace old code as needed -- cvs diff will show me the original code. The ORIGINAL_CODE stuff makes it harder for me to get the point of the diff. Also, maybe you could leave the speedup code out, to show the absolutely minimal amount of code needed. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 From noreply@sourceforge.net Mon Mar 11 00:25:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Mar 2002 16:25:28 -0800 Subject: [Patches] [ python-Patches-498109 ] fileobject truncate support for win32 Message-ID: Patches item #498109, was opened at 2001-12-31 11:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=498109&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Wolfgang Strobl (strobl) Assigned to: Tim Peters (tim_one) Summary: fileobject truncate support for win32 Initial Comment: python 2.2 has large file support on Windows, but f.truncate() throws an overflow exception when f.tell() >2G. I've changed file_truncate in fileobject.c to using SetEndOfFile iff truncate is called without a parameter, on Win32. Tested on W2k (Ger), see the diff to test_largfile.py. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-10 19:25 Message: Logged In: YES user_id=31435 I'm rejecting the patch (because it does too little), but implemented the suggested solution instead and checked it in (so you should be happy it's rejected ): Doc/lib/libstdtypes.tex; new revision: 1.82 Lib/test/test_largefile.py; new revision: 1.13 Misc/NEWS; new revision: 1.361 Objects/fileobject.c; new revision: 2.145 ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-10 03:25 Message: Logged In: YES user_id=31435 I wonder why you're settling for so little here, and I'm not sure it's a real improvement to fix one special large file case while letting others continue to blow up, and especially not when leaving it all undocumented. Did you consider using SetFilePointer() before SetEndOfFile (), in order to handle all cases (the former allows setting to 64-bit file positions)? This is trickier (e.g., .truncate() should never *grow* the file, and it would get you into the obscure Windows LARGE_INTEGER business), but would be much more satisfying. In any case, note that you should #define WINDOWS_LEAN_AND_MEAN before including windows.h in Python. ---------------------------------------------------------------------- Comment By: Wolfgang Strobl (strobl) Date: 2002-01-02 02:49 Message: Logged In: YES user_id=311771 Right. See the attached diff. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-01-01 14:15 Message: Logged In: YES user_id=21627 Wouldn't it be better to include a header file instead of declaring a SetEndOfFile prototype? ---------------------------------------------------------------------- Comment By: Wolfgang Strobl (strobl) Date: 2001-12-31 14:56 Message: Logged In: YES user_id=311771 Oops. While removing some obsolete personal notes, I accidentally removed the leading comment. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-31 11:42 Message: Logged In: YES user_id=6380 For Tim. I presume the chunk of the diff that removes the leading comment of the file is a mistake? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=498109&group_id=5470 From noreply@sourceforge.net Mon Mar 11 06:47:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 10 Mar 2002 22:47:27 -0800 Subject: [Patches] [ python-Patches-443899 ] Minor fix to gzip.py module. Message-ID: Patches item #443899, was opened at 2001-07-23 21:35 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=443899&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Titus Brown (titus) Assigned to: Martin v. Lцwis (loewis) Summary: Minor fix to gzip.py module. Initial Comment: --- from cStringIO import StringIO from gzip import GzipFile stringFile = StringIO() gzFile = GzipFile("test1", 'wb', 9, stringFile) gzFile.write('howdy there!') r = gzFile.read() --- the above code fragment gave a nonintuitive error response (attribute missing). Now, an exception is raised stating that the file is not opened for reading or writing. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-11 07:47 Message: Logged In: YES user_id=21627 Thanks again for the patch; committed (in modified form) as gzip.py 1.29. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:04 Message: Logged In: YES user_id=21627 Taken the load from Jeremy. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-10-19 13:03 Message: Logged In: YES user_id=21627 I think gzip files should behave like fileobjects with respect to exceptions. Perhaps inconsistently, performing read or write on files that are opened only for the other operation raises an IOError (EBADF), since Posix says so, whereas performing close on a closed file raises a ValueError (it can't perform a system call since the file descriptor might have been recycled meanwhile). So I'm still in favour of applying this patch, with the valueerror changed to IOError, and perhaps passing EBADF as the error code in all cases of IOError. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-19 04:09 Message: Logged In: YES user_id=6380 Time to look at this again? ---------------------------------------------------------------------- Comment By: Titus Brown (titus) Date: 2001-08-16 22:33 Message: Logged In: YES user_id=23486 Re: context diff, thanks & sorry for the trouble; my newer patches are being submitted this way. Re: IOError, I wasn't sure which exception to use at the time. I therefore took my cue from other code in the gzip module, which raises a ValueError when self.fileobj is closed. The only IO errors raised in the module are those that pertain to incorrect file formats. I'd be happy to change any and all of the ValueErrors that are raised into IOErrors, but I think the current consistency of errors should be maintained ;). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-08-16 20:42 Message: Logged In: YES user_id=21627 Please always submit context (-c) or unified (-u) diffs; I've reformatted your patch by retrieving 1.24, applying the patch, updating to the current version, and regenerating the patch. Apart from that, the patch looks fine to me, and I recommend to approve it. One consideration is the exception being raised: Maybe IOError is more appropriate. ---------------------------------------------------------------------- Comment By: Titus Brown (titus) Date: 2001-08-14 21:15 Message: Logged In: YES user_id=23486 (sorry -- misunderstanding of how the changelog view works) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=443899&group_id=5470 From noreply@sourceforge.net Mon Mar 11 17:00:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Mar 2002 09:00:44 -0800 Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes Message-ID: Patches item #525109, was opened at 2002-03-03 06:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 Category: IDLE Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Martin Liebmann (mliebmann) >Assigned to: Guido van Rossum (gvanrossum) Summary: Extension to Calltips / Show attributes Initial Comment: The attached files (unified diff files) implement a (quick and dirty but usefull) extension to IDLE 0.8 (Python 2.2) - Tested on WINDOWS 95/98/NT/2000 - Similar to "CallTips" this extension shows (context sensitive) all available member functions and attributes of the current object after hitting the 'dot'-key. The toplevel help widget now supports scrolling. (Key- Up and Key-Down events) ...that is why I changed among else the first argument of 'showtip' from 'text string' to a 'list of text strings' ... The 'space'-key is used to insert the topmost item of the help widget into an IDLE text window. ...the even handling seems to be a critical part of the current IDLE implementation. That is why I added the new functionallity as a patch of CallTips.py and CallTipWindow.py. May be you still have a better implementation ... Greetings Martin Liebmann ---------------------------------------------------------------------- Comment By: Martin Liebmann (mliebmann) Date: 2002-03-07 16:41 Message: Logged In: YES user_id=475133 Patched and more robust version of the extended files CallTips.py and CallTipWindows.py. (Now more compatible to earlier versions of python) ---------------------------------------------------------------------- Comment By: Martin Liebmann (mliebmann) Date: 2002-03-03 17:02 Message: Logged In: YES user_id=475133 '' must be substituted by '.' within CallTip.py ! ( Linux do not support an event named ) Running idle on Linux, I found the warning, that 'import *' is not allowed within function '_dir_main' of CallTip.py ??? Nevertheless CallTips works fine on Linux ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 From noreply@sourceforge.net Tue Mar 12 02:49:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Mar 2002 18:49:36 -0800 Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior Message-ID: Patches item #514662, was opened at 2002-02-08 04:49 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Naofumi Honda (naofumi-h) Assigned to: Guido van Rossum (gvanrossum) Summary: On the update_slot() behavior Initial Comment: Inherited method __getitem__ of list type in the new subclass is unexpectedly slow. For example, x = list([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m2.390s class nlist(list): pass x = nlist([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m7.040s about 3times slower!!! The reason is: for the __getitem__ attribute, there are two slotdefs in typeobject.c (one for the mapping type, and the other for the sequence type). In the creation of new_type of list type, fixup_slot_dispatchers() and update_slot() functions in typeobject.c allocate the functions to both sq_item and mp_subscript slots (the mp_subscript slot had originally no function, because the list type is a sequence type), and it's an unexpected allocation for the mapping slot since the descriptor type of __getitem__ is now WrapperType for the sequence operations. If you will trace x[1] using gdb, you will find that in PyObject_GetItem() m->mp_subscript = slot_mp_subscript is called instead of a sequece operation because mp_subscript slot was allocated by fixup_slot_dispatchers(). In the slot_mp_subscirpt(), call_method(self, "__getitem__", ...) is invoked, and turn out to call a wrapper descriptors for the sq_item. As a result, the method of list type finally called, but it needs many unexpected function calls. I will fix the behavior of fixup_slot_dispachers() and update_slot() as follows: Only the case where *) two or more slotdefs have the same attribute name where at most one corresponding slot has a non null pointer *) the descriptor type of the attribute is WrapperType, these functions will allocate the only one function to the apropriate slot. The other case, the behavior not changed to keep compatiblity! (in particular, considering the case where user override methods exist!) The following patch also includes speed up routines to find the slotdef duplications, but it's not essential! ---------------------------------------------------------------------- >Comment By: Naofumi Honda (naofumi-h) Date: 2002-03-12 02:49 Message: Logged In: YES user_id=452575 I will post a new patch containing a essential part of previous one (i.e. without ifdef and almost all speed up routines). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-10 22:14 Message: Logged In: YES user_id=6380 Thanks for the analysis! Would you mind submitting a new patch without the #ifdef ORIGINAL_CODE stuff? Just delete/replace old code as needed -- cvs diff will show me the original code. The ORIGINAL_CODE stuff makes it harder for me to get the point of the diff. Also, maybe you could leave the speedup code out, to show the absolutely minimal amount of code needed. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 From noreply@sourceforge.net Tue Mar 12 02:49:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 11 Mar 2002 18:49:55 -0800 Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior Message-ID: Patches item #514662, was opened at 2002-02-08 04:49 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Naofumi Honda (naofumi-h) Assigned to: Guido van Rossum (gvanrossum) Summary: On the update_slot() behavior Initial Comment: Inherited method __getitem__ of list type in the new subclass is unexpectedly slow. For example, x = list([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m2.390s class nlist(list): pass x = nlist([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m7.040s about 3times slower!!! The reason is: for the __getitem__ attribute, there are two slotdefs in typeobject.c (one for the mapping type, and the other for the sequence type). In the creation of new_type of list type, fixup_slot_dispatchers() and update_slot() functions in typeobject.c allocate the functions to both sq_item and mp_subscript slots (the mp_subscript slot had originally no function, because the list type is a sequence type), and it's an unexpected allocation for the mapping slot since the descriptor type of __getitem__ is now WrapperType for the sequence operations. If you will trace x[1] using gdb, you will find that in PyObject_GetItem() m->mp_subscript = slot_mp_subscript is called instead of a sequece operation because mp_subscript slot was allocated by fixup_slot_dispatchers(). In the slot_mp_subscirpt(), call_method(self, "__getitem__", ...) is invoked, and turn out to call a wrapper descriptors for the sq_item. As a result, the method of list type finally called, but it needs many unexpected function calls. I will fix the behavior of fixup_slot_dispachers() and update_slot() as follows: Only the case where *) two or more slotdefs have the same attribute name where at most one corresponding slot has a non null pointer *) the descriptor type of the attribute is WrapperType, these functions will allocate the only one function to the apropriate slot. The other case, the behavior not changed to keep compatiblity! (in particular, considering the case where user override methods exist!) The following patch also includes speed up routines to find the slotdef duplications, but it's not essential! ---------------------------------------------------------------------- >Comment By: Naofumi Honda (naofumi-h) Date: 2002-03-12 02:49 Message: Logged In: YES user_id=452575 I will post a new patch containing a essential part of previous one (i.e. without ifdef and almost all speed up routines). ---------------------------------------------------------------------- Comment By: Naofumi Honda (naofumi-h) Date: 2002-03-12 02:49 Message: Logged In: YES user_id=452575 I will post a new patch containing a essential part of previous one (i.e. without ifdef and almost all speed up routines). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-10 22:14 Message: Logged In: YES user_id=6380 Thanks for the analysis! Would you mind submitting a new patch without the #ifdef ORIGINAL_CODE stuff? Just delete/replace old code as needed -- cvs diff will show me the original code. The ORIGINAL_CODE stuff makes it harder for me to get the point of the diff. Also, maybe you could leave the speedup code out, to show the absolutely minimal amount of code needed. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 From noreply@sourceforge.net Tue Mar 12 21:40:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Mar 2002 13:40:22 -0800 Subject: [Patches] [ python-Patches-523271 ] Docstrings for os.stat and time.localtim Message-ID: Patches item #523271, was opened at 2002-02-27 00:32 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523271&group_id=5470 Category: Documentation Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Sean Reifschneider (jafo) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Docstrings for os.stat and time.localtim Initial Comment: This patch updates the first line of the docstrings for os.stat(), os.lstat(), and time.*time() so that it reflects the attribute names on the tuple-like struct object returned. It changes: localtime(...) localtime([seconds]) -> (year,month,day,hour,minute,second,weekday,dayofyear ,dst) into: gmtime([seconds]) -> (tm_year,tm_mon,tm_day,tm_hour,tm_min,tm_sec,tm_wday,tm_yday,tm_isdst) ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-03-12 16:40 Message: Logged In: YES user_id=3066 Checked in a modified patch as Modules/posixmodule.c revisions 2.216.4.3 and 2.225, and Modules/timemodule.c revisions 2.118.6.2 and 2.124. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523271&group_id=5470 From noreply@sourceforge.net Wed Mar 13 03:16:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 12 Mar 2002 19:16:42 -0800 Subject: [Patches] [ python-Patches-515015 ] inspect.py raise exception if code not found Message-ID: Patches item #515015, was opened at 2002-02-08 17:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515015&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Neal Norwitz (nnorwitz) Summary: inspect.py raise exception if code not found Initial Comment: there is a comment which says the suffixes should be sorted by length, but there is no comparison function. this patch adds a comparison (lambda). also, there are two functions which are documented to return IOError if there are problems, but if the function reaches the end, there were no raises. This patch adds raise IOErrors. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-12 22:16 Message: Logged In: YES user_id=33168 Checked in as inspect.py 1.27. Only findsource() is documented to raise an IOError, so that is the only function that is fixed. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:36 Message: Logged In: YES user_id=6380 Neal, can you check this is and mark as bugfix? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-02-09 09:16 Message: Logged In: YES user_id=33168 Sorry, I saw the map/lambda above, but misread the code. Attached is a new file (just contains the 2 raises). I really need to add a test for this as well. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-02-08 18:10 Message: Logged In: YES user_id=31435 Please remove the lambda trick from the patch. The comment is explaining why the negation of the length is the first element of the tuples being sorted (that's what guarantees the longest suffix is checked first in case of overlap). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=515015&group_id=5470 From noreply@sourceforge.net Wed Mar 13 12:15:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Mar 2002 04:15:01 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 23:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) Assigned to: Nobody/Anonymous (nobody) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Wed Mar 13 19:54:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Mar 2002 11:54:18 -0800 Subject: [Patches] [ python-Patches-529586 ] Missing character in BNF Message-ID: Patches item #529586, was opened at 2002-03-13 19:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jeremy Yallop (yallop) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Missing character in BNF Initial Comment: The bitwise inversion operator isn't displayed in the python grammar (reference manual, section 5.5). Tidle needs to be escaped in the LaTeX source. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470 From noreply@sourceforge.net Wed Mar 13 21:51:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Mar 2002 13:51:31 -0800 Subject: [Patches] [ python-Patches-529586 ] Missing character in BNF Message-ID: Patches item #529586, was opened at 2002-03-13 19:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470 Category: Documentation Group: None Status: Open Resolution: None >Priority: 2 Submitted By: Jeremy Yallop (yallop) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Missing character in BNF Initial Comment: The bitwise inversion operator isn't displayed in the python grammar (reference manual, section 5.5). Tidle needs to be escaped in the LaTeX source. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470 From noreply@sourceforge.net Wed Mar 13 22:44:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Mar 2002 14:44:23 -0800 Subject: [Patches] [ python-Patches-476814 ] foreign-platform newline support Message-ID: Patches item #476814, was opened at 2001-10-31 17:41 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jack Jansen (jackjansen) Assigned to: Jack Jansen (jackjansen) Summary: foreign-platform newline support Initial Comment: This patch enables Python to interpret all known newline conventions, CR, LF or CRLF, on all platforms. This support is enabled by configuring with --with-universal-newlines (so by default it is off, and everything should behave as usual). With universal newline support enabled two things happen: - When importing or otherwise parsing .py files any newline convention is accepted. - Python code can pass a new "t" mode parameter to open() which reads files with any newline convention. "t" cannot be combined with any other mode flags like "w" or "+", for obvious reasons. File objects have a new attribute "newlines" which contains the type of newlines encountered in the file (or None when no newline has been seen, or "mixed" if there were various types of newlines). Also included is a test script which tests both file I/O and parsing. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-03-13 23:44 Message: Logged In: YES user_id=45365 A new version of the patch. Main differences are that U is now the mode character to trigger universal newline input and --with-universal-newlines is default on. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-01-16 23:47 Message: Logged In: YES user_id=45365 This version of the patch addresses the bug in Py_UniversalNewlineFread and fixes up some minor details. Tim's other issues are addressed (at least: I think they are:-) in a forthcoming PEP. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-12-14 00:57 Message: Logged In: YES user_id=31435 Back to Jack -- and sorry for sitting on it so long. Clearly this isn't making it into 2.2 in the core. As I said on Python-Dev, I believe this needs a PEP: the design decisions are debatable, so *should* be debated outside the Mac community too. Note, though, that I can't stop you from adding it to the 2.2 Mac distribution (if you want it badly enough there). If a PEP won't be written, I suggest finding someone else to review it again; maybe Guido. Note that the patch needs doc changes too. The patch to regrtest.py doesn't belong here (I assume it just slipped in). There seems a lot of code in support of the f_newlinetypes member, and the value of that member isn't clear -- I can't imagine a good use for it (maybe it's a Mac thing?). The implementation of Py_UniversalNewlineFread appears incorrect to me: it reads n bytes *every* time around the outer loop, no matter how few characters are still required, and n doesn't change inside the loop. The business about the GIL may be due to the lack of docs: are, or are not, people supposed to release the GIL themselves around calls to these guys? It's not documented, and it appears your intent differed from my guess. Finally, it would be better to call ferror () after calling fread() instead of before it . ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2001-11-14 16:13 Message: Logged In: YES user_id=45365 Here's a new version of the patch. To address your issues one by one: - get_line and Py_UniversalNewlineFgets are too difficult to integrate, at leat, I don't see how I could do it. The storage management of get_line gets in the way. - The global lock comment I don't understand. The Universal... routines are replacements for fgets() and fread(), so have nothing to do with the interpreter lock. - The logic of all three routines (get_line too) has changed and I've put comments in. I hope this addresses some of the points. - If universal_newline is false for a certain PyFileObject we now immedeately take a quick exit via fgets() or fread(). There's also a new test script, that tests some more border cases (like lines longer than 100 characters, and a lone CR just before end of file). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-11-05 09:16 Message: Logged In: YES user_id=31435 It would be better if get_line just called Py_UniversalNewlineFgets (when appropriate) instead of duplicating its logic inline. Py_UniversalNewlineFgets and Py_UniversalNewlineFread should deal with releasing the global lock themselves -- the correct granularity for lock release/reacquire is around the C-level input routines (esp. for fread). The new routines never check for I/O errors! Why not? It seems essential. The new Fgets checks for EOF at the end of the loop instead of the top. This is surprising, and I stared a long time in vain trying to guess why. Setting newlinetypes |= NEWLINE_CR; immediately after seeing an '\r' would be as fast (instead of waiting to see EOF and then inferring the prior existence of '\r' indirectly from the state of the skipnextlf flag). Speaking of which , the fobj tests in the inner loop waste cycles. Set the local flag vrbls whether or not fobj is NULL. When you're *out* of the inner loop you can simply decline to store the new masks when fobj is NULL (and you're already doing the latter anyway). A test and branch inside the loop is much more expensive than or'ing in a flag bit inside the loop, ditto harder to understand. Floating the univ_newline test out of the loop (and duplicating the loop body, one way for univ_newline true and the other for it false) would also save a test and branch on every character. Doing fread one character at a time is very inefficient. Since you know you need to obtain n characters in the end, and that these transformations require reading at least n characters, you could very profitably read n characters in one gulp at the start, then switch to k at a time where k is the number of \r\n pairs seen since the last fread call. This is easier to code than it sounds . It would be fine by me if you included (and initialized) the new file-object fields all the time, whether or not universal newlines are configured. I'd rather waste a few bytes in a file object than see #ifdefs spread thru the code. I'll be damned if I can think of a quick way to do this stuff on Windows -- native Windows fgets() is still the only Windows handle we have on avoiding crushing thread overhead inside MS's C library. I'll think some more about it (the thrust still being to eliminate the 't' mode flag, as whined about on Python-Dev). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-31 18:38 Message: Logged In: YES user_id=6380 Tim, can you review this or pass it on to someone else who has time? Jack developed this patch after a discussion in which I was involved in some of the design, but I won't have time to look at it until December. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470 From noreply@sourceforge.net Thu Mar 14 07:13:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 13 Mar 2002 23:13:26 -0800 Subject: [Patches] [ python-Patches-529768 ] Speed-up getattr Message-ID: Patches item #529768, was opened at 2002-03-14 08:13 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: Nobody/Anonymous (nobody) Summary: Speed-up getattr Initial Comment: This patch moves the string check in getattr before the Unicode check, reducing the number of IsSubType checks originating from getattr to 50% in a typical application. For the attached artificial benchmark, this gives a 7% speed-up. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470 From noreply@sourceforge.net Thu Mar 14 14:46:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 06:46:25 -0800 Subject: [Patches] [ python-Patches-529586 ] Missing character in BNF Message-ID: Patches item #529586, was opened at 2002-03-13 14:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470 Category: Documentation Group: None >Status: Closed >Resolution: Wont Fix Priority: 2 Submitted By: Jeremy Yallop (yallop) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Missing character in BNF Initial Comment: The bitwise inversion operator isn't displayed in the python grammar (reference manual, section 5.5). Tidle needs to be escaped in the LaTeX source. ---------------------------------------------------------------------- >Comment By: Fred L. Drake, Jr. (fdrake) Date: 2002-03-14 09:46 Message: Logged In: YES user_id=3066 This problem will no longer be present after bug #523117 is fixed; that bug has been assigned a high priority. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529586&group_id=5470 From noreply@sourceforge.net Thu Mar 14 16:24:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 08:24:30 -0800 Subject: [Patches] [ python-Patches-502415 ] optimize attribute lookups Message-ID: Patches item #502415, was opened at 2002-01-11 18:07 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Zooko O'Whielacronx (zooko) Assigned to: Nobody/Anonymous (nobody) Summary: optimize attribute lookups Initial Comment: This patch optimizes the string comparisons in class_getattr(), class_setattr(), instance_getattr1(), and instance_setattr(). I pulled out the relevant section of class_setattr() and measured its performance, yielding the following results: * in the case that the argument does *not* begin with "__", then the new version is 1.03 times as fast as the old. (This is a mystery to me, as the path through the code looks the same, in C. I examined the assembly that GCC v3.0.3 generated in -O3 mode, and it is true that the assembly for the new version is smaller/faster, although I don't really understand why.) * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "X_" (where X is a random alphabetic character), then the new version 1.12 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and does *not* end with "_", then the new version 1.16 times as fast as the old. * in the case that the argument is (randomly) one of the six special names, then the new version is 2.7 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "__" (but is not one of the six special names), then the new version is 3.7 times as fast as the old. ---------------------------------------------------------------------- >Comment By: Zooko O'Whielacronx (zooko) Date: 2002-03-14 16:24 Message: Logged In: YES user_id=52562 update: I did a real app benchmark of this patch by running one of the unit tests from PyXML-0.6.6. (Which one? The one that I guessed would favor my optimization the most. Unfortunately I've lost my notes and I don't remember which one.) I also separated out the "unroll strcmp" optimization from the "use macros" optimization on request. I have lost my notes, but I recall that my results showed what I expected: between 0.5 and 3 percent app-level speed-up for the unroll strcmp optimization. Interesting detail: a quirk in GCC 3 makes the unroll strcmp version is slightly faster than the current strcmp version *even* in the (common) case that the first two characters of the attribute name are *not* '__'. What should happen next: 1. Someone who has the authority to approve or reject this patch should tell me what kind of benchmark would be persuasive to you. I mean: what specific program I can run with and without my patch for a useful comparison. (If you require more than a 5% app-level speed-up, then let's give up on this patch now!) 2. Someone volunteer to test this patch with MSFT compiler, as I don't have one right now. Some people are still using the Windows platform, I've noticed [1], so it is worth benchmarking. Actually, someone should volunteer to benchmark GCC+Linux-or-MacOSX, too, as my computer is a laptop with variable-speed CPU and is really crummy for benchmarking. By the way, PEP 266 is a better solution to the problem but until it's implemented, this patch is the better patch. ;-) Note: this is one of those patches that looks uglier in "diff -u" format than in actual source code. Please browse the actual source side-by-side [2] to see how ugly it really is. Regards Zooko [1] http://www.google.com/press/zeitgeist/jan02-pie.gif [2] search for "class_getattr" in: http://zooko.com/classobject.c http://zooko.com/classobject-strcmpunroll.c --- zooko.com Security and Distributed Systems Engineering --- ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-01-18 00:22 Message: Logged In: YES user_id=52562 Okay I've done some "mini benchmarks". The earlier reported micro-benchmarks were the result of running the inner loop itself, in C. These mini benchmarks are the result of running this Python script: class A: def __init__(self): self.a = 0 a = A() for i in xrange(2**20): a.a = i print a.a and then using different attribute names in place of `a'. The results are as expected: the optimized version is faster than the current one, depending on the shape of the attribute name, and dampened by the fact that there is now other work being done. The case that shows the smallest difference is when the attribute name neither begins nor ends with an '_'. In that case the above script runs about 2% faster with the optimizations. The case that shows the biggest difference is when the attribute begins and ends with '__', as in `__a__'. Then the above script runs about 15% faster. This still isn't a *real* application benchmark. I'm looking for one that is a reasonable case for real Python users but that also uses attribute lookups heavily. ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-01-17 20:33 Message: Logged In: YES user_id=52562 Yeah, the optimized version is less readable that the original. I'll try to come up with a benchmark application. Any ideas? Maybe some unit tests from Zope that use attribute lookups heavily? My guess is that the actual results in an application will be "marginal", like maybe between 0.5% to 3% improvement. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-17 18:29 Message: Logged In: YES user_id=31392 This seems to add a lot of complexity for a few special cases. How important are these particular attributes? Do you have any benchmark applications that show real improvement? It seems like microbenchmarks overstate the benefit, since we don't know how often these attributes are looked up by most applications. It would also be interesting to see how much of the benefit for non __ names is the result of the PyString_AS_STRING() macro. Maybe that's all the change we really need :-). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470 From noreply@sourceforge.net Thu Mar 14 23:05:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 15:05:47 -0800 Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py Message-ID: Patches item #503202, was opened at 2002-01-14 00:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 Category: Library (Lib) >Group: Python 2.2.x Status: Open Resolution: None >Priority: 7 Submitted By: Hye-Shik Chang (perky) Assigned to: Barry Warsaw (bwarsaw) Summary: backward compat. on calendar.py Initial Comment: Many applications fails on 2.2 by this problem: under 2.1.1 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun >>> calendar.month_abbr[7:] ['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] 2.2 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () ValueError: year out of range >>> calendar.month_abbr[7:] Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () TypeError: an integer is required >>> ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-14 18:05 Message: Logged In: YES user_id=31435 Based on Guido's comment, categorized as 2.2.x and boosted priority to 7. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-01-14 01:18 Message: Logged In: YES user_id=6380 You're right. Assigned to Barry. I propose that the test suite should be changed to test for this. This would be a 2.2.1 bugfix candidate! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 From noreply@sourceforge.net Thu Mar 14 23:58:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 15:58:07 -0800 Subject: [Patches] [ python-Patches-530105 ] file object may not be subtyped Message-ID: Patches item #530105, was opened at 2002-03-14 23:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Nobody/Anonymous (nobody) Summary: file object may not be subtyped Initial Comment: PyFileObject should be defined in fileobject.h, so it may be properly subtyped. This patches fixes this, and also a comment word typed twice. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470 From noreply@sourceforge.net Fri Mar 15 01:41:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 17:41:43 -0800 Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py Message-ID: Patches item #503202, was opened at 2002-01-13 23:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Hye-Shik Chang (perky) Assigned to: Barry Warsaw (bwarsaw) Summary: backward compat. on calendar.py Initial Comment: Many applications fails on 2.2 by this problem: under 2.1.1 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun >>> calendar.month_abbr[7:] ['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] 2.2 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () ValueError: year out of range >>> calendar.month_abbr[7:] Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () TypeError: an integer is required >>> ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 19:41 Message: Logged In: YES user_id=44345 Looks to me like adding if item > 6 or item < -7: raise IndexError to the start of _localized_name.__getitem__ will do the trick. (Should a test for non-integer items also be added?) Skip ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-14 17:05 Message: Logged In: YES user_id=31435 Based on Guido's comment, categorized as 2.2.x and boosted priority to 7. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-01-14 00:18 Message: Logged In: YES user_id=6380 You're right. Assigned to Barry. I propose that the test suite should be changed to test for this. This would be a 2.2.1 bugfix candidate! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 From noreply@sourceforge.net Fri Mar 15 01:45:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 17:45:23 -0800 Subject: [Patches] [ python-Patches-480902 ] allow dumbdbm to reuse space Message-ID: Patches item #480902, was opened at 2001-11-12 07:30 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=480902&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Skip Montanaro (montanaro) Summary: allow dumbdbm to reuse space Initial Comment: This patch to dumbdbm does two things: * allows it to reuse holes in the .dat file * provides a somewhat more complete test The first change should be considered only for 2.3. Barry may or may not want to check out the test case rewrite for incorporation into 2.2. Accordingly, I've assigned it to him. Skip ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 19:45 Message: Logged In: YES user_id=44345 Unless someone else has an objection, I'm going to close this. Barry already incorporated the expanded test case and the space reuse is not really that important in my mind since dumbdbm is generally only a fallback when no other database is available. If someone wants to use a database bad enough, they will probably figure out a way to use something more powerful. Skip ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2001-11-13 14:16 Message: Logged In: YES user_id=12800 I've accepted the second half -- the improvement to the test suite -- but as recommended, I'm postponing the first half until Py 2.3. Assigning back to Skip so he'll remember to deal with this again later. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=480902&group_id=5470 From noreply@sourceforge.net Fri Mar 15 03:08:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 19:08:17 -0800 Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py Message-ID: Patches item #503202, was opened at 2002-01-14 00:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 7 Submitted By: Hye-Shik Chang (perky) >Assigned to: Skip Montanaro (montanaro) Summary: backward compat. on calendar.py Initial Comment: Many applications fails on 2.2 by this problem: under 2.1.1 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun >>> calendar.month_abbr[7:] ['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] 2.2 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () ValueError: year out of range >>> calendar.month_abbr[7:] Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () TypeError: an integer is required >>> ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-14 22:08 Message: Logged In: YES user_id=12800 Go for it Skip! ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 20:41 Message: Logged In: YES user_id=44345 Looks to me like adding if item > 6 or item < -7: raise IndexError to the start of _localized_name.__getitem__ will do the trick. (Should a test for non-integer items also be added?) Skip ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-14 18:05 Message: Logged In: YES user_id=31435 Based on Guido's comment, categorized as 2.2.x and boosted priority to 7. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-01-14 01:18 Message: Logged In: YES user_id=6380 You're right. Assigned to Barry. I propose that the test suite should be changed to test for this. This would be a 2.2.1 bugfix candidate! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 From noreply@sourceforge.net Fri Mar 15 04:09:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 20:09:49 -0800 Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py Message-ID: Patches item #503202, was opened at 2002-01-13 23:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed >Resolution: Fixed Priority: 7 Submitted By: Hye-Shik Chang (perky) Assigned to: Skip Montanaro (montanaro) Summary: backward compat. on calendar.py Initial Comment: Many applications fails on 2.2 by this problem: under 2.1.1 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun >>> calendar.month_abbr[7:] ['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] 2.2 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () ValueError: year out of range >>> calendar.month_abbr[7:] Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () TypeError: an integer is required >>> ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 22:09 Message: Logged In: YES user_id=44345 fixed by calendar.py 1.23 and test_calendar.py 1.2. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-14 21:08 Message: Logged In: YES user_id=12800 Go for it Skip! ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 19:41 Message: Logged In: YES user_id=44345 Looks to me like adding if item > 6 or item < -7: raise IndexError to the start of _localized_name.__getitem__ will do the trick. (Should a test for non-integer items also be added?) Skip ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-14 17:05 Message: Logged In: YES user_id=31435 Based on Guido's comment, categorized as 2.2.x and boosted priority to 7. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-01-14 00:18 Message: Logged In: YES user_id=6380 You're right. Assigned to Barry. I propose that the test suite should be changed to test for this. This would be a 2.2.1 bugfix candidate! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 From noreply@sourceforge.net Fri Mar 15 07:16:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 23:16:47 -0800 Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr Message-ID: Patches item #517521, was opened at 2002-02-15 01:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Optimization for PyObject_Get/SetAttr Initial Comment: The attached patch is based on the assumption that the vast majority of calls to PyObject_GetAttr and PyObject_SetAttr use a PyString (rather than a PyUnicode) as the name parameter. Because these routines perform a PyUnicode_Check first, every call (with a PyString as name) requires a call to PyType_IsSubType. By reorganizing so that PyString_Check is called first, the call to PyType_IsSubType is avoided in the common case. The same reorganization is done for PyObject_GenericGet/SetAttr. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:16 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 From noreply@sourceforge.net Fri Mar 15 07:18:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 23:18:05 -0800 Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr Message-ID: Patches item #517521, was opened at 2002-02-15 01:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Optimization for PyObject_Get/SetAttr Initial Comment: The attached patch is based on the assumption that the vast majority of calls to PyObject_GetAttr and PyObject_SetAttr use a PyString (rather than a PyUnicode) as the name parameter. Because these routines perform a PyUnicode_Check first, every call (with a PyString as name) requires a call to PyType_IsSubType. By reorganizing so that PyString_Check is called first, the call to PyType_IsSubType is avoided in the common case. The same reorganization is done for PyObject_GenericGet/SetAttr. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:16 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 From noreply@sourceforge.net Fri Mar 15 07:18:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 23:18:31 -0800 Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr Message-ID: Patches item #517521, was opened at 2002-02-15 01:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Optimization for PyObject_Get/SetAttr Initial Comment: The attached patch is based on the assumption that the vast majority of calls to PyObject_GetAttr and PyObject_SetAttr use a PyString (rather than a PyUnicode) as the name parameter. Because these routines perform a PyUnicode_Check first, every call (with a PyString as name) requires a call to PyType_IsSubType. By reorganizing so that PyString_Check is called first, the call to PyType_IsSubType is avoided in the common case. The same reorganization is done for PyObject_GenericGet/SetAttr. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:16 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 From noreply@sourceforge.net Fri Mar 15 07:25:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 23:25:04 -0800 Subject: [Patches] [ python-Patches-529768 ] Speed-up getattr Message-ID: Patches item #529768, was opened at 2002-03-14 18:13 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: Nobody/Anonymous (nobody) Summary: Speed-up getattr Initial Comment: This patch moves the string check in getattr before the Unicode check, reducing the number of IsSubType checks originating from getattr to 50% in a typical application. For the attached artificial benchmark, this gives a 7% speed-up. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:25 Message: Logged In: YES user_id=250749 This seems to be pretty close to patch # 517521. That patch only gains about 2% overall according to my PyBench 1.0 tests. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470 From noreply@sourceforge.net Fri Mar 15 07:25:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 14 Mar 2002 23:25:37 -0800 Subject: [Patches] [ python-Patches-529768 ] Speed-up getattr Message-ID: Patches item #529768, was opened at 2002-03-14 18:13 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: Nobody/Anonymous (nobody) Summary: Speed-up getattr Initial Comment: This patch moves the string check in getattr before the Unicode check, reducing the number of IsSubType checks originating from getattr to 50% in a typical application. For the attached artificial benchmark, this gives a 7% speed-up. ---------------------------------------------------------------------- >Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:25 Message: Logged In: YES user_id=250749 This seems to be pretty close to patch # 517521. That patch only gains about 2% overall according to my PyBench 1.0 tests. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 18:25 Message: Logged In: YES user_id=250749 This seems to be pretty close to patch # 517521. That patch only gains about 2% overall according to my PyBench 1.0 tests. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470 From noreply@sourceforge.net Fri Mar 15 08:19:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 00:19:18 -0800 Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr Message-ID: Patches item #517521, was opened at 2002-02-14 15:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Optimization for PyObject_Get/SetAttr Initial Comment: The attached patch is based on the assumption that the vast majority of calls to PyObject_GetAttr and PyObject_SetAttr use a PyString (rather than a PyUnicode) as the name parameter. Because these routines perform a PyUnicode_Check first, every call (with a PyString as name) requires a call to PyType_IsSubType. By reorganizing so that PyString_Check is called first, the call to PyType_IsSubType is avoided in the common case. The same reorganization is done for PyObject_GenericGet/SetAttr. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 09:19 Message: Logged In: YES user_id=21627 It is a fairly trivial change, and it has no ill effects, so I think this it is worth the trouble (in particular since a duplicate has been submitted as 529768). Whether PEP 263 affects it depends on the implementation strategy taken in phase 2; most likely, attribute accesses remain as byte strings (it is already decided that they remain restricted to ASCII). Unless there are any strong objections to this patch, I'd like to integrate it. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:16 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 From noreply@sourceforge.net Fri Mar 15 08:31:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 00:31:43 -0800 Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr Message-ID: Patches item #517521, was opened at 2002-02-14 09:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Optimization for PyObject_Get/SetAttr Initial Comment: The attached patch is based on the assumption that the vast majority of calls to PyObject_GetAttr and PyObject_SetAttr use a PyString (rather than a PyUnicode) as the name parameter. Because these routines perform a PyUnicode_Check first, every call (with a PyString as name) requires a call to PyType_IsSubType. By reorganizing so that PyString_Check is called first, the call to PyType_IsSubType is avoided in the common case. The same reorganization is done for PyObject_GenericGet/SetAttr. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-15 03:31 Message: Logged In: YES user_id=31435 +1 on integrating the patch. Better 2% today than 200% that may never materialize! ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 03:19 Message: Logged In: YES user_id=21627 It is a fairly trivial change, and it has no ill effects, so I think this it is worth the trouble (in particular since a duplicate has been submitted as 529768). Whether PEP 263 affects it depends on the implementation strategy taken in phase 2; most likely, attribute accesses remain as byte strings (it is already decided that they remain restricted to ASCII). Unless there are any strong objections to this patch, I'd like to integrate it. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 02:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 02:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 02:16 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 From noreply@sourceforge.net Fri Mar 15 08:54:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 00:54:32 -0800 Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores Message-ID: Patches item #525532, was opened at 2002-03-04 09:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Add support for POSIX semaphores Initial Comment: thread_pthread.h can be modified to use POSIX semaphores if available. This is more efficient than emulating them with mutexes and condition variables, and at least one platform that supports POSIX semaphores has a race condition in its condition variable support. The new file would still be supporting POSIX threads, although from both and , so perhaps ought to be renamed if this patch is accepted. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-15 03:54 Message: Logged In: YES user_id=31435 Can someone on a pthreads platform please continue with this? I'm +1 on it via eyeballing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 From noreply@sourceforge.net Fri Mar 15 13:41:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 05:41:37 -0800 Subject: [Patches] [ python-Patches-529768 ] Speed-up getattr Message-ID: Patches item #529768, was opened at 2002-03-14 08:13 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470 Category: Core (C code) Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Submitted By: Martin v. Lцwis (loewis) Assigned to: Nobody/Anonymous (nobody) Summary: Speed-up getattr Initial Comment: This patch moves the string check in getattr before the Unicode check, reducing the number of IsSubType checks originating from getattr to 50% in a typical application. For the attached artificial benchmark, this gives a 7% speed-up. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:41 Message: Logged In: YES user_id=21627 Closing it as a duplicate. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:25 Message: Logged In: YES user_id=250749 This seems to be pretty close to patch # 517521. That patch only gains about 2% overall according to my PyBench 1.0 tests. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:25 Message: Logged In: YES user_id=250749 This seems to be pretty close to patch # 517521. That patch only gains about 2% overall according to my PyBench 1.0 tests. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529768&group_id=5470 From noreply@sourceforge.net Fri Mar 15 13:41:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 05:41:06 -0800 Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr Message-ID: Patches item #517521, was opened at 2002-02-14 15:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 Category: Core (C code) Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Optimization for PyObject_Get/SetAttr Initial Comment: The attached patch is based on the assumption that the vast majority of calls to PyObject_GetAttr and PyObject_SetAttr use a PyString (rather than a PyUnicode) as the name parameter. Because these routines perform a PyUnicode_Check first, every call (with a PyString as name) requires a call to PyType_IsSubType. By reorganizing so that PyString_Check is called first, the call to PyType_IsSubType is avoided in the common case. The same reorganization is done for PyObject_GenericGet/SetAttr. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:41 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as object.c 2.164. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-15 09:31 Message: Logged In: YES user_id=31435 +1 on integrating the patch. Better 2% today than 200% that may never materialize! ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 09:19 Message: Logged In: YES user_id=21627 It is a fairly trivial change, and it has no ill effects, so I think this it is worth the trouble (in particular since a duplicate has been submitted as 529768). Whether PEP 263 affects it depends on the implementation strategy taken in phase 2; most likely, attribute accesses remain as byte strings (it is already decided that they remain restricted to ASCII). Unless there are any strong objections to this patch, I'd like to integrate it. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:16 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 From noreply@sourceforge.net Fri Mar 15 13:44:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 05:44:45 -0800 Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr Message-ID: Patches item #517521, was opened at 2002-02-14 15:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Closed Resolution: Accepted Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Optimization for PyObject_Get/SetAttr Initial Comment: The attached patch is based on the assumption that the vast majority of calls to PyObject_GetAttr and PyObject_SetAttr use a PyString (rather than a PyUnicode) as the name parameter. Because these routines perform a PyUnicode_Check first, every call (with a PyString as name) requires a call to PyType_IsSubType. By reorganizing so that PyString_Check is called first, the call to PyType_IsSubType is avoided in the common case. The same reorganization is done for PyObject_GenericGet/SetAttr. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:44 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as object.c 2.164. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:41 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as object.c 2.164. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-15 09:31 Message: Logged In: YES user_id=31435 +1 on integrating the patch. Better 2% today than 200% that may never materialize! ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 09:19 Message: Logged In: YES user_id=21627 It is a fairly trivial change, and it has no ill effects, so I think this it is worth the trouble (in particular since a duplicate has been submitted as 529768). Whether PEP 263 affects it depends on the implementation strategy taken in phase 2; most likely, attribute accesses remain as byte strings (it is already decided that they remain restricted to ASCII). Unless there are any strong objections to this patch, I'd like to integrate it. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 08:16 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 From noreply@sourceforge.net Fri Mar 15 13:45:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 05:45:24 -0800 Subject: [Patches] [ python-Patches-530105 ] file object may not be subtyped Message-ID: Patches item #530105, was opened at 2002-03-15 00:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Nobody/Anonymous (nobody) Summary: file object may not be subtyped Initial Comment: PyFileObject should be defined in fileobject.h, so it may be properly subtyped. This patches fixes this, and also a comment word typed twice. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:45 Message: Logged In: YES user_id=21627 This patch looks good to me. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470 From noreply@sourceforge.net Fri Mar 15 13:48:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 05:48:54 -0800 Subject: [Patches] [ python-Patches-527434 ] Double inclusion of thread.o on Sol2.8 Message-ID: Patches item #527434, was opened at 2002-03-08 16:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527434&group_id=5470 Category: Build Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Double inclusion of thread.o on Sol2.8 Initial Comment: When compiling on Solaris 2.8(sparc), the thread.o gets included twice in the list of objects. The problem arises when compiling python as shared library as you may not specify the some thing twice. This patch evades checking for -lthread if posix threads are already defined. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:48 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied as configure 1.287; configure.in 1.297. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527434&group_id=5470 From noreply@sourceforge.net Fri Mar 15 13:54:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 05:54:21 -0800 Subject: [Patches] [ python-Patches-527427 ] minidom fails to use NodeList sometimes Message-ID: Patches item #527427, was opened at 2002-03-08 16:39 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Cesar Eduardo Barros (cesarb) Assigned to: Nobody/Anonymous (nobody) Summary: minidom fails to use NodeList sometimes Initial Comment: (why is the summary box so small?) xml.dom.minidom doesn't use a NodeList as the return type of GetElementsByTagName{,NS} as it should. The patch (against 2.2 or HEAD) fixes it. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:54 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as 1.44 and 1.43.6.1 (Python) and 1.39 (PyXML). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470 From noreply@sourceforge.net Fri Mar 15 13:54:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 05:54:52 -0800 Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py Message-ID: Patches item #503202, was opened at 2002-01-13 23:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Fixed Priority: 7 Submitted By: Hye-Shik Chang (perky) Assigned to: Skip Montanaro (montanaro) Summary: backward compat. on calendar.py Initial Comment: Many applications fails on 2.2 by this problem: under 2.1.1 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun >>> calendar.month_abbr[7:] ['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] 2.2 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () ValueError: year out of range >>> calendar.month_abbr[7:] Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () TypeError: an integer is required >>> ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-03-15 07:54 Message: Logged In: YES user_id=44345 further update - 1.24 adds slicing capability - I missed the patch attached to the original report (thought it was a bug report and didn't even notice the diff - my apologies to perky). ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 22:09 Message: Logged In: YES user_id=44345 fixed by calendar.py 1.23 and test_calendar.py 1.2. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-14 21:08 Message: Logged In: YES user_id=12800 Go for it Skip! ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 19:41 Message: Logged In: YES user_id=44345 Looks to me like adding if item > 6 or item < -7: raise IndexError to the start of _localized_name.__getitem__ will do the trick. (Should a test for non-integer items also be added?) Skip ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-14 17:05 Message: Logged In: YES user_id=31435 Based on Guido's comment, categorized as 2.2.x and boosted priority to 7. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-01-14 00:18 Message: Logged In: YES user_id=6380 You're right. Assigned to Barry. I propose that the test suite should be changed to test for this. This would be a 2.2.1 bugfix candidate! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 From noreply@sourceforge.net Fri Mar 15 13:55:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 05:55:40 -0800 Subject: [Patches] [ python-Patches-527427 ] minidom fails to use NodeList sometimes Message-ID: Patches item #527427, was opened at 2002-03-08 16:39 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Cesar Eduardo Barros (cesarb) Assigned to: Nobody/Anonymous (nobody) Summary: minidom fails to use NodeList sometimes Initial Comment: (why is the summary box so small?) xml.dom.minidom doesn't use a NodeList as the return type of GetElementsByTagName{,NS} as it should. The patch (against 2.2 or HEAD) fixes it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:54 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as 1.44 and 1.43.6.1 (Python) and 1.39 (PyXML). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527427&group_id=5470 From noreply@sourceforge.net Fri Mar 15 14:05:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 06:05:01 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 15:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 17:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 15:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 12:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 11:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 11:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 18:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Fri Mar 15 14:08:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 06:08:21 -0800 Subject: [Patches] [ python-Patches-503202 ] backward compat. on calendar.py Message-ID: Patches item #503202, was opened at 2002-01-13 23:47 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Fixed Priority: 7 Submitted By: Hye-Shik Chang (perky) Assigned to: Skip Montanaro (montanaro) Summary: backward compat. on calendar.py Initial Comment: Many applications fails on 2.2 by this problem: under 2.1.1 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun >>> calendar.month_abbr[7:] ['Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] 2.2 --- >>> import calendar >>> for n in calendar.day_abbr: ... print n, ... Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon Tue Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () ValueError: year out of range >>> calendar.month_abbr[7:] Traceback (most recent call last): File "", line 1, in ? File "/usr/pkg/lib/python2.2/calendar.py", line 31, in __getitem__ return strftime(self.format, (item,)*9).capitalize () TypeError: an integer is required >>> ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-03-15 08:08 Message: Logged In: YES user_id=44345 further update - 1.24 adds slicing capability - I missed the patch attached to the original report (thought it was a bug report and didn't even notice the diff - my apologies to perky). ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-15 07:54 Message: Logged In: YES user_id=44345 further update - 1.24 adds slicing capability - I missed the patch attached to the original report (thought it was a bug report and didn't even notice the diff - my apologies to perky). ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 22:09 Message: Logged In: YES user_id=44345 fixed by calendar.py 1.23 and test_calendar.py 1.2. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-14 21:08 Message: Logged In: YES user_id=12800 Go for it Skip! ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 19:41 Message: Logged In: YES user_id=44345 Looks to me like adding if item > 6 or item < -7: raise IndexError to the start of _localized_name.__getitem__ will do the trick. (Should a test for non-integer items also be added?) Skip ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-14 17:05 Message: Logged In: YES user_id=31435 Based on Guido's comment, categorized as 2.2.x and boosted priority to 7. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-01-14 00:18 Message: Logged In: YES user_id=6380 You're right. Assigned to Barry. I propose that the test suite should be changed to test for this. This would be a 2.2.1 bugfix candidate! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=503202&group_id=5470 From noreply@sourceforge.net Fri Mar 15 14:37:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 06:37:21 -0800 Subject: [Patches] [ python-Patches-517521 ] Optimization for PyObject_Get/SetAttr Message-ID: Patches item #517521, was opened at 2002-02-14 09:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Closed Resolution: Accepted Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Nobody/Anonymous (nobody) Summary: Optimization for PyObject_Get/SetAttr Initial Comment: The attached patch is based on the assumption that the vast majority of calls to PyObject_GetAttr and PyObject_SetAttr use a PyString (rather than a PyUnicode) as the name parameter. Because these routines perform a PyUnicode_Check first, every call (with a PyString as name) requires a call to PyType_IsSubType. By reorganizing so that PyString_Check is called first, the call to PyType_IsSubType is avoided in the common case. The same reorganization is done for PyObject_GenericGet/SetAttr. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-15 09:37 Message: Logged In: YES user_id=6380 You mean 2.165, surely. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 08:44 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as object.c 2.164. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 08:41 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as object.c 2.164. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-15 03:31 Message: Logged In: YES user_id=31435 +1 on integrating the patch. Better 2% today than 200% that may never materialize! ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 03:19 Message: Logged In: YES user_id=21627 It is a fairly trivial change, and it has no ill effects, so I think this it is worth the trouble (in particular since a duplicate has been submitted as 529768). Whether PEP 263 affects it depends on the implementation strategy taken in phase 2; most likely, attribute accesses remain as byte strings (it is already decided that they remain restricted to ASCII). Unless there are any strong objections to this patch, I'd like to integrate it. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 02:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 02:18 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-15 02:16 Message: Logged In: YES user_id=250749 I've tried this patch and it appears to have no ill affects on a FreeBSD 4.4 system, though I haven't exhaustively checked it. My testing (using pystone.py and PyBench 1.0) shows only about 2% gain, which in isolation is hardly worth the bother (though a number of 2% gains can cumulatively be attactive). I don't know at this point whether PEP 263 (if accepted) would have any affect on the change implemented by the patch; if so, it may not be worth pursuing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517521&group_id=5470 From noreply@sourceforge.net Fri Mar 15 15:30:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 07:30:17 -0800 Subject: [Patches] [ python-Patches-530105 ] file object may not be subtyped Message-ID: Patches item #530105, was opened at 2002-03-14 18:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470 Category: None Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Nobody/Anonymous (nobody) Summary: file object may not be subtyped Initial Comment: PyFileObject should be defined in fileobject.h, so it may be properly subtyped. This patches fixes this, and also a comment word typed twice. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-15 10:30 Message: Logged In: YES user_id=6380 Looks good to me too. Check it in. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 08:45 Message: Logged In: YES user_id=21627 This patch looks good to me. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470 From noreply@sourceforge.net Fri Mar 15 17:03:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 09:03:45 -0800 Subject: [Patches] [ python-Patches-492105 ] Import from Zip archive Message-ID: Patches item #492105, was opened at 2001-12-12 17:21 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: James C. Ahlstrom (ahlstromjc) Assigned to: Nobody/Anonymous (nobody) Summary: Import from Zip archive Initial Comment: This is the "final" patch to support imports from zip archives, and directory caching using os.listdir(). It replaces patch 483466 and 476047. It is a separate patch since I can't delete file attachments. It adds support for importing from "" and from relative paths. ---------------------------------------------------------------------- >Comment By: James C. Ahlstrom (ahlstromjc) Date: 2002-03-15 17:03 Message: Logged In: YES user_id=64929 I still can't delete files, but I added a new file which contains all diffs as a single file, and is made from the current CVS tree (Mar 15, 2002). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470 From noreply@sourceforge.net Fri Mar 15 17:06:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 09:06:11 -0800 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 15:43 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 6 Submitted By: Walter Dцrwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aдoцuьЯ".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-15 18:06 Message: Logged In: YES user_id=89016 For encoding it's always (end-start)*u"?": >>> u"дд".encode("ascii", "replace") '??' But for decoding, it is neither nor: >>> "\Ux\U".decode("unicode-escape", "replace") u'\ufffd\ufffd' i.e. a sequence of 5 illegal characters was replace by two replacement characters. This might mean that decoders can't collect all the illegal characters and call the callback once. They might have to call the callback for every single illegal byte sequence to get the old behaviour. (It seems that this patch would be much, much simpler, if we only change the encoders) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 19:36 Message: Logged In: YES user_id=38388 Hmm, whatever it takes to maintain backwards compatibility. Do you have an example ? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-08 18:31 Message: Logged In: YES user_id=89016 What should replace do: Return u"?" or (end-start)*u"?" ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 16:15 Message: Logged In: YES user_id=38388 Sounds like a good idea. Please keep the encoder and decoder APIs symmetric, though, ie. add the slice information to both APIs. The slice should use the same format as Python's standard slices, that is left inclusive, right exclusive. I like the highlighting feature ! ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-08 00:09 Message: Logged In: YES user_id=89016 I'm think about extending the API a little bit: Consider the following example: >>> "\u1".decode("unicode-escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 2: truncated \uXXXX escape The error message is a lie: Not the '1' in position 2 is the problem, but the complete truncated sequence '\u1'. For this the decoder should pass a start and an end position to the handler. For encoding this would be useful too: Suppose I want to have an encoder that colors the unencodable character via an ANSI escape sequences. Then I could do the following: >>> import codecs >>> def color(enc, uni, pos, why, sta): ... return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1) ... >>> codecs.register_unicodeencodeerrorhandler("color", color) >>> u"aдьцo".encode("ascii", "color") 'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b [0mo' But here the sequences "\x1b[0m\x1b[1m" are not needed. To fix this problem the encoder could collect as many unencodable characters as possible and pass those to the error callback in one go (passing a start and end+1 position). This fixes the above problem and reduces the number of calls to the callback, so it should speed up the algorithms in case of custom encoding names. (And it makes the implementation very interesting ;)) What do you think? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-07 02:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 17:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 12:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 12:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-27 05:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-23 19:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gьrk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 13:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"дць".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-12 13:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gьrk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"дць".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 14:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gьrk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 22:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 19:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 17:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 15:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gьrk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 10:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 21:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 20:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 20:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 16:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Fri Mar 15 17:19:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 09:19:42 -0800 Subject: [Patches] [ python-Patches-432401 ] unicode encoding error callbacks Message-ID: Patches item #432401, was opened at 2001-06-12 15:43 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Postponed Priority: 6 Submitted By: Walter Dцrwald (doerwalter) Assigned to: M.-A. Lemburg (lemburg) Summary: unicode encoding error callbacks Initial Comment: This patch adds unicode error handling callbacks to the encode functionality. With this patch it's possible to not only pass 'strict', 'ignore' or 'replace' as the errors argument to encode, but also a callable function, that will be called with the encoding name, the original unicode object and the position of the unencodable character. The callback must return a replacement unicode object that will be encoded instead of the original character. For example replacing unencodable characters with XML character references can be done in the following way. u"aдoцuьЯ".encode( "ascii", lambda enc, uni, pos: u"&#x%x;" % ord(uni[pos]) ) ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-15 18:19 Message: Logged In: YES user_id=89016 So this means that the encoder can collect illegal characters and pass it to the callback. "replace" will replace this with (end-start)*u"?". Decoders don't collect all illegal byte sequences, but call the callback once for every byte sequence that has been found illegal and "replace" will replace it with u"?". Does this make sense? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-15 18:06 Message: Logged In: YES user_id=89016 For encoding it's always (end-start)*u"?": >>> u"дд".encode("ascii", "replace") '??' But for decoding, it is neither nor: >>> "\Ux\U".decode("unicode-escape", "replace") u'\ufffd\ufffd' i.e. a sequence of 5 illegal characters was replace by two replacement characters. This might mean that decoders can't collect all the illegal characters and call the callback once. They might have to call the callback for every single illegal byte sequence to get the old behaviour. (It seems that this patch would be much, much simpler, if we only change the encoders) ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 19:36 Message: Logged In: YES user_id=38388 Hmm, whatever it takes to maintain backwards compatibility. Do you have an example ? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-08 18:31 Message: Logged In: YES user_id=89016 What should replace do: Return u"?" or (end-start)*u"?" ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-08 16:15 Message: Logged In: YES user_id=38388 Sounds like a good idea. Please keep the encoder and decoder APIs symmetric, though, ie. add the slice information to both APIs. The slice should use the same format as Python's standard slices, that is left inclusive, right exclusive. I like the highlighting feature ! ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-08 00:09 Message: Logged In: YES user_id=89016 I'm think about extending the API a little bit: Consider the following example: >>> "\u1".decode("unicode-escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 2: truncated \uXXXX escape The error message is a lie: Not the '1' in position 2 is the problem, but the complete truncated sequence '\u1'. For this the decoder should pass a start and an end position to the handler. For encoding this would be useful too: Suppose I want to have an encoder that colors the unencodable character via an ANSI escape sequences. Then I could do the following: >>> import codecs >>> def color(enc, uni, pos, why, sta): ... return (u"\033[1m<%d>\033[0m" % ord(uni[pos]), pos+1) ... >>> codecs.register_unicodeencodeerrorhandler("color", color) >>> u"aдьцo".encode("ascii", "color") 'a\x1b[1m<228>\x1b[0m\x1b[1m<252>\x1b[0m\x1b[1m<246>\x1b [0mo' But here the sequences "\x1b[0m\x1b[1m" are not needed. To fix this problem the encoder could collect as many unencodable characters as possible and pass those to the error callback in one go (passing a start and end+1 position). This fixes the above problem and reduces the number of calls to the callback, so it should speed up the algorithms in case of custom encoding names. (And it makes the implementation very interesting ;)) What do you think? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-07 02:29 Message: Logged In: YES user_id=89016 I started from scratch, and the current state is this: Encoding mostly works (except that I haven't changed TranslateCharmap and EncodeDecimal yet) and most of the decoding stuff works (DecodeASCII and DecodeCharmap are still unchanged) and the decoding callback helper isn't optimized for the "builtin" names yet (i.e. it still calls the handler). For encoding the callback helper knows how to handle "strict", "replace", "ignore" and "xmlcharrefreplace" itself and won't call the callback. This should make the encoder fast enough. As callback name string comparison results are cached it might even be faster than the original. The patch so far didn't require any changes to unicodeobject.h, stringobject.h or stringobject.c ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-05 17:49 Message: Logged In: YES user_id=38388 Walter, are you making any progress on the new scheme we discussed on the mailing list (adding an error handler registry much like the codec registry itself instead of trying to redo the complete codec API) ? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-09-20 12:38 Message: Logged In: YES user_id=38388 I am postponing this patch until the PEP process has started. This feature won't make it into Python 2.2. Walter, you may want to reference this patch in the PEP. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-08-16 12:53 Message: Logged In: YES user_id=38388 I think we ought to summarize these changes in a PEP to get some more feedback and testing from others as well. I'll look into this after I'm back from vacation on the 10.09. Given the release schedule I am not sure whether this feature will make it into 2.2. The size of the patch is huge and probably needs a lot of testing first. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-27 05:55 Message: Logged In: YES user_id=89016 Changing the decoding API is done now. There are new functions codec.register_unicodedecodeerrorhandler and codec.lookup_unicodedecodeerrorhandler. Only the standard handlers for 'strict', 'ignore' and 'replace' are preregistered. There may be many reasons for decoding errors in the byte string, so I added an additional argument to the decoding API: reason, which gives the reason for the failure, e.g.: >>> "\U1111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 8: truncated \UXXXXXXXX escape >>> "\U11111111".decode("unicode_escape") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'unicodeescape' can't decode byte 0x31 in position 9: illegal Unicode character For symmetry I added this to the encoding API too: >>> u"\xff".encode("ascii") Traceback (most recent call last): File "", line 1, in ? UnicodeError: encoding 'ascii' can't decode byte 0xff in position 0: ordinal not in range(128) The parameters passed to the callbacks now are: encoding, unicode, position, reason, state. The encoding and decoding API for strings has been adapted too, so now the new API should be usable everywhere: >>> unicode("a\xffb\xffc", "ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' >>> "a\xffb\xffc".decode("ascii", ... lambda enc, uni, pos, rea, sta: (u"", pos+1)) u'abc' I had a problem with the decoding API: all the functions in _codecsmodule.c used the t# format specifier. I changed that to O! with &PyString_Type, because otherwise we would have the problem that the decoding API would must pass buffer object around instead of strings, and the callback would have to call str() on the buffer anyway to access a specific character, so this wouldn't be any faster than calling str() on the buffer before decoding. It seems that buffers aren't used anyway. I changed all the old function to call the new ones so bugfixes don't have to be done in two places. There are two exceptions: I didn't change PyString_AsEncodedString and PyString_AsDecodedString because they are documented as deprecated anyway (although they are called in a few spots) This means that I duplicated part of their functionality in PyString_AsEncodedObjectEx and PyString_AsDecodedObjectEx. There are still a few spots that call the old API: E.g. PyString_Format still calls PyUnicode_Decode (but with strict decoding) because it passes the rest of the format string to PyUnicode_Format when it encounters a Unicode object. Should we switch to the new API everywhere even if strict encoding/decoding is used? The size of this patch begins to scare me. I guess we need an extensive test script for all the new features and documentation. I hope you have time to do that, as I'll be busy with other projects in the next weeks. (BTW, I have't touched PyUnicode_TranslateCharmap yet.) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-23 19:03 Message: Logged In: YES user_id=89016 New version of the patch with the error handling callback registry. > > OK, done, now there's a > > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > > codecs.escapereplace_unicodeencode_errors > > that uses \u (or \U if x>0xffff (with a wide build > > of Python)). > > Great! Now PyCodec_EscapeReplaceUnicodeEncodeErrors uses \x in addition to \u and \U where appropriate. > > [...] > > But for special one-shot error handlers, it might still be > > useful to pass the error handler directly, so maybe we > > should leave error as PyObject *, but implement the > > registry anyway? > > Good idea ! > > One minor nit: codecs.registerError() should be named > codecs.register_errorhandler() to be more inline with > the Python coding style guide. OK, but these function are specific to unicode encoding, so now the functions are called: codecs.register_unicodeencodeerrorhandler codecs.lookup_unicodeencodeerrorhandler Now all callbacks (including the new ones: "xmlcharrefreplace" and "escapereplace") are registered in the codecs.c/_PyCodecRegistry_Init so using them is really simple: u"gьrk".encode("ascii", "xmlcharrefreplace") ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-13 13:26 Message: Logged In: YES user_id=38388 > > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > > with \uxxxx replacement callback. > > > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > > I'd rather leave the special encoder in place, > > > > since it is being used a lot in Python and > > > > probably some applications too. > > > > > > It would be a slowdown. But callbacks open many > > > possiblities. > > > > True, but in this case I believe that we should stick with > > the native implementation for "unicode-escape". Having > > a standard callback error handler which does the \uXXXX > > replacement would be nice to have though, since this would > > also be usable with lots of other codecs (e.g. all the > > code page ones). > > OK, done, now there's a > PyCodec_EscapeReplaceUnicodeEncodeErrors/ > codecs.escapereplace_unicodeencode_errors > that uses \u (or \U if x>0xffff (with a wide build > of Python)). Great ! > > [...] > > > Should the old TranslateCharmap map to the new > > > TranslateCharmapEx and inherit the > > > "multicharacter replacement" feature, > > > or should I leave it as it is? > > > > If possible, please also add the multichar replacement > > to the old API. I think it is very useful and since the > > old APIs work on raw buffers it would be a benefit to have > > the functionality in the old implementation too. > > OK! I will try to find the time to implement that in the > next days. Good. > > [Decoding error callbacks] > > > > About the return value: > > > > I'd suggest to always use the same tuple interface, e.g. > > > > callback(encoding, input_data, input_position, > state) -> > > (output_to_be_appended, new_input_position) > > > > (I think it's better to use absolute values for the > > position rather than offsets.) > > > > Perhaps the encoding callbacks should use the same > > interface... what do you think ? > > This would make the callback feature hypergeneric and a > little slower, because tuples have to be created, but it > (almost) unifies the encoding and decoding API. ("almost" > because, for the encoder output_to_be_appended will be > reencoded, for the decoder it will simply be appended.), > so I'm for it. That's the point. Note that I don't think the tuple creation will hurt much (see the make_tuple() API in codecs.c) since small tuples are cached by Python internally. > I implemented this and changed the encoders to only > lookup the error handler on the first error. The UCS1 > encoder now no longer uses the two-item stack strategy. > (This strategy only makes sense for those encoder where > the encoding itself is much more complicated than the > looping/callback etc.) So now memory overflow tests are > only done, when an unencodable error occurs, so now the > UCS1 encoder should be as fast as it was without > error callbacks. > > Do we want to enforce new_input_position>input_position, > or should jumping back be allowed? No; moving backwards should be allowed (this may be useful in order to resynchronize with the input data). > Here's is the current todo list: > 1. implement a new TranslateCharmap and fix the old. > 2. New encoding API for string objects too. > 3. Decoding > 4. Documentation > 5. Test cases > > I'm thinking about a different strategy for implementing > callbacks > (see http://mail.python.org/pipermail/i18n-sig/2001- > July/001262.html) > > We coould have a error handler registry, which maps names > to error handlers, then it would be possible to keep the > errors argument as "const char *" instead of "PyObject *". > Currently PyCodec_UnicodeEncodeHandlerForObject is a > backwards compatibility hack that will never go away, > because > it's always more convenient to type > u"...".encode("...", "strict") > instead of > import codecs > u"...".encode("...", codecs.raise_encode_errors) > > But with an error handler registry this function would > become the official lookup method for error handlers. > (PyCodec_LookupUnicodeEncodeErrorHandler?) > Python code would look like this: > --- > def xmlreplace(encoding, unicode, pos, state): > return (u"&#%d;" % ord(uni[pos]), pos+1) > > import codec > > codec.registerError("xmlreplace",xmlreplace) > --- > and then the following call can be made: > u"дць".encode("ascii", "xmlreplace") > As soon as the first error is encountered, the encoder uses > its builtin error handling method if it recognizes the name > ("strict", "replace" or "ignore") or looks up the error > handling function in the registry if it doesn't. In this way > the speed for the backwards compatible features is the same > as before and "const char *error" can be kept as the > parameter to all encoding functions. For speed common error > handling names could even be implemented in the encoder > itself. > > But for special one-shot error handlers, it might still be > useful to pass the error handler directly, so maybe we > should leave error as PyObject *, but implement the > registry anyway? Good idea ! One minor nit: codecs.registerError() should be named codecs.register_errorhandler() to be more inline with the Python coding style guide. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-07-12 13:03 Message: Logged In: YES user_id=89016 > > [...] > > so I guess we could change the replace handler > > to always return u'?'. This would make the > > implementation a little bit simpler, but the > > explanation of the callback feature *a lot* > > simpler. > > Go for it. OK, done! > [...] > > > Could you add these docs to the Misc/unicode.txt > > > file ? I will eventually take that file and turn > > > it into a PEP which will then serve as general > > > documentation for these things. > > > > I could, but first we should work out how the > > decoding callback API will work. > > Ok. BTW, Barry Warsaw already did the work of converting > the unicode.txt to PEP 100, so the docs should eventually > go there. OK. I guess it would be best to do this when everything is finished. > > > > BTW, I guess PyUnicode_EncodeUnicodeEscape > > > > could be reimplemented as PyUnicode_EncodeASCII > > > > with \uxxxx replacement callback. > > > > > > Hmm, wouldn't that result in a slowdown ? If so, > > > I'd rather leave the special encoder in place, > > > since it is being used a lot in Python and > > > probably some applications too. > > > > It would be a slowdown. But callbacks open many > > possiblities. > > True, but in this case I believe that we should stick with > the native implementation for "unicode-escape". Having > a standard callback error handler which does the \uXXXX > replacement would be nice to have though, since this would > also be usable with lots of other codecs (e.g. all the > code page ones). OK, done, now there's a PyCodec_EscapeReplaceUnicodeEncodeErrors/ codecs.escapereplace_unicodeencode_errors that uses \u (or \U if x>0xffff (with a wide build of Python)). > > For example: > > > > Why can't I print u"gьrk"? > > > > is probably one of the most frequently asked > > questions in comp.lang.python. For printing > > Unicode stuff, print could be extended the use an > > error handling callback for Unicode strings (or > > objects where __str__ or tp_str returns a Unicode > > object) instead of using str() which always > > returns an 8bit string and uses strict encoding. > > There might even be a > > sys.setprintencodehandler()/sys.getprintencodehandler () > > There already is a print callback in Python (forgot the > name of the hook though), so this should be possible by > providing the encoding logic in the hook. True: sys.displayhook > [...] > > Should the old TranslateCharmap map to the new > > TranslateCharmapEx and inherit the > > "multicharacter replacement" feature, > > or should I leave it as it is? > > If possible, please also add the multichar replacement > to the old API. I think it is very useful and since the > old APIs work on raw buffers it would be a benefit to have > the functionality in the old implementation too. OK! I will try to find the time to implement that in the next days. > [Decoding error callbacks] > > About the return value: > > I'd suggest to always use the same tuple interface, e.g. > > callback(encoding, input_data, input_position, state) -> > (output_to_be_appended, new_input_position) > > (I think it's better to use absolute values for the > position rather than offsets.) > > Perhaps the encoding callbacks should use the same > interface... what do you think ? This would make the callback feature hypergeneric and a little slower, because tuples have to be created, but it (almost) unifies the encoding and decoding API. ("almost" because, for the encoder output_to_be_appended will be reencoded, for the decoder it will simply be appended.), so I'm for it. I implemented this and changed the encoders to only lookup the error handler on the first error. The UCS1 encoder now no longer uses the two-item stack strategy. (This strategy only makes sense for those encoder where the encoding itself is much more complicated than the looping/callback etc.) So now memory overflow tests are only done, when an unencodable error occurs, so now the UCS1 encoder should be as fast as it was without error callbacks. Do we want to enforce new_input_position>input_position, or should jumping back be allowed? > > > > One additional note: It is vital that errors > > > > is an assignable attribute of the StreamWriter. > > > > > > It is already ! > > > > I know, but IMHO it should be documented that an > > assignable errors attribute must be supported > > as part of the official codec API. > > > > Misc/unicode.txt is not clear on that: > > """ > > It is not required by the Unicode implementation > > to use these base classes, only the interfaces must > > match; this allows writing Codecs as extension types. > > """ > > Good point. I'll add that to the PEP 100. OK. Here's is the current todo list: 1. implement a new TranslateCharmap and fix the old. 2. New encoding API for string objects too. 3. Decoding 4. Documentation 5. Test cases I'm thinking about a different strategy for implementing callbacks (see http://mail.python.org/pipermail/i18n-sig/2001- July/001262.html) We coould have a error handler registry, which maps names to error handlers, then it would be possible to keep the errors argument as "const char *" instead of "PyObject *". Currently PyCodec_UnicodeEncodeHandlerForObject is a backwards compatibility hack that will never go away, because it's always more convenient to type u"...".encode("...", "strict") instead of import codecs u"...".encode("...", codecs.raise_encode_errors) But with an error handler registry this function would become the official lookup method for error handlers. (PyCodec_LookupUnicodeEncodeErrorHandler?) Python code would look like this: --- def xmlreplace(encoding, unicode, pos, state): return (u"&#%d;" % ord(uni[pos]), pos+1) import codec codec.registerError("xmlreplace",xmlreplace) --- and then the following call can be made: u"дць".encode("ascii", "xmlreplace") As soon as the first error is encountered, the encoder uses its builtin error handling method if it recognizes the name ("strict", "replace" or "ignore") or looks up the error handling function in the registry if it doesn't. In this way the speed for the backwards compatible features is the same as before and "const char *error" can be kept as the parameter to all encoding functions. For speed common error handling names could even be implemented in the encoder itself. But for special one-shot error handlers, it might still be useful to pass the error handler directly, so maybe we should leave error as PyObject *, but implement the registry anyway? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-07-10 14:29 Message: Logged In: YES user_id=38388 Ok, here we go... > > > raise an exception). U+FFFD characters in the > replacement > > > string will be replaced with a character that the > encoder > > > chooses ('?' in all cases). > > > > Nice. > > But the special casing of U+FFFD makes the interface > somewhat > less clean than it could be. It was only done to be 100% > backwards compatible. With the original "replace" > error > handling the codec chose the replacement character. But as > far as I can tell none of the codecs uses anything other > than '?', True. > so I guess we could change the replace handler > to always return u'?'. This would make the implementation a > little bit simpler, but the explanation of the callback > feature *a lot* simpler. Go for it. > And if you still want to handle > an unencodable U+FFFD, you can write a special callback for > that, e.g. > > def FFFDreplace(enc, uni, pos): > if uni[pos] == "\ufffd": > return u"?" > else: > raise UnicodeError(...) > > > ...docs... > > > > Could you add these docs to the Misc/unicode.txt file ? I > > will eventually take that file and turn it into a PEP > which > > will then serve as general documentation for these things. > > I could, but first we should work out how the decoding > callback API will work. Ok. BTW, Barry Warsaw already did the work of converting the unicode.txt to PEP 100, so the docs should eventually go there. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > > replacement callback. > > > > Hmm, wouldn't that result in a slowdown ? If so, I'd > rather > > leave the special encoder in place, since it is being > used a > > lot in Python and probably some applications too. > > It would be a slowdown. But callbacks open many > possiblities. True, but in this case I believe that we should stick with the native implementation for "unicode-escape". Having a standard callback error handler which does the \uXXXX replacement would be nice to have though, since this would also be usable with lots of other codecs (e.g. all the code page ones). > For example: > > Why can't I print u"gьrk"? > > is probably one of the most frequently asked questions in > comp.lang.python. For printing Unicode stuff, print could be > extended the use an error handling callback for Unicode > strings (or objects where __str__ or tp_str returns a > Unicode object) instead of using str() which always returns > an 8bit string and uses strict encoding. There might even > be a > sys.setprintencodehandler()/sys.getprintencodehandler() There already is a print callback in Python (forgot the name of the hook though), so this should be possible by providing the encoding logic in the hook. > > > I have not touched PyUnicode_TranslateCharmap yet, > > > should this function also support error callbacks? Why > > > would one want the insert None into the mapping to > call > > > the callback? > > > > 1. Yes. > > 2. The user may want to e.g. restrict usage of certain > > character ranges. In this case the codec would be used to > > verify the input and an exception would indeed be useful > > (e.g. say you want to restrict input to Hangul + ASCII). > > OK, do we want TranslateCharmap to work exactly like > encoding, > i.e. in case of an error should the returned replacement > string again be mapped through the translation mapping or > should it be copied to the output directly? The former would > be more in line with encoding, but IMHO the latter would > be much more useful. It's better to take the second approach (copy the callback output directly to the output string) to avoid endless recursion and other pitfalls. I suppose this will also simplify the implementation somewhat. > BTW, when I implement it I can implement patch #403100 > ("Multicharacter replacements in > PyUnicode_TranslateCharmap") > along the way. I've seen it; will comment on it later. > Should the old TranslateCharmap map to the new > TranslateCharmapEx > and inherit the "multicharacter replacement" feature, > or > should I leave it as it is? If possible, please also add the multichar replacement to the old API. I think it is very useful and since the old APIs work on raw buffers it would be a benefit to have the functionality in the old implementation too. [Decoding error callbacks] > > > A remaining problem is how to implement decoding error > > > callbacks. In Python 2.1 encoding and decoding errors > are > > > handled in the same way with a string value. But with > > > callbacks it doesn't make sense to use the same > callback > > > for encoding and decoding (like > codecs.StreamReaderWriter > > > and codecs.StreamRecoder do). Decoding callbacks have > a > > > different API. Which arguments should be passed to the > > > decoding callback, and what is the decoding callback > > > supposed to do? > > > > I'd suggest adding another set of PyCodec_UnicodeDecode... > () > > APIs for this. We'd then have to augment the base classes > of > > the StreamCodecs to provide two attributes for .errors > with > > a fallback solution for the string case (i.s. "strict" > can > > still be used for both directions). > > Sounds good. Now what is the decoding callback supposed to > do? > I guess it will be called in the same way as the encoding > callback, i.e. with encoding name, original string and > position of the error. It might returns a Unicode string > (i.e. an object of the decoding target type), that will be > emitted from the codec instead of the one offending byte. Or > it might return a tuple with replacement Unicode object and > a resynchronisation offset, i.e. returning (u"?", 1) > means > emit a '?' and skip the offending character. But to make > the offset really useful the callback has to know something > about the encoding, perhaps the codec should be allowed to > pass an additional state object to the callback? > > Maybe the same should be added to the encoding callbacks to? > Maybe the encoding callback should be able to tell the > encoder if the replacement returned should be reencoded > (in which case it's a Unicode object), or directly emitted > (in which case it's an 8bit string)? I like the idea of having an optional state object (basically this should be a codec-defined arbitrary Python object) which then allow the callback to apply additional tricks. The object should be documented to be modifyable in place (simplifies the interface). About the return value: I'd suggest to always use the same tuple interface, e.g. callback(encoding, input_data, input_position, state) -> (output_to_be_appended, new_input_position) (I think it's better to use absolute values for the position rather than offsets.) Perhaps the encoding callbacks should use the same interface... what do you think ? > > > One additional note: It is vital that errors is an > > > assignable attribute of the StreamWriter. > > > > It is already ! > > I know, but IMHO it should be documented that an assignable > errors attribute must be supported as part of the official > codec API. > > Misc/unicode.txt is not clear on that: > """ > It is not required by the Unicode implementation to use > these base classes, only the interfaces must match; this > allows writing Codecs as extension types. > """ Good point. I'll add that to the PEP 100. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-22 22:51 Message: Logged In: YES user_id=38388 Sorry to keep you waiting, Walter. I will look into this again next week -- this week was way too busy... ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 19:00 Message: Logged In: YES user_id=38388 On your comment about the non-Unicode codecs: let's keep this separated from the current patch. Don't have much time today. I'll comment on the other things tomorrow. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 17:49 Message: Logged In: YES user_id=89016 Guido van Rossum wrote in python-dev: > True, the "codec" pattern can be used for other > encodings than Unicode. But it seems to me that the > entire codecs architecture is rather strongly geared > towards en/decoding Unicode, and it's not clear > how well other codecs fit in this pattern (e.g. I > noticed that all the non-Unicode codecs ignore the > error handling parameter or assert that > it is set to 'strict'). I noticed that too. asserting that errors=='strict' would mean that the encoder is not able to deal in any other way with unencodable stuff than by raising an error. But that is not the problem here, because for zlib, base64, quopri, hex and uu encoding there can be no unencodable characters. The encoders can simply ignore the errors parameter. Should I remove the asserts from those codecs and change the docstrings accordingly, or will this be done separately? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-13 15:57 Message: Logged In: YES user_id=89016 > > [...] > > raise an exception). U+FFFD characters in the replacement > > string will be replaced with a character that the encoder > > chooses ('?' in all cases). > > Nice. But the special casing of U+FFFD makes the interface somewhat less clean than it could be. It was only done to be 100% backwards compatible. With the original "replace" error handling the codec chose the replacement character. But as far as I can tell none of the codecs uses anything other than '?', so I guess we could change the replace handler to always return u'?'. This would make the implementation a little bit simpler, but the explanation of the callback feature *a lot* simpler. And if you still want to handle an unencodable U+FFFD, you can write a special callback for that, e.g. def FFFDreplace(enc, uni, pos): if uni[pos] == "\ufffd": return u"?" else: raise UnicodeError(...) > > The implementation of the loop through the string is done > > in the following way. A stack with two strings is kept > > and the loop always encodes a character from the string > > at the stacktop. If an error is encountered and the stack > > has only one entry (during encoding of the original string) > > the callback is called and the unicode object returned is > > pushed on the stack, so the encoding continues with the > > replacement string. If the stack has two entries when an > > error is encountered, the replacement string itself has > > an unencodable character and a normal exception raised. > > When the encoder has reached the end of it's current string > > there are two possibilities: when the stack contains two > > entries, this was the replacement string, so the replacement > > string will be poppep from the stack and encoding continues > > with the next character from the original string. If the > > stack had only one entry, encoding is finished. > > Very elegant solution ! I'll put it as a comment in the source. > > (I hope that's enough explanation of the API and > implementation) > > Could you add these docs to the Misc/unicode.txt file ? I > will eventually take that file and turn it into a PEP which > will then serve as general documentation for these things. I could, but first we should work out how the decoding callback API will work. > > I have renamed the static ...121 function to all lowercase > > names. > > Ok. > > > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > > replacement callback. > > Hmm, wouldn't that result in a slowdown ? If so, I'd rather > leave the special encoder in place, since it is being used a > lot in Python and probably some applications too. It would be a slowdown. But callbacks open many possiblities. For example: Why can't I print u"gьrk"? is probably one of the most frequently asked questions in comp.lang.python. For printing Unicode stuff, print could be extended the use an error handling callback for Unicode strings (or objects where __str__ or tp_str returns a Unicode object) instead of using str() which always returns an 8bit string and uses strict encoding. There might even be a sys.setprintencodehandler()/sys.getprintencodehandler() > [...] > I think it would be worthwhile to rename the callbacks to > include "Unicode" somewhere, e.g. > PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but > then it points out the application field of the callback > rather well. Same for the callbacks exposed through the > _codecsmodule. OK, done (and PyCodec_XMLCharRefReplaceUnicodeEncodeErrors really is a long name ;)) > > I have not touched PyUnicode_TranslateCharmap yet, > > should this function also support error callbacks? Why > > would one want the insert None into the mapping to call > > the callback? > > 1. Yes. > 2. The user may want to e.g. restrict usage of certain > character ranges. In this case the codec would be used to > verify the input and an exception would indeed be useful > (e.g. say you want to restrict input to Hangul + ASCII). OK, do we want TranslateCharmap to work exactly like encoding, i.e. in case of an error should the returned replacement string again be mapped through the translation mapping or should it be copied to the output directly? The former would be more in line with encoding, but IMHO the latter would be much more useful. BTW, when I implement it I can implement patch #403100 ("Multicharacter replacements in PyUnicode_TranslateCharmap") along the way. Should the old TranslateCharmap map to the new TranslateCharmapEx and inherit the "multicharacter replacement" feature, or should I leave it as it is? > > A remaining problem is how to implement decoding error > > callbacks. In Python 2.1 encoding and decoding errors are > > handled in the same way with a string value. But with > > callbacks it doesn't make sense to use the same callback > > for encoding and decoding (like codecs.StreamReaderWriter > > and codecs.StreamRecoder do). Decoding callbacks have a > > different API. Which arguments should be passed to the > > decoding callback, and what is the decoding callback > > supposed to do? > > I'd suggest adding another set of PyCodec_UnicodeDecode... () > APIs for this. We'd then have to augment the base classes of > the StreamCodecs to provide two attributes for .errors with > a fallback solution for the string case (i.s. "strict" can > still be used for both directions). Sounds good. Now what is the decoding callback supposed to do? I guess it will be called in the same way as the encoding callback, i.e. with encoding name, original string and position of the error. It might returns a Unicode string (i.e. an object of the decoding target type), that will be emitted from the codec instead of the one offending byte. Or it might return a tuple with replacement Unicode object and a resynchronisation offset, i.e. returning (u"?", 1) means emit a '?' and skip the offending character. But to make the offset really useful the callback has to know something about the encoding, perhaps the codec should be allowed to pass an additional state object to the callback? Maybe the same should be added to the encoding callbacks to? Maybe the encoding callback should be able to tell the encoder if the replacement returned should be reencoded (in which case it's a Unicode object), or directly emitted (in which case it's an 8bit string)? > > One additional note: It is vital that errors is an > > assignable attribute of the StreamWriter. > > It is already ! I know, but IMHO it should be documented that an assignable errors attribute must be supported as part of the official codec API. Misc/unicode.txt is not clear on that: """ It is not required by the Unicode implementation to use these base classes, only the interfaces must match; this allows writing Codecs as extension types. """ ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-13 10:05 Message: Logged In: YES user_id=38388 > How the callbacks work: > > A PyObject * named errors is passed in. This may by NULL, > Py_None, 'strict', u'strict', 'ignore', u'ignore', > 'replace', u'replace' or a callable object. > PyCodec_EncodeHandlerForObject maps all of these objects to > one of the three builtin error callbacks > PyCodec_RaiseEncodeErrors (raises an exception), > PyCodec_IgnoreEncodeErrors (returns an empty replacement > string, in effect ignoring the error), > PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode > replacement character to signify to the encoder that it > should choose a suitable replacement character) or directly > returns errors if it is a callable object. When an > unencodable character is encounterd the error handling > callback will be called with the encoding name, the original > unicode object and the error position and must return a > unicode object that will be encoded instead of the offending > character (or the callback may of course raise an > exception). U+FFFD characters in the replacement string will > be replaced with a character that the encoder chooses ('?' > in all cases). Nice. > The implementation of the loop through the string is done in > the following way. A stack with two strings is kept and the > loop always encodes a character from the string at the > stacktop. If an error is encountered and the stack has only > one entry (during encoding of the original string) the > callback is called and the unicode object returned is pushed > on the stack, so the encoding continues with the replacement > string. If the stack has two entries when an error is > encountered, the replacement string itself has an > unencodable character and a normal exception raised. When > the encoder has reached the end of it's current string there > are two possibilities: when the stack contains two entries, > this was the replacement string, so the replacement string > will be poppep from the stack and encoding continues with > the next character from the original string. If the stack > had only one entry, encoding is finished. Very elegant solution ! > (I hope that's enough explanation of the API and implementation) Could you add these docs to the Misc/unicode.txt file ? I will eventually take that file and turn it into a PEP which will then serve as general documentation for these things. > I have renamed the static ...121 function to all lowercase > names. Ok. > BTW, I guess PyUnicode_EncodeUnicodeEscape could be > reimplemented as PyUnicode_EncodeASCII with a \uxxxx > replacement callback. Hmm, wouldn't that result in a slowdown ? If so, I'd rather leave the special encoder in place, since it is being used a lot in Python and probably some applications too. > PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, > PyCodec_ReplaceEncodeErrors are globally visible because > they have to be available in _codecsmodule.c to wrap them as > Python function objects, but they can't be implemented in > _codecsmodule, because they need to be available to the > encoders in unicodeobject.c (through > PyCodec_EncodeHandlerForObject), but importing the codecs > module might result in an endless recursion, because > importing a module requires unpickling of the bytecode, > which might require decoding utf8, which ... (but this will > only happen, if we implement the same mechanism for the > decoding API) I think that codecs.c is the right place for these APIs. _codecsmodule.c is only meant as Python access wrapper for the internal codecs and nothing more. One thing I noted about the callbacks: they assume that they will always get Unicode objects as input. This is certainly not true in the general case (it is for the codecs you touch in the patch). I think it would be worthwhile to rename the callbacks to include "Unicode" somewhere, e.g. PyCodec_UnicodeReplaceEncodeErrors(). It's a long name, but then it points out the application field of the callback rather well. Same for the callbacks exposed through the _codecsmodule. > I have not touched PyUnicode_TranslateCharmap yet, > should this function also support error callbacks? Why would > one want the insert None into the mapping to call the callback? 1. Yes. 2. The user may want to e.g. restrict usage of certain character ranges. In this case the codec would be used to verify the input and an exception would indeed be useful (e.g. say you want to restrict input to Hangul + ASCII). > A remaining problem is how to implement decoding error > callbacks. In Python 2.1 encoding and decoding errors are > handled in the same way with a string value. But with > callbacks it doesn't make sense to use the same callback for > encoding and decoding (like codecs.StreamReaderWriter and > codecs.StreamRecoder do). Decoding callbacks have a > different API. Which arguments should be passed to the > decoding callback, and what is the decoding callback > supposed to do? I'd suggest adding another set of PyCodec_UnicodeDecode...() APIs for this. We'd then have to augment the base classes of the StreamCodecs to provide two attributes for .errors with a fallback solution for the string case (i.s. "strict" can still be used for both directions). > One additional note: It is vital that errors is an > assignable attribute of the StreamWriter. It is already ! > Consider the XML example: For writing an XML DOM tree one > StreamWriter object is used. When a text node is written, > the error handling has to be set to > codecs.xmlreplace_encode_errors, but inside a comment or > processing instruction replacing unencodable characters with > charrefs is not possible, so here codecs.raise_encode_errors > should be used (or better a custom error handler that raises > an error that says "sorry, you can't have unencodable > characters inside a comment") Sure. > BTW, should we continue the discussion in the i18n SIG > mailing list? An email program is much more comfortable than > a HTML textarea! ;) I'd rather keep the discussions on this patch here -- forking it off to the i18n sig will make it very hard to follow up on it. (This HTML area is indeed damn small ;-) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 21:18 Message: Logged In: YES user_id=89016 One additional note: It is vital that errors is an assignable attribute of the StreamWriter. Consider the XML example: For writing an XML DOM tree one StreamWriter object is used. When a text node is written, the error handling has to be set to codecs.xmlreplace_encode_errors, but inside a comment or processing instruction replacing unencodable characters with charrefs is not possible, so here codecs.raise_encode_errors should be used (or better a custom error handler that raises an error that says "sorry, you can't have unencodable characters inside a comment") BTW, should we continue the discussion in the i18n SIG mailing list? An email program is much more comfortable than a HTML textarea! ;) ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 20:59 Message: Logged In: YES user_id=89016 How the callbacks work: A PyObject * named errors is passed in. This may by NULL, Py_None, 'strict', u'strict', 'ignore', u'ignore', 'replace', u'replace' or a callable object. PyCodec_EncodeHandlerForObject maps all of these objects to one of the three builtin error callbacks PyCodec_RaiseEncodeErrors (raises an exception), PyCodec_IgnoreEncodeErrors (returns an empty replacement string, in effect ignoring the error), PyCodec_ReplaceEncodeErrors (returns U+FFFD, the Unicode replacement character to signify to the encoder that it should choose a suitable replacement character) or directly returns errors if it is a callable object. When an unencodable character is encounterd the error handling callback will be called with the encoding name, the original unicode object and the error position and must return a unicode object that will be encoded instead of the offending character (or the callback may of course raise an exception). U+FFFD characters in the replacement string will be replaced with a character that the encoder chooses ('?' in all cases). The implementation of the loop through the string is done in the following way. A stack with two strings is kept and the loop always encodes a character from the string at the stacktop. If an error is encountered and the stack has only one entry (during encoding of the original string) the callback is called and the unicode object returned is pushed on the stack, so the encoding continues with the replacement string. If the stack has two entries when an error is encountered, the replacement string itself has an unencodable character and a normal exception raised. When the encoder has reached the end of it's current string there are two possibilities: when the stack contains two entries, this was the replacement string, so the replacement string will be poppep from the stack and encoding continues with the next character from the original string. If the stack had only one entry, encoding is finished. (I hope that's enough explanation of the API and implementation) I have renamed the static ...121 function to all lowercase names. BTW, I guess PyUnicode_EncodeUnicodeEscape could be reimplemented as PyUnicode_EncodeASCII with a \uxxxx replacement callback. PyCodec_RaiseEncodeErrors, PyCodec_IgnoreEncodeErrors, PyCodec_ReplaceEncodeErrors are globally visible because they have to be available in _codecsmodule.c to wrap them as Python function objects, but they can't be implemented in _codecsmodule, because they need to be available to the encoders in unicodeobject.c (through PyCodec_EncodeHandlerForObject), but importing the codecs module might result in an endless recursion, because importing a module requires unpickling of the bytecode, which might require decoding utf8, which ... (but this will only happen, if we implement the same mechanism for the decoding API) I have not touched PyUnicode_TranslateCharmap yet, should this function also support error callbacks? Why would one want the insert None into the mapping to call the callback? A remaining problem is how to implement decoding error callbacks. In Python 2.1 encoding and decoding errors are handled in the same way with a string value. But with callbacks it doesn't make sense to use the same callback for encoding and decoding (like codecs.StreamReaderWriter and codecs.StreamRecoder do). Decoding callbacks have a different API. Which arguments should be passed to the decoding callback, and what is the decoding callback supposed to do? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 20:00 Message: Logged In: YES user_id=38388 About the Py_UNICODE*data, int size APIs: Ok, point taken. In general, I think we ought to keep the callback feature as open as possible, so passing in pointers and sizes would not be very useful. BTW, could you summarize how the callback works in a few lines ? About _Encode121: I'd name this _EncodeUCS1 since that's what it is ;-) About the new functions: I was referring to the new static functions which you gave PyUnicode_... names. If these are not supposed to turn into non-static functions, I'd rather have them use lower case names (since that's how the Python internals work too -- most of the times). ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:56 Message: Logged In: YES user_id=89016 > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. Another problem is, that the callback requires a Python object, so in the PyObject *version, the refcount is incref'd and the object is passed to the callback. The Py_UNICODE*/int version would have to create a new Unicode object from the data. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2001-06-12 18:32 Message: Logged In: YES user_id=89016 > * please don't place more than one C statement on one line > like in: > """ > + unicode = unicode2; unicodepos = > unicode2pos; > + unicode2 = NULL; unicode2pos = 0; > """ OK, done! > * Comments should start with a capital letter and be > prepended > to the section they apply to Fixed! > * There should be spaces between arguments in compares > (a == b) not (a==b) Fixed! > * Where does the name "...Encode121" originate ? encode one-to-one, it implements both ASCII and latin-1 encoding. > * module internal APIs should use lower case names (you > converted some of these to PyUnicode_...() -- this is > normally reserved for APIs which are either marked as > potential candidates for the public API or are very > prominent in the code) Which ones? I introduced a new function for every old one, that had a "const char *errors" argument, and a few new ones in codecs.h, of those PyCodec_EncodeHandlerForObject is vital, because it is used to map for old string arguments to the new function objects. PyCodec_RaiseEncodeErrors can be used in the encoder implementation to raise an encode error, but it could be made static in unicodeobject.h so only those encoders implemented there have access to it. > One thing which I don't like about your API change is that > you removed the Py_UNICODE*data, int size style arguments > -- > this makes it impossible to use the new APIs on non-Python > data or data which is not available as Unicode object. I look through the code and found no situation where the Py_UNICODE*/int version is really used and having two (PyObject *)s (the original and the replacement string), instead of UNICODE*/int and PyObject * made the implementation a little easier, but I can fix that. > Please separate the errors.c patch from this patch -- it > seems totally unrelated to Unicode. PyCodec_RaiseEncodeErrors uses this the have a \Uxxxx with four hex digits. I removed it. I'll upload a revised patch as soon as it's done. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-06-12 16:29 Message: Logged In: YES user_id=38388 Thanks for the patch -- it looks very impressive !. I'll give it a try later this week. Some first cosmetic tidbits: * please don't place more than one C statement on one line like in: """ + unicode = unicode2; unicodepos = unicode2pos; + unicode2 = NULL; unicode2pos = 0; """ * Comments should start with a capital letter and be prepended to the section they apply to * There should be spaces between arguments in compares (a == b) not (a==b) * Where does the name "...Encode121" originate ? * module internal APIs should use lower case names (you converted some of these to PyUnicode_...() -- this is normally reserved for APIs which are either marked as potential candidates for the public API or are very prominent in the code) One thing which I don't like about your API change is that you removed the Py_UNICODE*data, int size style arguments -- this makes it impossible to use the new APIs on non-Python data or data which is not available as Unicode object. Please separate the errors.c patch from this patch -- it seems totally unrelated to Unicode. Thanks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=432401&group_id=5470 From noreply@sourceforge.net Fri Mar 15 17:27:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 09:27:58 -0800 Subject: [Patches] [ python-Patches-492105 ] Import from Zip archive Message-ID: Patches item #492105, was opened at 2001-12-12 17:21 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: James C. Ahlstrom (ahlstromjc) Assigned to: Nobody/Anonymous (nobody) Summary: Import from Zip archive Initial Comment: This is the "final" patch to support imports from zip archives, and directory caching using os.listdir(). It replaces patch 483466 and 476047. It is a separate patch since I can't delete file attachments. It adds support for importing from "" and from relative paths. ---------------------------------------------------------------------- >Comment By: James C. Ahlstrom (ahlstromjc) Date: 2002-03-15 17:27 Message: Logged In: YES user_id=64929 I added a diff -c version of the patch. ---------------------------------------------------------------------- Comment By: James C. Ahlstrom (ahlstromjc) Date: 2002-03-15 17:03 Message: Logged In: YES user_id=64929 I still can't delete files, but I added a new file which contains all diffs as a single file, and is made from the current CVS tree (Mar 15, 2002). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=492105&group_id=5470 From noreply@sourceforge.net Fri Mar 15 17:43:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 09:43:05 -0800 Subject: [Patches] [ python-Patches-530105 ] file object may not be subtyped Message-ID: Patches item #530105, was opened at 2002-03-15 00:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470 Category: None Group: None >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Gustavo Niemeyer (niemeyer) Assigned to: Nobody/Anonymous (nobody) Summary: file object may not be subtyped Initial Comment: PyFileObject should be defined in fileobject.h, so it may be properly subtyped. This patches fixes this, and also a comment word typed twice. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 18:43 Message: Logged In: YES user_id=21627 Applied as fileobject.c 2.147; fileobject.h 2.26. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-15 16:30 Message: Logged In: YES user_id=6380 Looks good to me too. Check it in. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:45 Message: Logged In: YES user_id=21627 This patch looks good to me. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530105&group_id=5470 From noreply@sourceforge.net Fri Mar 15 19:41:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 11:41:13 -0800 Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores Message-ID: Patches item #525532, was opened at 2002-03-04 15:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) >Assigned to: Martin v. Lцwis (loewis) Summary: Add support for POSIX semaphores Initial Comment: thread_pthread.h can be modified to use POSIX semaphores if available. This is more efficient than emulating them with mutexes and condition variables, and at least one platform that supports POSIX semaphores has a race condition in its condition variable support. The new file would still be supporting POSIX threads, although from both and , so perhaps ought to be renamed if this patch is accepted. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-15 09:54 Message: Logged In: YES user_id=31435 Can someone on a pthreads platform please continue with this? I'm +1 on it via eyeballing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 From noreply@sourceforge.net Sat Mar 16 00:01:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 16:01:57 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-16 00:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Martin v. Lцwis (loewis) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Sat Mar 16 00:54:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 16:54:42 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-16 01:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Martin v. Lцwis (loewis) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-16 01:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Sat Mar 16 03:50:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 15 Mar 2002 19:50:54 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-16 00:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Martin v. Lцwis (loewis) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-16 03:50 Message: Logged In: YES user_id=35752 Okay, with-pymalloc is back but defaults to enabled. The functions PyMalloc_{Malloc,Realloc,Free} have been renamed to _PyMalloc_{Malloc,Realloc,Free}. Maybe their ugly names will discourage their use. People should use PyMalloc_{New,NewVar,Del} if they want to allocate objects using pymalloc. There's no way we can reuse PyObject_{New,NewVar,Del}. Memory can be allocated with PyObject_New and freed with PyObject_DEL. That would not work if PyObject_New used pymalloc. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-16 00:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Sat Mar 16 08:24:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 00:24:22 -0800 Subject: [Patches] [ python-Patches-504714 ] hasattr catches only AttributeError Message-ID: Patches item #504714, was opened at 2002-01-17 03:52 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470 Category: Core (C code) Group: Python 2.1.2 Status: Open Resolution: None Priority: 5 Submitted By: Quinn Dunkan (quinn_dunkan) Assigned to: Nobody/Anonymous (nobody) Summary: hasattr catches only AttributeError Initial Comment: Curse me for a fool. I reported this exact same thing in getattr but failed to look 30 lines down to notice hasattr. hasattr(foo, 'bar') catches all exceptions. I think it should only catch AttributeError. Example: >>> class Foo: ... def __getattr__(self, attr): ... assert 0 ... >>> f = Foo() >>> hasattr(f, 'bar') 0 # should have gotten an AssertionError >>> This patch makes hasattr only catch AttributeError. I changed the docstring to reflect that, and also changed the getattr docstring to read a little more naturally. ---------------------------------------------------------------------- >Comment By: Just van Rossum (jvr) Date: 2002-03-16 09:24 Message: Logged In: YES user_id=92689 (The patch seems to be reversed.) The patch otherwise looks fine to me, but it will break code that depends on the current behavior. It can be argued that if getattr() raises *any* error, the attr doesn't exist, so the current behavior is in fact correct. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470 From noreply@sourceforge.net Sat Mar 16 08:55:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 00:55:23 -0800 Subject: [Patches] [ python-Patches-504714 ] hasattr catches only AttributeError Message-ID: Patches item #504714, was opened at 2002-01-17 02:52 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470 Category: Core (C code) Group: Python 2.1.2 Status: Open Resolution: None Priority: 5 Submitted By: Quinn Dunkan (quinn_dunkan) Assigned to: Nobody/Anonymous (nobody) Summary: hasattr catches only AttributeError Initial Comment: Curse me for a fool. I reported this exact same thing in getattr but failed to look 30 lines down to notice hasattr. hasattr(foo, 'bar') catches all exceptions. I think it should only catch AttributeError. Example: >>> class Foo: ... def __getattr__(self, attr): ... assert 0 ... >>> f = Foo() >>> hasattr(f, 'bar') 0 # should have gotten an AssertionError >>> This patch makes hasattr only catch AttributeError. I changed the docstring to reflect that, and also changed the getattr docstring to read a little more naturally. ---------------------------------------------------------------------- >Comment By: Quinn Dunkan (quinn_dunkan) Date: 2002-03-16 08:55 Message: Logged In: YES user_id=429749 That's true, but the current behavior can mask bugs unexpectedly. For example, if you ask someone if the brakes are engaged, and they discover that the brakes have crumbled to dust and fallen off, you probably want a different answer than "no". :) getattr() (now) only catches AttributeErrors, so there's a consistency thing too. Anyway, it's your call :) ---------------------------------------------------------------------- Comment By: Just van Rossum (jvr) Date: 2002-03-16 08:24 Message: Logged In: YES user_id=92689 (The patch seems to be reversed.) The patch otherwise looks fine to me, but it will break code that depends on the current behavior. It can be argued that if getattr() raises *any* error, the attr doesn't exist, so the current behavior is in fact correct. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504714&group_id=5470 From noreply@sourceforge.net Sat Mar 16 16:38:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 08:38:08 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 16:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 16:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 14:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 11:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 10:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 10:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 18:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 17:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Sat Mar 16 16:38:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 08:38:35 -0800 Subject: [Patches] [ python-Patches-518675 ] Adding galeon support Message-ID: Patches item #518675, was opened at 2002-02-17 05:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=518675&group_id=5470 Category: Library (Lib) >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Supreet Sethi (supreet) Assigned to: Nobody/Anonymous (nobody) Summary: Adding galeon support Initial Comment: It adds support galeon browser support in webbrowser lib. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:38 Message: Logged In: YES user_id=6656 Feature --> not in 2.2.1 ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-19 17:53 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=518675&group_id=5470 From noreply@sourceforge.net Sat Mar 16 16:40:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 08:40:23 -0800 Subject: [Patches] [ python-Patches-525763 ] minor fix for regen on IRIX Message-ID: Patches item #525763, was opened at 2002-03-05 02:59 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Michael Pruett (mpruett) >Assigned to: Jack Jansen (jackjansen) Summary: minor fix for regen on IRIX Initial Comment: The Lib/plat-irix6/regen script does not catch IRIX 6 (only IRIX 4 and 5), and it doesn't handle systems which report themselves as running 'IRIX64' rather than just 'IRIX'. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:40 Message: Logged In: YES user_id=6656 Jack, can you look at this? It looks fine to me, but I've never even been near IRIX. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470 From noreply@sourceforge.net Sat Mar 16 16:40:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 08:40:54 -0800 Subject: [Patches] [ python-Patches-525109 ] Extension to Calltips / Show attributes Message-ID: Patches item #525109, was opened at 2002-03-03 11:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 Category: IDLE >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Martin Liebmann (mliebmann) Assigned to: Guido van Rossum (gvanrossum) Summary: Extension to Calltips / Show attributes Initial Comment: The attached files (unified diff files) implement a (quick and dirty but usefull) extension to IDLE 0.8 (Python 2.2) - Tested on WINDOWS 95/98/NT/2000 - Similar to "CallTips" this extension shows (context sensitive) all available member functions and attributes of the current object after hitting the 'dot'-key. The toplevel help widget now supports scrolling. (Key- Up and Key-Down events) ...that is why I changed among else the first argument of 'showtip' from 'text string' to a 'list of text strings' ... The 'space'-key is used to insert the topmost item of the help widget into an IDLE text window. ...the even handling seems to be a critical part of the current IDLE implementation. That is why I added the new functionallity as a patch of CallTips.py and CallTipWindow.py. May be you still have a better implementation ... Greetings Martin Liebmann ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:40 Message: Logged In: YES user_id=6656 feature --> not in 2.2.x ---------------------------------------------------------------------- Comment By: Martin Liebmann (mliebmann) Date: 2002-03-07 21:41 Message: Logged In: YES user_id=475133 Patched and more robust version of the extended files CallTips.py and CallTipWindows.py. (Now more compatible to earlier versions of python) ---------------------------------------------------------------------- Comment By: Martin Liebmann (mliebmann) Date: 2002-03-03 22:02 Message: Logged In: YES user_id=475133 '' must be substituted by '.' within CallTip.py ! ( Linux do not support an event named ) Running idle on Linux, I found the warning, that 'import *' is not allowed within function '_dir_main' of CallTip.py ??? Nevertheless CallTips works fine on Linux ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525109&group_id=5470 From noreply@sourceforge.net Sat Mar 16 16:42:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 08:42:06 -0800 Subject: [Patches] [ python-Patches-523944 ] imputil.py can't import "\r\n" .py files Message-ID: Patches item #523944, was opened at 2002-02-28 17:17 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Mitch Chapman (mitchchapman) Assigned to: Greg Stein (gstein) >Summary: imputil.py can't import "\r\n" .py files Initial Comment: __builtin__.compile() requires that codestring line endings consist of "\n". imputil._compile() does not enforce this. One result is that imputil may be unable to import modules created on Win32. The attached patch to the latest (CVS revision 1.23) imputil.py replaces both "\r\n" and "\r" with "\n" before passing a code string to __builtin__.compile(). This is consistent with the behavior of e.g. Lib/py_compile.py. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:42 Message: Logged In: YES user_id=6656 Greg any chance of comments before 2.2.1c1, i.e. Monday? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-06 17:14 Message: Logged In: YES user_id=38388 Assigning to Greg Stein -- imputil.py is his baby. ---------------------------------------------------------------------- Comment By: Mitch Chapman (mitchchapman) Date: 2002-03-06 17:03 Message: Logged In: YES user_id=348188 Please pardon if it's inappropriate to assign patches to project developers. I'm doing so on the advice of a post by Skip Montanaro. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470 From noreply@sourceforge.net Sat Mar 16 16:43:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 08:43:50 -0800 Subject: [Patches] [ python-Patches-521478 ] mailbox / fromline matching Message-ID: Patches item #521478, was opened at 2002-02-22 14:54 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: Rejected Priority: 5 Submitted By: Camiel Dobbelaar (camield) Assigned to: Barry Warsaw (bwarsaw) Summary: mailbox / fromline matching Initial Comment: mailbox.py does not parse this 'From' line correctly: >From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200 ^^^^^ This is because of the trailing timezone information, that the regex does not account for. Also, 'From' should match at the beginning of the line. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:43 Message: Logged In: YES user_id=6656 Anything going to happen here by Monday? ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-02 16:47 Message: Logged In: YES user_id=12800 Re-opening and assigning to myself. I'll take a look at your patches asap. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-02 14:34 Message: Logged In: YES user_id=466784 PortableUnixMailbox is not that useful, because it only matches '^From '. From-quoting is an even bigger mess then From-headerlines, so that does not really help. I submit a new diff that matches '\n\nFrom ' or 'From ', which makes PortableUnixMailbox useful for my purposes. It is not that intrusive as the comment in the mailbox.py suggests. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-01 21:42 Message: Logged In: YES user_id=12800 IMO, Jamie Zawinski (author of the original mail/news reader in Netscape among other accomplishments), wrote the definitive answer on From_ http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html As far as Python's support for this in the mailbox module, for backwards compatibility, the UnixMailbox class has a strict-ish interpretation of the From_ delimiter, which I think should not change. It also has a class called PortableUnixMailbox which recognizes delimiters as specified in JWZ's document. Personally, if I was trolling over a real world mbox file I'd only use PortableUnixMailbox (as long as non-delimiter From_ lines were properly escaped -- I have some code in Mailman which tries to intelligently "fix" non-escaped mbox files). I agree with the Rejected resolution. ---------------------------------------------------------------------- Comment By: Camiel Dobbelaar (camield) Date: 2002-03-01 11:34 Message: Logged In: YES user_id=466784 I have tracked this down to Pine, the mailreader. In imap/src/c-client/mail.c, it has this flag: static int notimezones = NIL; /* write timezones in "From " header */ (so timezones are written in the "From" lines by default) I also found the following comment in imap/docs/FAQ in the Pine distribution: """ So, good mail reading software only considers a line to be a "From " line if it follows the actual specification for a "From " line. This means, among other things, that the day of week is fixed-format: "May 14", but "May 7" (note the extra space) as opposed to "May 7". ctime() format for the date is the most common, although POSIX also allows a numeric timezone after the year. """ While I don't consider Pine to be the ultimate mailreader, its heritage may warrant that the 'From ' lines it creates are considered 'standard'. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 22:37 Message: Logged In: YES user_id=6380 That From line is simply illegal, or at least nonstandard. If your system uses this nonstandard format, you can extend the mailbox parser by overriding the ._isrealfromline method. The pattern doesn't need ^ because match() is used, which only matches at the start of the line. Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=521478&group_id=5470 From noreply@sourceforge.net Sat Mar 16 16:42:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 08:42:36 -0800 Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores Message-ID: Patches item #525532, was opened at 2002-03-04 14:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Martin v. Lцwis (loewis) Summary: Add support for POSIX semaphores Initial Comment: thread_pthread.h can be modified to use POSIX semaphores if available. This is more efficient than emulating them with mutexes and condition variables, and at least one platform that supports POSIX semaphores has a race condition in its condition variable support. The new file would still be supporting POSIX threads, although from both and , so perhaps ought to be renamed if this patch is accepted. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:42 Message: Logged In: YES user_id=6656 Does this belong in the 2.2.x group? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-15 08:54 Message: Logged In: YES user_id=31435 Can someone on a pthreads platform please continue with this? I'm +1 on it via eyeballing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 From noreply@sourceforge.net Sat Mar 16 16:53:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 08:53:58 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 12:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) >Assigned to: Tim Peters (tim_one) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:53 Message: Logged In: YES user_id=6656 Tim, do you think this should go into 2.2.1? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Sat Mar 16 17:36:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 09:36:37 -0800 Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores Message-ID: Patches item #525532, was opened at 2002-03-04 09:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 Category: Core (C code) >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Martin v. Lцwis (loewis) Summary: Add support for POSIX semaphores Initial Comment: thread_pthread.h can be modified to use POSIX semaphores if available. This is more efficient than emulating them with mutexes and condition variables, and at least one platform that supports POSIX semaphores has a race condition in its condition variable support. The new file would still be supporting POSIX threads, although from both and , so perhaps ought to be renamed if this patch is accepted. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-16 12:36 Message: Logged In: YES user_id=31435 Changed Group to 2.3. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 11:42 Message: Logged In: YES user_id=6656 Does this belong in the 2.2.x group? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-15 03:54 Message: Logged In: YES user_id=31435 Can someone on a pthreads platform please continue with this? I'm +1 on it via eyeballing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 From noreply@sourceforge.net Sat Mar 16 17:38:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 16 Mar 2002 09:38:21 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 07:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) Assigned to: Tim Peters (tim_one) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-16 12:38 Message: Logged In: YES user_id=31435 Possibly, depending on whether it belongs in 2.3 -- I'm spread too thin to review it now. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 11:53 Message: Logged In: YES user_id=6656 Tim, do you think this should go into 2.2.1? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Sun Mar 17 09:54:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 01:54:37 -0800 Subject: [Patches] [ python-Patches-525532 ] Add support for POSIX semaphores Message-ID: Patches item #525532, was opened at 2002-03-04 15:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Martin v. Lцwis (loewis) Summary: Add support for POSIX semaphores Initial Comment: thread_pthread.h can be modified to use POSIX semaphores if available. This is more efficient than emulating them with mutexes and condition variables, and at least one platform that supports POSIX semaphores has a race condition in its condition variable support. The new file would still be supporting POSIX threads, although from both and , so perhaps ought to be renamed if this patch is accepted. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 10:54 Message: Logged In: YES user_id=21627 Thanks for the patch; committed as thread_pthread.h 2.39. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-16 18:36 Message: Logged In: YES user_id=31435 Changed Group to 2.3. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:42 Message: Logged In: YES user_id=6656 Does this belong in the 2.2.x group? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-15 09:54 Message: Logged In: YES user_id=31435 Can someone on a pthreads platform please continue with this? I'm +1 on it via eyeballing. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525532&group_id=5470 From noreply@sourceforge.net Sun Mar 17 10:12:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 02:12:43 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-16 01:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Martin v. Lцwis (loewis) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 11:12 Message: Logged In: YES user_id=21627 The patch looks good, except that it does not meet one of Tim's requirements: there is no way to spell "give me memory from the allocator that PyMalloc_New uses". _PyMalloc_Malloc is clearly not for general use, since it starts with an underscore. What about calling this allocator (which could be either PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free? Also, it appears that there is no function wrapper around this allocator: A module that uses the PyMalloc allocator will break in a configuration where pymalloc is disabled. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-16 04:50 Message: Logged In: YES user_id=35752 Okay, with-pymalloc is back but defaults to enabled. The functions PyMalloc_{Malloc,Realloc,Free} have been renamed to _PyMalloc_{Malloc,Realloc,Free}. Maybe their ugly names will discourage their use. People should use PyMalloc_{New,NewVar,Del} if they want to allocate objects using pymalloc. There's no way we can reuse PyObject_{New,NewVar,Del}. Memory can be allocated with PyObject_New and freed with PyObject_DEL. That would not work if PyObject_New used pymalloc. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-16 01:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Sun Mar 17 13:30:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 05:30:30 -0800 Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response Message-ID: Patches item #517256, was opened at 2002-02-14 00:48 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 Category: Library (Lib) Group: Python 2.1.2 Status: Open Resolution: Accepted Priority: 5 Submitted By: James Rucker (jamesrucker) Assigned to: Fredrik Lundh (effbot) Summary: poor performance in xmlrpc response Initial Comment: xmlrpclib.Transport.parse_response() (called from xmlrpclib.Transport.request()) is exhibiting poor performance - approx. 10x slower than expected. I investigated based on using a simple app that sent a msg to a server, where all the server did was return the message back to the caller. From profiling, it became clear that the return trip was taken 10x the time consumed by the client->server trip, and that the time was spent getting things across the wire. parse_response() reads from a file object created via socket.makefile(), and as a result exhibits performance that is about an order of magnitude worse than what it would be if socket.recv() were used on the socket. The patch provided uses socket.recv() when possible, to improve performance. The patch provided is against revision 1.15. Its use provides performance for the return trip that is more or less equivalent to that of the forward trip. ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2002-03-17 14:30 Message: Logged In: YES user_id=38376 James, what platform(s) did you use? I'm not sure changing the parse_response() interface is a good idea, but if this is a Windows-only problem, there may be a slightly cleaner way to get the same end result. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:14 Message: Logged In: YES user_id=6380 My guess makefile() isn't buffering properly. This has been a long-standing problem on Windows; I'm not sure if it's an issue on Unix. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-01 15:34 Message: Logged In: YES user_id=38376 looks fine to me. I'll merge it with SLAB changes, and will check it into the 2.3 codebase asap. (we probably should try to figure out why makefile causes a 10x slowdown too -- xmlrpclib isn't exactly the only client library reading from a buffered socket) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 00:23 Message: Logged In: YES user_id=6380 Fredrik, does this look OK to you? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 From noreply@sourceforge.net Sun Mar 17 13:33:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 05:33:43 -0800 Subject: [Patches] [ python-Patches-527371 ] Fix for sre bug 470582 Message-ID: Patches item #527371, was opened at 2002-03-08 14:14 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 Category: None Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Greg Chapman (glchapman) Assigned to: Fredrik Lundh (effbot) Summary: Fix for sre bug 470582 Initial Comment: Bug report 470582 points out that nested groups can produces matches in sre even if the groups within which they are nested do not match: >>> m = sre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, '3', '34', '123') >>> m = pre.search(r"^((\d)\:)?(\d\d)\.(\d\d\d) $", "34.123") >>> m.groups() (None, None, '34', '123') I believe this is because in the handling of SRE_OP_MAX_UNTIL, state->lastmark is being reduced (after "((\d)\:)" fails) without NULLing out the now- invalid entries at the end of the state->mark array. In the other two cases where state->lastmark is reduced (specifically in SRE_OP_BRANCH and SRE_OP_REPEAT_ONE) memset is used to NULL out the entries at the end of the array. The attached patch does the same thing for the SRE_OP_MAX_UNTIL case. This fixes the above case and does not break anything in test_re.py. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-08 19:28 Message: Logged In: YES user_id=31435 Assigned to /F -- he's the expert here. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 16:23 Message: Logged In: YES user_id=86307 I'm pretty sure the memset is correct; state->lastmark is the index of last mark written to (not the index of the next potential write). Also, it occurred to me that there is another related error here: >>> m = sre.search(r'^((\d)\:)?\d\d\.\d\d\d$', '34.123') >>> m.groups() (None, None) >>> m.lastindex 2 In other words, lastindex claims that group 2 was the last that matched, even though it didn't really match. Since lastindex is undocumented, this probably doesn't matter too much. Still, it probably should be reset if it is pointing to a group which gets "unmatched" when state->lastmark is reduced. Perhaps a function like the following should be added for use in the three places where state->lastmark is reset to a previous value: void lastmark_restore(SRE_STATE *state, int lastmark) { assert(lastmark >= 0); if (state->lastmark > lastmark) { int lastvalidindex = (lastmark == 0) ? -1 : (lastmark-1)/2+1; if (state->lastindex > lastvalidindex) state->lastindex = lastvalidindex; memset( state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*) ); } state->lastmark = lastmark; } ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-08 14:29 Message: Logged In: YES user_id=33168 Confirmed that the test w/o fix fails and the test passes with the fix to _sre.c. But I'm not sure if the memset can go too far: memset(state->mark + lastmark + 1, 0, (state->lastmark - lastmark) * sizeof(void*)); I can try under purify, but that doesn't guarantee anything. ---------------------------------------------------------------------- Comment By: Greg Chapman (glchapman) Date: 2002-03-08 14:20 Message: Logged In: YES user_id=86307 I forgot: here's a patch for re_tests.py which adds the case from the bug report as a test. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527371&group_id=5470 From noreply@sourceforge.net Sun Mar 17 16:13:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 08:13:18 -0800 Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response Message-ID: Patches item #517256, was opened at 2002-02-13 15:48 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 Category: Library (Lib) Group: Python 2.1.2 Status: Open Resolution: Accepted Priority: 5 Submitted By: James Rucker (jamesrucker) Assigned to: Fredrik Lundh (effbot) Summary: poor performance in xmlrpc response Initial Comment: xmlrpclib.Transport.parse_response() (called from xmlrpclib.Transport.request()) is exhibiting poor performance - approx. 10x slower than expected. I investigated based on using a simple app that sent a msg to a server, where all the server did was return the message back to the caller. From profiling, it became clear that the return trip was taken 10x the time consumed by the client->server trip, and that the time was spent getting things across the wire. parse_response() reads from a file object created via socket.makefile(), and as a result exhibits performance that is about an order of magnitude worse than what it would be if socket.recv() were used on the socket. The patch provided uses socket.recv() when possible, to improve performance. The patch provided is against revision 1.15. Its use provides performance for the return trip that is more or less equivalent to that of the forward trip. ---------------------------------------------------------------------- >Comment By: James Rucker (jamesrucker) Date: 2002-03-17 08:13 Message: Logged In: YES user_id=351540 The problem was discovered under FreeBSD 4.4. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-17 05:30 Message: Logged In: YES user_id=38376 James, what platform(s) did you use? I'm not sure changing the parse_response() interface is a good idea, but if this is a Windows-only problem, there may be a slightly cleaner way to get the same end result. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 08:14 Message: Logged In: YES user_id=6380 My guess makefile() isn't buffering properly. This has been a long-standing problem on Windows; I'm not sure if it's an issue on Unix. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-01 06:34 Message: Logged In: YES user_id=38376 looks fine to me. I'll merge it with SLAB changes, and will check it into the 2.3 codebase asap. (we probably should try to figure out why makefile causes a 10x slowdown too -- xmlrpclib isn't exactly the only client library reading from a buffered socket) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 15:23 Message: Logged In: YES user_id=6380 Fredrik, does this look OK to you? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 From noreply@sourceforge.net Sun Mar 17 17:10:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 09:10:23 -0800 Subject: [Patches] [ python-Patches-518675 ] Adding galeon support Message-ID: Patches item #518675, was opened at 2002-02-17 06:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=518675&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Supreet Sethi (supreet) Assigned to: Nobody/Anonymous (nobody) Summary: Adding galeon support Initial Comment: It adds support galeon browser support in webbrowser lib. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 18:10 Message: Logged In: YES user_id=21627 Since the actual code isn't forthcoming, I'm rejecting the patch. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:38 Message: Logged In: YES user_id=6656 Feature --> not in 2.2.1 ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-19 18:53 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=518675&group_id=5470 From noreply@sourceforge.net Sun Mar 17 17:11:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 09:11:48 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-16 00:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) >Assigned to: Tim Peters (tim_one) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-17 17:11 Message: Logged In: YES user_id=35752 I'm not sure exactly what Tim meant by that comment. If we want to make PyMalloc available to EXTENSION modules then, yes, we need to remove the leading underscope and make a wrapper for it. I would prefer to keep it private for now since it gives us more freedom on how PyMalloc_New is implemented. Tim? Regarding the names, I have no problem with Py_Malloc. If we change should we keep PyMalloc_{New,NewVar,Del}? Py_New seems at little to short. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 10:12 Message: Logged In: YES user_id=21627 The patch looks good, except that it does not meet one of Tim's requirements: there is no way to spell "give me memory from the allocator that PyMalloc_New uses". _PyMalloc_Malloc is clearly not for general use, since it starts with an underscore. What about calling this allocator (which could be either PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free? Also, it appears that there is no function wrapper around this allocator: A module that uses the PyMalloc allocator will break in a configuration where pymalloc is disabled. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-16 03:50 Message: Logged In: YES user_id=35752 Okay, with-pymalloc is back but defaults to enabled. The functions PyMalloc_{Malloc,Realloc,Free} have been renamed to _PyMalloc_{Malloc,Realloc,Free}. Maybe their ugly names will discourage their use. People should use PyMalloc_{New,NewVar,Del} if they want to allocate objects using pymalloc. There's no way we can reuse PyObject_{New,NewVar,Del}. Memory can be allocated with PyObject_New and freed with PyObject_DEL. That would not work if PyObject_New used pymalloc. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-16 00:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Sun Mar 17 18:22:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 10:22:46 -0800 Subject: [Patches] [ python-Patches-485959 ] Final set of patches to Demo/tix Message-ID: Patches item #485959, was opened at 2001-11-27 12:16 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=485959&group_id=5470 Category: Tkinter Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Internet Discovery (idiscovery) Assigned to: Martin v. Lцwis (loewis) Summary: Final set of patches to Demo/tix Initial Comment: Final set of patches to Demo/tix - this should be it for a while. Tix.py: Fixed tix_configure and fixed some of the doc strings. tixwidgets.py: fixed loop, added some more docstrings, and made some progress on the global image1 problem. Look for the code around 'if 0:' - it may point towards a bug in Tkinter. Image class. Also if I can understand this proble, maybe I can solve the long outstanding bug described in Demo/tix/BUG.txt. samples/ Fixed SHList1 and 2 not quiting when run from samples/ and fixed a bug in all of the demos that was leaving zombie pythonw processes under Windows in 2.1.0 ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 19:22 Message: Logged In: YES user_id=21627 Thanks for the patches, applied as tixwidgets.py 1.6 Balloon.py 1.2 BtnBox.py 1.2 CmpImg.py 1.2 ComboBox.py 1.2 Control.py 1.3 DirList.py 1.2 DirTree.py 1.2 NoteBook.py 1.2 OptMenu.py 1.2 PopMenu.py 1.2 SHList1.py 1.3 SHList2.py 1.3 Tree.py 1.2 ---------------------------------------------------------------------- Comment By: Internet Discovery (idiscovery) Date: 2001-12-10 01:01 Message: Logged In: YES user_id=33229 Does the attached Tix.py patch run cleanly (it does for me). The bugs in Tix.py and tixwidgets.py should be be fixed before 2.2 goes final - the others are not so important. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-12-09 18:48 Message: Logged In: YES user_id=21627 I don't think I can find the time to look at this once more before 2.2 - it was already quite time-consuming the last time. I would apply a patch in the last minute with little inspection if it applies cleanly. As for the nature of the problems: I believe for atleast one of the files, the modified file was CRLF converted, so I can't use the file you provided, either. I recommend that you obtain a copy of cvs.exe for Windows, so that you don't have to use a web browser to download the files, and produce a diff. ---------------------------------------------------------------------- Comment By: Internet Discovery (idiscovery) Date: 2001-12-09 07:52 Message: Logged In: YES user_id=33229 Are the reject significant or just spurious linefeeds at the end of the file? Please apply the patches of the files that run cleanly as the patches are all independent and are all against the current files. The Tix.py and tixiwdgets.py contain important bug fixes. Let me know if the rejects are signiifcant; I don't always have access to unix cvs, and the SF CVS download option under Windows adds spurious linefeeds at the end. That's why I add the .dst files to the tar so you can see if any rejects are trivial. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-12-02 13:35 Message: Logged In: YES user_id=21627 These patches don't apply cleanly; I get patch rejects in Control.py.rej SHList1.py.rej SHList2.py.rej Please obtain the current version through CVS, and produce a 'cvs diff -u', instead of individual diff files. We only need the diffs; the original files aren't needed (so you don't need to produce a tar file, either). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=485959&group_id=5470 From noreply@sourceforge.net Sun Mar 17 18:38:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 10:38:39 -0800 Subject: [Patches] [ python-Patches-430706 ] Persistent connections in BaseHTTPServer Message-ID: Patches item #430706, was opened at 2001-06-06 17:33 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=430706&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Chris Lawrence (lordsutch) Assigned to: Martin v. Lцwis (loewis) Summary: Persistent connections in BaseHTTPServer Initial Comment: This patch provides HTTP/1.1 persistent connection support in BaseHTTPServer.py. It is not enabled by default (for backwards compatibility) because Content-Length headers must be supplied for persistent connections to work correctly. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 19:38 Message: Logged In: YES user_id=21627 Thanks for the patch. Applied as aseHTTPServer.py 1.19, SimpleHTTPServer.py 1.18, libbasehttp.tex 1.14, NEWS 1.364. ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2002-01-07 20:39 Message: Logged In: YES user_id=6757 Here's my current version of the patch; the main change is that errors now result in closing the connection. A cleaner approach for HTTP 1.1 would be to use Chunked Transfer Encoding for this, so the connection could remain available. I still get spurious IOErrors (due to SIGPIPEs) that result from clients closing connections. I believe this is because a lot of clients aren't well-behaved; i.e. they read the HTTP/1.1 response line then close the connection immediately. Using TCP_CORK on Linux for sockets might help there, but it's not a general solution. Also, I'm not really sure if these exceptions should be caught here or just left to subclasses to deal with... ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-01-01 21:21 Message: Logged In: YES user_id=21627 Any chance that an updated patch is forthcoming? ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2001-09-22 00:01 Message: Logged In: YES user_id=6757 I've tracked that one down and will have an updated patch in a day or two... basically it just needs another else condition to handle the empty readline(). There are also some issues for subclasses that probably need to be documented to play nicely with bad clients like wget that claim to be HTTP 1.0 but do HTTP 1.1 stuff. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-18 18:36 Message: Logged In: YES user_id=21627 It still doesn't work right. If I access SimpleHTTPServer from a Netscape client, I get error messages like localhost - - [18/Sep/2001 18:32:22] code 400, message Bad request syntax ('') localhost - - [18/Sep/2001 18:32:22] "" 400 - These are caused because the client closes the connection after the first request (likely, after it finds out that the document it got contains no references to the same server anymore). However, the server continues to invoke handle_one_request, which reads the empty line and fails to recognize that the client has closed the connection. ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2001-09-15 10:15 Message: Logged In: YES user_id=6757 I reworked the patch a bit to ensure HTTP 1.1 mode is only used if the handler class is in HTTP 1.1 mode, and modified the test() functions in the server classes to add a "protocol" option. I also modified SimpleHTTPServer to send Content-Length headers for the implemented classes. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-04 13:40 Message: Logged In: YES user_id=21627 The patch in its current form seems to be broken. To see the problem, please run SimpleHTTPServer on some directory, then access it with a HTTP/1.1 client (e.g. Netscape 4.7). The server will use the protocol version HTTP/1.0, but the client will initially send 1.1, and send a Connection: Keep-alive header. As a result, self.close_connection is set to 0, despite using HTTP/1.0. In turn, the HTTP server won't send a content length, and won't close the connection either. Netscape waits forever from some completion which never occurs, since the server waits for the next request on the same connection. It might be useful to enhance the SimpleHTTPServer test() function to optionally operate in HTTP/1.1 mode (including sending a proper ContentLength). Doing the same for the CGI HTTP server is probably less useful. ---------------------------------------------------------------------- Comment By: Chris Lawrence (lordsutch) Date: 2001-08-30 05:21 Message: Logged In: YES user_id=6757 I have updated the patch against current CVS and have added documentation. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-08-08 22:43 Message: Logged In: YES user_id=21627 I haven't studied the patch in detail, yet, but I have a few comments on the style: - there is no need to quote all authors of the RFC. Also, the reference to long-ago expired HTTP draft should go; just replace it with a single reference to the RFC number (giving an URL for the RFC might be convenient) - Where is the documentation? A patch to Doc/lib/libbasehttp.tex would be appreciated. If you don't speak TeX, don't worry: Just write plain text, we'll do the mark-up. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=430706&group_id=5470 From noreply@sourceforge.net Sun Mar 17 19:32:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 11:32:46 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-15 19:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) >Assigned to: Neil Schemenauer (nascheme) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-17 14:32 Message: Logged In: YES user_id=31435 I certainly want, e.g., that our Unicode implementation can choose to use obmalloc.c for its raw string storage, despite that it isn't "object storage" (in the sense of Vladimir's level "+2" in the diagram at the top of obmalloc.c; the current CVS code restricts obmalloc use to level +2, while raw string storage is at level "+1"). Allowing to use pymalloc at level +1 changes Vladimir's original intent, and we have no experience with it, so I'm fine with restricting that ability to the core at the start. About names, we've been calling this package "pymalloc" for years, and the general form of external name throughout Python is ["_"] "Py" Package "_" Function _PyMalloc_{Malloc, Free, etc} fit that pattern perfectly. I don't see the attraction to giving functions from this package idiosyncratic names, and we've got so many ways to spell "get memory" that I expect it will be a genuine help to keep on making it clear, from the name alone, to which "family" a given variant of "new" (etc) belongs. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-17 12:11 Message: Logged In: YES user_id=35752 I'm not sure exactly what Tim meant by that comment. If we want to make PyMalloc available to EXTENSION modules then, yes, we need to remove the leading underscope and make a wrapper for it. I would prefer to keep it private for now since it gives us more freedom on how PyMalloc_New is implemented. Tim? Regarding the names, I have no problem with Py_Malloc. If we change should we keep PyMalloc_{New,NewVar,Del}? Py_New seems at little to short. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 05:12 Message: Logged In: YES user_id=21627 The patch looks good, except that it does not meet one of Tim's requirements: there is no way to spell "give me memory from the allocator that PyMalloc_New uses". _PyMalloc_Malloc is clearly not for general use, since it starts with an underscore. What about calling this allocator (which could be either PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free? Also, it appears that there is no function wrapper around this allocator: A module that uses the PyMalloc allocator will break in a configuration where pymalloc is disabled. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-15 22:50 Message: Logged In: YES user_id=35752 Okay, with-pymalloc is back but defaults to enabled. The functions PyMalloc_{Malloc,Realloc,Free} have been renamed to _PyMalloc_{Malloc,Realloc,Free}. Maybe their ugly names will discourage their use. People should use PyMalloc_{New,NewVar,Del} if they want to allocate objects using pymalloc. There's no way we can reuse PyObject_{New,NewVar,Del}. Memory can be allocated with PyObject_New and freed with PyObject_DEL. That would not work if PyObject_New used pymalloc. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 19:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Sun Mar 17 19:42:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 11:42:26 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 07:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) >Assigned to: Nobody/Anonymous (nobody) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-17 14:42 Message: Logged In: YES user_id=31435 Michael, this definitely doesn't belong in 2.2.1 as-is, because it removes a currently-exported name (buggy or not, sensible or not, somebody may be using random.stdgamma now and be happy with it). John, if you're going to remove stdgamma, you need also to remove its (string) name from the module's __all__ list (right before the _verify() function). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-16 12:38 Message: Logged In: YES user_id=31435 Possibly, depending on whether it belongs in 2.3 -- I'm spread too thin to review it now. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 11:53 Message: Logged In: YES user_id=6656 Tim, do you think this should go into 2.2.1? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Sun Mar 17 20:46:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 12:46:45 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 23:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) Assigned to: Nobody/Anonymous (nobody) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- >Comment By: John Machin (sjmachin) Date: 2002-03-18 07:46 Message: Logged In: YES user_id=480138 OK; I understand the problems with the patch. Not sure about the way forward -- shall I prepare a patch that just fixes gammavariate() and leaves stdgamma() there (with warning in the comments: deprecated? will be removed in 2.x?)? Do you want it real soon now (for 2.2.1)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-18 06:42 Message: Logged In: YES user_id=31435 Michael, this definitely doesn't belong in 2.2.1 as-is, because it removes a currently-exported name (buggy or not, sensible or not, somebody may be using random.stdgamma now and be happy with it). John, if you're going to remove stdgamma, you need also to remove its (string) name from the module's __all__ list (right before the _verify() function). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 04:38 Message: Logged In: YES user_id=31435 Possibly, depending on whether it belongs in 2.3 -- I'm spread too thin to review it now. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-17 03:53 Message: Logged In: YES user_id=6656 Tim, do you think this should go into 2.2.1? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Sun Mar 17 21:47:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 13:47:20 -0800 Subject: [Patches] [ python-Patches-525763 ] minor fix for regen on IRIX Message-ID: Patches item #525763, was opened at 2002-03-05 03:59 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Michael Pruett (mpruett) Assigned to: Jack Jansen (jackjansen) Summary: minor fix for regen on IRIX Initial Comment: The Lib/plat-irix6/regen script does not catch IRIX 6 (only IRIX 4 and 5), and it doesn't handle systems which report themselves as running 'IRIX64' rather than just 'IRIX'. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-03-17 22:47 Message: Logged In: YES user_id=45365 Checked in as rev 1.3 ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:40 Message: Logged In: YES user_id=6656 Jack, can you look at this? It looks fine to me, but I've never even been near IRIX. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525763&group_id=5470 From noreply@sourceforge.net Sun Mar 17 21:51:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 13:51:34 -0800 Subject: [Patches] [ python-Patches-490100 ] Lets Tkinter work with MacOSX native Tk Message-ID: Patches item #490100, was opened at 2001-12-07 03:44 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490100&group_id=5470 Category: Macintosh Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Tony Lownds (tonylownds) Assigned to: Jack Jansen (jackjansen) Summary: Lets Tkinter work with MacOSX native Tk Initial Comment: There is a new Tcl/Tk in alpha that works on MacOSX's windowing layer natively. This patch adds calls necessary for Tkinter to work with it. The Tcl/Tk alpha can be picked up here: http://sourceforge.net/project/showfiles.php?group_id=10894 NOTE: The amount of extra code needed to interface with Tcl/Tk will probably go down with the next alpha of Tcl/Tk. ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-03-17 22:51 Message: Logged In: YES user_id=45365 This patch was applied 3 months ago, and noone seems to be willing to write a readme. Still, people seem succesful in getting this to work, so let's forget about the readme and close the patch. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2001-12-10 00:17 Message: Logged In: YES user_id=45365 The mods to _tkinter.c and tkappinit.c are in the repository. What still needs to be done is a readme file explaining where to obtain the X11 headers, what to put into Setup.local and how to run your Tkinter scripts. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2001-12-07 11:34 Message: Logged In: YES user_id=45365 I assume the sprintf change was a mistake (I've undone it after I applied the patch). Aside from that the patch looks harmless to other platforms, but I haven't gotten it to work yet. It fails compilation with a missing X11/Xlib.h include. If I can get it to compile at least once I'll put it in CVS before 2.2 (even though it is only useful to the real die-hards: it requires a Tk alfa, and only works under the experimental framework-based Python.app). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-12-07 10:20 Message: Logged In: YES user_id=21627 Please review your patches carefully before submitting them. Why does this change PyOS_snprintf to sprintf? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490100&group_id=5470 From noreply@sourceforge.net Sun Mar 17 21:55:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 13:55:53 -0800 Subject: [Patches] [ python-Patches-496096 ] Mach-O MacPython IDE! Message-ID: Patches item #496096, was opened at 2001-12-22 14:41 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=496096&group_id=5470 Category: Macintosh Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: Mach-O MacPython IDE! Initial Comment: Here it is... the moment we've all been waiting for... the MacPython IDE running in a bundle under Unix Python! It's a beautiful thing. Most everything works flawlessly. One major point though... it's always asking you to convert UNIX line endings to mac line endings! Heh. p.s. Jack: I took the quick route and assumed paths passed to FSSpec_New were slash- delimited. It works at least, and the ability to specify the delimiter can be added later. I wanted to get this in CVS ASAP. Donovan ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-03-17 22:55 Message: Logged In: YES user_id=45365 I think this patch can be closed by now. Most of it was applied, and as the IDE seems to work in MachoPython I guess the bits that weren't applied were fixed in a different way. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-01-21 23:33 Message: Logged In: YES user_id=45365 Donovan, I can't apply your patches: something seems to have gone wrong with tabs and spaces. I'll apply the most important ones manually (those in IDE, mainly) insofar as I didn't have a similar patch myself already. If you could later try to regenerate your patch for the other files that would be great! ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-01-21 22:52 Message: Logged In: YES user_id=45365 Donovan, I can't apply your patches: something seems to have gone wrong with tabs and spaces. I'll apply the most important ones manually (those in IDE, mainly) insofar as I didn't have a similar patch myself already. If you could later try to regenerate your patch for the other files that would be great! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=496096&group_id=5470 From noreply@sourceforge.net Sun Mar 17 21:56:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 13:56:14 -0800 Subject: [Patches] [ python-Patches-496096 ] Mach-O MacPython IDE! Message-ID: Patches item #496096, was opened at 2001-12-22 14:41 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=496096&group_id=5470 Category: Macintosh Group: Python 2.3 Status: Closed Resolution: Accepted Priority: 5 Submitted By: Donovan Preston (dsposx) Assigned to: Jack Jansen (jackjansen) Summary: Mach-O MacPython IDE! Initial Comment: Here it is... the moment we've all been waiting for... the MacPython IDE running in a bundle under Unix Python! It's a beautiful thing. Most everything works flawlessly. One major point though... it's always asking you to convert UNIX line endings to mac line endings! Heh. p.s. Jack: I took the quick route and assumed paths passed to FSSpec_New were slash- delimited. It works at least, and the ability to specify the delimiter can be added later. I wanted to get this in CVS ASAP. Donovan ---------------------------------------------------------------------- >Comment By: Jack Jansen (jackjansen) Date: 2002-03-17 22:56 Message: Logged In: YES user_id=45365 I think this patch can be closed by now. Most of it was applied, and as the IDE seems to work in MachoPython I guess the bits that weren't applied were fixed in a different way. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-03-17 22:55 Message: Logged In: YES user_id=45365 I think this patch can be closed by now. Most of it was applied, and as the IDE seems to work in MachoPython I guess the bits that weren't applied were fixed in a different way. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-01-21 23:33 Message: Logged In: YES user_id=45365 Donovan, I can't apply your patches: something seems to have gone wrong with tabs and spaces. I'll apply the most important ones manually (those in IDE, mainly) insofar as I didn't have a similar patch myself already. If you could later try to regenerate your patch for the other files that would be great! ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-01-21 22:52 Message: Logged In: YES user_id=45365 Donovan, I can't apply your patches: something seems to have gone wrong with tabs and spaces. I'll apply the most important ones manually (those in IDE, mainly) insofar as I didn't have a similar patch myself already. If you could later try to regenerate your patch for the other files that would be great! ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=496096&group_id=5470 From noreply@sourceforge.net Sun Mar 17 22:11:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 14:11:39 -0800 Subject: [Patches] [ python-Patches-480902 ] allow dumbdbm to reuse space Message-ID: Patches item #480902, was opened at 2001-11-12 07:30 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=480902&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Skip Montanaro (montanaro) Summary: allow dumbdbm to reuse space Initial Comment: This patch to dumbdbm does two things: * allows it to reuse holes in the .dat file * provides a somewhat more complete test The first change should be considered only for 2.3. Barry may or may not want to check out the test case rewrite for incorporation into 2.2. Accordingly, I've assigned it to him. Skip ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-14 19:45 Message: Logged In: YES user_id=44345 Unless someone else has an objection, I'm going to close this. Barry already incorporated the expanded test case and the space reuse is not really that important in my mind since dumbdbm is generally only a fallback when no other database is available. If someone wants to use a database bad enough, they will probably figure out a way to use something more powerful. Skip ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2001-11-13 14:16 Message: Logged In: YES user_id=12800 I've accepted the second half -- the improvement to the test suite -- but as recommended, I'm postponing the first half until Py 2.3. Assigning back to Skip so he'll remember to deal with this again later. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=480902&group_id=5470 From noreply@sourceforge.net Mon Mar 18 05:32:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 21:32:50 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 07:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) >Assigned to: Tim Peters (tim_one) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-18 00:32 Message: Logged In: YES user_id=31435 John, if I were you I'd leave stdgamma alone, except for adding this code to its start: import warnings warnings.warn("The stdgamma function is deprecated; " "use gammavariate() instead", DeprecationWarning) Then we can remove stdgamma in 2.4. 2.2.1 will probably go out on Monday night, so it would be nice to get this done before then. OTOH, I expect there will be a 2.2.2 later, so not a tragedy if it's not. ---------------------------------------------------------------------- Comment By: John Machin (sjmachin) Date: 2002-03-17 15:46 Message: Logged In: YES user_id=480138 OK; I understand the problems with the patch. Not sure about the way forward -- shall I prepare a patch that just fixes gammavariate() and leaves stdgamma() there (with warning in the comments: deprecated? will be removed in 2.x?)? Do you want it real soon now (for 2.2.1)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 14:42 Message: Logged In: YES user_id=31435 Michael, this definitely doesn't belong in 2.2.1 as-is, because it removes a currently-exported name (buggy or not, sensible or not, somebody may be using random.stdgamma now and be happy with it). John, if you're going to remove stdgamma, you need also to remove its (string) name from the module's __all__ list (right before the _verify() function). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-16 12:38 Message: Logged In: YES user_id=31435 Possibly, depending on whether it belongs in 2.3 -- I'm spread too thin to review it now. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 11:53 Message: Logged In: YES user_id=6656 Tim, do you think this should go into 2.2.1? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Mon Mar 18 07:07:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 17 Mar 2002 23:07:14 -0800 Subject: [Patches] [ python-Patches-523944 ] imputil.py can't import "\r\n" .py files Message-ID: Patches item #523944, was opened at 2002-02-28 09:17 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Mitch Chapman (mitchchapman) Assigned to: Greg Stein (gstein) >Summary: imputil.py can't import "\r\n" .py files Initial Comment: __builtin__.compile() requires that codestring line endings consist of "\n". imputil._compile() does not enforce this. One result is that imputil may be unable to import modules created on Win32. The attached patch to the latest (CVS revision 1.23) imputil.py replaces both "\r\n" and "\r" with "\n" before passing a code string to __builtin__.compile(). This is consistent with the behavior of e.g. Lib/py_compile.py. ---------------------------------------------------------------------- >Comment By: Greg Stein (gstein) Date: 2002-03-17 23:07 Message: Logged In: YES user_id=6501 I've been out this weekend, so no... won't make it by Monday the 18th. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 08:42 Message: Logged In: YES user_id=6656 Greg any chance of comments before 2.2.1c1, i.e. Monday? ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-06 09:14 Message: Logged In: YES user_id=38388 Assigning to Greg Stein -- imputil.py is his baby. ---------------------------------------------------------------------- Comment By: Mitch Chapman (mitchchapman) Date: 2002-03-06 09:03 Message: Logged In: YES user_id=348188 Please pardon if it's inappropriate to assign patches to project developers. I'm doing so on the advice of a post by Skip Montanaro. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523944&group_id=5470 From noreply@sourceforge.net Mon Mar 18 08:37:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 00:37:45 -0800 Subject: [Patches] [ python-Patches-525870 ] urllib2: duplicate call, stat attrs Message-ID: Patches item #525870, was opened at 2002-03-05 09:58 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525870&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2: duplicate call, stat attrs Initial Comment: This patch removes a duplicate call to os.stat in urllib2.FileHandler.open_local_file() In addition to that, it uses the new stat attributes, so importing stat is no longer neccessary. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 09:37 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as urllib2.py 1.26. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=525870&group_id=5470 From noreply@sourceforge.net Mon Mar 18 08:42:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 00:42:52 -0800 Subject: [Patches] [ python-Patches-523424 ] Finding "home" in "user.py" for Windows Message-ID: Patches item #523424, was opened at 2002-02-27 16:03 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523424&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: Gilles Lenfant (glenfant) Assigned to: Nobody/Anonymous (nobody) >Summary: Finding "home" in "user.py" for Windows Initial Comment: On my win2k French box + python 2.1.2: >>> import user >>> user.home 'C:\' This isn't a great issue but this means that all users of this win2k box will share the same ".pythonrc.py". The code provided by Jeff Bauer can be changed easily because the standard Python distro now has a "_winreg" module. This patch gives the real user $HOME like folder for any user on whatever's Windows localization: >>> import user >>> user.home u'C:\Documents and Settings\MyWindowsUsername\Mes documents' This has been successfully tested with Win98 and Win2000. This should be tested on XP, NT4, and 95 but I can't. Sorry for the "context or unified diffs" (dunno what it means) but the module is short and my patch is clearly emphasized. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 09:42 Message: Logged In: YES user_id=21627 If there are no further comments in favour of accepting this patch, it will be rejected. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-27 23:13 Message: Logged In: YES user_id=21627 If it returns "My Documents", it is definitely *not* the home directory of the user; \Documents and Settings\username would be the home directory. Furthermore, on many installations, HOME *is* set, and it is the Administrator's choice where that points to; the typical installation (in a domain) indeed is to assign HOMEDRIVE. So I'm not in favour of that change. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523424&group_id=5470 From noreply@sourceforge.net Mon Mar 18 08:48:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 00:48:00 -0800 Subject: [Patches] [ python-Patches-514628 ] bug in pydoc on python 2.2 release Message-ID: Patches item #514628, was opened at 2002-02-08 03:09 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514628&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Raj Kunjithapadam (mmaster25) Assigned to: Tim Peters (tim_one) Summary: bug in pydoc on python 2.2 release Initial Comment: pydoc has a bug when trying to generate html doc more importantly it has bug in the method writedoc() attached is my fix. Here is the diff between my fix and the regular dist 1338c1338 < def writedoc(thing, forceload=0): --- > def writedoc(key, forceload=0): 1340,1346c1340,1343 < object = thing < if type(thing) is type(''): < try: < object = locate(thing, forceload) < except ErrorDuringImport, value: < print value < return --- > try: > object = locate(key, forceload) > except ErrorDuringImport, value: > print value 1351c1348 < file = open(thing.__name__ + '.html', 'w') --- > file = open(key + '.html', 'w') 1354c1351 < print 'wrote', thing.__name__ + '.html' --- > print 'wrote', key + '.html' 1356c1353 < print 'no Python documentation found for %s' % repr(thing) --- > print 'no Python documentation found for %s' % repr(key) ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 09:48 Message: Logged In: YES user_id=21627 Can you please provide an example that demonstrates the problem? Also, can you please regenerate your changes as context (-c) or unified (-u) diffs, and attach those to this report (do *not* paste them into the comment field)? In their current, the patch is pretty useless: SF messed up the indentation, and it is an old-style patch, and pydoc.py is already at 1.58. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 23:45 Message: Logged In: YES user_id=6380 assigned to Tim; this may be Ping's terrain but Ping is typically not responsive. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514628&group_id=5470 From noreply@sourceforge.net Mon Mar 18 08:48:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 00:48:55 -0800 Subject: [Patches] [ python-Patches-513329 ] build, install in HP-UX10.20 Message-ID: Patches item #513329, was opened at 2002-02-05 16:48 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=513329&group_id=5470 Category: Build Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Claudio Scafuri (scafuri) Assigned to: Nobody/Anonymous (nobody) Summary: build, install in HP-UX10.20 Initial Comment: a) python must be linked with c++ because at least one file is compiled with c++. b) in hpux "install -d" does not create a directory. Use "mkdir instead. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-24 17:07 Message: Logged In: YES user_id=21627 If there isn't any further feedback by March 1, this report will be closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-13 02:25 Message: Logged In: YES user_id=21627 There is already code in configure[.in] that tests whether using c++ to link is needed (it isn't needed on all systems). Please report why this test fails on HP-UX, and try providing a patch that corrects the test. The relevant code is after the comment # If CXX is set, and if it is needed to link a main function that was # compiled with CXX, LINKCC is CXX instead. Also, please contribute changes as unified or context diffs; see the Python SourceForge usage guidelines for details. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=513329&group_id=5470 From noreply@sourceforge.net Mon Mar 18 08:57:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 00:57:16 -0800 Subject: [Patches] [ python-Patches-512466 ] Script to move faqwiz entries. Message-ID: Patches item #512466, was opened at 2002-02-03 21:17 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=512466&group_id=5470 Category: Demos and tools Group: Python 2.1.2 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Christian Reis (kiko_async) Assigned to: Nobody/Anonymous (nobody) Summary: Script to move faqwiz entries. Initial Comment: Moves entries from one section (number actually) to another. Doesn't do anything smart like renumber questions, but at least it doesn't clobber them. Usage: blackjesus:~> ./move-faqwiz.sh 2\.1 3\.2 Moving FAQ question 02.001 to 03.002 ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 09:57 Message: Logged In: YES user_id=21627 Thanks for the patch. Added as move-faqwiz.sh 1.1, README 1.13. ---------------------------------------------------------------------- Comment By: Christian Reis (kiko_async) Date: 2002-02-13 03:23 Message: Logged In: YES user_id=222305 Added file (duh). And of course you can: use Bugzilla :-) it's free software. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-02-13 02:30 Message: Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=512466&group_id=5470 From noreply@sourceforge.net Mon Mar 18 08:59:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 00:59:27 -0800 Subject: [Patches] [ python-Patches-511219 ] suppress type restrictions on locals() Message-ID: Patches item #511219, was opened at 2002-01-31 15:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Cesar Douady (douady) Assigned to: Nobody/Anonymous (nobody) Summary: suppress type restrictions on locals() Initial Comment: This patch suppresses the restriction that global and local dictionaries do not access overloaded __getitem__ and __setitem__ if passed an object derived from class dict. An exception is made for the builtin insertion and reference in the global dict to make sure this object exists and to suppress the need for the derived class to take care of this implementation dependent detail. The behavior of eval and exec has been updated for code objects which have the CO_NEWLOCALS flag set : if explicitely passed a local dict, a new local dict is not generated. This allows one to pass an explicit local dict to the code object of a function (which otherwise cannot be achieved). If this cannot be done for backward compatibility problems, then an alternative would consist in using the "new" module to create a code object from a function with CO_NEWLOCALS reset but it seems logical to me to use the information explicitely provided. Free and cell variables are not managed in this version. If the patch is accepted, I am willing to finish the job and implement free and cell variables, but this requires a serious rework of the Cell object: free variables should be accessed using the method of the dict in which they relies and today, this dict is not accessible from the Cell object. Robustness : Currently, the plain test suite passes (with a modification of test_desctut which precisely verifies that the suppressed restriction is enforced). I have introduced a new test (test_subdict.py) which verifies the new behavior. Because of performance, the plain case (when the local dict is a plain dict) is optimized so that differences in performance are not measurable (within 1%) when run on the test suite (i.e. I timed make test). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 09:59 Message: Logged In: YES user_id=21627 This is quite a complex change. If you want to see it integrated, I recommend that you find people that try it out and report their experience here. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470 From noreply@sourceforge.net Mon Mar 18 10:43:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 02:43:45 -0800 Subject: [Patches] [ python-Patches-499513 ] robotparser.py fails on some URLs Message-ID: Patches item #499513, was opened at 2002-01-04 19:21 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499513&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Bastian Kleineidam (calvin) Assigned to: Guido van Rossum (gvanrossum) Summary: robotparser.py fails on some URLs Initial Comment: I am using Python 2.1.1. The URL http://www.chaosreigns.com/robots.txt results in an empty RobotParser object. Reason is that the file object returned from the URLOpener does not have a readlines() attribute. I patched the robotparser.py to use readline() instead of readlines(). Furthermore I removed the unnecessary redirection limit code which is already in FancyURLopener. Greetings, Bastian ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 11:43 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as robotparser.py 1.12. ---------------------------------------------------------------------- Comment By: Bastian Kleineidam (calvin) Date: 2002-01-04 20:02 Message: Logged In: YES user_id=9205 Updated patch with copyright ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-01-04 19:49 Message: Logged In: YES user_id=6380 I'll gladly apply your patch. Would you mind to also supply a patch for the copyright statement? It says "Python 2.0 open source license" but that's no longer the current license. How about the PSF license agreement for Python 2.2? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499513&group_id=5470 From noreply@sourceforge.net Mon Mar 18 12:45:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 04:45:12 -0800 Subject: [Patches] [ python-Patches-495598 ] add an -q (quiet) option to pycompile Message-ID: Patches item #495598, was opened at 2001-12-20 21:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=495598&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: add an -q (quiet) option to pycompile Initial Comment: this patch is applied to Debian's python packages for more than two years allowing quiet batch compilations. --- python2.2-2.2.orig/Lib/compileall.py Wed Apr 18 03:20:21 2001 +++ python2.2-2.2/Lib/compileall.py Sun Sep 30 22:30:32 2001 @@ -4,6 +4,8 @@ given as arguments recursively; the -l option prevents it from recursing into directories. +DEBIAN adds an -q option for more quiet operation. + Without arguments, if compiles all modules on sys.path, without recursing into subdirectories. (Even though it should do so for packages -- for now, you'll have to deal with packages separately.) @@ -19,7 +21,7 @@ __all__ = ["compile_dir","compile_path"] -def compile_dir(dir, maxlevels=10, ddir=None, force=0, rx=None): +def compile_dir(dir, maxlevels=10, ddir=None, force=0, rx=None, quiet=0): """Byte-compile all modules in the given directory tree. Arguments (only dir is required): @@ -29,9 +31,10 @@ ddir: if given, purported directory name (this is the directory name that will show up in error messages) force: if 1, force compilation, even if timestamps are up-to-date + quiet: if 1, be quiet during compilation """ - print 'Listing', dir, '...' + if not quiet: print 'Listing', dir, '...' try: names = os.listdir(dir) except os.error: @@ -57,7 +60,7 @@ try: ctime = os.stat(cfile) [stat.ST_MTIME] except os.error: ctime = 0 if (ctime > ftime) and not force: continue - print 'Compiling', fullname, '...' + if not quiet: print 'Compiling', fullname, '...' try: ok = py_compile.compile(fullname, None, dfile) except KeyboardInterrupt: @@ -77,11 +80,11 @@ name != os.curdir and name != os.pardir and \ os.path.isdir(fullname) and \ not os.path.islink(fullname): - if not compile_dir(fullname, maxlevels - 1, dfile, force, rx): + if not compile_dir(fullname, maxlevels - 1, dfile, force, rx, quiet): success = 0 return success -def compile_path(skip_curdir=1, maxlevels=0, force=0): +def compile_path(skip_curdir=1, maxlevels=0, force=0, quiet=0): """Byte-compile all module on sys.path. Arguments (all optional): @@ -89,6 +92,7 @@ skip_curdir: if true, skip current directory (default true) maxlevels: max recursion level (default 0) force: as for compile_dir() (default 0) + quiet: as for compile_dir() (default 0) """ success = 1 @@ -96,20 +100,21 @@ if (not dir or dir == os.curdir) and skip_curdir: print 'Skipping current directory' else: - success = success and compile_dir(dir, maxlevels, None, force) + success = success and compile_dir(dir, maxlevels, None, force, quiet) return success def main(): """Script main program.""" import getopt try: - opts, args = getopt.getopt(sys.argv [1:], 'lfd:x:') + opts, args = getopt.getopt(sys.argv [1:], 'lfqd:x:') except getopt.error, msg: print msg - print "usage: python compileall.py [-l] [-f] [-d destdir] " \ + print "usage: python compileall.py [-l] [-f] [-q] [-d destdir] " \ "[-s regexp] [directory ...]" print "-l: don't recurse down" print "-f: force rebuild even if timestamps are up-to-date" + print "-q: quiet operation" print "-d destdir: purported directory name for error messages" print " if no directory arguments, -l sys.path is assumed" print "-x regexp: skip files matching the regular expression regexp" @@ -118,11 +123,13 @@ maxlevels = 10 ddir = None force = 0 + quiet = 0 rx = None for o, a in opts: if o == '-l': maxlevels = 0 if o == '-d': ddir = a if o == '-f': force = 1 + if o == '-q': quiet = 1 if o == '-x': import re rx = re.compile(a) @@ -134,7 +141,7 @@ try: if args: for dir in args: - if not compile_dir(dir, maxlevels, ddir, force, rx): + if not compile_dir(dir, maxlevels, ddir, force, rx, quiet): success = 0 else: success = compile_path() ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 13:45 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as NEWS 1.365, compileall.py 1.10, libcompileall.tex 1.3. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=495598&group_id=5470 From noreply@sourceforge.net Mon Mar 18 12:53:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 04:53:58 -0800 Subject: [Patches] [ python-Patches-458534 ] ncurses form module Message-ID: Patches item #458534, was opened at 2001-09-04 23:23 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=458534&group_id=5470 Category: Modules Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: A.M. Kuchling (akuchling) Assigned to: A.M. Kuchling (akuchling) Summary: ncurses form module Initial Comment: >From an e-mail sent to me privately: hello. i written extension for curses module this is not 100% jet Lambach Bartosz lda@lupa.pl ps. sorry, english is not my favorit ;) ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 13:53 Message: Logged In: YES user_id=21627 Rejecting the patch. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-09-05 16:16 Message: Logged In: YES user_id=21627 In the current form, the module seems to be unacceptable. It comes with no documentation, and no examples. I'd strongly encourage the author to provide a sample application. If he's willing to write some documentation, that would be also good. If he can write only Polish, I could help finding somebody who translates that into English afterwards. Note that there is also a complete interface to forms, and the rest of ncurses, in http://pyncurses.sourceforge.net/ ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=458534&group_id=5470 From noreply@sourceforge.net Mon Mar 18 12:57:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 04:57:52 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 23:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) Assigned to: Tim Peters (tim_one) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- >Comment By: John Machin (sjmachin) Date: 2002-03-18 23:57 Message: Logged In: YES user_id=480138 Patch file random2.dif uploaded. stdgamma() deprecated as per TP suggestion. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-18 16:32 Message: Logged In: YES user_id=31435 John, if I were you I'd leave stdgamma alone, except for adding this code to its start: import warnings warnings.warn("The stdgamma function is deprecated; " "use gammavariate() instead", DeprecationWarning) Then we can remove stdgamma in 2.4. 2.2.1 will probably go out on Monday night, so it would be nice to get this done before then. OTOH, I expect there will be a 2.2.2 later, so not a tragedy if it's not. ---------------------------------------------------------------------- Comment By: John Machin (sjmachin) Date: 2002-03-18 07:46 Message: Logged In: YES user_id=480138 OK; I understand the problems with the patch. Not sure about the way forward -- shall I prepare a patch that just fixes gammavariate() and leaves stdgamma() there (with warning in the comments: deprecated? will be removed in 2.x?)? Do you want it real soon now (for 2.2.1)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-18 06:42 Message: Logged In: YES user_id=31435 Michael, this definitely doesn't belong in 2.2.1 as-is, because it removes a currently-exported name (buggy or not, sensible or not, somebody may be using random.stdgamma now and be happy with it). John, if you're going to remove stdgamma, you need also to remove its (string) name from the module's __all__ list (right before the _verify() function). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 04:38 Message: Logged In: YES user_id=31435 Possibly, depending on whether it belongs in 2.3 -- I'm spread too thin to review it now. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-17 03:53 Message: Logged In: YES user_id=6656 Tim, do you think this should go into 2.2.1? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Mon Mar 18 13:05:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 05:05:07 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 23:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) Assigned to: Tim Peters (tim_one) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- >Comment By: John Machin (sjmachin) Date: 2002-03-19 00:05 Message: Logged In: YES user_id=480138 Attached is test script test_gamma.py. Passing test means: eye-balling of relative "errors" reveals no nasties for at least alpha >= 0.1 Note that Python's gammavariate() is not very accurate at all for alpha < 0.1 approx. However neither are another two methods that I tried (details in the file). I'll leave it at that -- evidently alpha < 1.0 is "rare and difficult" according to Marsaglia & Tsang. ---------------------------------------------------------------------- Comment By: John Machin (sjmachin) Date: 2002-03-18 23:57 Message: Logged In: YES user_id=480138 Patch file random2.dif uploaded. stdgamma() deprecated as per TP suggestion. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-18 16:32 Message: Logged In: YES user_id=31435 John, if I were you I'd leave stdgamma alone, except for adding this code to its start: import warnings warnings.warn("The stdgamma function is deprecated; " "use gammavariate() instead", DeprecationWarning) Then we can remove stdgamma in 2.4. 2.2.1 will probably go out on Monday night, so it would be nice to get this done before then. OTOH, I expect there will be a 2.2.2 later, so not a tragedy if it's not. ---------------------------------------------------------------------- Comment By: John Machin (sjmachin) Date: 2002-03-18 07:46 Message: Logged In: YES user_id=480138 OK; I understand the problems with the patch. Not sure about the way forward -- shall I prepare a patch that just fixes gammavariate() and leaves stdgamma() there (with warning in the comments: deprecated? will be removed in 2.x?)? Do you want it real soon now (for 2.2.1)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-18 06:42 Message: Logged In: YES user_id=31435 Michael, this definitely doesn't belong in 2.2.1 as-is, because it removes a currently-exported name (buggy or not, sensible or not, somebody may be using random.stdgamma now and be happy with it). John, if you're going to remove stdgamma, you need also to remove its (string) name from the module's __all__ list (right before the _verify() function). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 04:38 Message: Logged In: YES user_id=31435 Possibly, depending on whether it belongs in 2.3 -- I'm spread too thin to review it now. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-17 03:53 Message: Logged In: YES user_id=6656 Tim, do you think this should go into 2.2.1? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Mon Mar 18 13:08:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 05:08:25 -0800 Subject: [Patches] [ python-Patches-529408 ] fix random.gammavariate bug #527139 Message-ID: Patches item #529408, was opened at 2002-03-13 12:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: John Machin (sjmachin) Assigned to: Tim Peters (tim_one) Summary: fix random.gammavariate bug #527139 Initial Comment: random.gammavariate() doesn't work for gamma < 0.5 See detailed comment on bug # 527139 ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-18 13:08 Message: Logged In: YES user_id=6656 I'm afraid this isn't going to make 2.2.1c1. I'll try to consider it before 2.2.1 final, but I'd want to be very certain about things before applying it there. ---------------------------------------------------------------------- Comment By: John Machin (sjmachin) Date: 2002-03-18 13:05 Message: Logged In: YES user_id=480138 Attached is test script test_gamma.py. Passing test means: eye-balling of relative "errors" reveals no nasties for at least alpha >= 0.1 Note that Python's gammavariate() is not very accurate at all for alpha < 0.1 approx. However neither are another two methods that I tried (details in the file). I'll leave it at that -- evidently alpha < 1.0 is "rare and difficult" according to Marsaglia & Tsang. ---------------------------------------------------------------------- Comment By: John Machin (sjmachin) Date: 2002-03-18 12:57 Message: Logged In: YES user_id=480138 Patch file random2.dif uploaded. stdgamma() deprecated as per TP suggestion. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-18 05:32 Message: Logged In: YES user_id=31435 John, if I were you I'd leave stdgamma alone, except for adding this code to its start: import warnings warnings.warn("The stdgamma function is deprecated; " "use gammavariate() instead", DeprecationWarning) Then we can remove stdgamma in 2.4. 2.2.1 will probably go out on Monday night, so it would be nice to get this done before then. OTOH, I expect there will be a 2.2.2 later, so not a tragedy if it's not. ---------------------------------------------------------------------- Comment By: John Machin (sjmachin) Date: 2002-03-17 20:46 Message: Logged In: YES user_id=480138 OK; I understand the problems with the patch. Not sure about the way forward -- shall I prepare a patch that just fixes gammavariate() and leaves stdgamma() there (with warning in the comments: deprecated? will be removed in 2.x?)? Do you want it real soon now (for 2.2.1)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 19:42 Message: Logged In: YES user_id=31435 Michael, this definitely doesn't belong in 2.2.1 as-is, because it removes a currently-exported name (buggy or not, sensible or not, somebody may be using random.stdgamma now and be happy with it). John, if you're going to remove stdgamma, you need also to remove its (string) name from the module's __all__ list (right before the _verify() function). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-16 17:38 Message: Logged In: YES user_id=31435 Possibly, depending on whether it belongs in 2.3 -- I'm spread too thin to review it now. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:53 Message: Logged In: YES user_id=6656 Tim, do you think this should go into 2.2.1? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=529408&group_id=5470 From noreply@sourceforge.net Mon Mar 18 13:45:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 05:45:21 -0800 Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance Message-ID: Patches item #504943, was opened at 2002-01-17 11:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: call warnings.warn with Warning instance Initial Comment: This patch makes it possible to pass Warning instances as the first argument to warnings.warn. In this case the category argument will be ignored. The message text used will be str(warninginstance). This makes it possible to implement special logic in a custom Warning class by implemening the __str__ method. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-18 08:45 Message: Logged In: YES user_id=6380 Nice idea. Where's the documentation patch? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 From noreply@sourceforge.net Mon Mar 18 13:58:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 05:58:09 -0800 Subject: [Patches] [ python-Patches-499513 ] robotparser.py fails on some URLs Message-ID: Patches item #499513, was opened at 2002-01-04 13:21 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499513&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Accepted Priority: 5 Submitted By: Bastian Kleineidam (calvin) >Assigned to: Martin v. Lцwis (loewis) Summary: robotparser.py fails on some URLs Initial Comment: I am using Python 2.1.1. The URL http://www.chaosreigns.com/robots.txt results in an empty RobotParser object. Reason is that the file object returned from the URLOpener does not have a readlines() attribute. I patched the robotparser.py to use readline() instead of readlines(). Furthermore I removed the unnecessary redirection limit code which is already in FancyURLopener. Greetings, Bastian ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 05:43 Message: Logged In: YES user_id=21627 Thanks for the patch. Committed as robotparser.py 1.12. ---------------------------------------------------------------------- Comment By: Bastian Kleineidam (calvin) Date: 2002-01-04 14:02 Message: Logged In: YES user_id=9205 Updated patch with copyright ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-01-04 13:49 Message: Logged In: YES user_id=6380 I'll gladly apply your patch. Would you mind to also supply a patch for the copyright statement? It says "Python 2.0 open source license" but that's no longer the current license. How about the PSF license agreement for Python 2.2? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=499513&group_id=5470 From noreply@sourceforge.net Mon Mar 18 15:01:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 07:01:43 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-18 16:01 Message: Logged In: YES user_id=88611 As far as I can see, the problems are: relocation of binary/library path (this is solved by adding -R to LDSHARED depending on platform) SOVERSION - some systems like it, some do not. If you do SOVERSION, you must create a link to the proper version in the installation phase. IMO we can just avoid versioning at all and let the distribution builders do it themselves. The other way is to attach full version of python as SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1). I'm the author of the patch (ppython.diff). I'm not the author of the file dynamic.diff, I have included it here by accident and if it is possible to delete it from this page, it should be done. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 15:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 17:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 15:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 12:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 11:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 11:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 18:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Mon Mar 18 15:46:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 07:46:42 -0800 Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance Message-ID: Patches item #504943, was opened at 2002-01-17 17:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: call warnings.warn with Warning instance Initial Comment: This patch makes it possible to pass Warning instances as the first argument to warnings.warn. In this case the category argument will be ignored. The message text used will be str(warninginstance). This makes it possible to implement special logic in a custom Warning class by implemening the __str__ method. ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-18 16:46 Message: Logged In: YES user_id=89016 The new version includes a patch to the documentation and an entry in Misc/NEWS ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-18 14:45 Message: Logged In: YES user_id=6380 Nice idea. Where's the documentation patch? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 From noreply@sourceforge.net Mon Mar 18 18:38:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 10:38:10 -0800 Subject: [Patches] [ python-Patches-531480 ] Use new GC API (generators, iters, ...) Message-ID: Patches item #531480, was opened at 2002-03-18 18:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Guido van Rossum (gvanrossum) Summary: Use new GC API (generators, iters, ...) Initial Comment: I just noticed that iterators, generators and method objects are still using the old GC API. I thought I fixed these. Is it possible that branch merging backed out the changes? Maybe my memory is bad. Anyhow, this patch restores GC of these objects. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470 From noreply@sourceforge.net Mon Mar 18 19:31:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 11:31:19 -0800 Subject: [Patches] [ python-Patches-531491 ] PEP 4 update: deprecations Message-ID: Patches item #531491, was opened at 2002-03-18 14:31 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531491&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Barry Warsaw (bwarsaw) Assigned to: Martin v. Lцwis (loewis) Summary: PEP 4 update: deprecations Initial Comment: The following modules should be deprecated for Python 2.3: mimify.py, rfc822.py, MIMEWriter.py, and mimetools.py. All are supplanted by Python 2.2's email package. Attached is the proposed mod to PEP 4 as per procedure described therein. I am not including mods to the module documents as those should be easy to add. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531491&group_id=5470 From noreply@sourceforge.net Mon Mar 18 19:37:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 11:37:04 -0800 Subject: [Patches] [ python-Patches-531493 ] drop PyCore_* API layer Message-ID: Patches item #531493, was opened at 2002-03-18 19:37 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Tim Peters (tim_one) Summary: drop PyCore_* API layer Initial Comment: I think we need to sort out the pymalloc situation using smaller steps. I already checked in the first step. This patch is the next step and removes the PyCore_* API layer. Vladimir said it could go and I agree. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470 From noreply@sourceforge.net Mon Mar 18 19:41:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 11:41:10 -0800 Subject: [Patches] [ python-Patches-473586 ] SimpleXMLRPCServer - fixes and CGI Message-ID: Patches item #473586, was opened at 2001-10-22 00:26 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=473586&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Fredrik Lundh (effbot) Summary: SimpleXMLRPCServer - fixes and CGI Initial Comment: Changes: o treats xmlrpclib.Fault's correctly (no longer absorbes them as generic exceptions) o changed failed marshal to generate a useful Fault instead of an internal server error o adds a new class to make writing XML-RPC functions embedded in other servers, using CGI, easier (tested with APACHE) o to support the above, added a new dispatch helper class SimpleXMLRPCDispatcher ---------------------------------------------------------------------- >Comment By: Brian Quinlan (bquinlan) Date: 2002-03-18 11:41 Message: Logged In: YES user_id=108973 OK, I fixed the backwards compatibility problem. Also added: o support for the XML-RPC introspection methods system.listMethods and system.methodHelp o support for the XML-RPC boxcaring method system.multicall ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-12-04 11:51 Message: Logged In: YES user_id=108973 Please do not accept this patch past 2.2 release; there are so non-backwards compatible changes that need to be though through. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-10-23 11:02 Message: Logged In: YES user_id=108973 - a few extra comments - moved a xmlrpclib.loads() inside an exception handler so an XML-RPC fault is generated for malformed requests ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-10-22 11:59 Message: Logged In: YES user_id=108973 The advantage of the entire patch being accepted before 2.2 is that there is an API change and, once 2.2 is release, we will probably have to make a bit of an attempt to maintain backwards compatibility. If this patch is too high-risk for 2.2 then I can certainly design a bug-fix patch for 2.2 and submit a new patch for 2.3 (that is API compatible with 2.2). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-10-22 11:43 Message: Logged In: YES user_id=21627 Brian, please note that Python 2.2b1 has been released, so no new features are acceptable until 2.2. So unless Fredrik Lundh wants to accept your entire patch, I think it has little chance to get integrated for the next few months. If you want pieces of it accepted, I'd recommend to split it into bug fixes and new features; bug fixes are still acceptable. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-10-22 11:27 Message: Logged In: YES user_id=108973 I just can't stop mucking with it. This time there are only documentation changes. I should also have pointed out that this patch changes the mechanism for overriding the dispatch mechanism: you used to subclass the request handler, now you subclass the server. I believe that this change is correct because the server actually has the required state information to do the dispatching. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2001-10-22 00:35 Message: Logged In: YES user_id=108973 Changed a name to fit other naming conventions ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=473586&group_id=5470 From noreply@sourceforge.net Mon Mar 18 20:09:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 12:09:24 -0800 Subject: [Patches] [ python-Patches-531480 ] Use new GC API (generators, iters, ...) Message-ID: Patches item #531480, was opened at 2002-03-18 13:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Guido van Rossum (gvanrossum) Summary: Use new GC API (generators, iters, ...) Initial Comment: I just noticed that iterators, generators and method objects are still using the old GC API. I thought I fixed these. Is it possible that branch merging backed out the changes? Maybe my memory is bad. Anyhow, this patch restores GC of these objects. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-18 15:09 Message: Logged In: YES user_id=6380 Oops. I suggest you check this in, and mark it as a 2.2 bugfix candidate. We'll see what Michael says. :-) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470 From noreply@sourceforge.net Mon Mar 18 20:30:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 12:30:21 -0800 Subject: [Patches] [ python-Patches-531493 ] drop PyCore_* API layer Message-ID: Patches item #531493, was opened at 2002-03-18 14:37 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) >Assigned to: Neil Schemenauer (nascheme) Summary: drop PyCore_* API layer Initial Comment: I think we need to sort out the pymalloc situation using smaller steps. I already checked in the first step. This patch is the next step and removes the PyCore_* API layer. Vladimir said it could go and I agree. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-18 15:30 Message: Logged In: YES user_id=31435 I like it too. +1. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470 From noreply@sourceforge.net Mon Mar 18 20:47:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 12:47:40 -0800 Subject: [Patches] [ python-Patches-531480 ] Use new GC API (generators, iters, ...) Message-ID: Patches item #531480, was opened at 2002-03-18 18:38 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Guido van Rossum (gvanrossum) Summary: Use new GC API (generators, iters, ...) Initial Comment: I just noticed that iterators, generators and method objects are still using the old GC API. I thought I fixed these. Is it possible that branch merging backed out the changes? Maybe my memory is bad. Anyhow, this patch restores GC of these objects. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 20:47 Message: Logged In: YES user_id=35752 Checked in but not marked as a bugfix candidate. I don't think this counts as a bug fix. I guess Michael can make up his own mind I guess. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-18 20:09 Message: Logged In: YES user_id=6380 Oops. I suggest you check this in, and mark it as a 2.2 bugfix candidate. We'll see what Michael says. :-) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531480&group_id=5470 From noreply@sourceforge.net Mon Mar 18 22:28:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 14:28:14 -0800 Subject: [Patches] [ python-Patches-531493 ] drop PyCore_* API layer Message-ID: Patches item #531493, was opened at 2002-03-18 19:37 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Neil Schemenauer (nascheme) >Assigned to: Tim Peters (tim_one) Summary: drop PyCore_* API layer Initial Comment: I think we need to sort out the pymalloc situation using smaller steps. I already checked in the first step. This patch is the next step and removes the PyCore_* API layer. Vladimir said it could go and I agree. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 22:28 Message: Logged In: YES user_id=35752 Checked in. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-18 20:30 Message: Logged In: YES user_id=31435 I like it too. +1. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531493&group_id=5470 From noreply@sourceforge.net Mon Mar 18 23:00:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 15:00:32 -0800 Subject: [Patches] [ python-Patches-531629 ] Add multicall support to xmlrpclib Message-ID: Patches item #531629, was opened at 2002-03-18 15:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531629&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: Add multicall support to xmlrpclib Initial Comment: Adds a new object to xmlrpclib that allows the user to boxcared XML-RPC requests e.g. server_proxy = ServerProxy(...) multicall = MultiCall(server_proxy) multicall.add(2,3) multicall.get_address("Guido") add_result, address = multicall() see http://www.xmlrpc.com/discuss/msgReader$1208 ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531629&group_id=5470 From noreply@sourceforge.net Mon Mar 18 23:08:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 15:08:15 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-16 00:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Neil Schemenauer (nascheme) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 23:08 Message: Logged In: YES user_id=35752 Update patch to latest CVS. It's now about 1/3 of its original size. We still need documentation for PyMalloc_{New,NewVar,Del}. Other than the docs, the only thing left to do is decide if we want the new API. The situation with extension modules is not as bad as I originally thought. The xxmodule.c example has been correct since version 1.6. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 19:32 Message: Logged In: YES user_id=31435 I certainly want, e.g., that our Unicode implementation can choose to use obmalloc.c for its raw string storage, despite that it isn't "object storage" (in the sense of Vladimir's level "+2" in the diagram at the top of obmalloc.c; the current CVS code restricts obmalloc use to level +2, while raw string storage is at level "+1"). Allowing to use pymalloc at level +1 changes Vladimir's original intent, and we have no experience with it, so I'm fine with restricting that ability to the core at the start. About names, we've been calling this package "pymalloc" for years, and the general form of external name throughout Python is ["_"] "Py" Package "_" Function _PyMalloc_{Malloc, Free, etc} fit that pattern perfectly. I don't see the attraction to giving functions from this package idiosyncratic names, and we've got so many ways to spell "get memory" that I expect it will be a genuine help to keep on making it clear, from the name alone, to which "family" a given variant of "new" (etc) belongs. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-17 17:11 Message: Logged In: YES user_id=35752 I'm not sure exactly what Tim meant by that comment. If we want to make PyMalloc available to EXTENSION modules then, yes, we need to remove the leading underscope and make a wrapper for it. I would prefer to keep it private for now since it gives us more freedom on how PyMalloc_New is implemented. Tim? Regarding the names, I have no problem with Py_Malloc. If we change should we keep PyMalloc_{New,NewVar,Del}? Py_New seems at little to short. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 10:12 Message: Logged In: YES user_id=21627 The patch looks good, except that it does not meet one of Tim's requirements: there is no way to spell "give me memory from the allocator that PyMalloc_New uses". _PyMalloc_Malloc is clearly not for general use, since it starts with an underscore. What about calling this allocator (which could be either PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free? Also, it appears that there is no function wrapper around this allocator: A module that uses the PyMalloc allocator will break in a configuration where pymalloc is disabled. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-16 03:50 Message: Logged In: YES user_id=35752 Okay, with-pymalloc is back but defaults to enabled. The functions PyMalloc_{Malloc,Realloc,Free} have been renamed to _PyMalloc_{Malloc,Realloc,Free}. Maybe their ugly names will discourage their use. People should use PyMalloc_{New,NewVar,Del} if they want to allocate objects using pymalloc. There's no way we can reuse PyObject_{New,NewVar,Del}. Memory can be allocated with PyObject_New and freed with PyObject_DEL. That would not work if PyObject_New used pymalloc. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-16 00:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Mon Mar 18 23:23:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 18 Mar 2002 15:23:37 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-16 00:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Neil Schemenauer (nascheme) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 23:23 Message: Logged In: YES user_id=35752 Oops, forgot one important change in the last update. PyObject_MALLOC needs to use PyMem_MALLOC not _PyMalloc_MALLOC. Clear as mud, no? :-) ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 23:08 Message: Logged In: YES user_id=35752 Update patch to latest CVS. It's now about 1/3 of its original size. We still need documentation for PyMalloc_{New,NewVar,Del}. Other than the docs, the only thing left to do is decide if we want the new API. The situation with extension modules is not as bad as I originally thought. The xxmodule.c example has been correct since version 1.6. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 19:32 Message: Logged In: YES user_id=31435 I certainly want, e.g., that our Unicode implementation can choose to use obmalloc.c for its raw string storage, despite that it isn't "object storage" (in the sense of Vladimir's level "+2" in the diagram at the top of obmalloc.c; the current CVS code restricts obmalloc use to level +2, while raw string storage is at level "+1"). Allowing to use pymalloc at level +1 changes Vladimir's original intent, and we have no experience with it, so I'm fine with restricting that ability to the core at the start. About names, we've been calling this package "pymalloc" for years, and the general form of external name throughout Python is ["_"] "Py" Package "_" Function _PyMalloc_{Malloc, Free, etc} fit that pattern perfectly. I don't see the attraction to giving functions from this package idiosyncratic names, and we've got so many ways to spell "get memory" that I expect it will be a genuine help to keep on making it clear, from the name alone, to which "family" a given variant of "new" (etc) belongs. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-17 17:11 Message: Logged In: YES user_id=35752 I'm not sure exactly what Tim meant by that comment. If we want to make PyMalloc available to EXTENSION modules then, yes, we need to remove the leading underscope and make a wrapper for it. I would prefer to keep it private for now since it gives us more freedom on how PyMalloc_New is implemented. Tim? Regarding the names, I have no problem with Py_Malloc. If we change should we keep PyMalloc_{New,NewVar,Del}? Py_New seems at little to short. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 10:12 Message: Logged In: YES user_id=21627 The patch looks good, except that it does not meet one of Tim's requirements: there is no way to spell "give me memory from the allocator that PyMalloc_New uses". _PyMalloc_Malloc is clearly not for general use, since it starts with an underscore. What about calling this allocator (which could be either PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free? Also, it appears that there is no function wrapper around this allocator: A module that uses the PyMalloc allocator will break in a configuration where pymalloc is disabled. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-16 03:50 Message: Logged In: YES user_id=35752 Okay, with-pymalloc is back but defaults to enabled. The functions PyMalloc_{Malloc,Realloc,Free} have been renamed to _PyMalloc_{Malloc,Realloc,Free}. Maybe their ugly names will discourage their use. People should use PyMalloc_{New,NewVar,Del} if they want to allocate objects using pymalloc. There's no way we can reuse PyObject_{New,NewVar,Del}. Memory can be allocated with PyObject_New and freed with PyObject_DEL. That would not work if PyObject_New used pymalloc. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-16 00:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Tue Mar 19 09:24:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 01:24:30 -0800 Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response Message-ID: Patches item #517256, was opened at 2002-02-14 00:48 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 Category: Library (Lib) Group: Python 2.1.2 Status: Open Resolution: Accepted Priority: 5 Submitted By: James Rucker (jamesrucker) Assigned to: Fredrik Lundh (effbot) Summary: poor performance in xmlrpc response Initial Comment: xmlrpclib.Transport.parse_response() (called from xmlrpclib.Transport.request()) is exhibiting poor performance - approx. 10x slower than expected. I investigated based on using a simple app that sent a msg to a server, where all the server did was return the message back to the caller. From profiling, it became clear that the return trip was taken 10x the time consumed by the client->server trip, and that the time was spent getting things across the wire. parse_response() reads from a file object created via socket.makefile(), and as a result exhibits performance that is about an order of magnitude worse than what it would be if socket.recv() were used on the socket. The patch provided uses socket.recv() when possible, to improve performance. The patch provided is against revision 1.15. Its use provides performance for the return trip that is more or less equivalent to that of the forward trip. ---------------------------------------------------------------------- >Comment By: Fredrik Lundh (effbot) Date: 2002-03-19 10:24 Message: Logged In: YES user_id=38376 What server did you use? In all my test setups, h._conn.sock is None at the time parse_response is called... ---------------------------------------------------------------------- Comment By: James Rucker (jamesrucker) Date: 2002-03-17 17:13 Message: Logged In: YES user_id=351540 The problem was discovered under FreeBSD 4.4. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-17 14:30 Message: Logged In: YES user_id=38376 James, what platform(s) did you use? I'm not sure changing the parse_response() interface is a good idea, but if this is a Windows-only problem, there may be a slightly cleaner way to get the same end result. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:14 Message: Logged In: YES user_id=6380 My guess makefile() isn't buffering properly. This has been a long-standing problem on Windows; I'm not sure if it's an issue on Unix. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-01 15:34 Message: Logged In: YES user_id=38376 looks fine to me. I'll merge it with SLAB changes, and will check it into the 2.3 codebase asap. (we probably should try to figure out why makefile causes a 10x slowdown too -- xmlrpclib isn't exactly the only client library reading from a buffered socket) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 00:23 Message: Logged In: YES user_id=6380 Fredrik, does this look OK to you? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 From noreply@sourceforge.net Tue Mar 19 10:57:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 02:57:37 -0800 Subject: [Patches] [ python-Patches-511219 ] suppress type restrictions on locals() Message-ID: Patches item #511219, was opened at 2002-01-31 15:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Cesar Douady (douady) Assigned to: Nobody/Anonymous (nobody) Summary: suppress type restrictions on locals() Initial Comment: This patch suppresses the restriction that global and local dictionaries do not access overloaded __getitem__ and __setitem__ if passed an object derived from class dict. An exception is made for the builtin insertion and reference in the global dict to make sure this object exists and to suppress the need for the derived class to take care of this implementation dependent detail. The behavior of eval and exec has been updated for code objects which have the CO_NEWLOCALS flag set : if explicitely passed a local dict, a new local dict is not generated. This allows one to pass an explicit local dict to the code object of a function (which otherwise cannot be achieved). If this cannot be done for backward compatibility problems, then an alternative would consist in using the "new" module to create a code object from a function with CO_NEWLOCALS reset but it seems logical to me to use the information explicitely provided. Free and cell variables are not managed in this version. If the patch is accepted, I am willing to finish the job and implement free and cell variables, but this requires a serious rework of the Cell object: free variables should be accessed using the method of the dict in which they relies and today, this dict is not accessible from the Cell object. Robustness : Currently, the plain test suite passes (with a modification of test_desctut which precisely verifies that the suppressed restriction is enforced). I have introduced a new test (test_subdict.py) which verifies the new behavior. Because of performance, the plain case (when the local dict is a plain dict) is optimized so that differences in performance are not measurable (within 1%) when run on the test suite (i.e. I timed make test). ---------------------------------------------------------------------- >Comment By: Cesar Douady (douady) Date: 2002-03-19 11:57 Message: Logged In: YES user_id=428521 Granted. Seems fair. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-18 09:59 Message: Logged In: YES user_id=21627 This is quite a complex change. If you want to see it integrated, I recommend that you find people that try it out and report their experience here. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470 From noreply@sourceforge.net Tue Mar 19 14:35:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 06:35:21 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 15:35 Message: Logged In: YES user_id=21627 The patch looks quite good. There are a number of remaining issues that need to be resolved, though: - please regenerate the patch against the current CVS. As is, it fails to apply; parts of it are already in the CVS (the thr_create changes) - I think the SOVERSION should be 1.0, atleast initially: for most Python releases, there will be only a single release of the shared library, which should be named 1.0. - Why do you think that no rpath is needed on Linux? It is not needed if prefix is /usr, and on many installations, it is also not needed if prefix is /usr/local. For all other configurations, you still need a rpath on Linux. - IMO, there could be a default case, assuming SysV-ish configurations. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-18 16:01 Message: Logged In: YES user_id=88611 As far as I can see, the problems are: relocation of binary/library path (this is solved by adding -R to LDSHARED depending on platform) SOVERSION - some systems like it, some do not. If you do SOVERSION, you must create a link to the proper version in the installation phase. IMO we can just avoid versioning at all and let the distribution builders do it themselves. The other way is to attach full version of python as SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1). I'm the author of the patch (ppython.diff). I'm not the author of the file dynamic.diff, I have included it here by accident and if it is possible to delete it from this page, it should be done. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 15:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 17:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 15:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 12:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 11:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 11:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 18:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Tue Mar 19 15:13:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 07:13:46 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 16:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 15:13 Message: Logged In: YES user_id=10327 A SOVERSION of 0.0 makes perfect sense for the CVS head. Release versions should probably use 1.0. I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0. That way, installing a newer CVS version won't instantly beak everything people have built with it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 14:35 Message: Logged In: YES user_id=21627 The patch looks quite good. There are a number of remaining issues that need to be resolved, though: - please regenerate the patch against the current CVS. As is, it fails to apply; parts of it are already in the CVS (the thr_create changes) - I think the SOVERSION should be 1.0, atleast initially: for most Python releases, there will be only a single release of the shared library, which should be named 1.0. - Why do you think that no rpath is needed on Linux? It is not needed if prefix is /usr, and on many installations, it is also not needed if prefix is /usr/local. For all other configurations, you still need a rpath on Linux. - IMO, there could be a default case, assuming SysV-ish configurations. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-18 15:01 Message: Logged In: YES user_id=88611 As far as I can see, the problems are: relocation of binary/library path (this is solved by adding -R to LDSHARED depending on platform) SOVERSION - some systems like it, some do not. If you do SOVERSION, you must create a link to the proper version in the installation phase. IMO we can just avoid versioning at all and let the distribution builders do it themselves. The other way is to attach full version of python as SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1). I'm the author of the patch (ppython.diff). I'm not the author of the file dynamic.diff, I have included it here by accident and if it is possible to delete it from this page, it should be done. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 16:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 14:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 11:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 10:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 10:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 18:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 17:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Tue Mar 19 15:53:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 07:53:46 -0800 Subject: [Patches] [ python-Patches-531901 ] binary packagers Message-ID: Patches item #531901, was opened at 2002-03-19 15:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Alexander (mwa) Assigned to: Nobody/Anonymous (nobody) Summary: binary packagers Initial Comment: zip file with updated Solaris and HP-UX packagers. Replaces 415226, 415227, 415228. Changes made to take advantage of new PEP241 changes in the Distribution class. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 From noreply@sourceforge.net Tue Mar 19 16:13:53 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 08:13:53 -0800 Subject: [Patches] [ python-Patches-531901 ] binary packagers Message-ID: Patches item #531901, was opened at 2002-03-19 15:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Alexander (mwa) >Assigned to: M.-A. Lemburg (lemburg) Summary: binary packagers Initial Comment: zip file with updated Solaris and HP-UX packagers. Replaces 415226, 415227, 415228. Changes made to take advantage of new PEP241 changes in the Distribution class. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 From noreply@sourceforge.net Tue Mar 19 17:05:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 09:05:05 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 18:05 Message: Logged In: YES user_id=21627 The CVS version will usually use a completely different library name (e.g. libpython23.so), so there will be no conflicts with prior versions. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 16:13 Message: Logged In: YES user_id=10327 A SOVERSION of 0.0 makes perfect sense for the CVS head. Release versions should probably use 1.0. I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0. That way, installing a newer CVS version won't instantly beak everything people have built with it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 15:35 Message: Logged In: YES user_id=21627 The patch looks quite good. There are a number of remaining issues that need to be resolved, though: - please regenerate the patch against the current CVS. As is, it fails to apply; parts of it are already in the CVS (the thr_create changes) - I think the SOVERSION should be 1.0, atleast initially: for most Python releases, there will be only a single release of the shared library, which should be named 1.0. - Why do you think that no rpath is needed on Linux? It is not needed if prefix is /usr, and on many installations, it is also not needed if prefix is /usr/local. For all other configurations, you still need a rpath on Linux. - IMO, there could be a default case, assuming SysV-ish configurations. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-18 16:01 Message: Logged In: YES user_id=88611 As far as I can see, the problems are: relocation of binary/library path (this is solved by adding -R to LDSHARED depending on platform) SOVERSION - some systems like it, some do not. If you do SOVERSION, you must create a link to the proper version in the installation phase. IMO we can just avoid versioning at all and let the distribution builders do it themselves. The other way is to attach full version of python as SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1). I'm the author of the patch (ppython.diff). I'm not the author of the file dynamic.diff, I have included it here by accident and if it is possible to delete it from this page, it should be done. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 15:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 17:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 15:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 12:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 11:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 11:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 18:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Tue Mar 19 17:14:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 09:14:01 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 16:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 17:14 Message: Logged In: YES user_id=10327 This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes. That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-) This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python. :-( ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 17:05 Message: Logged In: YES user_id=21627 The CVS version will usually use a completely different library name (e.g. libpython23.so), so there will be no conflicts with prior versions. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 15:13 Message: Logged In: YES user_id=10327 A SOVERSION of 0.0 makes perfect sense for the CVS head. Release versions should probably use 1.0. I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0. That way, installing a newer CVS version won't instantly beak everything people have built with it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 14:35 Message: Logged In: YES user_id=21627 The patch looks quite good. There are a number of remaining issues that need to be resolved, though: - please regenerate the patch against the current CVS. As is, it fails to apply; parts of it are already in the CVS (the thr_create changes) - I think the SOVERSION should be 1.0, atleast initially: for most Python releases, there will be only a single release of the shared library, which should be named 1.0. - Why do you think that no rpath is needed on Linux? It is not needed if prefix is /usr, and on many installations, it is also not needed if prefix is /usr/local. For all other configurations, you still need a rpath on Linux. - IMO, there could be a default case, assuming SysV-ish configurations. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-18 15:01 Message: Logged In: YES user_id=88611 As far as I can see, the problems are: relocation of binary/library path (this is solved by adding -R to LDSHARED depending on platform) SOVERSION - some systems like it, some do not. If you do SOVERSION, you must create a link to the proper version in the installation phase. IMO we can just avoid versioning at all and let the distribution builders do it themselves. The other way is to attach full version of python as SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1). I'm the author of the patch (ppython.diff). I'm not the author of the file dynamic.diff, I have included it here by accident and if it is possible to delete it from this page, it should be done. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 16:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 14:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 16:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 14:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 11:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 10:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 10:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 18:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 17:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Tue Mar 19 19:47:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 11:47:14 -0800 Subject: [Patches] [ python-Patches-517256 ] poor performance in xmlrpc response Message-ID: Patches item #517256, was opened at 2002-02-13 15:48 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 Category: Library (Lib) Group: Python 2.1.2 Status: Open Resolution: Accepted Priority: 5 Submitted By: James Rucker (jamesrucker) Assigned to: Fredrik Lundh (effbot) Summary: poor performance in xmlrpc response Initial Comment: xmlrpclib.Transport.parse_response() (called from xmlrpclib.Transport.request()) is exhibiting poor performance - approx. 10x slower than expected. I investigated based on using a simple app that sent a msg to a server, where all the server did was return the message back to the caller. From profiling, it became clear that the return trip was taken 10x the time consumed by the client->server trip, and that the time was spent getting things across the wire. parse_response() reads from a file object created via socket.makefile(), and as a result exhibits performance that is about an order of magnitude worse than what it would be if socket.recv() were used on the socket. The patch provided uses socket.recv() when possible, to improve performance. The patch provided is against revision 1.15. Its use provides performance for the return trip that is more or less equivalent to that of the forward trip. ---------------------------------------------------------------------- >Comment By: James Rucker (jamesrucker) Date: 2002-03-19 11:47 Message: Logged In: YES user_id=351540 HTTPConnection.getresponse() will close the socket and set self.sock to null after instantiating response_class (by default, this is HTTPResponse; note that HTTPResponse does a makefile() and stores the result in self.fp) iff the newly created response class instance's 'will_close' attribute is true. My server is setting the Keep-alive header with a value of 1 (it is based on xmlrpcserver.py), which causes will_close to evaluate to false. In your case, I'm presuming that will_close is being evaluated as false and thus the socket (accessed via h._conn.sock) has been set to . Note that when I removed the Keep-alive header, I witness the behaviour you're seeing. Thus, it seems that as it stands, the beneift of the change will only be realized if Keep-alive is set or HTTP/1.1 is used (and Keep-alive is either not specified or is set to non-zero). The following from httplib.py shows and explains how 'will_close' will be set (from httplib.py): conn = self.msg.getheader('connection') if conn: conn = conn.lower() # a "Connection: close" will always close the connection. if we # don't see that and this is not HTTP/1.1, then the connection will # close unless we see a Keep-Alive header. self.will_close = conn.find('close') != -1 or \ ( self.version != 11 and \ not self.msg.getheader('keep-alive') ) else: # for HTTP/1.1, the connection will always remain open # otherwise, it will remain open IFF we see a Keep-Alive header self.will_close = self.version != 11 and \ not self.msg.getheader('keep-alive') ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-19 01:24 Message: Logged In: YES user_id=38376 What server did you use? In all my test setups, h._conn.sock is None at the time parse_response is called... ---------------------------------------------------------------------- Comment By: James Rucker (jamesrucker) Date: 2002-03-17 08:13 Message: Logged In: YES user_id=351540 The problem was discovered under FreeBSD 4.4. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-17 05:30 Message: Logged In: YES user_id=38376 James, what platform(s) did you use? I'm not sure changing the parse_response() interface is a good idea, but if this is a Windows-only problem, there may be a slightly cleaner way to get the same end result. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 08:14 Message: Logged In: YES user_id=6380 My guess makefile() isn't buffering properly. This has been a long-standing problem on Windows; I'm not sure if it's an issue on Unix. ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2002-03-01 06:34 Message: Logged In: YES user_id=38376 looks fine to me. I'll merge it with SLAB changes, and will check it into the 2.3 codebase asap. (we probably should try to figure out why makefile causes a 10x slowdown too -- xmlrpclib isn't exactly the only client library reading from a buffered socket) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-02-28 15:23 Message: Logged In: YES user_id=6380 Fredrik, does this look OK to you? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=517256&group_id=5470 From noreply@sourceforge.net Tue Mar 19 22:28:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 14:28:46 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 14:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 07:28:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 23:28:57 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 23:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 07:35:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 23:35:37 -0800 Subject: [Patches] [ python-Patches-531901 ] binary packagers Message-ID: Patches item #531901, was opened at 2002-03-19 16:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Alexander (mwa) Assigned to: M.-A. Lemburg (lemburg) Summary: binary packagers Initial Comment: zip file with updated Solaris and HP-UX packagers. Replaces 415226, 415227, 415228. Changes made to take advantage of new PEP241 changes in the Distribution class. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:35 Message: Logged In: YES user_id=21627 Which of the three attached files is the right one (19633, 19634, or 19635)? Unless they are all needed, we should delete the extra copies. I recommend to apply PEP 2 to this patch: A library PEP is needed (which could be quite short), documentation, perhaps test cases. Most importantly, there must be an identified maintainer of these modules. Are you willing to act as the maintainer? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 From noreply@sourceforge.net Wed Mar 20 07:41:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 19 Mar 2002 23:41:33 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:41 Message: Logged In: YES user_id=21627 The API version is maintained in modsupport.h:API_VERSION. I'm personally not concerned about breakage of API during the development of a new release. Absolutely no breakage should occur in maintenance releases. After all, a maintenance will replace pythonxy.dll on Windows with no protection against API breakage, thus, it is a bug if the API changes in a maintenace release. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 18:14 Message: Logged In: YES user_id=10327 This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes. That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-) This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python. :-( ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 18:05 Message: Logged In: YES user_id=21627 The CVS version will usually use a completely different library name (e.g. libpython23.so), so there will be no conflicts with prior versions. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 16:13 Message: Logged In: YES user_id=10327 A SOVERSION of 0.0 makes perfect sense for the CVS head. Release versions should probably use 1.0. I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0. That way, installing a newer CVS version won't instantly beak everything people have built with it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 15:35 Message: Logged In: YES user_id=21627 The patch looks quite good. There are a number of remaining issues that need to be resolved, though: - please regenerate the patch against the current CVS. As is, it fails to apply; parts of it are already in the CVS (the thr_create changes) - I think the SOVERSION should be 1.0, atleast initially: for most Python releases, there will be only a single release of the shared library, which should be named 1.0. - Why do you think that no rpath is needed on Linux? It is not needed if prefix is /usr, and on many installations, it is also not needed if prefix is /usr/local. For all other configurations, you still need a rpath on Linux. - IMO, there could be a default case, assuming SysV-ish configurations. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-18 16:01 Message: Logged In: YES user_id=88611 As far as I can see, the problems are: relocation of binary/library path (this is solved by adding -R to LDSHARED depending on platform) SOVERSION - some systems like it, some do not. If you do SOVERSION, you must create a link to the proper version in the installation phase. IMO we can just avoid versioning at all and let the distribution builders do it themselves. The other way is to attach full version of python as SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1). I'm the author of the patch (ppython.diff). I'm not the author of the file dynamic.diff, I have included it here by accident and if it is possible to delete it from this page, it should be done. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 15:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 17:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 15:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 12:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 11:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 11:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 18:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Wed Mar 20 14:49:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 06:49:08 -0800 Subject: [Patches] [ python-Patches-531901 ] binary packagers Message-ID: Patches item #531901, was opened at 2002-03-19 15:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Alexander (mwa) Assigned to: M.-A. Lemburg (lemburg) Summary: binary packagers Initial Comment: zip file with updated Solaris and HP-UX packagers. Replaces 415226, 415227, 415228. Changes made to take advantage of new PEP241 changes in the Distribution class. ---------------------------------------------------------------------- >Comment By: Mark Alexander (mwa) Date: 2002-03-20 14:49 Message: Logged In: YES user_id=12810 Any of the three (they're all the same). SourceForge hiccuped during the upload, and I don't have permission to delete the duplicates. I don't exactly understand what you mean by applying PEP 2. I uploaded this per Marc Lemburg's request for the latest versions of patches 41522[6-8]. He's acting as as the integrator in this case (see http://mail.python.org/pipermail/distutils-sig/2001-December/002659.html). I let him know about the duplicate uploads, so hopefully he'll correct it. If you can and want, feel free to delete the 2 of your choice. I agree they need to be documented. As soon as I can, I'll submit changes to the Distutils documentation. Finally, yes, I'll act as maintainer. I'm on the Distutils-sig and as soon as some other poor soul who has to deal with Solaris or HP-UX tries them, I'm there to work out issues. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 07:35 Message: Logged In: YES user_id=21627 Which of the three attached files is the right one (19633, 19634, or 19635)? Unless they are all needed, we should delete the extra copies. I recommend to apply PEP 2 to this patch: A library PEP is needed (which could be quite short), documentation, perhaps test cases. Most importantly, there must be an identified maintainer of these modules. Are you willing to act as the maintainer? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 From noreply@sourceforge.net Wed Mar 20 15:02:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 07:02:12 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 17:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 02:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 15:35:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 07:35:35 -0800 Subject: [Patches] [ python-Patches-531901 ] binary packagers Message-ID: Patches item #531901, was opened at 2002-03-19 16:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Alexander (mwa) Assigned to: M.-A. Lemburg (lemburg) Summary: binary packagers Initial Comment: zip file with updated Solaris and HP-UX packagers. Replaces 415226, 415227, 415228. Changes made to take advantage of new PEP241 changes in the Distribution class. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 16:35 Message: Logged In: YES user_id=21627 You volunteering as the maintainer is part of the prerequisites of accepting new modules, when following PEP 2, see http://python.sourceforge.net/peps/pep-0002.html It says: "developers ... will first form a group of maintainers. Then, this group shall produce a PEP called a library PEP." So existance of a PEP describing these library extensions would be a prerequisite for accepting them. If MAL wants to waive this requirement, it would be fine with me. However, such a PEP could also share text with the documentation, so it might not be wasted effort. ---------------------------------------------------------------------- Comment By: Mark Alexander (mwa) Date: 2002-03-20 15:49 Message: Logged In: YES user_id=12810 Any of the three (they're all the same). SourceForge hiccuped during the upload, and I don't have permission to delete the duplicates. I don't exactly understand what you mean by applying PEP 2. I uploaded this per Marc Lemburg's request for the latest versions of patches 41522[6-8]. He's acting as as the integrator in this case (see http://mail.python.org/pipermail/distutils-sig/2001-December/002659.html). I let him know about the duplicate uploads, so hopefully he'll correct it. If you can and want, feel free to delete the 2 of your choice. I agree they need to be documented. As soon as I can, I'll submit changes to the Distutils documentation. Finally, yes, I'll act as maintainer. I'm on the Distutils-sig and as soon as some other poor soul who has to deal with Solaris or HP-UX tries them, I'm there to work out issues. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:35 Message: Logged In: YES user_id=21627 Which of the three attached files is the right one (19633, 19634, or 19635)? Unless they are all needed, we should delete the extra copies. I recommend to apply PEP 2 to this patch: A library PEP is needed (which could be quite short), documentation, perhaps test cases. Most importantly, there must be an identified maintainer of these modules. Are you willing to act as the maintainer? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 From noreply@sourceforge.net Wed Mar 20 16:03:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 08:03:02 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 23:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 17:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 16:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 16:23:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 08:23:42 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 17:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 11:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 02:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 17:26:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 09:26:04 -0800 Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance Message-ID: Patches item #504943, was opened at 2002-01-17 17:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Nobody/Anonymous (nobody) Summary: call warnings.warn with Warning instance Initial Comment: This patch makes it possible to pass Warning instances as the first argument to warnings.warn. In this case the category argument will be ignored. The message text used will be str(warninginstance). This makes it possible to implement special logic in a custom Warning class by implemening the __str__ method. ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-20 18:26 Message: Logged In: YES user_id=89016 Now that I have write access can I check this in? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-18 16:46 Message: Logged In: YES user_id=89016 The new version includes a patch to the documentation and an entry in Misc/NEWS ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-18 14:45 Message: Logged In: YES user_id=6380 Nice idea. Where's the documentation patch? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 From noreply@sourceforge.net Wed Mar 20 17:31:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 09:31:34 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 14:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 09:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 08:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 07:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 23:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 17:53:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 09:53:43 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 17:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 12:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 11:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 02:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 18:08:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 10:08:03 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 17:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 13:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 12:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 11:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 02:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 18:21:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 10:21:39 -0800 Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance Message-ID: Patches item #504943, was opened at 2002-01-17 11:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 Category: Library (Lib) >Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Walter Dцrwald (doerwalter) >Assigned to: Walter Dцrwald (doerwalter) Summary: call warnings.warn with Warning instance Initial Comment: This patch makes it possible to pass Warning instances as the first argument to warnings.warn. In this case the category argument will be ignored. The message text used will be str(warninginstance). This makes it possible to implement special logic in a custom Warning class by implemening the __str__ method. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-20 13:21 Message: Logged In: YES user_id=6380 Looks OK. Give it a try. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-20 12:26 Message: Logged In: YES user_id=89016 Now that I have write access can I check this in? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-18 10:46 Message: Logged In: YES user_id=89016 The new version includes a patch to the documentation and an entry in Misc/NEWS ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-18 08:45 Message: Logged In: YES user_id=6380 Nice idea. Where's the documentation patch? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 From noreply@sourceforge.net Wed Mar 20 18:42:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 10:42:49 -0800 Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting Message-ID: Patches item #532638, was opened at 2002-03-20 12:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Better AttributeError formatting Initial Comment: A user in c.l.py was confused when import m m.a reported AttributeError: 'module' object has no attribute 'a' The attached patch displays the object's name in the error message if it has a __name__ attribute. This is a bit tricky because of the recursive nature of looking up an attribute during a getattr operation. My solution was to pull the error formatting code into a separate static routine (the same basic thing happens in three places) and define a static variable there that breaks any recursion. While this might not be thread-safe, I think it's okay in this situation. The worst that should happen is you get either an extra round of recursion while looking up a non-existent __name__ ttribute or fail to even check for __name__ and use the default formatting when the object actually has a __name__ attribute. This can only happen if you have two threads who both get attribute errors at the same time, and then only if the process of looking things up takes you back into Python code. Perhaps a similar technique can be provided for other error formatting operations in object.c. Example for objects with and without __name__ attributes: >>> "".foo Traceback (most recent call last): File "", line 1, in ? AttributeError: str object has no attribute 'foo' >>> import string >>> string.foo Traceback (most recent call last): File "", line 1, in ? AttributeError: module object 'string' has no attribute 'foo' Skip ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 From noreply@sourceforge.net Wed Mar 20 18:56:04 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 10:56:04 -0800 Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting Message-ID: Patches item #532638, was opened at 2002-03-20 13:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Better AttributeError formatting Initial Comment: A user in c.l.py was confused when import m m.a reported AttributeError: 'module' object has no attribute 'a' The attached patch displays the object's name in the error message if it has a __name__ attribute. This is a bit tricky because of the recursive nature of looking up an attribute during a getattr operation. My solution was to pull the error formatting code into a separate static routine (the same basic thing happens in three places) and define a static variable there that breaks any recursion. While this might not be thread-safe, I think it's okay in this situation. The worst that should happen is you get either an extra round of recursion while looking up a non-existent __name__ ttribute or fail to even check for __name__ and use the default formatting when the object actually has a __name__ attribute. This can only happen if you have two threads who both get attribute errors at the same time, and then only if the process of looking things up takes you back into Python code. Perhaps a similar technique can be provided for other error formatting operations in object.c. Example for objects with and without __name__ attributes: >>> "".foo Traceback (most recent call last): File "", line 1, in ? AttributeError: str object has no attribute 'foo' >>> import string >>> string.foo Traceback (most recent call last): File "", line 1, in ? AttributeError: module object 'string' has no attribute 'foo' Skip ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 13:56 Message: Logged In: YES user_id=31435 I'm -1 on this because of the expense: many apps routinely provoke AttributeErrors that are deliberately ignored. All the time that goes into making nice messages is wasted then. A "lazy" exception object that produced a string only when actually needed would be fine (although perhaps an object may manage to change its computed __name__ by then!). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 From noreply@sourceforge.net Wed Mar 20 18:57:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 10:57:09 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 14:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 10:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 09:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 09:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 08:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 07:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 23:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 19:04:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 11:04:15 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 17:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 14:04 Message: Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format"). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 13:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 13:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 12:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 11:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 02:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 19:32:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 11:32:28 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 14:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 11:32 Message: Logged In: YES user_id=108973 I think that we should be flexible about the data that we accept but rigorous about the data that we generate. So the sign should always be send but not required. "decimal format" appears in the Python documentation (http://www.python.org/doc/current/lib/typesseq- strings.html) so it is probably a documentation bug if the meaning is not widely known. I parsed it as "not exponential format". My question was whether the %f Python format specifier simply mapped to the C %f format specifier. But, based on the output of a simple C program, that does not appear to be the case. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:04 Message: Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format"). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 10:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 09:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 09:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 08:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 07:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 23:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 19:55:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 11:55:03 -0800 Subject: [Patches] [ python-Patches-531901 ] binary packagers Message-ID: Patches item #531901, was opened at 2002-03-19 15:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 Category: Distutils and setup.py Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Alexander (mwa) Assigned to: M.-A. Lemburg (lemburg) Summary: binary packagers Initial Comment: zip file with updated Solaris and HP-UX packagers. Replaces 415226, 415227, 415228. Changes made to take advantage of new PEP241 changes in the Distribution class. ---------------------------------------------------------------------- >Comment By: Mark Alexander (mwa) Date: 2002-03-20 19:55 Message: Logged In: YES user_id=12810 OK, the PEP seems to me to mean most of this is done. These additions are not library modules, they are Distutils "commands". So the way i read it, the Distutils-SIG (where I've been hanging around for some time) are the Maintainers. The documentation will be 2 new chapters for the Distutils manual "Creating Solaris packages" and "Creating HP-UX packages" each looking a whole lot like "Creating RPM packages". Does that clarify anything, or am I still missing a clue? p.s. Thanks for cleaning up the extra uploads! ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 15:35 Message: Logged In: YES user_id=21627 You volunteering as the maintainer is part of the prerequisites of accepting new modules, when following PEP 2, see http://python.sourceforge.net/peps/pep-0002.html It says: "developers ... will first form a group of maintainers. Then, this group shall produce a PEP called a library PEP." So existance of a PEP describing these library extensions would be a prerequisite for accepting them. If MAL wants to waive this requirement, it would be fine with me. However, such a PEP could also share text with the documentation, so it might not be wasted effort. ---------------------------------------------------------------------- Comment By: Mark Alexander (mwa) Date: 2002-03-20 14:49 Message: Logged In: YES user_id=12810 Any of the three (they're all the same). SourceForge hiccuped during the upload, and I don't have permission to delete the duplicates. I don't exactly understand what you mean by applying PEP 2. I uploaded this per Marc Lemburg's request for the latest versions of patches 41522[6-8]. He's acting as as the integrator in this case (see http://mail.python.org/pipermail/distutils-sig/2001-December/002659.html). I let him know about the duplicate uploads, so hopefully he'll correct it. If you can and want, feel free to delete the 2 of your choice. I agree they need to be documented. As soon as I can, I'll submit changes to the Distutils documentation. Finally, yes, I'll act as maintainer. I'm on the Distutils-sig and as soon as some other poor soul who has to deal with Solaris or HP-UX tries them, I'm there to work out issues. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 07:35 Message: Logged In: YES user_id=21627 Which of the three attached files is the right one (19633, 19634, or 19635)? Unless they are all needed, we should delete the extra copies. I recommend to apply PEP 2 to this patch: A library PEP is needed (which could be quite short), documentation, perhaps test cases. Most importantly, there must be an identified maintainer of these modules. Are you willing to act as the maintainer? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=531901&group_id=5470 From noreply@sourceforge.net Wed Mar 20 20:13:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 12:13:48 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 17:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 15:13 Message: Logged In: YES user_id=31435 If you think XML-RPC users are keen to see multi-hundred character strings produced for ordinary doubles, Python isn't going to be much help (you'll have to write your own float -> string conversion); or if you think they're happy to get an exception if they want to pass (e.g.) 1e20, you can keep using repr() and complain because repr(1e20) produces an exponent. "decimal format" is simply two extremely common words pasted together <+.9 wink>. I expect the Python docs here ended up so vague because whoever wrote this part of the docs didn't know the full story and didn't have time to figure it out. But I expect the same is true of the part of this spec dealing with doubles (it doesn't define what it means by "double-precision", and then goes on to say stuff that doesn't make sense for what C or Java mean by double, or by what IEEE-754 means by double precision -- it's off in its own world, so if you take it at face value you'll have to guess what the world is, and implement it yourself). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 14:32 Message: Logged In: YES user_id=108973 I think that we should be flexible about the data that we accept but rigorous about the data that we generate. So the sign should always be send but not required. "decimal format" appears in the Python documentation (http://www.python.org/doc/current/lib/typesseq- strings.html) so it is probably a documentation bug if the meaning is not widely known. I parsed it as "not exponential format". My question was whether the %f Python format specifier simply mapped to the C %f format specifier. But, based on the output of a simple C program, that does not appear to be the case. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 14:04 Message: Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format"). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 13:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 13:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 12:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 11:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 02:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 20:48:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 12:48:23 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 14:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Nobody/Anonymous (nobody) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 12:48 Message: Logged In: YES user_id=108973 Ooops, I already wrote the converter (see new patch). I'm not very concerned about sending 300 character strings for large doubles, but I guess someone might be. I am concerned about how large and ugly the code is. XML-RPC is very poorly specified but the grammar for doubles seems reasonably clear (silly, but clear). If you don't like my double marshalling code, you could please just checkin your infinity/NaN detection code (also part of my patch)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:13 Message: Logged In: YES user_id=31435 If you think XML-RPC users are keen to see multi-hundred character strings produced for ordinary doubles, Python isn't going to be much help (you'll have to write your own float -> string conversion); or if you think they're happy to get an exception if they want to pass (e.g.) 1e20, you can keep using repr() and complain because repr(1e20) produces an exponent. "decimal format" is simply two extremely common words pasted together <+.9 wink>. I expect the Python docs here ended up so vague because whoever wrote this part of the docs didn't know the full story and didn't have time to figure it out. But I expect the same is true of the part of this spec dealing with doubles (it doesn't define what it means by "double-precision", and then goes on to say stuff that doesn't make sense for what C or Java mean by double, or by what IEEE-754 means by double precision -- it's off in its own world, so if you take it at face value you'll have to guess what the world is, and implement it yourself). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 11:32 Message: Logged In: YES user_id=108973 I think that we should be flexible about the data that we accept but rigorous about the data that we generate. So the sign should always be send but not required. "decimal format" appears in the Python documentation (http://www.python.org/doc/current/lib/typesseq- strings.html) so it is probably a documentation bug if the meaning is not widely known. I parsed it as "not exponential format". My question was whether the %f Python format specifier simply mapped to the C %f format specifier. But, based on the output of a simple C program, that does not appear to be the case. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:04 Message: Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format"). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 10:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 09:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 09:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 08:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 07:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 23:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 21:07:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 13:07:29 -0800 Subject: [Patches] [ python-Patches-532729 ] build (link) fails on Solaris 8-sem_init Message-ID: Patches item #532729, was opened at 2002-03-20 16:07 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: build (link) fails on Solaris 8-sem_init Initial Comment: The build fails on Solaris 8 because sem_init() is in -lrt. Attached is a patch which works. Actually, there will be 3 patches. 1 to configure.in, 1 to configure which has many changes (my autoconf must be different than whoever generates configure normally) and a minimal configure diff. Probably would be best to have the correct person generate a new configure. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470 From noreply@sourceforge.net Wed Mar 20 21:50:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 13:50:58 -0800 Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting Message-ID: Patches item #532638, was opened at 2002-03-20 12:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Better AttributeError formatting Initial Comment: A user in c.l.py was confused when import m m.a reported AttributeError: 'module' object has no attribute 'a' The attached patch displays the object's name in the error message if it has a __name__ attribute. This is a bit tricky because of the recursive nature of looking up an attribute during a getattr operation. My solution was to pull the error formatting code into a separate static routine (the same basic thing happens in three places) and define a static variable there that breaks any recursion. While this might not be thread-safe, I think it's okay in this situation. The worst that should happen is you get either an extra round of recursion while looking up a non-existent __name__ ttribute or fail to even check for __name__ and use the default formatting when the object actually has a __name__ attribute. This can only happen if you have two threads who both get attribute errors at the same time, and then only if the process of looking things up takes you back into Python code. Perhaps a similar technique can be provided for other error formatting operations in object.c. Example for objects with and without __name__ attributes: >>> "".foo Traceback (most recent call last): File "", line 1, in ? AttributeError: str object has no attribute 'foo' >>> import string >>> string.foo Traceback (most recent call last): File "", line 1, in ? AttributeError: module object 'string' has no attribute 'foo' Skip ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 15:50 Message: Logged In: YES user_id=44345 hmmm... How much would I have to modify it to get you to change your mind? I'm pretty sure I can get rid of the call to PyObject_HasAttrString without a lot of effort. I can't do much about avoiding at least one PyObject_GetAttrString call though, which obviously means you could wind up back in bytecode. I jumped on this after seeing the request in c.l.py mostly because I've wanted it from time-to-time as well. The extra information is useful at times. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:56 Message: Logged In: YES user_id=31435 I'm -1 on this because of the expense: many apps routinely provoke AttributeErrors that are deliberately ignored. All the time that goes into making nice messages is wasted then. A "lazy" exception object that produced a string only when actually needed would be fine (although perhaps an object may manage to change its computed __name__ by then!). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 From noreply@sourceforge.net Wed Mar 20 22:53:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 14:53:55 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 17:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) >Assigned to: Fredrik Lundh (effbot) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 17:53 Message: Logged In: YES user_id=31435 I don't use XML-RPC, so I'm assigning this to /F (it was his code at the start, and he wants to keep it in synch with his company's version). Formatting floats is a difficult job if you pay attention to accuracy. The original code had the property that converting a Python float to an XML-RPC string, then back to a float again, reproduced the original input exactly. The code in the patch enjoys that property only by accident; much of the time a roundtrip conversion using it won't reproduce the number that was passed in. Is that OK? There's no way to tell, since the XML-RPC spec has scant idea what it's doing here, so leaves important questions unanswered. OTOH, it seems to me that the *point* of this porotocol is to transport values across boxes, so of course it should move heaven and earth to transport them faithfully. Is it OK that it loses accuracy? Is it OK that it produces 16 trailing zeroes for 1e-250? Is it OK that it raises OverflowError for the normal double 1e-300? No matter what's asked, the spec has no answers. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 15:48 Message: Logged In: YES user_id=108973 Ooops, I already wrote the converter (see new patch). I'm not very concerned about sending 300 character strings for large doubles, but I guess someone might be. I am concerned about how large and ugly the code is. XML-RPC is very poorly specified but the grammar for doubles seems reasonably clear (silly, but clear). If you don't like my double marshalling code, you could please just checkin your infinity/NaN detection code (also part of my patch)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 15:13 Message: Logged In: YES user_id=31435 If you think XML-RPC users are keen to see multi-hundred character strings produced for ordinary doubles, Python isn't going to be much help (you'll have to write your own float -> string conversion); or if you think they're happy to get an exception if they want to pass (e.g.) 1e20, you can keep using repr() and complain because repr(1e20) produces an exponent. "decimal format" is simply two extremely common words pasted together <+.9 wink>. I expect the Python docs here ended up so vague because whoever wrote this part of the docs didn't know the full story and didn't have time to figure it out. But I expect the same is true of the part of this spec dealing with doubles (it doesn't define what it means by "double-precision", and then goes on to say stuff that doesn't make sense for what C or Java mean by double, or by what IEEE-754 means by double precision -- it's off in its own world, so if you take it at face value you'll have to guess what the world is, and implement it yourself). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 14:32 Message: Logged In: YES user_id=108973 I think that we should be flexible about the data that we accept but rigorous about the data that we generate. So the sign should always be send but not required. "decimal format" appears in the Python documentation (http://www.python.org/doc/current/lib/typesseq- strings.html) so it is probably a documentation bug if the meaning is not widely known. I parsed it as "not exponential format". My question was whether the %f Python format specifier simply mapped to the C %f format specifier. But, based on the output of a simple C program, that does not appear to be the case. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 14:04 Message: Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format"). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 13:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 13:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 12:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 11:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 02:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 23:09:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 15:09:36 -0800 Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting Message-ID: Patches item #532638, was opened at 2002-03-20 13:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Better AttributeError formatting Initial Comment: A user in c.l.py was confused when import m m.a reported AttributeError: 'module' object has no attribute 'a' The attached patch displays the object's name in the error message if it has a __name__ attribute. This is a bit tricky because of the recursive nature of looking up an attribute during a getattr operation. My solution was to pull the error formatting code into a separate static routine (the same basic thing happens in three places) and define a static variable there that breaks any recursion. While this might not be thread-safe, I think it's okay in this situation. The worst that should happen is you get either an extra round of recursion while looking up a non-existent __name__ ttribute or fail to even check for __name__ and use the default formatting when the object actually has a __name__ attribute. This can only happen if you have two threads who both get attribute errors at the same time, and then only if the process of looking things up takes you back into Python code. Perhaps a similar technique can be provided for other error formatting operations in object.c. Example for objects with and without __name__ attributes: >>> "".foo Traceback (most recent call last): File "", line 1, in ? AttributeError: str object has no attribute 'foo' >>> import string >>> string.foo Traceback (most recent call last): File "", line 1, in ? AttributeError: module object 'string' has no attribute 'foo' Skip ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 18:09 Message: Logged In: YES user_id=31435 If it's one cycle slower than it is today when the exception is ignored, Zope will notice it (it uses hasattr for blood). Then Guido will get fired, have to pump gas in Amsterdam for a living, and we'll never hear from him again. How badly do you want to destroy Python ? It may be fruitful to hammer out an efficient alternative on PythonDev. It's not an argument about whether more info would be useful, although on c.l.py Dale seemed happy enough as soon as someone explained what 'module' was doing in his msg. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 16:50 Message: Logged In: YES user_id=44345 hmmm... How much would I have to modify it to get you to change your mind? I'm pretty sure I can get rid of the call to PyObject_HasAttrString without a lot of effort. I can't do much about avoiding at least one PyObject_GetAttrString call though, which obviously means you could wind up back in bytecode. I jumped on this after seeing the request in c.l.py mostly because I've wanted it from time-to-time as well. The extra information is useful at times. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 13:56 Message: Logged In: YES user_id=31435 I'm -1 on this because of the expense: many apps routinely provoke AttributeErrors that are deliberately ignored. All the time that goes into making nice messages is wasted then. A "lazy" exception object that produced a string only when actually needed would be fine (although perhaps an object may manage to change its computed __name__ by then!). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 From noreply@sourceforge.net Wed Mar 20 23:24:47 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 15:24:47 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 14:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Fredrik Lundh (effbot) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 15:24 Message: Logged In: YES user_id=108973 OK, this floating point stuff is over my head. Is it OK that it loses accuracy? - No Is it OK that it produces 16 trailing zeroes for 1e-250? - Yes Is it OK that it raises OverflowError for the normal double 1e-300? - No Would exposing and using the C %f specifier, along with repr, make for identical roundtrips? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 14:53 Message: Logged In: YES user_id=31435 I don't use XML-RPC, so I'm assigning this to /F (it was his code at the start, and he wants to keep it in synch with his company's version). Formatting floats is a difficult job if you pay attention to accuracy. The original code had the property that converting a Python float to an XML-RPC string, then back to a float again, reproduced the original input exactly. The code in the patch enjoys that property only by accident; much of the time a roundtrip conversion using it won't reproduce the number that was passed in. Is that OK? There's no way to tell, since the XML-RPC spec has scant idea what it's doing here, so leaves important questions unanswered. OTOH, it seems to me that the *point* of this porotocol is to transport values across boxes, so of course it should move heaven and earth to transport them faithfully. Is it OK that it loses accuracy? Is it OK that it produces 16 trailing zeroes for 1e-250? Is it OK that it raises OverflowError for the normal double 1e-300? No matter what's asked, the spec has no answers. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 12:48 Message: Logged In: YES user_id=108973 Ooops, I already wrote the converter (see new patch). I'm not very concerned about sending 300 character strings for large doubles, but I guess someone might be. I am concerned about how large and ugly the code is. XML-RPC is very poorly specified but the grammar for doubles seems reasonably clear (silly, but clear). If you don't like my double marshalling code, you could please just checkin your infinity/NaN detection code (also part of my patch)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:13 Message: Logged In: YES user_id=31435 If you think XML-RPC users are keen to see multi-hundred character strings produced for ordinary doubles, Python isn't going to be much help (you'll have to write your own float -> string conversion); or if you think they're happy to get an exception if they want to pass (e.g.) 1e20, you can keep using repr() and complain because repr(1e20) produces an exponent. "decimal format" is simply two extremely common words pasted together <+.9 wink>. I expect the Python docs here ended up so vague because whoever wrote this part of the docs didn't know the full story and didn't have time to figure it out. But I expect the same is true of the part of this spec dealing with doubles (it doesn't define what it means by "double-precision", and then goes on to say stuff that doesn't make sense for what C or Java mean by double, or by what IEEE-754 means by double precision -- it's off in its own world, so if you take it at face value you'll have to guess what the world is, and implement it yourself). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 11:32 Message: Logged In: YES user_id=108973 I think that we should be flexible about the data that we accept but rigorous about the data that we generate. So the sign should always be send but not required. "decimal format" appears in the Python documentation (http://www.python.org/doc/current/lib/typesseq- strings.html) so it is probably a documentation bug if the meaning is not widely known. I parsed it as "not exponential format". My question was whether the %f Python format specifier simply mapped to the C %f format specifier. But, based on the output of a simple C program, that does not appear to be the case. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:04 Message: Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format"). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 10:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 09:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 09:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 08:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 07:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 23:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Wed Mar 20 23:55:06 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 15:55:06 -0800 Subject: [Patches] [ python-Patches-532180 ] fix xmlrpclib float marshalling bug Message-ID: Patches item #532180, was opened at 2002-03-19 17:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Fredrik Lundh (effbot) Summary: fix xmlrpclib float marshalling bug Initial Comment: As it stands now, xmlrpclib can send doubles, such as 1.#INF, that are not part of the XML-RPC standard. This patch causes a ValueError to be raised instead. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 18:55 Message: Logged In: YES user_id=31435 Python's internal format buffers are too small to use C %f in its full generality, so you're suggesting something there that's much harder to get done than you suspect. Note that %f isn't a cureall anyway, as in either Python or C, e.g., '%f' % 1e-10 throws away all information, producing a string of zeroes. What you did is usually much better than that. Let's wait to hear what /F wants to do. If he's inclined to take this part of the spec at face value, I can work with him to write a "conforming" float->string that's numerically sound. Else it's a lot of tedious work for no reason. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 18:24 Message: Logged In: YES user_id=108973 OK, this floating point stuff is over my head. Is it OK that it loses accuracy? - No Is it OK that it produces 16 trailing zeroes for 1e-250? - Yes Is it OK that it raises OverflowError for the normal double 1e-300? - No Would exposing and using the C %f specifier, along with repr, make for identical roundtrips? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 17:53 Message: Logged In: YES user_id=31435 I don't use XML-RPC, so I'm assigning this to /F (it was his code at the start, and he wants to keep it in synch with his company's version). Formatting floats is a difficult job if you pay attention to accuracy. The original code had the property that converting a Python float to an XML-RPC string, then back to a float again, reproduced the original input exactly. The code in the patch enjoys that property only by accident; much of the time a roundtrip conversion using it won't reproduce the number that was passed in. Is that OK? There's no way to tell, since the XML-RPC spec has scant idea what it's doing here, so leaves important questions unanswered. OTOH, it seems to me that the *point* of this porotocol is to transport values across boxes, so of course it should move heaven and earth to transport them faithfully. Is it OK that it loses accuracy? Is it OK that it produces 16 trailing zeroes for 1e-250? Is it OK that it raises OverflowError for the normal double 1e-300? No matter what's asked, the spec has no answers. ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 15:48 Message: Logged In: YES user_id=108973 Ooops, I already wrote the converter (see new patch). I'm not very concerned about sending 300 character strings for large doubles, but I guess someone might be. I am concerned about how large and ugly the code is. XML-RPC is very poorly specified but the grammar for doubles seems reasonably clear (silly, but clear). If you don't like my double marshalling code, you could please just checkin your infinity/NaN detection code (also part of my patch)? ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 15:13 Message: Logged In: YES user_id=31435 If you think XML-RPC users are keen to see multi-hundred character strings produced for ordinary doubles, Python isn't going to be much help (you'll have to write your own float -> string conversion); or if you think they're happy to get an exception if they want to pass (e.g.) 1e20, you can keep using repr() and complain because repr(1e20) produces an exponent. "decimal format" is simply two extremely common words pasted together <+.9 wink>. I expect the Python docs here ended up so vague because whoever wrote this part of the docs didn't know the full story and didn't have time to figure it out. But I expect the same is true of the part of this spec dealing with doubles (it doesn't define what it means by "double-precision", and then goes on to say stuff that doesn't make sense for what C or Java mean by double, or by what IEEE-754 means by double precision -- it's off in its own world, so if you take it at face value you'll have to guess what the world is, and implement it yourself). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 14:32 Message: Logged In: YES user_id=108973 I think that we should be flexible about the data that we accept but rigorous about the data that we generate. So the sign should always be send but not required. "decimal format" appears in the Python documentation (http://www.python.org/doc/current/lib/typesseq- strings.html) so it is probably a documentation bug if the meaning is not widely known. I parsed it as "not exponential format". My question was whether the %f Python format specifier simply mapped to the C %f format specifier. But, based on the output of a simple C program, that does not appear to be the case. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 14:04 Message: Logged In: YES user_id=31435 Well, Brian, the spec clearly disallows 1.0 too -- if you want to take that spec seriously, you can implement what it says and we'll redirect the complaints to your personal email account . I can't parse your question about the C library (like, I don't know what you mean by "decimal format"). ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 13:57 Message: Logged In: YES user_id=108973 Whether it was intended or not, the spec clearly disallows it. I noticed the %f behavior too, which is interesting because the Python docs say: f Floating point decimal format I wonder if it is the underlying C library refusing to write large float values in decimal format. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 13:08 Message: Logged In: YES user_id=31435 Ack, I take part of that back: it's Python's implementation of '%f' that can produce exponent notation. There's no simple way to get the effect of C's %f from Python. It's clear as mud whether "the spec" *intended* to outlaw exponent notation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:53 Message: Logged In: YES user_id=31435 "%f" can produce exponent notation too, which is also not allowed by this pseudo-spec. r = repr(some_double) if 'n' in r or 'N' in r: raise ValueError(...) is robust, will work fine x-platform, and isn't insane . ---------------------------------------------------------------------- Comment By: Brian Quinlan (bquinlan) Date: 2002-03-20 12:31 Message: Logged In: YES user_id=108973 Eric Kidd's XML-RPC C uses sprintf("%f") for marshalling and strtod for unmarshalling. Let me design a more robust patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 11:23 Message: Logged In: YES user_id=31435 The spec appears worse than useless to me here -- whoever wrote it just made stuff up. They don't appear to know anything about floats or about grammar specification. Do you really want to allow "+." and disallow "1.0"? This seems a case where the spec is so braindead that nobody (in their mind ) will implement it as given. What do other implementations do? ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 11:03 Message: Logged In: YES user_id=21627 You are right. An even better patch would check for compliance with the protocol. Currently, the xmlrpc spec says # There is no representation for infinity or negative # infinity or "not a number". At this time, only decimal # point notation is allowed, a plus or a minus, followed by # any number of numeric characters, followed by a period # and any number of numeric characters. Whitespace is not # allowed. The range of allowable values is # implementation-dependent, is not specified. That would be best validated with a regular expression. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 10:02 Message: Logged In: YES user_id=31435 Note that the patch only catches "the problem" on a platform whose C library can't read back its own float output. Windows is in that class, but many other platforms aren't. It would be better to see whether 'n' or 'N' appear in the repr() (that would catch variations of 'inf', 'INF', 'NaN' and 'IND', while no "normal" float contains n). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 02:28 Message: Logged In: YES user_id=21627 It seems repr of the float is computed twice in every case. I recommend to save the result of the first computation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532180&group_id=5470 From noreply@sourceforge.net Thu Mar 21 00:36:40 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 16:36:40 -0800 Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting Message-ID: Patches item #532638, was opened at 2002-03-20 18:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Better AttributeError formatting Initial Comment: A user in c.l.py was confused when import m m.a reported AttributeError: 'module' object has no attribute 'a' The attached patch displays the object's name in the error message if it has a __name__ attribute. This is a bit tricky because of the recursive nature of looking up an attribute during a getattr operation. My solution was to pull the error formatting code into a separate static routine (the same basic thing happens in three places) and define a static variable there that breaks any recursion. While this might not be thread-safe, I think it's okay in this situation. The worst that should happen is you get either an extra round of recursion while looking up a non-existent __name__ ttribute or fail to even check for __name__ and use the default formatting when the object actually has a __name__ attribute. This can only happen if you have two threads who both get attribute errors at the same time, and then only if the process of looking things up takes you back into Python code. Perhaps a similar technique can be provided for other error formatting operations in object.c. Example for objects with and without __name__ attributes: >>> "".foo Traceback (most recent call last): File "", line 1, in ? AttributeError: str object has no attribute 'foo' >>> import string >>> string.foo Traceback (most recent call last): File "", line 1, in ? AttributeError: module object 'string' has no attribute 'foo' Skip ---------------------------------------------------------------------- Comment By: Dale Strickland-Clark (dalesc) Date: 2002-03-21 00:36 Message: Logged In: YES user_id=457577 Surely Tim's is more an argument for fixing hasattr so it doesn't depend on an exception? To limit meaningful error messages because they slow normal program flow screams 'bad design' to me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 23:09 Message: Logged In: YES user_id=31435 If it's one cycle slower than it is today when the exception is ignored, Zope will notice it (it uses hasattr for blood). Then Guido will get fired, have to pump gas in Amsterdam for a living, and we'll never hear from him again. How badly do you want to destroy Python ? It may be fruitful to hammer out an efficient alternative on PythonDev. It's not an argument about whether more info would be useful, although on c.l.py Dale seemed happy enough as soon as someone explained what 'module' was doing in his msg. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 21:50 Message: Logged In: YES user_id=44345 hmmm... How much would I have to modify it to get you to change your mind? I'm pretty sure I can get rid of the call to PyObject_HasAttrString without a lot of effort. I can't do much about avoiding at least one PyObject_GetAttrString call though, which obviously means you could wind up back in bytecode. I jumped on this after seeing the request in c.l.py mostly because I've wanted it from time-to-time as well. The extra information is useful at times. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 18:56 Message: Logged In: YES user_id=31435 I'm -1 on this because of the expense: many apps routinely provoke AttributeErrors that are deliberately ignored. All the time that goes into making nice messages is wasted then. A "lazy" exception object that produced a string only when actually needed would be fine (although perhaps an object may manage to change its computed __name__ by then!). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 From noreply@sourceforge.net Thu Mar 21 01:50:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 17:50:28 -0800 Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting Message-ID: Patches item #532638, was opened at 2002-03-20 12:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Better AttributeError formatting Initial Comment: A user in c.l.py was confused when import m m.a reported AttributeError: 'module' object has no attribute 'a' The attached patch displays the object's name in the error message if it has a __name__ attribute. This is a bit tricky because of the recursive nature of looking up an attribute during a getattr operation. My solution was to pull the error formatting code into a separate static routine (the same basic thing happens in three places) and define a static variable there that breaks any recursion. While this might not be thread-safe, I think it's okay in this situation. The worst that should happen is you get either an extra round of recursion while looking up a non-existent __name__ ttribute or fail to even check for __name__ and use the default formatting when the object actually has a __name__ attribute. This can only happen if you have two threads who both get attribute errors at the same time, and then only if the process of looking things up takes you back into Python code. Perhaps a similar technique can be provided for other error formatting operations in object.c. Example for objects with and without __name__ attributes: >>> "".foo Traceback (most recent call last): File "", line 1, in ? AttributeError: str object has no attribute 'foo' >>> import string >>> string.foo Traceback (most recent call last): File "", line 1, in ? AttributeError: module object 'string' has no attribute 'foo' Skip ---------------------------------------------------------------------- >Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 19:50 Message: Logged In: YES user_id=44345 In theory. Python's getattr capability is so dynamic though I suspect there's little hasattr() can do but call getattr() and react to the result. ---------------------------------------------------------------------- Comment By: Dale Strickland-Clark (dalesc) Date: 2002-03-20 18:36 Message: Logged In: YES user_id=457577 Surely Tim's is more an argument for fixing hasattr so it doesn't depend on an exception? To limit meaningful error messages because they slow normal program flow screams 'bad design' to me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 17:09 Message: Logged In: YES user_id=31435 If it's one cycle slower than it is today when the exception is ignored, Zope will notice it (it uses hasattr for blood). Then Guido will get fired, have to pump gas in Amsterdam for a living, and we'll never hear from him again. How badly do you want to destroy Python ? It may be fruitful to hammer out an efficient alternative on PythonDev. It's not an argument about whether more info would be useful, although on c.l.py Dale seemed happy enough as soon as someone explained what 'module' was doing in his msg. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 15:50 Message: Logged In: YES user_id=44345 hmmm... How much would I have to modify it to get you to change your mind? I'm pretty sure I can get rid of the call to PyObject_HasAttrString without a lot of effort. I can't do much about avoiding at least one PyObject_GetAttrString call though, which obviously means you could wind up back in bytecode. I jumped on this after seeing the request in c.l.py mostly because I've wanted it from time-to-time as well. The extra information is useful at times. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 12:56 Message: Logged In: YES user_id=31435 I'm -1 on this because of the expense: many apps routinely provoke AttributeErrors that are deliberately ignored. All the time that goes into making nice messages is wasted then. A "lazy" exception object that produced a string only when actually needed would be fine (although perhaps an object may manage to change its computed __name__ by then!). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 From christine@ema.trafficmagnet.net Thu Mar 21 02:25:09 2002 From: christine@ema.trafficmagnet.net (Christine Hall) Date: Thu, 21 Mar 2002 10:25:09 +0800 (CST) Subject: [Patches] http://www.pythonlabs.com Message-ID: <27CR1000148026@emaserver.trafficmagnet.net> --1197004328.1016677509125.JavaMail.SYSTEM.emaserver Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit ReplyTo:"Christine Hall" Hi! Did you know that 85% of your potential customers will be using search engines to find what they are looking for on the Internet? Have you ever thought about getting your website listed on search engines worldwide? TrafficMagnet offers a unique technology that will submit your website to over 300,000 search engines and directories every month. We can help your customers find you! Submit your website to more than 300,000 search engines and directories http://www.pythonlabs.com Normal Price: $14.95 per month Special Price: $9.95 per month You Save: more than 30% off Sign up today at http://www.trafficmagnet.net Benefit now! It's easy, it's affordable and you can sign up online. I look forward to hearing from you. Best Regards, Christine Hall Sales and Marketing E-mail: christine@trafficmagnet.net http://www.TrafficMagnet.net --1197004328.1016677509125.JavaMail.SYSTEM.emaserver Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit
Hi!

Did you know that 85% of your potential customers will be using search engines to find what they are looking for on the Internet? Have you ever thought about getting your website listed on search engines worldwide?

TrafficMagnet offers a unique technology that will submit your website to over 300,000 search engines and directories every month. We can help your customers find you!

Normal Price: $14.95 per month
Special Price: $9.95 per month
You Save: More than 30% off
 
Get Started Today
Learn More
Benefit now!
It's easy, it's affordable and you can sign up online.
I look forward to hearing from you.

Best Regards,

Christine Hall
Sales and Marketing
E-mail: christine@trafficmagnet.net
http://www.TrafficMagnet.net

This email was sent to patches@python.org.
We understand you may wish NOT to receive information from us by eMail.
To be removed from this and other offers, simply click here.
. --1197004328.1016677509125.JavaMail.SYSTEM.emaserver-- From noreply@sourceforge.net Thu Mar 21 02:25:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 20 Mar 2002 18:25:11 -0800 Subject: [Patches] [ python-Patches-532638 ] Better AttributeError formatting Message-ID: Patches item #532638, was opened at 2002-03-20 13:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: Better AttributeError formatting Initial Comment: A user in c.l.py was confused when import m m.a reported AttributeError: 'module' object has no attribute 'a' The attached patch displays the object's name in the error message if it has a __name__ attribute. This is a bit tricky because of the recursive nature of looking up an attribute during a getattr operation. My solution was to pull the error formatting code into a separate static routine (the same basic thing happens in three places) and define a static variable there that breaks any recursion. While this might not be thread-safe, I think it's okay in this situation. The worst that should happen is you get either an extra round of recursion while looking up a non-existent __name__ ttribute or fail to even check for __name__ and use the default formatting when the object actually has a __name__ attribute. This can only happen if you have two threads who both get attribute errors at the same time, and then only if the process of looking things up takes you back into Python code. Perhaps a similar technique can be provided for other error formatting operations in object.c. Example for objects with and without __name__ attributes: >>> "".foo Traceback (most recent call last): File "", line 1, in ? AttributeError: str object has no attribute 'foo' >>> import string >>> string.foo Traceback (most recent call last): File "", line 1, in ? AttributeError: module object 'string' has no attribute 'foo' Skip ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-20 21:25 Message: Logged In: YES user_id=31435 hasattr() is defined in terms of whether PyObject_GetAttr() raises an exception, and thanks to __getattr__ hooks can't be computed any faster than calling PyObject_GetAttr(). Which is what the code does: v = PyObject_GetAttr(v, name); if (v == NULL) { PyErr_Clear(); Py_INCREF(Py_False); return Py_False; } Py_DECREF(v); Py_INCREF(Py_True); return Py_True; It's simply not going to get faster than that. I'm not saying you can't have a "better" message here (although since an object's __name__ field doesn't bear any necessary relationship to the variable name(s) through which the object is referenced, it's unclear that the message won't actually be worse in real non-trivial cases: the type name is an object invariant, but the name can be misleading). I am saying the tradeoff is real and needs to be addressed. That's part of "good design", Dale; doing what feels good in the last case you remember is arguably not. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 20:50 Message: Logged In: YES user_id=44345 In theory. Python's getattr capability is so dynamic though I suspect there's little hasattr() can do but call getattr() and react to the result. ---------------------------------------------------------------------- Comment By: Dale Strickland-Clark (dalesc) Date: 2002-03-20 19:36 Message: Logged In: YES user_id=457577 Surely Tim's is more an argument for fixing hasattr so it doesn't depend on an exception? To limit meaningful error messages because they slow normal program flow screams 'bad design' to me. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 18:09 Message: Logged In: YES user_id=31435 If it's one cycle slower than it is today when the exception is ignored, Zope will notice it (it uses hasattr for blood). Then Guido will get fired, have to pump gas in Amsterdam for a living, and we'll never hear from him again. How badly do you want to destroy Python ? It may be fruitful to hammer out an efficient alternative on PythonDev. It's not an argument about whether more info would be useful, although on c.l.py Dale seemed happy enough as soon as someone explained what 'module' was doing in his msg. ---------------------------------------------------------------------- Comment By: Skip Montanaro (montanaro) Date: 2002-03-20 16:50 Message: Logged In: YES user_id=44345 hmmm... How much would I have to modify it to get you to change your mind? I'm pretty sure I can get rid of the call to PyObject_HasAttrString without a lot of effort. I can't do much about avoiding at least one PyObject_GetAttrString call though, which obviously means you could wind up back in bytecode. I jumped on this after seeing the request in c.l.py mostly because I've wanted it from time-to-time as well. The extra information is useful at times. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-20 13:56 Message: Logged In: YES user_id=31435 I'm -1 on this because of the expense: many apps routinely provoke AttributeErrors that are deliberately ignored. All the time that goes into making nice messages is wasted then. A "lazy" exception object that produced a string only when actually needed would be fine (although perhaps an object may manage to change its computed __name__ by then!). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532638&group_id=5470 From noreply@sourceforge.net Thu Mar 21 10:25:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 02:25:20 -0800 Subject: [Patches] [ python-Patches-526840 ] PEP 263 Implementation Message-ID: Patches item #526840, was opened at 2002-03-07 09:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Martin v. Lцwis (loewis) Assigned to: M.-A. Lemburg (lemburg) Summary: PEP 263 Implementation Initial Comment: The attached patch implements PEP 263. The following differences to the PEP (rev. 1.8) are known: - The implementation interprets "ASCII compatible" as meaning "bytes below 128 always denote ASCII characters", although this property is only used for ",', and \. There have been other readings of "ASCII compatible", so this should probably be elaborated in the PEP. - The check whether all bytes follow the declared or system encoding (including comments and string literals) is only performed if the encoding is "ascii". ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-21 11:25 Message: Logged In: YES user_id=21627 Version 2 of this patch implements revision 1.11 of the PEP (phase 1). The check of the complete source file for compliance with the declared encoding is implemented by decoding the input line-by-line; I believe that for all supported encodings, this is not different compared to decoding the entire source file at once. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:24 Message: Logged In: YES user_id=21627 Changing the decoding functions will not result in one additional function, but in two of them: you'll also get PyUnicode_DecodeRawUnicodeEscapeFromUnicode. That seems quite unmaintainable to me: any change now needs to propagate into four functions. OTOH, I don't think that the code that allows parsing a variable-sized strings is overly complicated. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-07 19:01 Message: Logged In: YES user_id=38388 Ok, I've had a look at the patch. It looks good except for the overly complicated implementation of the unicode-escape codec. Even though there's a bit of code duplication, I'd prefer to have two separate functions here: one for the standard char* pointer type and another one for Py_UNICODE*, ie. PyUnicode_DecodeUnicodeEscape(char*...) and PyUnicode_DecodeUnicodeEscapeFromUnicode(Py_UNICODE*...) This is easier to support and gives better performance since the compiler can optimize the two functions making different assumptions. You'll also need to include a name mangling at the top of the header for the new API. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 15:06 Message: Logged In: YES user_id=6380 I've set the group to Python 2.3 so the priority has some context (I'd rather you move the priority down to 5 but I understand this is your personal priority). I haven't accepted the PEP yet (although I expect I will), so please don't check this in yet (if you feel it needs to be saved in CVS, use a branch). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-03-07 12:06 Message: Logged In: YES user_id=38388 Thank you ! I'll add a note to the PEP about the way the first two lines are processed (removing the ASCII mention...). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 10:11 Message: Logged In: YES user_id=21627 A note on the implementation strategy: it turned out that communicating the encoding into the abstract syntax was the biggest challenge. To solve this, I introduced encoding_decl pseudo node: it is an unused non-terminal whose STR() is the encoding, and whose only child is the true root of the syntax tree. As such, it is the only non-terminal which has a STR value. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=526840&group_id=5470 From noreply@sourceforge.net Thu Mar 21 10:40:09 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 02:40:09 -0800 Subject: [Patches] [ python-Patches-504943 ] call warnings.warn with Warning instance Message-ID: Patches item #504943, was opened at 2002-01-17 17:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: Walter Dцrwald (doerwalter) Summary: call warnings.warn with Warning instance Initial Comment: This patch makes it possible to pass Warning instances as the first argument to warnings.warn. In this case the category argument will be ignored. The message text used will be str(warninginstance). This makes it possible to implement special logic in a custom Warning class by implemening the __str__ method. ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-21 11:40 Message: Logged In: YES user_id=89016 Checked in as: Lib/warnings.py 1.10 Doc/lib/libwarnings.tex 1.8 ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-20 19:21 Message: Logged In: YES user_id=6380 Looks OK. Give it a try. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-20 18:26 Message: Logged In: YES user_id=89016 Now that I have write access can I check this in? ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-18 16:46 Message: Logged In: YES user_id=89016 The new version includes a patch to the documentation and an entry in Misc/NEWS ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-18 14:45 Message: Logged In: YES user_id=6380 Nice idea. Where's the documentation patch? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504943&group_id=5470 From noreply@sourceforge.net Thu Mar 21 11:08:03 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 03:08:03 -0800 Subject: [Patches] [ python-Patches-523415 ] Explict proxies for urllib.urlopen() Message-ID: Patches item #523415, was opened at 2002-02-27 14:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Andy Gimblett (gimbo) >Assigned to: Fred L. Drake, Jr. (fdrake) Summary: Explict proxies for urllib.urlopen() Initial Comment: This patch extends urllib.urlopen() so that proxies may be specified explicitly. This is achieved by adding an optional "proxies" parameter. If this parameter is omitted, urlopen() acts exactly as before, ie gets proxy settings from the environment. This is useful if you want to tell urlopen() not to use the proxy: just pass an empty dictionary. Also included is a patch to the urllib documentation explaining the new parameter. Apologies if patch format is not exactly as required: this is my first submission. All feedback appreciated. :-) ---------------------------------------------------------------------- >Comment By: Andy Gimblett (gimbo) Date: 2002-03-21 11:08 Message: Logged In: YES user_id=262849 OK, have updated docs as suggested by aimacintyre, attached as urllib_proxies_docs.cdiff I also added an example for explicit proxy specification, since it illustrates how the proxies dictionary should be structured. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-10 05:31 Message: Logged In: YES user_id=250749 I think expanding the docs is the go here. In looking at the 2.2 docs (11.4 urllib), the bits that I think could usefully be improved include:- - the paragraph describing the proxy environment variables should note that on Windows, browser (at least for InternetExplorer - I don't know about Netscape) registry settings for proxies will be used when available; - a short para noting that proxies can be overridden using URLopener/FancyURLopener class instances, documented further down the page, placed just before the note about not supporting authenticating proxies; - adding a description of the "proxies" parameter to the URLopener class definition; - adding an example of bypassing proxies to the examples subsection (11.4.2). If/when you upload a doc patch, I suggest that you assign it to Fred Drake, who is the chief docs person. ---------------------------------------------------------------------- Comment By: Andy Gimblett (gimbo) Date: 2002-03-04 09:33 Message: Logged In: YES user_id=262849 Thanks for feedback re: diffs. Have now found out about context diffs and attached new version - hope this is better. Regarding the patch itself, this arose out of a newbie question on c.l.py and I was reminded that this was an issue I'd come across in my early days too. Personally I'd never picked up the hint that you should use FancyURLopener directly. If preferred, I could have a go at patching the docs to make that clearer? ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 03:34 Message: Logged In: YES user_id=250749 BTW, the patch guidelines indicate a strong preference for context diffs with unified diffs a poor second. ---------------------------------------------------------------------- Comment By: Andrew I MacIntyre (aimacintyre) Date: 2002-03-03 03:32 Message: Logged In: YES user_id=250749 Having just looked at this myself, I can understand where you're coming from, however my reading between the lines of the docs is that if you care about the proxies then you are supposed to use urllib.FancyURLopener (or urllib.URLopener) directly. If this is the intent, the docs could be a little clearer about this. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=523415&group_id=5470 From noreply@sourceforge.net Thu Mar 21 11:09:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 03:09:37 -0800 Subject: [Patches] [ python-Patches-533008 ] specifying headers for extensions Message-ID: Patches item #533008, was opened at 2002-03-21 12:09 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470 Category: Distutils and setup.py Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Thomas Heller (theller) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: specifying headers for extensions Initial Comment: This patch allows to specify that C header files are part of source files for dependency checking. The 'sources' list in Extension instances can be simple filenames as before, but they can also be SourceFile instances created by SourceFile("myfile.c", headers=["inc1.h", "inc2.h"]). Unfortunately not only changes to command.build_ext and command.build_clib had to be made, also all the ccompiler (sub)classes have to be changed because the ccompiler does the actual dependency checking. I updated all the ccompiler subclasses except mwerkscompiler.py, but only msvccompiler has actually been tested. The argument list which dep_util.newer_pairwise() now accepts has changed, the first arg must now be a sequence of SourceFile instances. This may be problematic, better would IMO be to move this function (with a new name?) into ccompiler. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470 From noreply@sourceforge.net Thu Mar 21 13:25:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 05:25:10 -0800 Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings. Message-ID: Patches item #533070, was opened at 2002-03-21 13:25 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Ralph Corderoy (ralph) Assigned to: Nobody/Anonymous (nobody) Summary: Silence AIX C Compiler Warnings. Initial Comment: AIX 3.2.5 C compiler gives warnings during compile of Objects/object.c and Modules/signalmodule.c due to superfluous use of the ampersand address operator in front of a function name. Since the code elsewhere consistently uses plain `foo' to represent a pointer to the function foo and not `&foo' it seems best to make the code consistent and silence these warnings at the same time. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 From noreply@sourceforge.net Thu Mar 21 13:29:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 05:29:31 -0800 Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings. Message-ID: Patches item #533070, was opened at 2002-03-21 13:25 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None >Priority: 3 Submitted By: Ralph Corderoy (ralph) >Assigned to: Michael Hudson (mwh) Summary: Silence AIX C Compiler Warnings. Initial Comment: AIX 3.2.5 C compiler gives warnings during compile of Objects/object.c and Modules/signalmodule.c due to superfluous use of the ampersand address operator in front of a function name. Since the code elsewhere consistently uses plain `foo' to represent a pointer to the function foo and not `&foo' it seems best to make the code consistent and silence these warnings at the same time. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 From noreply@sourceforge.net Thu Mar 21 15:13:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 07:13:08 -0800 Subject: [Patches] [ python-Patches-532729 ] build (link) fails on Solaris 8-sem_init Message-ID: Patches item #532729, was opened at 2002-03-20 22:07 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470 Category: Build Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Martin v. Lцwis (loewis) Summary: build (link) fails on Solaris 8-sem_init Initial Comment: The build fails on Solaris 8 because sem_init() is in -lrt. Attached is a patch which works. Actually, there will be 3 patches. 1 to configure.in, 1 to configure which has many changes (my autoconf must be different than whoever generates configure normally) and a minimal configure diff. Probably would be best to have the correct person generate a new configure. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-21 16:13 Message: Logged In: YES user_id=21627 Committed as configure 1.289; configure.in 1.299; pyconfig.h.in 1.24. Python uses currently autoconf 2.13; 2.12 should also work. autoconf 2.50 is a quite different beast - even though the resulting configure should work fine, it has many macros changed and thus results in huge differences to the CVS configure. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470 From noreply@sourceforge.net Thu Mar 21 15:48:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 07:48:55 -0800 Subject: [Patches] [ python-Patches-532729 ] build (link) fails on Solaris 8-sem_init Message-ID: Patches item #532729, was opened at 2002-03-20 16:07 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470 Category: Build Group: Python 2.3 Status: Closed Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Martin v. Lцwis (loewis) Summary: build (link) fails on Solaris 8-sem_init Initial Comment: The build fails on Solaris 8 because sem_init() is in -lrt. Attached is a patch which works. Actually, there will be 3 patches. 1 to configure.in, 1 to configure which has many changes (my autoconf must be different than whoever generates configure normally) and a minimal configure diff. Probably would be best to have the correct person generate a new configure. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-21 10:48 Message: Logged In: YES user_id=33168 That's odd: autoconf --version Autoconf version 2.13 uname -a Linux epoch 2.4.7-10 #1 Thu Sep 6 16:46:36 EDT 2001 i686 unknown Oh well. Thanks Martin. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-21 10:13 Message: Logged In: YES user_id=21627 Committed as configure 1.289; configure.in 1.299; pyconfig.h.in 1.24. Python uses currently autoconf 2.13; 2.12 should also work. autoconf 2.50 is a quite different beast - even though the resulting configure should work fine, it has many macros changed and thus results in huge differences to the CVS configure. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=532729&group_id=5470 From noreply@sourceforge.net Thu Mar 21 16:40:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 08:40:49 -0800 Subject: [Patches] [ python-Patches-533165 ] add expected test failures on solaris 8 Message-ID: Patches item #533165, was opened at 2002-03-21 11:40 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: add expected test failures on solaris 8 Initial Comment: This patch makes the following skipped tests expected on sunos5: test_al test_bsddb test_cd test_cl test_gl test_imgfile test_linuxaudiodev test_nis test_openpty test_winreg test_winsound I'll try to fix the problem that sunos5 should really be something like sunos5.6, 5.7, 5.8, etc. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470 From noreply@sourceforge.net Thu Mar 21 17:17:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 09:17:16 -0800 Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings. Message-ID: Patches item #533070, was opened at 2002-03-21 08:25 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 3 Submitted By: Ralph Corderoy (ralph) Assigned to: Michael Hudson (mwh) Summary: Silence AIX C Compiler Warnings. Initial Comment: AIX 3.2.5 C compiler gives warnings during compile of Objects/object.c and Modules/signalmodule.c due to superfluous use of the ampersand address operator in front of a function name. Since the code elsewhere consistently uses plain `foo' to represent a pointer to the function foo and not `&foo' it seems best to make the code consistent and silence these warnings at the same time. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-21 12:17 Message: Logged In: YES user_id=33168 Builds for me without warnings on Linux gcc 2.96 & solaris 8, gcc 2.95.3. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 From noreply@sourceforge.net Thu Mar 21 18:17:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 10:17:44 -0800 Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings. Message-ID: Patches item #533070, was opened at 2002-03-21 13:25 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 3 Submitted By: Ralph Corderoy (ralph) Assigned to: Michael Hudson (mwh) Summary: Silence AIX C Compiler Warnings. Initial Comment: AIX 3.2.5 C compiler gives warnings during compile of Objects/object.c and Modules/signalmodule.c due to superfluous use of the ampersand address operator in front of a function name. Since the code elsewhere consistently uses plain `foo' to represent a pointer to the function foo and not `&foo' it seems best to make the code consistent and silence these warnings at the same time. ---------------------------------------------------------------------- >Comment By: Ralph Corderoy (ralph) Date: 2002-03-21 18:17 Message: Logged In: YES user_id=911 Dear nnorwitz, I'm aware that gcc doesn't issue the warning. However, AIX 3.2.5's C compiler does. And the source consistently omits the ampersand elsewhere. So there seems little reason not to make the change and increase the number of `clean' builds out there. Cheers, Ralph. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-21 17:17 Message: Logged In: YES user_id=33168 Builds for me without warnings on Linux gcc 2.96 & solaris 8, gcc 2.95.3. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 From noreply@sourceforge.net Thu Mar 21 18:25:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 10:25:22 -0800 Subject: [Patches] [ python-Patches-533070 ] Silence AIX C Compiler Warnings. Message-ID: Patches item #533070, was opened at 2002-03-21 08:25 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 3 Submitted By: Ralph Corderoy (ralph) Assigned to: Michael Hudson (mwh) Summary: Silence AIX C Compiler Warnings. Initial Comment: AIX 3.2.5 C compiler gives warnings during compile of Objects/object.c and Modules/signalmodule.c due to superfluous use of the ampersand address operator in front of a function name. Since the code elsewhere consistently uses plain `foo' to represent a pointer to the function foo and not `&foo' it seems best to make the code consistent and silence these warnings at the same time. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-21 13:25 Message: Logged In: YES user_id=33168 I'm sorry, you misunderstood. I agree that the patch should be applied. I was reporting that there were no problems created on other platforms. -- Neal ---------------------------------------------------------------------- Comment By: Ralph Corderoy (ralph) Date: 2002-03-21 13:17 Message: Logged In: YES user_id=911 Dear nnorwitz, I'm aware that gcc doesn't issue the warning. However, AIX 3.2.5's C compiler does. And the source consistently omits the ampersand elsewhere. So there seems little reason not to make the change and increase the number of `clean' builds out there. Cheers, Ralph. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-21 12:17 Message: Logged In: YES user_id=33168 Builds for me without warnings on Linux gcc 2.96 & solaris 8, gcc 2.95.3. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533070&group_id=5470 From noreply@sourceforge.net Fri Mar 22 06:59:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 21 Mar 2002 22:59:07 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-15 19:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Neil Schemenauer (nascheme) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-22 01:59 Message: Logged In: YES user_id=31435 Neil, I'm in favor of forcing this issue: check it in now, while we're still far from the first 2.3 alpha. People will gripe, but that will give them the motivation to help too. It's not going to go anywhere if we wait for all answers to all issues in advance (it's been in that limbo state for a couple years already ...). Note that I already made pymalloc the default on Windows. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 18:23 Message: Logged In: YES user_id=35752 Oops, forgot one important change in the last update. PyObject_MALLOC needs to use PyMem_MALLOC not _PyMalloc_MALLOC. Clear as mud, no? :-) ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 18:08 Message: Logged In: YES user_id=35752 Update patch to latest CVS. It's now about 1/3 of its original size. We still need documentation for PyMalloc_{New,NewVar,Del}. Other than the docs, the only thing left to do is decide if we want the new API. The situation with extension modules is not as bad as I originally thought. The xxmodule.c example has been correct since version 1.6. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 14:32 Message: Logged In: YES user_id=31435 I certainly want, e.g., that our Unicode implementation can choose to use obmalloc.c for its raw string storage, despite that it isn't "object storage" (in the sense of Vladimir's level "+2" in the diagram at the top of obmalloc.c; the current CVS code restricts obmalloc use to level +2, while raw string storage is at level "+1"). Allowing to use pymalloc at level +1 changes Vladimir's original intent, and we have no experience with it, so I'm fine with restricting that ability to the core at the start. About names, we've been calling this package "pymalloc" for years, and the general form of external name throughout Python is ["_"] "Py" Package "_" Function _PyMalloc_{Malloc, Free, etc} fit that pattern perfectly. I don't see the attraction to giving functions from this package idiosyncratic names, and we've got so many ways to spell "get memory" that I expect it will be a genuine help to keep on making it clear, from the name alone, to which "family" a given variant of "new" (etc) belongs. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-17 12:11 Message: Logged In: YES user_id=35752 I'm not sure exactly what Tim meant by that comment. If we want to make PyMalloc available to EXTENSION modules then, yes, we need to remove the leading underscope and make a wrapper for it. I would prefer to keep it private for now since it gives us more freedom on how PyMalloc_New is implemented. Tim? Regarding the names, I have no problem with Py_Malloc. If we change should we keep PyMalloc_{New,NewVar,Del}? Py_New seems at little to short. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 05:12 Message: Logged In: YES user_id=21627 The patch looks good, except that it does not meet one of Tim's requirements: there is no way to spell "give me memory from the allocator that PyMalloc_New uses". _PyMalloc_Malloc is clearly not for general use, since it starts with an underscore. What about calling this allocator (which could be either PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free? Also, it appears that there is no function wrapper around this allocator: A module that uses the PyMalloc allocator will break in a configuration where pymalloc is disabled. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-15 22:50 Message: Logged In: YES user_id=35752 Okay, with-pymalloc is back but defaults to enabled. The functions PyMalloc_{Malloc,Realloc,Free} have been renamed to _PyMalloc_{Malloc,Realloc,Free}. Maybe their ugly names will discourage their use. People should use PyMalloc_{New,NewVar,Del} if they want to allocate objects using pymalloc. There's no way we can reuse PyObject_{New,NewVar,Del}. Memory can be allocated with PyObject_New and freed with PyObject_DEL. That would not work if PyObject_New used pymalloc. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 19:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Fri Mar 22 08:04:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 00:04:39 -0800 Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip) Message-ID: Patches item #533482, was opened at 2002-03-22 03:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Todd Warner (icode) Assigned to: Nobody/Anonymous (nobody) Summary: small seek tweak upon reads (gzip) Initial Comment: Upon actual read of a gzipped file, there is a check to see if you are already at the end of the file. This is done by saving your position, seeking to the end, and comparing that tell(). It is more efficient to simply increment position + 1. Efficiency gain is nearly insignificant, but this patch will greatly decrease the size of my next one. :) NOTE: all version of gzip.py do this. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 From groupagent@myegroups.com Fri Mar 22 04:05:33 2002 From: groupagent@myegroups.com (My eGroups Agent) Date: Fri, 22 Mar 2002 04:05:33 (GMT) Subject: [Patches] It's About Time Message-ID: Automated Message We have some exciting news! My eGroups is a new service= that provides a great new way to organize your groups online!= The concept is to provide online tools to organize any group,= including the following:
  • Friends
  • Social clubs
  • Kids playgroups
  • Community groups
  • Sports clubs
  • Golf buddies
  • Neighborhood Associations
Any group that needs to share information,= schedule events, post discussion items, take a group vote, or= keep an updated contact list is perfect for My eGroups! It is extremely= flexible and designed to allow any type of group to collaborate and organize on-line! It is= a great service that you will get a lot of value out of when you= become a member!

Here is a link where you can find additional information =96 http://www.myegroups.com. = Visit the site and see what it is all about! When you sign up,= please use Promotion Code - 00MM2!

If you have any questions about the site or ideas for= improvements, please let us know. You can submit= feedback directly from the site!

We know you are busy...that is why= we created My eGroups.

My eGroups...It's about PEOPLE...It's about TIME!

You have received this message because of your online= affiliations and indications that you may be interested in new online services . If this message= has reached you in error or you no longer wish to receive these email promotions please click here to unsubscribe.

You may also reply to this message with the word= "Remove" in the subject field.

From noreply@sourceforge.net Fri Mar 22 15:20:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 07:20:36 -0800 Subject: [Patches] [ python-Patches-533621 ] Remove pymalloc hooks Message-ID: Patches item #533621, was opened at 2002-03-22 15:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Tim Peters (tim_one) Summary: Remove pymalloc hooks Initial Comment: Just to make sure Vladimir hates me. :-) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470 From noreply@sourceforge.net Fri Mar 22 17:10:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 09:10:41 -0800 Subject: [Patches] [ python-Patches-530556 ] Enable pymalloc Message-ID: Patches item #530556, was opened at 2002-03-16 00:01 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Neil Schemenauer (nascheme) Summary: Enable pymalloc Initial Comment: The attached patch removes the PyCore_* memory management layer and gives up on the hope that PyObject_DEL() will ever be anything but free(). pymalloc is given a visible API in the form of PyMalloc_Malloc, PyMalloc_Realloc, PyMalloc_Free. A new object memory interface is implemented on top of pymalloc in the form of PyMalloc_{New,NewVar,Del}. Those are ugly names. Please suggest alternatives. Some objects are changed to use pymalloc. The GC memory functions are changed to use pymalloc. The configure support for enabling pymalloc was also removed. Perhaps that should be left in so people can disable pymalloc on low memory machines. I left typeobject using the system allocator (new style classes will not use pymalloc). Fixing that is probably a job for Guido. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-22 17:10 Message: Logged In: YES user_id=35752 A slightly modified version of this patch has been checked in. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-22 06:59 Message: Logged In: YES user_id=31435 Neil, I'm in favor of forcing this issue: check it in now, while we're still far from the first 2.3 alpha. People will gripe, but that will give them the motivation to help too. It's not going to go anywhere if we wait for all answers to all issues in advance (it's been in that limbo state for a couple years already ...). Note that I already made pymalloc the default on Windows. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 23:23 Message: Logged In: YES user_id=35752 Oops, forgot one important change in the last update. PyObject_MALLOC needs to use PyMem_MALLOC not _PyMalloc_MALLOC. Clear as mud, no? :-) ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-18 23:08 Message: Logged In: YES user_id=35752 Update patch to latest CVS. It's now about 1/3 of its original size. We still need documentation for PyMalloc_{New,NewVar,Del}. Other than the docs, the only thing left to do is decide if we want the new API. The situation with extension modules is not as bad as I originally thought. The xxmodule.c example has been correct since version 1.6. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-17 19:32 Message: Logged In: YES user_id=31435 I certainly want, e.g., that our Unicode implementation can choose to use obmalloc.c for its raw string storage, despite that it isn't "object storage" (in the sense of Vladimir's level "+2" in the diagram at the top of obmalloc.c; the current CVS code restricts obmalloc use to level +2, while raw string storage is at level "+1"). Allowing to use pymalloc at level +1 changes Vladimir's original intent, and we have no experience with it, so I'm fine with restricting that ability to the core at the start. About names, we've been calling this package "pymalloc" for years, and the general form of external name throughout Python is ["_"] "Py" Package "_" Function _PyMalloc_{Malloc, Free, etc} fit that pattern perfectly. I don't see the attraction to giving functions from this package idiosyncratic names, and we've got so many ways to spell "get memory" that I expect it will be a genuine help to keep on making it clear, from the name alone, to which "family" a given variant of "new" (etc) belongs. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-17 17:11 Message: Logged In: YES user_id=35752 I'm not sure exactly what Tim meant by that comment. If we want to make PyMalloc available to EXTENSION modules then, yes, we need to remove the leading underscope and make a wrapper for it. I would prefer to keep it private for now since it gives us more freedom on how PyMalloc_New is implemented. Tim? Regarding the names, I have no problem with Py_Malloc. If we change should we keep PyMalloc_{New,NewVar,Del}? Py_New seems at little to short. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-17 10:12 Message: Logged In: YES user_id=21627 The patch looks good, except that it does not meet one of Tim's requirements: there is no way to spell "give me memory from the allocator that PyMalloc_New uses". _PyMalloc_Malloc is clearly not for general use, since it starts with an underscore. What about calling this allocator (which could be either PyMalloc or malloc) Py_Malloc, Py_Realloc, Py_Free? Also, it appears that there is no function wrapper around this allocator: A module that uses the PyMalloc allocator will break in a configuration where pymalloc is disabled. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-16 03:50 Message: Logged In: YES user_id=35752 Okay, with-pymalloc is back but defaults to enabled. The functions PyMalloc_{Malloc,Realloc,Free} have been renamed to _PyMalloc_{Malloc,Realloc,Free}. Maybe their ugly names will discourage their use. People should use PyMalloc_{New,NewVar,Del} if they want to allocate objects using pymalloc. There's no way we can reuse PyObject_{New,NewVar,Del}. Memory can be allocated with PyObject_New and freed with PyObject_DEL. That would not work if PyObject_New used pymalloc. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-16 00:54 Message: Logged In: YES user_id=21627 -1. --with-pymalloc should remain an option; there is still the heuristics in releasing memory that may people make uncomfortable. Also, on systems with super-efficient malloc, you may not want to use pymalloc. I dislike the name PyMalloc_Malloc; it may be acceptable for the allocation algorithm itself (although it sounds funny). However, for the PyObject allocator, something else needs to be found. I can't really see the problem with calling it PyObject_New/_NewVar/_Del. None of these where available in Python 1.5.2, so I don't think 1.5.2 code could break. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=530556&group_id=5470 From noreply@sourceforge.net Fri Mar 22 17:20:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 09:20:48 -0800 Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin Message-ID: Patches item #533681, was opened at 2002-03-22 17:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Apply semaphore code to Cygwin Initial Comment: The current version of Cygwin does not define _POSIX_SEMAPHORES by default, although requires the new semaphore interface since its condition variables interface contains a race condition. This patch simply specifies that semaphores should be used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 From noreply@sourceforge.net Fri Mar 22 17:26:11 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 09:26:11 -0800 Subject: [Patches] [ python-Patches-533165 ] add expected test failures on solaris 8 Message-ID: Patches item #533165, was opened at 2002-03-21 17:40 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: add expected test failures on solaris 8 Initial Comment: This patch makes the following skipped tests expected on sunos5: test_al test_bsddb test_cd test_cl test_gl test_imgfile test_linuxaudiodev test_nis test_openpty test_winreg test_winsound I'll try to fix the problem that sunos5 should really be something like sunos5.6, 5.7, 5.8, etc. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 18:26 Message: Logged In: YES user_id=21627 -1. The list of skipped modules will vary widely across installations, even if you take Solaris versions into account. For example, test_nis will pass for many users, since NIS is really common in Solaris environments. Likewise, bsddb tests will pass if bsddb is installed in /usr/local. OTOH, test_sunaudiodev is known to fail on server systems which don't have a /dev/audio. Instead, I would like to see a more flexible scheme for expected skips, which includes detection that some resources are unavailable - if that is the cause, the skipped test does not indicate a problem with the Python installation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470 From noreply@sourceforge.net Fri Mar 22 17:40:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 09:40:29 -0800 Subject: [Patches] [ python-Patches-533621 ] Remove pymalloc hooks Message-ID: Patches item #533621, was opened at 2002-03-22 10:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470 Category: Core (C code) Group: None Status: Open >Resolution: Accepted Priority: 5 Submitted By: Neil Schemenauer (nascheme) >Assigned to: Neil Schemenauer (nascheme) Summary: Remove pymalloc hooks Initial Comment: Just to make sure Vladimir hates me. :-) ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-22 12:40 Message: Logged In: YES user_id=31435 Well, I hate you too, but it's still a good idea . Accepted & back to you. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470 From noreply@sourceforge.net Fri Mar 22 17:48:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 09:48:10 -0800 Subject: [Patches] [ python-Patches-533165 ] add expected test failures on solaris 8 Message-ID: Patches item #533165, was opened at 2002-03-21 11:40 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470 Category: Tests Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: add expected test failures on solaris 8 Initial Comment: This patch makes the following skipped tests expected on sunos5: test_al test_bsddb test_cd test_cl test_gl test_imgfile test_linuxaudiodev test_nis test_openpty test_winreg test_winsound I'll try to fix the problem that sunos5 should really be something like sunos5.6, 5.7, 5.8, etc. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-22 12:48 Message: Logged In: YES user_id=33168 I agree that this the skipped test is inadequate, but this is more of a general problem. I actually used linux2 as the template. But only applied the tests which really failed on the sun. linux2 also adds that curses, socket_ssl, socketserver to the list, even those these are probably successful with the -u curses -u network flags. I could certainly pare down the list to not include nis, bsddb. Are you saying that you would like new code added to regrtest.py to handle TestUnavailable or something like that? So if NIS is not available it would raise this exception. This is probably a good idea, but would mean a bunch of tests being modified and mostly getting rid of the current known skipped list. If you want to head down this route, I suggest closing this patch. We can start a discussion on python-dev or at least ask if anyone has a problem with the approach. Also, we should fix the problem you noted before that sunos5 is not sufficient. We need to be more fine grained, ie, 5.6, 5.7, 5.8. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 12:26 Message: Logged In: YES user_id=21627 -1. The list of skipped modules will vary widely across installations, even if you take Solaris versions into account. For example, test_nis will pass for many users, since NIS is really common in Solaris environments. Likewise, bsddb tests will pass if bsddb is installed in /usr/local. OTOH, test_sunaudiodev is known to fail on server systems which don't have a /dev/audio. Instead, I would like to see a more flexible scheme for expected skips, which includes detection that some resources are unavailable - if that is the cause, the skipped test does not indicate a problem with the Python installation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470 From noreply@sourceforge.net Fri Mar 22 17:55:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 09:55:21 -0800 Subject: [Patches] [ python-Patches-403679 ] AIX and BeOS build quirk revisions Message-ID: Patches item #403679, was opened at 2001-02-08 08:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470 Category: Build Group: None Status: Closed Resolution: None Priority: 5 Submitted By: Donn Cave (donnc) Assigned to: Nobody/Anonymous (nobody) Summary: AIX and BeOS build quirk revisions Initial Comment: This obsoletes #103487. It deals with scripts like ld_so_aix. Please move the scripts in the BeOS subdirectory to Modules: $ mv BeOS/ar-fake Modules/ar_beos $ mv BeOS/linkmodule Modules/ld_so_beos; you may also $ mv BeOS/README Misc/BeOS-NOTES and delete the rest of BeOS if you like. This patch doesn't modify either of those files, but another patch will. The new top level Makefile is a good thing here, by the way. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 18:55 Message: Logged In: YES user_id=21627 Bug python.org/sf/533306 claims that there is no cc_r on AIX, yet your patch changes the compiler name. Can you please explain? ---------------------------------------------------------------------- Comment By: Donn Cave (donnc) Date: 2001-02-15 19:24 Message: An alert reader brought to my attention, I appear to have converted to CRLF line endings when I pasted this patch in, so it applies better after a tr -d '\015' or similar. Will be happy to resubmit better copies of this and the other couple of patches I botched the same way, if it's a problem at all. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470 From noreply@sourceforge.net Fri Mar 22 17:57:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 09:57:02 -0800 Subject: [Patches] [ python-Patches-403679 ] AIX and BeOS build quirk revisions Message-ID: Patches item #403679, was opened at 2001-02-08 08:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470 Category: Build Group: None Status: Closed Resolution: None Priority: 5 Submitted By: Donn Cave (donnc) Assigned to: Nobody/Anonymous (nobody) Summary: AIX and BeOS build quirk revisions Initial Comment: This obsoletes #103487. It deals with scripts like ld_so_aix. Please move the scripts in the BeOS subdirectory to Modules: $ mv BeOS/ar-fake Modules/ar_beos $ mv BeOS/linkmodule Modules/ld_so_beos; you may also $ mv BeOS/README Misc/BeOS-NOTES and delete the rest of BeOS if you like. This patch doesn't modify either of those files, but another patch will. The new top level Makefile is a good thing here, by the way. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 18:57 Message: Logged In: YES user_id=21627 ... got the number wrong; it is 533188. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 18:55 Message: Logged In: YES user_id=21627 Bug python.org/sf/533306 claims that there is no cc_r on AIX, yet your patch changes the compiler name. Can you please explain? ---------------------------------------------------------------------- Comment By: Donn Cave (donnc) Date: 2001-02-15 19:24 Message: An alert reader brought to my attention, I appear to have converted to CRLF line endings when I pasted this patch in, so it applies better after a tr -d '\015' or similar. Will be happy to resubmit better copies of this and the other couple of patches I botched the same way, if it's a problem at all. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470 From noreply@sourceforge.net Fri Mar 22 18:04:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 10:04:26 -0800 Subject: [Patches] [ python-Patches-533165 ] add expected test failures on solaris 8 Message-ID: Patches item #533165, was opened at 2002-03-21 17:40 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470 Category: Tests Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: add expected test failures on solaris 8 Initial Comment: This patch makes the following skipped tests expected on sunos5: test_al test_bsddb test_cd test_cl test_gl test_imgfile test_linuxaudiodev test_nis test_openpty test_winreg test_winsound I'll try to fix the problem that sunos5 should really be something like sunos5.6, 5.7, 5.8, etc. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 19:04 Message: Logged In: YES user_id=21627 This is indeed what I'd prefer to happen - but you probably need BDFL support before changing it. Closing it for now - if alternatives are rejected, we can reopen it. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-22 18:48 Message: Logged In: YES user_id=33168 I agree that this the skipped test is inadequate, but this is more of a general problem. I actually used linux2 as the template. But only applied the tests which really failed on the sun. linux2 also adds that curses, socket_ssl, socketserver to the list, even those these are probably successful with the -u curses -u network flags. I could certainly pare down the list to not include nis, bsddb. Are you saying that you would like new code added to regrtest.py to handle TestUnavailable or something like that? So if NIS is not available it would raise this exception. This is probably a good idea, but would mean a bunch of tests being modified and mostly getting rid of the current known skipped list. If you want to head down this route, I suggest closing this patch. We can start a discussion on python-dev or at least ask if anyone has a problem with the approach. Also, we should fix the problem you noted before that sunos5 is not sufficient. We need to be more fine grained, ie, 5.6, 5.7, 5.8. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 18:26 Message: Logged In: YES user_id=21627 -1. The list of skipped modules will vary widely across installations, even if you take Solaris versions into account. For example, test_nis will pass for many users, since NIS is really common in Solaris environments. Likewise, bsddb tests will pass if bsddb is installed in /usr/local. OTOH, test_sunaudiodev is known to fail on server systems which don't have a /dev/audio. Instead, I would like to see a more flexible scheme for expected skips, which includes detection that some resources are unavailable - if that is the cause, the skipped test does not indicate a problem with the Python installation. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533165&group_id=5470 From noreply@sourceforge.net Fri Mar 22 18:42:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 10:42:18 -0800 Subject: [Patches] [ python-Patches-403679 ] AIX and BeOS build quirk revisions Message-ID: Patches item #403679, was opened at 2001-02-08 07:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470 Category: Build Group: None Status: Closed Resolution: None Priority: 5 Submitted By: Donn Cave (donnc) Assigned to: Nobody/Anonymous (nobody) Summary: AIX and BeOS build quirk revisions Initial Comment: This obsoletes #103487. It deals with scripts like ld_so_aix. Please move the scripts in the BeOS subdirectory to Modules: $ mv BeOS/ar-fake Modules/ar_beos $ mv BeOS/linkmodule Modules/ld_so_beos; you may also $ mv BeOS/README Misc/BeOS-NOTES and delete the rest of BeOS if you like. This patch doesn't modify either of those files, but another patch will. The new top level Makefile is a good thing here, by the way. ---------------------------------------------------------------------- >Comment By: Donn Cave (donnc) Date: 2002-03-22 18:42 Message: Logged In: YES user_id=42839 Response posted to 533188. cc_r is needed for reentrant library functions. IBM may charge extra for reentrant library functions, I have no idea. Usage: xlc [ option | inputfile ]... cc [ option | inputfile ]... c89 [ option | inputfile ]... xlc128 [ option | inputfile ]... cc128 [ option | inputfile ]... xlc_r [ option | inputfile ]... cc_r [ option | inputfile ]... xlc_r4 [ option | inputfile ]... cc_r4 [ option | inputfile ]... xlc_r7 [ option | inputfile ]... cc_r7 [ option | inputfile ]... ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 17:57 Message: Logged In: YES user_id=21627 ... got the number wrong; it is 533188. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 17:55 Message: Logged In: YES user_id=21627 Bug python.org/sf/533306 claims that there is no cc_r on AIX, yet your patch changes the compiler name. Can you please explain? ---------------------------------------------------------------------- Comment By: Donn Cave (donnc) Date: 2001-02-15 18:24 Message: An alert reader brought to my attention, I appear to have converted to CRLF line endings when I pasted this patch in, so it applies better after a tr -d '\015' or similar. Will be happy to resubmit better copies of this and the other couple of patches I botched the same way, if it's a problem at all. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403679&group_id=5470 From noreply@sourceforge.net Fri Mar 22 20:03:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 12:03:56 -0800 Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin Message-ID: Patches item #533681, was opened at 2002-03-22 18:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Apply semaphore code to Cygwin Initial Comment: The current version of Cygwin does not define _POSIX_SEMAPHORES by default, although requires the new semaphore interface since its condition variables interface contains a race condition. This patch simply specifies that semaphores should be used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 21:03 Message: Logged In: YES user_id=21627 -1. Cygwin really ought to define _POSIX_SEMAPHORES if they support them, so if they support them and don't define the feature test macro, it is a Cygwin bug. Work-arounds around platform bugs are generally discourgaged in Python. On python-dev, you indicate that _POSIX_SEMPAPHORES is only defined if __rtems__ is also defined. What is the rationale for that? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 From noreply@sourceforge.net Fri Mar 22 21:18:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 13:18:41 -0800 Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin Message-ID: Patches item #533681, was opened at 2002-03-22 17:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Apply semaphore code to Cygwin Initial Comment: The current version of Cygwin does not define _POSIX_SEMAPHORES by default, although requires the new semaphore interface since its condition variables interface contains a race condition. This patch simply specifies that semaphores should be used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined. ---------------------------------------------------------------------- >Comment By: Gerald S. Williams (gsw_agere) Date: 2002-03-22 21:18 Message: Logged In: YES user_id=329402 Before _POSIX_SEMAPHORES is specified by default for Cygwin, it will probably have to be shown that it is 100% compliant with POSIX. Whether or not this is the case, the POSIX semaphore implementation is the one that should be used for Cygwin (it has been verified and approved by the Cygwin Python maintainer, etc.). Prior to this, threading had been disabled for Cygwin Python, so this is really more of a port-to-Cygwin than a workaround. This could have been implemented in a new file (thread_cygwin.h), although during implementation it was discovered that the change for Cygwin would also benefit POSIX semaphore users in general. The threading module overall is highly platform-specific, especially with regard to redefining POSIX symbols for specific platforms. In particular, this is done for the following platforms: __DGUX __sgi __ksr__ anything using SOLARIS_THREADS __MWERKS__ However, except for those using SOLARIS_THREADS, these are specified in thread.c. I will therefore resubmit the patch as a change to thread.c instead. The reference to __rtems__ actually comes from newlib, which Cygwin uses. It doesn't apply to Cygwin. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 20:03 Message: Logged In: YES user_id=21627 -1. Cygwin really ought to define _POSIX_SEMAPHORES if they support them, so if they support them and don't define the feature test macro, it is a Cygwin bug. Work-arounds around platform bugs are generally discourgaged in Python. On python-dev, you indicate that _POSIX_SEMPAPHORES is only defined if __rtems__ is also defined. What is the rationale for that? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 From noreply@sourceforge.net Fri Mar 22 21:19:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 13:19:08 -0800 Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin Message-ID: Patches item #533681, was opened at 2002-03-22 17:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Apply semaphore code to Cygwin Initial Comment: The current version of Cygwin does not define _POSIX_SEMAPHORES by default, although requires the new semaphore interface since its condition variables interface contains a race condition. This patch simply specifies that semaphores should be used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined. ---------------------------------------------------------------------- >Comment By: Gerald S. Williams (gsw_agere) Date: 2002-03-22 21:19 Message: Logged In: YES user_id=329402 Before _POSIX_SEMAPHORES is specified by default for Cygwin, it will probably have to be shown that it is 100% compliant with POSIX. Whether or not this is the case, the POSIX semaphore implementation is the one that should be used for Cygwin (it has been verified and approved by the Cygwin Python maintainer, etc.). Prior to this, threading had been disabled for Cygwin Python, so this is really more of a port-to-Cygwin than a workaround. This could have been implemented in a new file (thread_cygwin.h), although during implementation it was discovered that the change for Cygwin would also benefit POSIX semaphore users in general. The threading module overall is highly platform-specific, especially with regard to redefining POSIX symbols for specific platforms. In particular, this is done for the following platforms: __DGUX __sgi __ksr__ anything using SOLARIS_THREADS __MWERKS__ However, except for those using SOLARIS_THREADS, these are specified in thread.c. I will therefore resubmit the patch as a change to thread.c instead. The reference to __rtems__ actually comes from newlib, which Cygwin uses. It doesn't apply to Cygwin. ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-03-22 21:18 Message: Logged In: YES user_id=329402 Before _POSIX_SEMAPHORES is specified by default for Cygwin, it will probably have to be shown that it is 100% compliant with POSIX. Whether or not this is the case, the POSIX semaphore implementation is the one that should be used for Cygwin (it has been verified and approved by the Cygwin Python maintainer, etc.). Prior to this, threading had been disabled for Cygwin Python, so this is really more of a port-to-Cygwin than a workaround. This could have been implemented in a new file (thread_cygwin.h), although during implementation it was discovered that the change for Cygwin would also benefit POSIX semaphore users in general. The threading module overall is highly platform-specific, especially with regard to redefining POSIX symbols for specific platforms. In particular, this is done for the following platforms: __DGUX __sgi __ksr__ anything using SOLARIS_THREADS __MWERKS__ However, except for those using SOLARIS_THREADS, these are specified in thread.c. I will therefore resubmit the patch as a change to thread.c instead. The reference to __rtems__ actually comes from newlib, which Cygwin uses. It doesn't apply to Cygwin. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 20:03 Message: Logged In: YES user_id=21627 -1. Cygwin really ought to define _POSIX_SEMAPHORES if they support them, so if they support them and don't define the feature test macro, it is a Cygwin bug. Work-arounds around platform bugs are generally discourgaged in Python. On python-dev, you indicate that _POSIX_SEMPAPHORES is only defined if __rtems__ is also defined. What is the rationale for that? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 From noreply@sourceforge.net Fri Mar 22 21:28:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 13:28:20 -0800 Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin Message-ID: Patches item #533681, was opened at 2002-03-22 12:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Apply semaphore code to Cygwin Initial Comment: The current version of Cygwin does not define _POSIX_SEMAPHORES by default, although requires the new semaphore interface since its condition variables interface contains a race condition. This patch simply specifies that semaphores should be used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-22 16:28 Message: Logged In: YES user_id=31435 I'm afraid I agree with Martin here: the crusty old historical examples you dug up are exactly why we avoid doing similar stuff now. Nobody understands why that code is there anymore, and it will never go away. For example, I happen to know that KSR went bankrupt in 1994, and anything keying off __ksr__ has been worse than useless since then. ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-03-22 16:19 Message: Logged In: YES user_id=329402 Before _POSIX_SEMAPHORES is specified by default for Cygwin, it will probably have to be shown that it is 100% compliant with POSIX. Whether or not this is the case, the POSIX semaphore implementation is the one that should be used for Cygwin (it has been verified and approved by the Cygwin Python maintainer, etc.). Prior to this, threading had been disabled for Cygwin Python, so this is really more of a port-to-Cygwin than a workaround. This could have been implemented in a new file (thread_cygwin.h), although during implementation it was discovered that the change for Cygwin would also benefit POSIX semaphore users in general. The threading module overall is highly platform-specific, especially with regard to redefining POSIX symbols for specific platforms. In particular, this is done for the following platforms: __DGUX __sgi __ksr__ anything using SOLARIS_THREADS __MWERKS__ However, except for those using SOLARIS_THREADS, these are specified in thread.c. I will therefore resubmit the patch as a change to thread.c instead. The reference to __rtems__ actually comes from newlib, which Cygwin uses. It doesn't apply to Cygwin. ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-03-22 16:18 Message: Logged In: YES user_id=329402 Before _POSIX_SEMAPHORES is specified by default for Cygwin, it will probably have to be shown that it is 100% compliant with POSIX. Whether or not this is the case, the POSIX semaphore implementation is the one that should be used for Cygwin (it has been verified and approved by the Cygwin Python maintainer, etc.). Prior to this, threading had been disabled for Cygwin Python, so this is really more of a port-to-Cygwin than a workaround. This could have been implemented in a new file (thread_cygwin.h), although during implementation it was discovered that the change for Cygwin would also benefit POSIX semaphore users in general. The threading module overall is highly platform-specific, especially with regard to redefining POSIX symbols for specific platforms. In particular, this is done for the following platforms: __DGUX __sgi __ksr__ anything using SOLARIS_THREADS __MWERKS__ However, except for those using SOLARIS_THREADS, these are specified in thread.c. I will therefore resubmit the patch as a change to thread.c instead. The reference to __rtems__ actually comes from newlib, which Cygwin uses. It doesn't apply to Cygwin. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 15:03 Message: Logged In: YES user_id=21627 -1. Cygwin really ought to define _POSIX_SEMAPHORES if they support them, so if they support them and don't define the feature test macro, it is a Cygwin bug. Work-arounds around platform bugs are generally discourgaged in Python. On python-dev, you indicate that _POSIX_SEMPAPHORES is only defined if __rtems__ is also defined. What is the rationale for that? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 From noreply@sourceforge.net Fri Mar 22 23:20:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 15:20:33 -0800 Subject: [Patches] [ python-Patches-533621 ] Remove pymalloc hooks Message-ID: Patches item #533621, was opened at 2002-03-22 15:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: Accepted Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Neil Schemenauer (nascheme) Summary: Remove pymalloc hooks Initial Comment: Just to make sure Vladimir hates me. :-) ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-22 23:20 Message: Logged In: YES user_id=35752 Checked in as obmalloc.c 2.5. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-22 17:40 Message: Logged In: YES user_id=31435 Well, I hate you too, but it's still a good idea . Accepted & back to you. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470 From noreply@sourceforge.net Fri Mar 22 23:20:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 15:20:58 -0800 Subject: [Patches] [ python-Patches-533621 ] Remove pymalloc hooks Message-ID: Patches item #533621, was opened at 2002-03-22 15:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470 Category: Core (C code) Group: None >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Neil Schemenauer (nascheme) Assigned to: Neil Schemenauer (nascheme) Summary: Remove pymalloc hooks Initial Comment: Just to make sure Vladimir hates me. :-) ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-22 23:20 Message: Logged In: YES user_id=35752 Checked in as obmalloc.c 2.5. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-22 17:40 Message: Logged In: YES user_id=31435 Well, I hate you too, but it's still a good idea . Accepted & back to you. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533621&group_id=5470 From noreply@sourceforge.net Sat Mar 23 03:41:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 19:41:05 -0800 Subject: [Patches] [ python-Patches-440407 ] Remote execution patch for IDLE Message-ID: Patches item #440407, was opened at 2001-07-11 09:35 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470 Category: IDLE Group: None Status: Open Resolution: Out of Date >Priority: 1 Submitted By: Guido van Rossum (gvanrossum) Assigned to: Guido van Rossum (gvanrossum) Summary: Remote execution patch for IDLE Initial Comment: This is the code I have for the remote execution patch. (Remote execution must be enabled with an explicit command line argument -r.) Caveats: - undocumented - slow - security issue: the subprocess should not be the server but the client, to prevent a hacker from gaining access This should apply cleanly against IDLE as currently checked into the Python CVS tree. I don't want to check this in yet because of the security issue, and I don't have time to work on it. I hope the idlefork project will pick this up though and address the issues above. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-09 09:19 Message: Logged In: YES user_id=6380 No, the IDLEfork project has stalled except for tweaking the configuration code (which would be good to merge into the Python IDLE tree when it's ready). I expect the patch failure is shallow so I won't bother fixing it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 06:02 Message: Logged In: YES user_id=21627 It appears the patch is slightly outdated now, atleast the chunk removing set_break does not apply anymore. Has this been integrated to idlefork? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-07-11 09:38 Message: Logged In: YES user_id=6380 Uploading the patch again. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=440407&group_id=5470 From noreply@sourceforge.net Sat Mar 23 03:47:16 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Fri, 22 Mar 2002 19:47:16 -0800 Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior Message-ID: Patches item #514662, was opened at 2002-02-07 23:49 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None >Priority: 6 Submitted By: Naofumi Honda (naofumi-h) Assigned to: Guido van Rossum (gvanrossum) Summary: On the update_slot() behavior Initial Comment: Inherited method __getitem__ of list type in the new subclass is unexpectedly slow. For example, x = list([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m2.390s class nlist(list): pass x = nlist([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m7.040s about 3times slower!!! The reason is: for the __getitem__ attribute, there are two slotdefs in typeobject.c (one for the mapping type, and the other for the sequence type). In the creation of new_type of list type, fixup_slot_dispatchers() and update_slot() functions in typeobject.c allocate the functions to both sq_item and mp_subscript slots (the mp_subscript slot had originally no function, because the list type is a sequence type), and it's an unexpected allocation for the mapping slot since the descriptor type of __getitem__ is now WrapperType for the sequence operations. If you will trace x[1] using gdb, you will find that in PyObject_GetItem() m->mp_subscript = slot_mp_subscript is called instead of a sequece operation because mp_subscript slot was allocated by fixup_slot_dispatchers(). In the slot_mp_subscirpt(), call_method(self, "__getitem__", ...) is invoked, and turn out to call a wrapper descriptors for the sq_item. As a result, the method of list type finally called, but it needs many unexpected function calls. I will fix the behavior of fixup_slot_dispachers() and update_slot() as follows: Only the case where *) two or more slotdefs have the same attribute name where at most one corresponding slot has a non null pointer *) the descriptor type of the attribute is WrapperType, these functions will allocate the only one function to the apropriate slot. The other case, the behavior not changed to keep compatiblity! (in particular, considering the case where user override methods exist!) The following patch also includes speed up routines to find the slotdef duplications, but it's not essential! ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-22 22:47 Message: Logged In: YES user_id=6380 Is slot-1.dif the promised new patch? ---------------------------------------------------------------------- Comment By: Naofumi Honda (naofumi-h) Date: 2002-03-11 21:49 Message: Logged In: YES user_id=452575 I will post a new patch containing a essential part of previous one (i.e. without ifdef and almost all speed up routines). ---------------------------------------------------------------------- Comment By: Naofumi Honda (naofumi-h) Date: 2002-03-11 21:49 Message: Logged In: YES user_id=452575 I will post a new patch containing a essential part of previous one (i.e. without ifdef and almost all speed up routines). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-10 17:14 Message: Logged In: YES user_id=6380 Thanks for the analysis! Would you mind submitting a new patch without the #ifdef ORIGINAL_CODE stuff? Just delete/replace old code as needed -- cvs diff will show me the original code. The ORIGINAL_CODE stuff makes it harder for me to get the point of the diff. Also, maybe you could leave the speedup code out, to show the absolutely minimal amount of code needed. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 From noreply@sourceforge.net Sat Mar 23 08:40:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 00:40:39 -0800 Subject: [Patches] [ python-Patches-514662 ] On the update_slot() behavior Message-ID: Patches item #514662, was opened at 2002-02-08 04:49 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 6 Submitted By: Naofumi Honda (naofumi-h) Assigned to: Guido van Rossum (gvanrossum) Summary: On the update_slot() behavior Initial Comment: Inherited method __getitem__ of list type in the new subclass is unexpectedly slow. For example, x = list([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m2.390s class nlist(list): pass x = nlist([1,2,3]) r = xrange(1, 1000000) for i in r: x[1] = 2 ==> excution time: real 0m7.040s about 3times slower!!! The reason is: for the __getitem__ attribute, there are two slotdefs in typeobject.c (one for the mapping type, and the other for the sequence type). In the creation of new_type of list type, fixup_slot_dispatchers() and update_slot() functions in typeobject.c allocate the functions to both sq_item and mp_subscript slots (the mp_subscript slot had originally no function, because the list type is a sequence type), and it's an unexpected allocation for the mapping slot since the descriptor type of __getitem__ is now WrapperType for the sequence operations. If you will trace x[1] using gdb, you will find that in PyObject_GetItem() m->mp_subscript = slot_mp_subscript is called instead of a sequece operation because mp_subscript slot was allocated by fixup_slot_dispatchers(). In the slot_mp_subscirpt(), call_method(self, "__getitem__", ...) is invoked, and turn out to call a wrapper descriptors for the sq_item. As a result, the method of list type finally called, but it needs many unexpected function calls. I will fix the behavior of fixup_slot_dispachers() and update_slot() as follows: Only the case where *) two or more slotdefs have the same attribute name where at most one corresponding slot has a non null pointer *) the descriptor type of the attribute is WrapperType, these functions will allocate the only one function to the apropriate slot. The other case, the behavior not changed to keep compatiblity! (in particular, considering the case where user override methods exist!) The following patch also includes speed up routines to find the slotdef duplications, but it's not essential! ---------------------------------------------------------------------- >Comment By: Naofumi Honda (naofumi-h) Date: 2002-03-23 08:40 Message: Logged In: YES user_id=452575 Yes. slot-1.dif is a new version. At least, I purged ifdef ... as you want. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-23 03:47 Message: Logged In: YES user_id=6380 Is slot-1.dif the promised new patch? ---------------------------------------------------------------------- Comment By: Naofumi Honda (naofumi-h) Date: 2002-03-12 02:49 Message: Logged In: YES user_id=452575 I will post a new patch containing a essential part of previous one (i.e. without ifdef and almost all speed up routines). ---------------------------------------------------------------------- Comment By: Naofumi Honda (naofumi-h) Date: 2002-03-12 02:49 Message: Logged In: YES user_id=452575 I will post a new patch containing a essential part of previous one (i.e. without ifdef and almost all speed up routines). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-10 22:14 Message: Logged In: YES user_id=6380 Thanks for the analysis! Would you mind submitting a new patch without the #ifdef ORIGINAL_CODE stuff? Just delete/replace old code as needed -- cvs diff will show me the original code. The ORIGINAL_CODE stuff makes it harder for me to get the point of the diff. Also, maybe you could leave the speedup code out, to show the absolutely minimal amount of code needed. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514662&group_id=5470 From noreply@sourceforge.net Sat Mar 23 22:41:56 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 14:41:56 -0800 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 23:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 22:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Sat Mar 23 23:35:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 15:35:30 -0800 Subject: [Patches] [ python-Patches-479615 ] Fast-path for interned string compares Message-ID: Patches item #479615, was opened at 2001-11-08 15:19 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: M.-A. Lemburg (lemburg) Assigned to: M.-A. Lemburg (lemburg) Summary: Fast-path for interned string compares Initial Comment: This patch adds a fast-path for comparing equality of interned strings. The patch boosts performance for comparing identical string objects by some 20% on my machine while not causing any noticable slow-down for other operations (according to tests done with pybench). More infos and benchmarks later... ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 23:35 Message: Logged In: YES user_id=35752 Attached is an updated version of this patch. I'm -0 on it since it doesn't seem to help much except for artificial benchmarks. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2001-11-08 15:26 Message: Logged In: YES user_id=38388 Output from pybench comparing today's CVS Python with patch (eqpython) and without patch (stdpython): PYBENCH 1.0 Benchmark: eqpython.bench (rounds=10, warp=20) Tests: per run per oper. diff *) ------------------------------------------------------------------------ BuiltinFunctionCalls: 125.55 ms 0.98 us -1.68% BuiltinMethodLookup: 180.10 ms 0.34 us +1.75% CompareFloats: 107.30 ms 0.24 us +2.04% CompareFloatsIntegers: 185.15 ms 0.41 us -0.05% CompareIntegers: 163.50 ms 0.18 us -1.77% CompareInternedStrings: 79.50 ms 0.16 us -20.78% ^^^^^^^^^^^^^^^^^^^^ This is the interesting line :-) ^^^^^^^^^^^^^^^^^^^^^^^^^^ CompareLongs: 110.25 ms 0.24 us +0.09% CompareStrings: 143.40 ms 0.29 us +2.14% CompareUnicode: 118.00 ms 0.31 us +1.68% ConcatStrings: 189.55 ms 1.26 us -1.61% ConcatUnicode: 226.55 ms 1.51 us +1.34% CreateInstances: 202.35 ms 4.82 us -1.87% CreateStringsWithConcat: 221.00 ms 1.11 us +0.45% CreateUnicodeWithConcat: 240.00 ms 1.20 us +1.27% DictCreation: 213.25 ms 1.42 us +0.47% DictWithFloatKeys: 263.50 ms 0.44 us +1.15% DictWithIntegerKeys: 158.50 ms 0.26 us -1.86% DictWithStringKeys: 147.60 ms 0.25 us +0.75% ForLoops: 144.90 ms 14.49 us -4.64% IfThenElse: 174.15 ms 0.26 us -0.00% ListSlicing: 88.80 ms 25.37 us -1.11% NestedForLoops: 136.95 ms 0.39 us +3.01% NormalClassAttribute: 177.80 ms 0.30 us -2.68% NormalInstanceAttribute: 166.85 ms 0.28 us -0.54% PythonFunctionCalls: 152.20 ms 0.92 us +1.40% PythonMethodCalls: 133.70 ms 1.78 us +1.60% Recursion: 119.45 ms 9.56 us +0.04% SecondImport: 124.65 ms 4.99 us -6.03% SecondPackageImport: 130.70 ms 5.23 us -5.73% SecondSubmoduleImport: 161.65 ms 6.47 us -5.88% SimpleComplexArithmetic: 245.50 ms 1.12 us +2.08% SimpleDictManipulation: 108.50 ms 0.36 us +0.05% SimpleFloatArithmetic: 125.80 ms 0.23 us +0.84% SimpleIntFloatArithmetic: 128.50 ms 0.19 us -1.46% SimpleIntegerArithmetic: 128.45 ms 0.19 us -0.77% SimpleListManipulation: 159.15 ms 0.59 us -5.32% SimpleLongArithmetic: 189.55 ms 1.15 us +2.65% SmallLists: 293.70 ms 1.15 us -5.26% SmallTuples: 230.00 ms 0.96 us +0.44% SpecialClassAttribute: 175.70 ms 0.29 us -2.79% SpecialInstanceAttribute: 199.70 ms 0.33 us -1.55% StringMappings: 196.85 ms 1.56 us -2.48% StringPredicates: 133.00 ms 0.48 us -8.28% StringSlicing: 165.45 ms 0.95 us -3.47% TryExcept: 193.60 ms 0.13 us +0.57% TryRaiseExcept: 175.40 ms 11.69 us +0.69% TupleSlicing: 156.85 ms 1.49 us -0.00% UnicodeMappings: 175.90 ms 9.77 us +1.76% UnicodePredicates: 141.35 ms 0.63 us +0.78% UnicodeProperties: 184.35 ms 0.92 us -2.10% UnicodeSlicing: 179.45 ms 1.03 us -1.10% ------------------------------------------------------------------------ Average round time: 9855.00 ms -1.13% *) measured against: stdpython.bench (rounds=10, warp=20) As you can see, the rest of the results don't change much and the ones that do indicate some additional benefit gained by the patch. All slow-downs are way below the noise limit of around 5-10% (depending the platforms/machine/compiler). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=479615&group_id=5470 From noreply@sourceforge.net Sat Mar 23 23:45:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 15:45:41 -0800 Subject: [Patches] [ python-Patches-490026 ] Namespace selection for rlcompleter Message-ID: Patches item #490026, was opened at 2001-12-06 21:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490026&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Fernando Pйrez (fer_perez) Assigned to: Nobody/Anonymous (nobody) Summary: Namespace selection for rlcompleter Initial Comment: The standard rlcompleter is hardwired to work with __main__.__dict__. This is limiting, as one may have applications which execute in specially constructed 'sandboxed' namespaces. This patch extends rlcompleter with a constructor which provides an optional namespace specifier. This optional parameter defaults to __main__.__dict__, so the patch is 100% backwards compatible. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 23:45 Message: Logged In: YES user_id=35752 Looks good. Checked in with minor modifications as rlcompleter.py 1.10. ---------------------------------------------------------------------- Comment By: Fernando Pйrez (fer_perez) Date: 2001-12-11 18:44 Message: Logged In: YES user_id=395388 Updated with a one-line fix (a mistyped variable name). Deleted v2 of the patch with the typo. ---------------------------------------------------------------------- Comment By: Fernando Pйrez (fer_perez) Date: 2001-12-09 07:16 Message: Logged In: YES user_id=395388 I've uploaded a new version of the patch with those changes. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-09 03:32 Message: Logged In: YES user_id=6380 Yes, that's about right. ---------------------------------------------------------------------- Comment By: Fernando Pйrez (fer_perez) Date: 2001-12-09 02:53 Message: Logged In: YES user_id=395388 I could rewrite it to use instead a namespace=None in the constructor. If a namespace is given it will be used, otherwise at completion time a check will be made: if self.namespace is None: self.namespace=__main__.__dict__ This means an extra if in the completer, but would address your concern. Do you want me to do that? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-09 01:38 Message: Logged In: YES user_id=6380 Since this is obviously a new feature, I'll postpone this until after 2.2. One thing that worries me: you capture the identity of __main__.__dict__ early on in this patch. The original code uses whatever __main__.__dict__ at the time it is needed. ---------------------------------------------------------------------- Comment By: Fernando Pйrez (fer_perez) Date: 2001-12-08 18:39 Message: Logged In: YES user_id=395388 Oops, sorry. You can tell I've never used the system before. I put the file in, but I just didn't see the stupid extra checkbox. Lack of orthogonality in an interface is always a recipe for problems. Anyway, it should be ok now. Cheers, Fernando. PS. And the obvious, *THANKS* a lot for putting such a fantastic tool out. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-08 17:15 Message: Logged In: YES user_id=6380 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490026&group_id=5470 From noreply@sourceforge.net Sat Mar 23 23:51:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 15:51:27 -0800 Subject: [Patches] [ python-Patches-490374 ] make inspect.stack() work with PyShell Message-ID: Patches item #490374, was opened at 2001-12-07 19:57 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490374&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Jason Orendorff (jorend) Assigned to: Nobody/Anonymous (nobody) Summary: make inspect.stack() work with PyShell Initial Comment: I'm on Python 2.2b2 on Windows. Changed the 'inspect' module to use 'linecache' for loading source code. This is more efficient. Also, 'inspect' now can see the source code of stuff entered in the IDLE PyShell. E.g. In IDLE, type: >>> import inspect >>> inspect.stack()[0] Without the patch, the output would be like this: (, None, 1, '?', None, None) With this patch: (, '', 1, '?', ['inspect.stack()[0]'], 0) ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 23:51 Message: Logged In: YES user_id=35752 Checked in as inspect.py 1.29. ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2002-02-20 14:51 Message: Logged In: YES user_id=18139 >poke< ---------------------------------------------------------------------- Comment By: Jason Orendorff (jorend) Date: 2001-12-07 20:07 Message: Logged In: YES user_id=18139 I'm afraid it's definitely a feature. (sigh) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-07 20:04 Message: Logged In: YES user_id=6380 Assigned to Tim for review, since he knows inspect.py inside-out. :-) It's probably too late for 2.2, unless you can prove this is a bugfix and not a feature. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=490374&group_id=5470 From noreply@sourceforge.net Sun Mar 24 00:02:58 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 16:02:58 -0800 Subject: [Patches] [ python-Patches-491936 ] Opt for tok_nextc Message-ID: Patches item #491936, was opened at 2001-12-12 08:00 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=491936&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: David Jacobs (dbj) Assigned to: Nobody/Anonymous (nobody) Summary: Opt for tok_nextc Initial Comment: tokenizer.c - revision 2.53 I tried to pick a routine that looked like it was heavily used and optimizations that do not increase the maintenance burden (I wont feel bad if you reject it though, I'll keep on trying as long as you don't consider it a burden :-). I changed one strcpy to a memcpy because the length had already been computed. I also changed the pattern: a = strchr(b,'\0'); to a = b + strlen(b); Which is an idiom I've seen in many other places in the code so I don't think it makes it harder to understand and strlen is significantly more efficient than strchr. Aloha, David Jacobs (your pico optimizer :-) ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 00:02 Message: Logged In: YES user_id=35752 It _seems_ to give about a 2% speedup when running compileall.py on Lib. That's in the noise. I'm rejecting this patch. It's just not worth it. David, don't let this discourage you. Optimizing Python is hard since all the low hanging fruit has been picked by other people. I think replacing strncpy with strlcpy might yield better results. Look at bug 487703. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-09 12:00 Message: Logged In: YES user_id=21627 Can you report some data about the resulting speedup? I seriously doubt that this is a significant change; unless data is forthcoming proving me wrong, I recommend to reject this patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=491936&group_id=5470 From noreply@sourceforge.net Sun Mar 24 00:25:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 16:25:57 -0800 Subject: [Patches] [ python-Patches-489066 ] Include RLIM_INFINITY constant Message-ID: Patches item #489066, was opened at 2001-12-04 20:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470 Category: Modules Group: None Status: Open Resolution: Later Priority: 5 Submitted By: Eric Huss (ehuss) Assigned to: Jeremy Hylton (jhylton) Summary: Include RLIM_INFINITY constant Initial Comment: The following is a patch to the resource module to include the RLIM_INFINITY constant. It should handle platforms where RLIM_INFINITY is not a LONG_LONG, but I have no means to test that. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 00:25 Message: Logged In: YES user_id=35752 This doesn't seem to work on my Linux machine. RLIM_INFINITY is an unsigned long. It becomes -1L in the resource module. I'm attaching an updated patch that uses PyModule_AddObject and applies cleanly to the current CVS. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2001-12-13 20:43 Message: Logged In: YES user_id=31392 I'd rather see this go through a beta release where we can verify that it works for both the LONG_LONG and non-LONG_LONG cases. Among other things, it looks possible (though probably unlikely) that there are platforms that do not have long long and do not representation rlim_t as long. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-12 05:24 Message: Logged In: YES user_id=6380 Jeremy, please review and apply or reject (or postpone and lower priority). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470 From noreply@sourceforge.net Sun Mar 24 01:12:13 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 17:12:13 -0800 Subject: [Patches] [ python-Patches-494066 ] Access to readline history elements Message-ID: Patches item #494066, was opened at 2001-12-17 04:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494066&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Chuck Blake (cblake) Assigned to: Nobody/Anonymous (nobody) Summary: Access to readline history elements Initial Comment: The current readlinemodule.c has a relatively minimal wrapper around the functionality of the GNU readline and history libraries. That may be fine, and since some try to use libeditline instead it may be the best. However the current module does not enable any access from within Python to the libreadline maintained list of input lines. The ideal thing would be to actually export that dynamically maintained C list as a Python object. In lieu of that more complex change, my patch simply adds very simple history_get() and history_len() methods. This is the least one needs to access the list. I'm pretty sure the library functions go waaaay back, probably to the merger of the history and readline libraries. This patch also adds one final little ingredient which is rl_redisplay() in the wrapper for rl_insert_text(). Without this the user cannot see the inserted text until they type another character, which seems pretty undesirable. Together these two updates allow the regular Unix readline-enabled shell to perform "auto indentation". I.e., inserting into the edit buffer the leading whitespace from the preceding non-result producing line. Since it can be editted, one can just backspace a couple times to reverse the autoindent. This makes the basic readline-enabled read-eval-print loop substantially more pleasant. I can provide an example PYTHONSTARTUP file that shows how to use it. Only a tiny 8 line or so pre_input_hook is needed and a slightly smart sys.ps1 or sys.ps2 object that communicates via a variable to our hook function whether or not the parser is expecting more input. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:12 Message: Logged In: YES user_id=35752 Checked in as readline 2.45. I renamed the functions to get_history_item and get_current_history_length. The last one's a bit unwieldly but hopefully clear. get_history_length really should have been called get_max_history_length. Too late for that unfortunately. I also added a redisplay function instead of adding the redisplay call to insert_text. ---------------------------------------------------------------------- Comment By: Chuck Blake (cblake) Date: 2001-12-17 20:41 Message: Logged In: YES user_id=403855 Sounds quite reasonable. Having a nice readline completer and history matching interface is pretty cool when you're using the shell over a network where remote X windows would be painful. It's been a very useful interface for a while, and likely will be for the forseeable future. When I get a chance I'll work on seeing what parts of readline have been around for a very long time (e.g. since readline 2.0 or so) and try to wrap the basically available features more intelligently with Python objects, e.g. a tuple or list for command input history. Hopefully not too much will need to be conditionalized on readline versions. A lot of added functionality could be written trivially in Python if there is access to the library structures and exporting of hook/event type functions. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-17 20:13 Message: Logged In: YES user_id=6380 OK, in the sake of stability, let's not do any of this in 2.2 then. Sounds like there are plenty of things we *could* do. I'm not against expanding the readline module -- but I don't have a use for it in mind myself. For fancy editing I much prefer IDLE's command line editor, since it lets you edit an entire multi-line command as a single unit, than on a per-line basis as readline does... ---------------------------------------------------------------------- Comment By: Chuck Blake (cblake) Date: 2001-12-17 20:00 Message: Logged In: YES user_id=403855 I have something this in my ~/.py/rc.py (STARTUP file). The just_did_a_result var is also maintained by sys.ps1. def auto_indent(): global just_did_a_result if just_did_a_result: just_did_a_result = 0 return last = readline.history_get(readline.history_len()) spc = len(last) - len(last.lstrip()) if spc > 0: readline.insert_text(last[ : spc]) readline.set_pre_input_hook(auto_indent) I don't know if you have a system where set_pre_input_hook is available. Unless you have access to the history or at least the very last input line from within Python, then it doesn't seem very useful. That is because there is no way for your input_hook to know when/what it should stuff text into your command buffer. The redisplay() is innocuous when it happens to be unnecessary, so it shouldn't be very objecionable. It's an interactive prompt so hyper-optimization isn't very important or noticeable. Even on a slow terminal it is only a few characters in one command prompt being re-drawn. If it is really an issue, though, then an alternative to adding my redisplay() fix would be to export another function from readline to Python, namely rl_redisplay(). Anyone's Python code could then just call it as necessary. Longer term, it seems like an awful lot more libhistory and libreadline functionality could profitably be included in the readline module. That's surely a 2.3 or later change, but the exporting of rl_redisplay() might be a closer step in that direction. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-17 19:40 Message: Logged In: YES user_id=6380 Hm, I was going to see if the insert_text fix was a simple enough fix to apply to 2.2, but I don't have an example of where this is needed. If I call it from the startup hook the text I insert is already being displayed. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494066&group_id=5470 From noreply@sourceforge.net Sun Mar 24 01:27:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 17:27:32 -0800 Subject: [Patches] [ python-Patches-494871 ] test exceptions in various types/methods Message-ID: Patches item #494871, was opened at 2001-12-19 02:16 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494871&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: test exceptions in various types/methods Initial Comment: Add a bunch of tests for various methods, including numeric stuff like: float('') float('5\0') 5.0 / 0.0 5.0 // 0.0 5.0 % 0.0 5 << -5 sequence stuff like: ()[0] x += () [].pop() [].extend(None) {}.values() {}.items() not sure if buffer stuff should go here. if so, need to update X.X.X to be a real number, not sure if there is any correlation of the numbers or should the next available be used (6.7) ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:27 Message: Logged In: YES user_id=35752 Checked in as test_types.py 1.26. I left out the section number for "Buffers". Having section numbers in the testing output seems insane to me. What if a section is added to the documentation? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2001-12-28 23:27 Message: Logged In: YES user_id=33168 I didn't see buffers mentioned in section 2.2 at all. The buffer() function is mentioned in 2.1. Perhaps the buffer tests should be moved into a test of their own? There appear to be very few uses of buffer throughout the tests. Also, I saw in test_StringIO.py that jython doesn't have buffers, so the whole test should be skipped/pass for jython it seems (see lines 79-80). Other than the buffer change in the patch, the other tests should be in the appropriate location. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2001-12-28 22:19 Message: Logged In: YES user_id=21627 The numbers are the section numbers of the documentation, of what is now section 2.2 (dunno in what release and document this was section 6). I also don't know how useful it is to keep the numbering, however, if you easily can, please re-organize your tests to fit into the most appropriate sections. Optionally, you a) may want to check that the things you are testing are really mentioned in the section, and b) may want to update the tests to the current section numbers. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=494871&group_id=5470 From noreply@sourceforge.net Sun Mar 24 01:39:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 17:39:35 -0800 Subject: [Patches] [ python-Patches-497097 ] location of mbox Message-ID: Patches item #497097, was opened at 2001-12-27 18:14 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497097&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Matthias Klose (doko) Assigned to: Nobody/Anonymous (nobody) Summary: location of mbox Initial Comment: Most mail spools now are under /var, so this seems to be a better default. --- python2.1-2.1.1.orig/Lib/mailbox.py +++ python2.1-2.1.1/Lib/mailbox.py @@ -267,7 +267,7 @@ if mbox[:1] == '+': mbox = os.environ['HOME'] + '/Mail/' + mbox [1:] elif not '/' in mbox: - mbox = '/usr/mail/' + mbox + mbox = '/var/mail/' + mbox if os.path.isdir(mbox): if os.path.isdir(os.path.join(mbox, 'cur')): mb = Maildir(mbox) ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:39 Message: Logged In: YES user_id=35752 I don't know why you care since that code is inside the _test() function. Fixed in mailbox.py 1.35 anyhow. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497097&group_id=5470 From noreply@sourceforge.net Sun Mar 24 01:42:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 17:42:00 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-30 01:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Barry Warsaw (bwarsaw) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-30 02:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Sun Mar 24 01:54:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 17:54:36 -0800 Subject: [Patches] [ python-Patches-501713 ] compileall.py -d errors Message-ID: Patches item #501713, was opened at 2002-01-10 10:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=501713&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Bastian Kleineidam (calvin) >Assigned to: Guido van Rossum (gvanrossum) Summary: compileall.py -d errors Initial Comment: the option -d is not handled properly, the compileall.py script generates files in the wrong directory. Patch is for Python 2.1.1. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:54 Message: Logged In: YES user_id=35752 Attached is an updated version of the patch that cleanly applies to the current CVS tree. I can't figure out what the -d option is supposed to do however. The documentation says "-d destdir: purported directory name for error messages if no directory arguments, -l sys.path is assumed". What does that mean? Assigning to Guido since it looks like he added the -d option. ---------------------------------------------------------------------- Comment By: Bastian Kleineidam (calvin) Date: 2002-01-17 16:49 Message: Logged In: YES user_id=9205 I updated the patch to correct the case where dfile is None ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=501713&group_id=5470 From noreply@sourceforge.net Sun Mar 24 01:57:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 17:57:18 -0800 Subject: [Patches] [ python-Patches-502415 ] optimize attribute lookups Message-ID: Patches item #502415, was opened at 2002-01-11 18:07 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Zooko O'Whielacronx (zooko) Assigned to: Nobody/Anonymous (nobody) Summary: optimize attribute lookups Initial Comment: This patch optimizes the string comparisons in class_getattr(), class_setattr(), instance_getattr1(), and instance_setattr(). I pulled out the relevant section of class_setattr() and measured its performance, yielding the following results: * in the case that the argument does *not* begin with "__", then the new version is 1.03 times as fast as the old. (This is a mystery to me, as the path through the code looks the same, in C. I examined the assembly that GCC v3.0.3 generated in -O3 mode, and it is true that the assembly for the new version is smaller/faster, although I don't really understand why.) * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "X_" (where X is a random alphabetic character), then the new version 1.12 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and does *not* end with "_", then the new version 1.16 times as fast as the old. * in the case that the argument is (randomly) one of the six special names, then the new version is 2.7 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "__" (but is not one of the six special names), then the new version is 3.7 times as fast as the old. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:57 Message: Logged In: YES user_id=35752 Based on the complexity added by the patch I would say at least a 5% speedup would be needed to offset the maintainence cost. -1 on the current patch. ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-03-14 16:24 Message: Logged In: YES user_id=52562 update: I did a real app benchmark of this patch by running one of the unit tests from PyXML-0.6.6. (Which one? The one that I guessed would favor my optimization the most. Unfortunately I've lost my notes and I don't remember which one.) I also separated out the "unroll strcmp" optimization from the "use macros" optimization on request. I have lost my notes, but I recall that my results showed what I expected: between 0.5 and 3 percent app-level speed-up for the unroll strcmp optimization. Interesting detail: a quirk in GCC 3 makes the unroll strcmp version is slightly faster than the current strcmp version *even* in the (common) case that the first two characters of the attribute name are *not* '__'. What should happen next: 1. Someone who has the authority to approve or reject this patch should tell me what kind of benchmark would be persuasive to you. I mean: what specific program I can run with and without my patch for a useful comparison. (If you require more than a 5% app-level speed-up, then let's give up on this patch now!) 2. Someone volunteer to test this patch with MSFT compiler, as I don't have one right now. Some people are still using the Windows platform, I've noticed [1], so it is worth benchmarking. Actually, someone should volunteer to benchmark GCC+Linux-or-MacOSX, too, as my computer is a laptop with variable-speed CPU and is really crummy for benchmarking. By the way, PEP 266 is a better solution to the problem but until it's implemented, this patch is the better patch. ;-) Note: this is one of those patches that looks uglier in "diff -u" format than in actual source code. Please browse the actual source side-by-side [2] to see how ugly it really is. Regards Zooko [1] http://www.google.com/press/zeitgeist/jan02-pie.gif [2] search for "class_getattr" in: http://zooko.com/classobject.c http://zooko.com/classobject-strcmpunroll.c --- zooko.com Security and Distributed Systems Engineering --- ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-01-18 00:22 Message: Logged In: YES user_id=52562 Okay I've done some "mini benchmarks". The earlier reported micro-benchmarks were the result of running the inner loop itself, in C. These mini benchmarks are the result of running this Python script: class A: def __init__(self): self.a = 0 a = A() for i in xrange(2**20): a.a = i print a.a and then using different attribute names in place of `a'. The results are as expected: the optimized version is faster than the current one, depending on the shape of the attribute name, and dampened by the fact that there is now other work being done. The case that shows the smallest difference is when the attribute name neither begins nor ends with an '_'. In that case the above script runs about 2% faster with the optimizations. The case that shows the biggest difference is when the attribute begins and ends with '__', as in `__a__'. Then the above script runs about 15% faster. This still isn't a *real* application benchmark. I'm looking for one that is a reasonable case for real Python users but that also uses attribute lookups heavily. ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-01-17 20:33 Message: Logged In: YES user_id=52562 Yeah, the optimized version is less readable that the original. I'll try to come up with a benchmark application. Any ideas? Maybe some unit tests from Zope that use attribute lookups heavily? My guess is that the actual results in an application will be "marginal", like maybe between 0.5% to 3% improvement. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-17 18:29 Message: Logged In: YES user_id=31392 This seems to add a lot of complexity for a few special cases. How important are these particular attributes? Do you have any benchmark applications that show real improvement? It seems like microbenchmarks overstate the benefit, since we don't know how often these attributes are looked up by most applications. It would also be interesting to see how much of the benefit for non __ names is the result of the PyString_AS_STRING() macro. Maybe that's all the change we really need :-). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470 From noreply@sourceforge.net Sun Mar 24 02:01:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 18:01:10 -0800 Subject: [Patches] [ python-Patches-504889 ] make setup.py less chatty by default Message-ID: Patches item #504889, was opened at 2002-01-17 15:02 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504889&group_id=5470 Category: Distutils and setup.py Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Jeremy Hylton (jhylton) Assigned to: Nobody/Anonymous (nobody) Summary: make setup.py less chatty by default Initial Comment: I don't like the amount of output that setup.py produces by default, and I don't like the way that -q and -v affect the amount of output. In general, I want setup.py to tell me what it is doing and not what it is skippping. It's fine to say nothing with -q, but it shouldn't say more without -v. The attached patch is a bit of a kludge, but I'm not familiar enough with distutils to do any better. One problem is that -v/--verbose has previously handled as a flag, either on or off. (There is a curiously large amount of code that compares this boolean to see if it's greater than some number!) I had the options processor to treat self.verbose as a count of -v options. So -vv is more verbose than -v. Then I change the specific prints and announcements that I've seen with setup.py that I didn't want to see. The messages I don't want to see (unless verbose is high) are about skipping builds of Extensions and not copying files that are already up-to-date. With this patch in place, setup.py tells me only the extensions is actually builds and files it actually copies. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 02:01 Message: Logged In: YES user_id=35752 I would prefer it if setup.py would only print what it's compiling and not what it's skipping. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-18 14:53 Message: Logged In: YES user_id=31392 Good suggestion. I hadn't planned to change anything, but wanted to capture the feature request and share the code. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-01-18 09:05 Message: Logged In: YES user_id=38388 Jeremy, if that's what you want you should at least post to the distutils list before going ahead and change things. E.g. I can't see why "skip" notices are any less important than "building..." notices: they tell you that distutils has found some components up-to-date and that may sometimes not be what you'd really expect. We should first discuss, what distutils developers want as default and then go ahead and fixup distutils to meet those demands. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-17 18:25 Message: Logged In: YES user_id=31392 MAL, I really want to change distutils not Python's setup.py. I use distutils for all sorts of projects and the default chattiness is always a nuisance. When I'm doing development, I invariable have to wade through hundreds of lines of useless output to find the one or two lines that confirm a change was made. You could still get the skip notices for your stuff, you'd just have to run in extra verbose mode. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-17 18:17 Message: Logged In: YES user_id=31392 If I had to guess, I'd say cleaning up and rationalizing the use of self.verbose and print vs self.announce() vs the other methods that print things would teach you a lot about the internals. Hey, and reformat the code while you're at it . ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2002-01-17 18:17 Message: Logged In: YES user_id=38388 Jeremy, the patch touches the distutils code, but what you really want is to change the behaviour in one single use-case (the setup.py which Python uses). The "right" way to fix this would be to subclass the various distutils classes to implement the change. If this becomes too complicated, then distutils ought to be tweaked to make this easier in way that doesn't break existing code (e.g. I don't want to miss the skip notices for my stuff). ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-01-17 17:25 Message: Logged In: YES user_id=6656 You're not wrong :| The "assert 0" is on the install path though. Right. I'm currently fighting emacs to let me print source duplex, but I want to understand distutils' innards at some point, might as well be now. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-17 16:50 Message: Logged In: YES user_id=31392 The distutils package is a maze of twisty little passages that all look the same . I added an assert 0 to make sure that the execution path that generated the output wasn't the one with the assert 0. (It wasn't.) Didn't intend for the patch to make it in. But I'd still be surprised if this patch is the right thing. More likely that it demonstrates good behavior that could be implemented more cleanly. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-01-17 16:45 Message: Logged In: YES user_id=6656 Hokay, next question: why the "assert 0" in cmd.py? Are you sure you've finished? ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-17 16:32 Message: Logged In: YES user_id=31392 Er, context diff. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-01-17 15:49 Message: Logged In: YES user_id=6656 Um, context diff? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=504889&group_id=5470 From noreply@sourceforge.net Sun Mar 24 02:04:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 18:04:54 -0800 Subject: [Patches] [ python-Patches-514997 ] remove extra SET_LINENOs Message-ID: Patches item #514997, was opened at 2002-02-08 21:22 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470 Category: Parser/Compiler Group: None Status: Open Resolution: None Priority: 3 Submitted By: Neal Norwitz (nnorwitz) >Assigned to: Neil Schemenauer (nascheme) Summary: remove extra SET_LINENOs Initial Comment: This patch removes consecutive SET_LINENOs. The patch fixes test_hotspot, but does not fix a failure in inspect. I wasn't sure what was the problem was or why SET_LINENO would matter for inspect. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 22:42 Message: Logged In: YES user_id=6380 Can you find someone interested in answering the inspect question? Otherwise this patch is stalled... ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470 From noreply@sourceforge.net Sun Mar 24 02:05:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 18:05:27 -0800 Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput Message-ID: Patches item #516297, was opened at 2002-02-12 03:56 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) >Assigned to: Neil Schemenauer (nascheme) Summary: iterator for lineinput Initial Comment: Taking the route of least evasiveness, I have come up with a VERY simple iterator interface for fileinput. Basically, __iter__() returns self and next() calls __getitem__() with the proper number. This was done to have the patch only add methods and not change any existing ones, thus minimizing any chance of breaking existing code. Now the module on the whole, however, could possibly stand an update now that generators are coming. I have a recipe up at the Cookbook that uses generators to implement fileinput w/o in-place editing (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506). If there is enough interest, I would be quite willing to rewrite fileinput using generators. And if some of the unneeded methods could be deprecated (__getitem__, readline), then the whole module could probably be cleaned up a decent amount and have a possible speed improvement. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 From noreply@sourceforge.net Sun Mar 24 02:06:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 18:06:00 -0800 Subject: [Patches] [ python-Patches-522587 ] Fixes pydoc http/ftp URL matching Message-ID: Patches item #522587, was opened at 2002-02-25 18:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522587&group_id=5470 Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brian Quinlan (bquinlan) >Assigned to: Neil Schemenauer (nascheme) Summary: Fixes pydoc http/ftp URL matching Initial Comment: The current URL matching pattern used by pydoc only excludes whitespace. My patch also excludes the following characters: ' & " - excludes the quotes in: < & > - As stated in RFC-1738: """The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text""" We don't want to include the delimeters as part of the URL. And including unescaped "<" in an attribute value is not legal markup. Also, remove the word boundary requirement for http/ftp URIs because otherwise the "/" would not be included in the following URL: "http://www.python.org/" Attached is the patch and some simple test code. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522587&group_id=5470 From noreply@sourceforge.net Sun Mar 24 02:07:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 18:07:05 -0800 Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip) Message-ID: Patches item #533482, was opened at 2002-03-22 08:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Todd Warner (icode) >Assigned to: Neil Schemenauer (nascheme) Summary: small seek tweak upon reads (gzip) Initial Comment: Upon actual read of a gzipped file, there is a check to see if you are already at the end of the file. This is done by saving your position, seeking to the end, and comparing that tell(). It is more efficient to simply increment position + 1. Efficiency gain is nearly insignificant, but this patch will greatly decrease the size of my next one. :) NOTE: all version of gzip.py do this. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 From noreply@sourceforge.net Sun Mar 24 05:29:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 23 Mar 2002 21:29:19 -0800 Subject: [Patches] [ python-Patches-514997 ] remove extra SET_LINENOs Message-ID: Patches item #514997, was opened at 2002-02-08 16:22 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470 Category: Parser/Compiler Group: None Status: Open Resolution: None Priority: 3 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Neil Schemenauer (nascheme) Summary: remove extra SET_LINENOs Initial Comment: This patch removes consecutive SET_LINENOs. The patch fixes test_hotspot, but does not fix a failure in inspect. I wasn't sure what was the problem was or why SET_LINENO would matter for inspect. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-24 00:29 Message: Logged In: YES user_id=31435 Neal, do you have your editor set to insert spaces instead of tabs, and to consider "a tab" to be four spaces? Guido wrote this file using hard tabs considered as 8-space gimmicks, and the after-patch code is kinda gruesome due to the mixture of indentation styles. Second, why do you think a hard-coded 0xffff is something interesting for line numbers? Or are you just giving up when line numbers are >= 2**16? The code is mysterious here and needs a comment. It's probably not good to leave the code in a state where adjacent SET_LINENOs are collapsed if and only if the line numbers "aren't big" (then code using line numbers can't guess whether they are or aren't collapsed without duplicating the same lumpy logic). Third, c_lnotab is extremely delicate, historically subject to miserable rare bugs. If you've read the long comment block explaining it near the top of this file, I'd appreciate an argument (in code comments more than here ) for why just mucking with the last pair in a sequence of offset pairs can't break the subtle correctness property explained in the comment block. Finally, it's definitely worth tracking down why test_inspect fails: that test is difficult to understand, but the bottom line is that it's provoking an exception traceback and asserting that the computed line numbers correspond to the actual lines that are failing. The failing case provokes a three-frame traceback, and 2 of the 3 line numbers are wrong after the patch (the first is off by 1, and the third is off by 3; the frame in the middle gets the right line number). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:42 Message: Logged In: YES user_id=6380 Can you find someone interested in answering the inspect question? Otherwise this patch is stalled... ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470 From gohotel@gotohotel.com.tw Sun Mar 24 06:11:43 2002 From: gohotel@gotohotel.com.tw (Ґш·~®a¤j¶є©±(ҐxЖW.Ґx¤¤)) Date: Sun, 24 Mar 2002 14:11:43 +0800 Subject: [Patches] ­»ґд¶g-ґ_¬Ўё`ЇSґf¬Ў°К Message-ID: home-test
     ­»ґд¶g-ґ_¬Ўё`ЇSґf¬Ў°К      
Ў@
ёgҐС¦№ Emaill §iЄѕ±zёФІУ¤є®e
Ў@
ҐxЖW.Ґx¤¤
Your best choice for accommodation in Taichung.
¤@¦ёєЎЁ¬     ¦н±J.¦YіЬ.Є±јЦ.БКЄ«     ¤@¦ёєЎЁ¬
Ў@
Ґш·~®a¤j¶є©±  Вщ¬P¤j¶є©±
Ў@
¦h¦ёєaАтҐx¤¤Ґ«АuЁ}®ИА]µыЕІІД¤@¦W
Ў@
­Y±z¤Ј·Q¦A¦¬Ём¦№¬Ў°К
ЅРҐС gohotel@gotohotel.com.tw §iЄѕ§Ъ­М  БВБВ

Ў@

From noreply@sourceforge.net Sun Mar 24 12:02:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 04:02:48 -0800 Subject: [Patches] [ python-Patches-489066 ] Include RLIM_INFINITY constant Message-ID: Patches item #489066, was opened at 2001-12-04 15:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470 Category: Modules >Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Eric Huss (ehuss) >Assigned to: Neil Schemenauer (nascheme) Summary: Include RLIM_INFINITY constant Initial Comment: The following is a patch to the resource module to include the RLIM_INFINITY constant. It should handle platforms where RLIM_INFINITY is not a LONG_LONG, but I have no means to test that. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:02 Message: Logged In: YES user_id=6380 Comments: (1) RLIM_INFINITY is used unconditionally elsewhere in the module, so the #ifdef is unnecessary. (2) The extra #if/#endif around the closing curly is ugly. I'd avoid this by moving the corresponding opening curly outside the first block. (3) resource.RLIM_INFINITY is -1 on my system too. But does that matter? This is just a symbolic constant to be used to set limits to infinit, and if it happens to be -1, who cares? It's got 32 1-bits, which is what counts. So I'd accept it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 19:25 Message: Logged In: YES user_id=35752 This doesn't seem to work on my Linux machine. RLIM_INFINITY is an unsigned long. It becomes -1L in the resource module. I'm attaching an updated patch that uses PyModule_AddObject and applies cleanly to the current CVS. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2001-12-13 15:43 Message: Logged In: YES user_id=31392 I'd rather see this go through a beta release where we can verify that it works for both the LONG_LONG and non-LONG_LONG cases. Among other things, it looks possible (though probably unlikely) that there are platforms that do not have long long and do not representation rlim_t as long. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-12 00:24 Message: Logged In: YES user_id=6380 Jeremy, please review and apply or reject (or postpone and lower priority). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470 From noreply@sourceforge.net Sun Mar 24 12:06:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 04:06:19 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open >Resolution: Accepted Priority: 5 Submitted By: Eduardo Pйrez (eperez) >Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Sun Mar 24 12:17:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 04:17:10 -0800 Subject: [Patches] [ python-Patches-501713 ] compileall.py -d errors Message-ID: Patches item #501713, was opened at 2002-01-10 05:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=501713&group_id=5470 >Category: Library (Lib) Group: None >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Bastian Kleineidam (calvin) Assigned to: Guido van Rossum (gvanrossum) Summary: compileall.py -d errors Initial Comment: the option -d is not handled properly, the compileall.py script generates files in the wrong directory. Patch is for Python 2.1.1. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:17 Message: Logged In: YES user_id=6380 Good question. The patch is bogus, it turns out! Bastian didn't understand -d either. The patch changes the semantics of the -d option. What -d is *supposed* to do (and what it does without the patch) is to lie about the filename embedded in code objects. I think the use case is a setup Bill Janssen at Xerox PARC described: they mount a shared lib directory as e.g. /shared/local/lib/python2.2/, which is read-only; there's a different pathname for it that's only accessible on the server machine, e.g. /writable/local/lib/python2.2/. When compiling the modules, they write the .pyc and .pyo files in the /writable/ mounted filesystem, but they want the co_filename attribute of the code to start with /shared/. The -d option lets them do this by saying compileall -d /shared/local/lib/python2.2/ /writable/local/lib/python2.2/ Bastian's patch changes the -d option to make te -d argument the destination where the .pyc files are written, which would defeat the purpose. Bastian, if you want a way to change the destination directory (which would be a useful feature too), please submit a new patch. The -o option seems to make sense to specify the output directory. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:54 Message: Logged In: YES user_id=35752 Attached is an updated version of the patch that cleanly applies to the current CVS tree. I can't figure out what the -d option is supposed to do however. The documentation says "-d destdir: purported directory name for error messages if no directory arguments, -l sys.path is assumed". What does that mean? Assigning to Guido since it looks like he added the -d option. ---------------------------------------------------------------------- Comment By: Bastian Kleineidam (calvin) Date: 2002-01-17 11:49 Message: Logged In: YES user_id=9205 I updated the patch to correct the case where dfile is None ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=501713&group_id=5470 From noreply@sourceforge.net Sun Mar 24 13:52:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 05:52:25 -0800 Subject: [Patches] [ python-Patches-534304 ] PEP 263 phase 2 Implementation Message-ID: Patches item #534304, was opened at 2002-03-24 22:52 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470 Category: Parser/Compiler Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: SUZUKI Hisao (suzuki_hisao) Assigned to: Nobody/Anonymous (nobody) Summary: PEP 263 phase 2 Implementation Initial Comment: This is a sample implementation of PEP 263 phase 2. This implementation behaves just as normal Python does if no other coding hints are given. Thus it does not hurt anyone who uses Python now. Note that it is strictly compatible with the PEP in that every program valid in the PEP is also valid in this implementation. This implementation also accepts files in UTF-16 with BOM. They are read as UTF-8 internally. Please try "utf16sample.py" included. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470 From noreply@sourceforge.net Sun Mar 24 15:12:25 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 07:12:25 -0800 Subject: [Patches] [ python-Patches-502415 ] optimize attribute lookups Message-ID: Patches item #502415, was opened at 2002-01-11 18:07 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Zooko O'Whielacronx (zooko) Assigned to: Nobody/Anonymous (nobody) Summary: optimize attribute lookups Initial Comment: This patch optimizes the string comparisons in class_getattr(), class_setattr(), instance_getattr1(), and instance_setattr(). I pulled out the relevant section of class_setattr() and measured its performance, yielding the following results: * in the case that the argument does *not* begin with "__", then the new version is 1.03 times as fast as the old. (This is a mystery to me, as the path through the code looks the same, in C. I examined the assembly that GCC v3.0.3 generated in -O3 mode, and it is true that the assembly for the new version is smaller/faster, although I don't really understand why.) * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "X_" (where X is a random alphabetic character), then the new version 1.12 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and does *not* end with "_", then the new version 1.16 times as fast as the old. * in the case that the argument is (randomly) one of the six special names, then the new version is 2.7 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "__" (but is not one of the six special names), then the new version is 3.7 times as fast as the old. ---------------------------------------------------------------------- >Comment By: Zooko O'Whielacronx (zooko) Date: 2002-03-24 15:12 Message: Logged In: YES user_id=52562 Okay, I just want to double-check these two points: 1. You did look at the actual resulting source code and not just the patch, right? Here's a side-by-side: http://zooko.com/temp.html 2. You realize that my promise that the actual speedup is < 5% is in a realistic application-level benchmark. For microbenchmarks, the speed-up is various but generally much higher than 5%, as described in this patch tracker entry. Given these two facts, then please reject this patch and spend your time on the new cached attribute lookups architecture instead. ;-) Regards, Zooko ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:57 Message: Logged In: YES user_id=35752 Based on the complexity added by the patch I would say at least a 5% speedup would be needed to offset the maintainence cost. -1 on the current patch. ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-03-14 16:24 Message: Logged In: YES user_id=52562 update: I did a real app benchmark of this patch by running one of the unit tests from PyXML-0.6.6. (Which one? The one that I guessed would favor my optimization the most. Unfortunately I've lost my notes and I don't remember which one.) I also separated out the "unroll strcmp" optimization from the "use macros" optimization on request. I have lost my notes, but I recall that my results showed what I expected: between 0.5 and 3 percent app-level speed-up for the unroll strcmp optimization. Interesting detail: a quirk in GCC 3 makes the unroll strcmp version is slightly faster than the current strcmp version *even* in the (common) case that the first two characters of the attribute name are *not* '__'. What should happen next: 1. Someone who has the authority to approve or reject this patch should tell me what kind of benchmark would be persuasive to you. I mean: what specific program I can run with and without my patch for a useful comparison. (If you require more than a 5% app-level speed-up, then let's give up on this patch now!) 2. Someone volunteer to test this patch with MSFT compiler, as I don't have one right now. Some people are still using the Windows platform, I've noticed [1], so it is worth benchmarking. Actually, someone should volunteer to benchmark GCC+Linux-or-MacOSX, too, as my computer is a laptop with variable-speed CPU and is really crummy for benchmarking. By the way, PEP 266 is a better solution to the problem but until it's implemented, this patch is the better patch. ;-) Note: this is one of those patches that looks uglier in "diff -u" format than in actual source code. Please browse the actual source side-by-side [2] to see how ugly it really is. Regards Zooko [1] http://www.google.com/press/zeitgeist/jan02-pie.gif [2] search for "class_getattr" in: http://zooko.com/classobject.c http://zooko.com/classobject-strcmpunroll.c --- zooko.com Security and Distributed Systems Engineering --- ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-01-18 00:22 Message: Logged In: YES user_id=52562 Okay I've done some "mini benchmarks". The earlier reported micro-benchmarks were the result of running the inner loop itself, in C. These mini benchmarks are the result of running this Python script: class A: def __init__(self): self.a = 0 a = A() for i in xrange(2**20): a.a = i print a.a and then using different attribute names in place of `a'. The results are as expected: the optimized version is faster than the current one, depending on the shape of the attribute name, and dampened by the fact that there is now other work being done. The case that shows the smallest difference is when the attribute name neither begins nor ends with an '_'. In that case the above script runs about 2% faster with the optimizations. The case that shows the biggest difference is when the attribute begins and ends with '__', as in `__a__'. Then the above script runs about 15% faster. This still isn't a *real* application benchmark. I'm looking for one that is a reasonable case for real Python users but that also uses attribute lookups heavily. ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-01-17 20:33 Message: Logged In: YES user_id=52562 Yeah, the optimized version is less readable that the original. I'll try to come up with a benchmark application. Any ideas? Maybe some unit tests from Zope that use attribute lookups heavily? My guess is that the actual results in an application will be "marginal", like maybe between 0.5% to 3% improvement. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-17 18:29 Message: Logged In: YES user_id=31392 This seems to add a lot of complexity for a few special cases. How important are these particular attributes? Do you have any benchmark applications that show real improvement? It seems like microbenchmarks overstate the benefit, since we don't know how often these attributes are looked up by most applications. It would also be interesting to see how much of the benefit for non __ names is the result of the PyString_AS_STRING() macro. Maybe that's all the change we really need :-). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470 From noreply@sourceforge.net Sun Mar 24 15:37:18 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 07:37:18 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-30 01:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 12:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-30 02:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Sun Mar 24 18:25:14 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 10:25:14 -0800 Subject: [Patches] [ python-Patches-502415 ] optimize attribute lookups Message-ID: Patches item #502415, was opened at 2002-01-11 18:07 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470 Category: Core (C code) Group: Python 2.3 >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Zooko O'Whielacronx (zooko) Assigned to: Nobody/Anonymous (nobody) Summary: optimize attribute lookups Initial Comment: This patch optimizes the string comparisons in class_getattr(), class_setattr(), instance_getattr1(), and instance_setattr(). I pulled out the relevant section of class_setattr() and measured its performance, yielding the following results: * in the case that the argument does *not* begin with "__", then the new version is 1.03 times as fast as the old. (This is a mystery to me, as the path through the code looks the same, in C. I examined the assembly that GCC v3.0.3 generated in -O3 mode, and it is true that the assembly for the new version is smaller/faster, although I don't really understand why.) * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "X_" (where X is a random alphabetic character), then the new version 1.12 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and does *not* end with "_", then the new version 1.16 times as fast as the old. * in the case that the argument is (randomly) one of the six special names, then the new version is 2.7 times as fast as the old. * in the case that the argument is a string of random length between 1 and 19 inclusive, and it begins with "__" and ends with "__" (but is not one of the six special names), then the new version is 3.7 times as fast as the old. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 18:25 Message: Logged In: YES user_id=35752 I've played with your patch for about 2 hours today. I benchmarked it, tried to clean it up using macros or inlined functions. I also tried a varation that exploited the fact that most names were interned strings. It's not worth it. Spend time on rattlesnake, pysco, or the namespace optimizations. ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-03-24 15:12 Message: Logged In: YES user_id=52562 Okay, I just want to double-check these two points: 1. You did look at the actual resulting source code and not just the patch, right? Here's a side-by-side: http://zooko.com/temp.html 2. You realize that my promise that the actual speedup is < 5% is in a realistic application-level benchmark. For microbenchmarks, the speed-up is various but generally much higher than 5%, as described in this patch tracker entry. Given these two facts, then please reject this patch and spend your time on the new cached attribute lookups architecture instead. ;-) Regards, Zooko ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:57 Message: Logged In: YES user_id=35752 Based on the complexity added by the patch I would say at least a 5% speedup would be needed to offset the maintainence cost. -1 on the current patch. ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-03-14 16:24 Message: Logged In: YES user_id=52562 update: I did a real app benchmark of this patch by running one of the unit tests from PyXML-0.6.6. (Which one? The one that I guessed would favor my optimization the most. Unfortunately I've lost my notes and I don't remember which one.) I also separated out the "unroll strcmp" optimization from the "use macros" optimization on request. I have lost my notes, but I recall that my results showed what I expected: between 0.5 and 3 percent app-level speed-up for the unroll strcmp optimization. Interesting detail: a quirk in GCC 3 makes the unroll strcmp version is slightly faster than the current strcmp version *even* in the (common) case that the first two characters of the attribute name are *not* '__'. What should happen next: 1. Someone who has the authority to approve or reject this patch should tell me what kind of benchmark would be persuasive to you. I mean: what specific program I can run with and without my patch for a useful comparison. (If you require more than a 5% app-level speed-up, then let's give up on this patch now!) 2. Someone volunteer to test this patch with MSFT compiler, as I don't have one right now. Some people are still using the Windows platform, I've noticed [1], so it is worth benchmarking. Actually, someone should volunteer to benchmark GCC+Linux-or-MacOSX, too, as my computer is a laptop with variable-speed CPU and is really crummy for benchmarking. By the way, PEP 266 is a better solution to the problem but until it's implemented, this patch is the better patch. ;-) Note: this is one of those patches that looks uglier in "diff -u" format than in actual source code. Please browse the actual source side-by-side [2] to see how ugly it really is. Regards Zooko [1] http://www.google.com/press/zeitgeist/jan02-pie.gif [2] search for "class_getattr" in: http://zooko.com/classobject.c http://zooko.com/classobject-strcmpunroll.c --- zooko.com Security and Distributed Systems Engineering --- ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-01-18 00:22 Message: Logged In: YES user_id=52562 Okay I've done some "mini benchmarks". The earlier reported micro-benchmarks were the result of running the inner loop itself, in C. These mini benchmarks are the result of running this Python script: class A: def __init__(self): self.a = 0 a = A() for i in xrange(2**20): a.a = i print a.a and then using different attribute names in place of `a'. The results are as expected: the optimized version is faster than the current one, depending on the shape of the attribute name, and dampened by the fact that there is now other work being done. The case that shows the smallest difference is when the attribute name neither begins nor ends with an '_'. In that case the above script runs about 2% faster with the optimizations. The case that shows the biggest difference is when the attribute begins and ends with '__', as in `__a__'. Then the above script runs about 15% faster. This still isn't a *real* application benchmark. I'm looking for one that is a reasonable case for real Python users but that also uses attribute lookups heavily. ---------------------------------------------------------------------- Comment By: Zooko O'Whielacronx (zooko) Date: 2002-01-17 20:33 Message: Logged In: YES user_id=52562 Yeah, the optimized version is less readable that the original. I'll try to come up with a benchmark application. Any ideas? Maybe some unit tests from Zope that use attribute lookups heavily? My guess is that the actual results in an application will be "marginal", like maybe between 0.5% to 3% improvement. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2002-01-17 18:29 Message: Logged In: YES user_id=31392 This seems to add a lot of complexity for a few special cases. How important are these particular attributes? Do you have any benchmark applications that show real improvement? It seems like microbenchmarks overstate the benefit, since we don't know how often these attributes are looked up by most applications. It would also be interesting to see how much of the benefit for non __ names is the result of the PyString_AS_STRING() macro. Maybe that's all the change we really need :-). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=502415&group_id=5470 From noreply@sourceforge.net Sun Mar 24 18:39:20 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 10:39:20 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-30 01:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Open Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 18:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 12:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-30 02:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Sun Mar 24 21:51:57 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 13:51:57 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-30 01:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 21:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 18:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 12:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-30 02:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Sun Mar 24 22:05:02 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 14:05:02 -0800 Subject: [Patches] [ python-Patches-533008 ] specifying headers for extensions Message-ID: Patches item #533008, was opened at 2002-03-21 06:09 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470 Category: Distutils and setup.py Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Thomas Heller (theller) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: specifying headers for extensions Initial Comment: This patch allows to specify that C header files are part of source files for dependency checking. The 'sources' list in Extension instances can be simple filenames as before, but they can also be SourceFile instances created by SourceFile("myfile.c", headers=["inc1.h", "inc2.h"]). Unfortunately not only changes to command.build_ext and command.build_clib had to be made, also all the ccompiler (sub)classes have to be changed because the ccompiler does the actual dependency checking. I updated all the ccompiler subclasses except mwerkscompiler.py, but only msvccompiler has actually been tested. The argument list which dep_util.newer_pairwise() now accepts has changed, the first arg must now be a sequence of SourceFile instances. This may be problematic, better would IMO be to move this function (with a new name?) into ccompiler. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 17:05 Message: Logged In: YES user_id=6380 Why is this priority 7?????? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470 From noreply@sourceforge.net Sun Mar 24 22:28:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 14:28:12 -0800 Subject: [Patches] [ python-Patches-489066 ] Include RLIM_INFINITY constant Message-ID: Patches item #489066, was opened at 2001-12-04 20:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470 Category: Modules Group: Python 2.3 >Status: Closed Resolution: Accepted Priority: 5 Submitted By: Eric Huss (ehuss) Assigned to: Neil Schemenauer (nascheme) Summary: Include RLIM_INFINITY constant Initial Comment: The following is a patch to the resource module to include the RLIM_INFINITY constant. It should handle platforms where RLIM_INFINITY is not a LONG_LONG, but I have no means to test that. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 22:28 Message: Logged In: YES user_id=35752 Checked in as resource.c 2.23. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 12:02 Message: Logged In: YES user_id=6380 Comments: (1) RLIM_INFINITY is used unconditionally elsewhere in the module, so the #ifdef is unnecessary. (2) The extra #if/#endif around the closing curly is ugly. I'd avoid this by moving the corresponding opening curly outside the first block. (3) resource.RLIM_INFINITY is -1 on my system too. But does that matter? This is just a symbolic constant to be used to set limits to infinit, and if it happens to be -1, who cares? It's got 32 1-bits, which is what counts. So I'd accept it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 00:25 Message: Logged In: YES user_id=35752 This doesn't seem to work on my Linux machine. RLIM_INFINITY is an unsigned long. It becomes -1L in the resource module. I'm attaching an updated patch that uses PyModule_AddObject and applies cleanly to the current CVS. ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2001-12-13 20:43 Message: Logged In: YES user_id=31392 I'd rather see this go through a beta release where we can verify that it works for both the LONG_LONG and non-LONG_LONG cases. Among other things, it looks possible (though probably unlikely) that there are platforms that do not have long long and do not representation rlim_t as long. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-12 05:24 Message: Logged In: YES user_id=6380 Jeremy, please review and apply or reject (or postpone and lower priority). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=489066&group_id=5470 From noreply@sourceforge.net Sun Mar 24 22:35:26 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 14:35:26 -0800 Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip) Message-ID: Patches item #533482, was opened at 2002-03-22 08:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 Category: Library (Lib) Group: Python 2.2.x >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Todd Warner (icode) Assigned to: Neil Schemenauer (nascheme) Summary: small seek tweak upon reads (gzip) Initial Comment: Upon actual read of a gzipped file, there is a check to see if you are already at the end of the file. This is done by saving your position, seeking to the end, and comparing that tell(). It is more efficient to simply increment position + 1. Efficiency gain is nearly insignificant, but this patch will greatly decrease the size of my next one. :) NOTE: all version of gzip.py do this. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 22:35 Message: Logged In: YES user_id=35752 This looks like a pointless change to me. It's probably less efficient with the patch because there is an extra Python int add. Why don't you just submit the real patch? :) Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 From noreply@sourceforge.net Sun Mar 24 23:01:15 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 15:01:15 -0800 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 16:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2002-03-24 15:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 14:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Sun Mar 24 23:13:29 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 15:13:29 -0800 Subject: [Patches] [ python-Patches-522587 ] Fixes pydoc http/ftp URL matching Message-ID: Patches item #522587, was opened at 2002-02-25 18:50 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522587&group_id=5470 Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Brian Quinlan (bquinlan) Assigned to: Neil Schemenauer (nascheme) Summary: Fixes pydoc http/ftp URL matching Initial Comment: The current URL matching pattern used by pydoc only excludes whitespace. My patch also excludes the following characters: ' & " - excludes the quotes in: < & > - As stated in RFC-1738: """The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text""" We don't want to include the delimeters as part of the URL. And including unescaped "<" in an attribute value is not legal markup. Also, remove the word boundary requirement for http/ftp URIs because otherwise the "/" would not be included in the following URL: "http://www.python.org/" Attached is the patch and some simple test code. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 23:13 Message: Logged In: YES user_id=35752 Fixed in pydoc 1.60. I dropped the trailing \b. Instead of restricting the characters in the URL I changed the code to properly quote it. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=522587&group_id=5470 From noreply@sourceforge.net Sun Mar 24 23:15:22 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 15:15:22 -0800 Subject: [Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42) Message-ID: Patches item #474274, was opened at 2001-10-23 23:15 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 Category: Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Nobody/Anonymous (nobody) Summary: Pure Python strptime() (PEP 42) Initial Comment: The attached file contains a pure Python version of strptime(). It attempts to operate as much like time.strptime() within reason. Where vagueness or obvious platform dependence existed, I tried to standardize and be reasonable. PEP 42 makes a request for a portable, consistent version of time.strptime(): - Add a portable implementation of time.strptime() that works in clearly defined ways on all platforms. This module attempts to close that feature request. The code has been tested thoroughly by myself as well as some other people who happened to have caught the post I made to c.l.p a while back and used the module. It is available at the Python Cookbook (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036). It has been approved by the editors there and thus is listed as approved. It is also being considered for inclusion in the book (thanks, Alex, for encouraging this submission). A PyUnit testing suite for the module is available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime along with the code for the function itself. Localization has been handled in a modular way using regexes. All of it is self-explanatory in the doc strings. It is very straight-forward to include your own localization settings or modify the two languages included in the module (English and Swedish). If the code needs to have its license changed, I am quite happy to do it (I have already given the OK to the Python Cookbook). -Brett Cannon ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 23:15 Message: Logged In: YES user_id=35752 Go ahead and reuse this item. I'll wait for the updated version. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-24 23:01 Message: Logged In: YES user_id=357491 Oops. I thought I had removed the clause. Feel free to remove it. I am going to be cleaning up the module, though, so if you would rather not bother reviewing this version and wait on the cleaned-up one, go ahead. Speaking of which, should I just reply to this bugfix when I get around to the update, or start a new patch? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 22:41 Message: Logged In: YES user_id=35752 I'm pretty sure this code needs a different license before it can be accepted. The current license contains the "BSD advertising clause". See http://www.gnu.org/philosophy/bsd.html. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470 From noreply@sourceforge.net Mon Mar 25 01:21:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 17:21:36 -0800 Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip) Message-ID: Patches item #533482, was opened at 2002-03-22 03:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Rejected Priority: 5 Submitted By: Todd Warner (icode) Assigned to: Neil Schemenauer (nascheme) Summary: small seek tweak upon reads (gzip) Initial Comment: Upon actual read of a gzipped file, there is a check to see if you are already at the end of the file. This is done by saving your position, seeking to the end, and comparing that tell(). It is more efficient to simply increment position + 1. Efficiency gain is nearly insignificant, but this patch will greatly decrease the size of my next one. :) NOTE: all version of gzip.py do this. ---------------------------------------------------------------------- >Comment By: Todd Warner (icode) Date: 2002-03-24 20:21 Message: Logged In: YES user_id=87721 It is more efficient for the majority of gzipped files (if very small files are not in the majority). The "real" patch will be (once I give it a bit more polish/tuning --- using in production code soon) a class called GzipStream. Ie. it will allow high level access to any arbitrary file-like "stream" (eg. a gzipped socket stream) which are not generally "seekable". I do this via inheriting GzipFile and extending upon it... but I rewrite the _read method with a one line change. Anyway, that is my logic. Question to you: should this be included within gzip.py or in its own module (eg. gzipstream)? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 17:35 Message: Logged In: YES user_id=35752 This looks like a pointless change to me. It's probably less efficient with the patch because there is an extra Python int add. Why don't you just submit the real patch? :) Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 From noreply@sourceforge.net Mon Mar 25 01:30:50 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 17:30:50 -0800 Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip) Message-ID: Patches item #533482, was opened at 2002-03-22 03:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Rejected Priority: 5 Submitted By: Todd Warner (icode) Assigned to: Neil Schemenauer (nascheme) Summary: small seek tweak upon reads (gzip) Initial Comment: Upon actual read of a gzipped file, there is a check to see if you are already at the end of the file. This is done by saving your position, seeking to the end, and comparing that tell(). It is more efficient to simply increment position + 1. Efficiency gain is nearly insignificant, but this patch will greatly decrease the size of my next one. :) NOTE: all version of gzip.py do this. ---------------------------------------------------------------------- >Comment By: Todd Warner (icode) Date: 2002-03-24 20:30 Message: Logged In: YES user_id=87721 It is more efficient for the majority of gzipped files (if very small files are not in the majority). The "real" patch will be (once I give it a bit more polish/tuning --- using in production code soon) a class called GzipStream. Ie. it will allow high level access to any arbitrary file-like "stream" (eg. a gzipped socket stream) which are not generally "seekable". I do this via inheriting GzipFile and extending upon it... but I rewrite the _read method with a one line change. Anyway, that is my logic. Question to you: should this be included within gzip.py or in its own module (eg. gzipstream)? ---------------------------------------------------------------------- Comment By: Todd Warner (icode) Date: 2002-03-24 20:21 Message: Logged In: YES user_id=87721 It is more efficient for the majority of gzipped files (if very small files are not in the majority). The "real" patch will be (once I give it a bit more polish/tuning --- using in production code soon) a class called GzipStream. Ie. it will allow high level access to any arbitrary file-like "stream" (eg. a gzipped socket stream) which are not generally "seekable". I do this via inheriting GzipFile and extending upon it... but I rewrite the _read method with a one line change. Anyway, that is my logic. Question to you: should this be included within gzip.py or in its own module (eg. gzipstream)? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 17:35 Message: Logged In: YES user_id=35752 This looks like a pointless change to me. It's probably less efficient with the patch because there is an extra Python int add. Why don't you just submit the real patch? :) Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 From noreply@sourceforge.net Mon Mar 25 03:33:44 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 19:33:44 -0800 Subject: [Patches] [ python-Patches-533482 ] small seek tweak upon reads (gzip) Message-ID: Patches item #533482, was opened at 2002-03-22 08:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 Category: Library (Lib) Group: Python 2.2.x Status: Closed Resolution: Rejected Priority: 5 Submitted By: Todd Warner (icode) Assigned to: Neil Schemenauer (nascheme) Summary: small seek tweak upon reads (gzip) Initial Comment: Upon actual read of a gzipped file, there is a check to see if you are already at the end of the file. This is done by saving your position, seeking to the end, and comparing that tell(). It is more efficient to simply increment position + 1. Efficiency gain is nearly insignificant, but this patch will greatly decrease the size of my next one. :) NOTE: all version of gzip.py do this. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 03:33 Message: Logged In: YES user_id=35752 Why would it be more efficient? Assuming the OS is not implemented by a silly person, a seek just updates an offset in the in-memory file descriptor structure. Regarding your GzipStream, it sounds like making it part of gzip.py would be okay. ---------------------------------------------------------------------- Comment By: Todd Warner (icode) Date: 2002-03-25 01:30 Message: Logged In: YES user_id=87721 It is more efficient for the majority of gzipped files (if very small files are not in the majority). The "real" patch will be (once I give it a bit more polish/tuning --- using in production code soon) a class called GzipStream. Ie. it will allow high level access to any arbitrary file-like "stream" (eg. a gzipped socket stream) which are not generally "seekable". I do this via inheriting GzipFile and extending upon it... but I rewrite the _read method with a one line change. Anyway, that is my logic. Question to you: should this be included within gzip.py or in its own module (eg. gzipstream)? ---------------------------------------------------------------------- Comment By: Todd Warner (icode) Date: 2002-03-25 01:21 Message: Logged In: YES user_id=87721 It is more efficient for the majority of gzipped files (if very small files are not in the majority). The "real" patch will be (once I give it a bit more polish/tuning --- using in production code soon) a class called GzipStream. Ie. it will allow high level access to any arbitrary file-like "stream" (eg. a gzipped socket stream) which are not generally "seekable". I do this via inheriting GzipFile and extending upon it... but I rewrite the _read method with a one line change. Anyway, that is my logic. Question to you: should this be included within gzip.py or in its own module (eg. gzipstream)? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 22:35 Message: Logged In: YES user_id=35752 This looks like a pointless change to me. It's probably less efficient with the patch because there is an extra Python int add. Why don't you just submit the real patch? :) Rejected. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533482&group_id=5470 From noreply@sourceforge.net Mon Mar 25 04:00:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 20:00:19 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-24 23:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 16:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 13:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 10:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 04:35:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 20:35:32 -0800 Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput Message-ID: Patches item #516297, was opened at 2002-02-12 03:56 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Neil Schemenauer (nascheme) Summary: iterator for lineinput Initial Comment: Taking the route of least evasiveness, I have come up with a VERY simple iterator interface for fileinput. Basically, __iter__() returns self and next() calls __getitem__() with the proper number. This was done to have the patch only add methods and not change any existing ones, thus minimizing any chance of breaking existing code. Now the module on the whole, however, could possibly stand an update now that generators are coming. I have a recipe up at the Cookbook that uses generators to implement fileinput w/o in-place editing (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506). If there is enough interest, I would be quite willing to rewrite fileinput using generators. And if some of the unneeded methods could be deprecated (__getitem__, readline), then the whole module could probably be cleaned up a decent amount and have a possible speed improvement. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 04:35 Message: Logged In: YES user_id=35752 Why do you need fileinput to have a __iter__ method? As far as I can see it only slows things down. As it is now iter(fileinput.input()) works just fine. Adding __iter__ and next() just add another layer of method calls. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 From noreply@sourceforge.net Mon Mar 25 06:42:34 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 24 Mar 2002 22:42:34 -0800 Subject: [Patches] [ python-Patches-533681 ] Apply semaphore code to Cygwin Message-ID: Patches item #533681, was opened at 2002-03-22 12:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 Category: Core (C code) Group: Python 2.2.x >Status: Closed >Resolution: Rejected Priority: 5 Submitted By: Gerald S. Williams (gsw_agere) Assigned to: Nobody/Anonymous (nobody) Summary: Apply semaphore code to Cygwin Initial Comment: The current version of Cygwin does not define _POSIX_SEMAPHORES by default, although requires the new semaphore interface since its condition variables interface contains a race condition. This patch simply specifies that semaphores should be used if _POSIX_SEMAPHORES OR __CYGWIN__ is defined. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-25 01:42 Message: Logged In: YES user_id=31435 I'm rejecting the patch based on Jason Tishler's comments in: http://mail.python.org/pipermail/python-dev/2002- March/021675.html Please work with Jason to find a better solution. If you and Jason can't find a better one, and Jason goes along with this patch, we can reopen it. In the meantime, you motivated me to get rid of the old __ksr__ cruft . ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-22 16:28 Message: Logged In: YES user_id=31435 I'm afraid I agree with Martin here: the crusty old historical examples you dug up are exactly why we avoid doing similar stuff now. Nobody understands why that code is there anymore, and it will never go away. For example, I happen to know that KSR went bankrupt in 1994, and anything keying off __ksr__ has been worse than useless since then. ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-03-22 16:19 Message: Logged In: YES user_id=329402 Before _POSIX_SEMAPHORES is specified by default for Cygwin, it will probably have to be shown that it is 100% compliant with POSIX. Whether or not this is the case, the POSIX semaphore implementation is the one that should be used for Cygwin (it has been verified and approved by the Cygwin Python maintainer, etc.). Prior to this, threading had been disabled for Cygwin Python, so this is really more of a port-to-Cygwin than a workaround. This could have been implemented in a new file (thread_cygwin.h), although during implementation it was discovered that the change for Cygwin would also benefit POSIX semaphore users in general. The threading module overall is highly platform-specific, especially with regard to redefining POSIX symbols for specific platforms. In particular, this is done for the following platforms: __DGUX __sgi __ksr__ anything using SOLARIS_THREADS __MWERKS__ However, except for those using SOLARIS_THREADS, these are specified in thread.c. I will therefore resubmit the patch as a change to thread.c instead. The reference to __rtems__ actually comes from newlib, which Cygwin uses. It doesn't apply to Cygwin. ---------------------------------------------------------------------- Comment By: Gerald S. Williams (gsw_agere) Date: 2002-03-22 16:18 Message: Logged In: YES user_id=329402 Before _POSIX_SEMAPHORES is specified by default for Cygwin, it will probably have to be shown that it is 100% compliant with POSIX. Whether or not this is the case, the POSIX semaphore implementation is the one that should be used for Cygwin (it has been verified and approved by the Cygwin Python maintainer, etc.). Prior to this, threading had been disabled for Cygwin Python, so this is really more of a port-to-Cygwin than a workaround. This could have been implemented in a new file (thread_cygwin.h), although during implementation it was discovered that the change for Cygwin would also benefit POSIX semaphore users in general. The threading module overall is highly platform-specific, especially with regard to redefining POSIX symbols for specific platforms. In particular, this is done for the following platforms: __DGUX __sgi __ksr__ anything using SOLARIS_THREADS __MWERKS__ However, except for those using SOLARIS_THREADS, these are specified in thread.c. I will therefore resubmit the patch as a change to thread.c instead. The reference to __rtems__ actually comes from newlib, which Cygwin uses. It doesn't apply to Cygwin. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-22 15:03 Message: Logged In: YES user_id=21627 -1. Cygwin really ought to define _POSIX_SEMAPHORES if they support them, so if they support them and don't define the feature test macro, it is a Cygwin bug. Work-arounds around platform bugs are generally discourgaged in Python. On python-dev, you indicate that _POSIX_SEMPAPHORES is only defined if __rtems__ is also defined. What is the rationale for that? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533681&group_id=5470 From noreply@sourceforge.net Mon Mar 25 09:03:05 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 01:03:05 -0800 Subject: [Patches] [ python-Patches-533008 ] specifying headers for extensions Message-ID: Patches item #533008, was opened at 2002-03-21 12:09 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470 Category: Distutils and setup.py Group: Python 2.3 Status: Open Resolution: None Priority: 7 Submitted By: Thomas Heller (theller) Assigned to: Fred L. Drake, Jr. (fdrake) Summary: specifying headers for extensions Initial Comment: This patch allows to specify that C header files are part of source files for dependency checking. The 'sources' list in Extension instances can be simple filenames as before, but they can also be SourceFile instances created by SourceFile("myfile.c", headers=["inc1.h", "inc2.h"]). Unfortunately not only changes to command.build_ext and command.build_clib had to be made, also all the ccompiler (sub)classes have to be changed because the ccompiler does the actual dependency checking. I updated all the ccompiler subclasses except mwerkscompiler.py, but only msvccompiler has actually been tested. The argument list which dep_util.newer_pairwise() now accepts has changed, the first arg must now be a sequence of SourceFile instances. This may be problematic, better would IMO be to move this function (with a new name?) into ccompiler. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2002-03-25 10:03 Message: Logged In: YES user_id=11105 Fred requested it this way: http://mail.python.org/pipermail/distutils-sig/2002- March/002806.html ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 23:05 Message: Logged In: YES user_id=6380 Why is this priority 7?????? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=533008&group_id=5470 From noreply@sourceforge.net Mon Mar 25 12:27:49 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 04:27:49 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) Assigned to: Nobody/Anonymous (nobody) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-25 13:27 Message: Logged In: YES user_id=88611 I have rebuilt the patch against CVS - --enable-shared instead of --enable-shared-python - sets rpath on Linux and Tru64 too - I didn't change the SOVERSION stuff. I think we should come to a conclusion with versioning first. BTW: am I correct that make install should create the symlink .sl -> .sl.1.0 when we use versioning? - this patch may break BeOS and DgUX. I think somone with access to these platforms should test it (he should use --enable-shared, as this patch changes the default behavior to --disable-shared for all platforms). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:41 Message: Logged In: YES user_id=21627 The API version is maintained in modsupport.h:API_VERSION. I'm personally not concerned about breakage of API during the development of a new release. Absolutely no breakage should occur in maintenance releases. After all, a maintenance will replace pythonxy.dll on Windows with no protection against API breakage, thus, it is a bug if the API changes in a maintenace release. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 18:14 Message: Logged In: YES user_id=10327 This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes. That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-) This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python. :-( ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 18:05 Message: Logged In: YES user_id=21627 The CVS version will usually use a completely different library name (e.g. libpython23.so), so there will be no conflicts with prior versions. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 16:13 Message: Logged In: YES user_id=10327 A SOVERSION of 0.0 makes perfect sense for the CVS head. Release versions should probably use 1.0. I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0. That way, installing a newer CVS version won't instantly beak everything people have built with it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 15:35 Message: Logged In: YES user_id=21627 The patch looks quite good. There are a number of remaining issues that need to be resolved, though: - please regenerate the patch against the current CVS. As is, it fails to apply; parts of it are already in the CVS (the thr_create changes) - I think the SOVERSION should be 1.0, atleast initially: for most Python releases, there will be only a single release of the shared library, which should be named 1.0. - Why do you think that no rpath is needed on Linux? It is not needed if prefix is /usr, and on many installations, it is also not needed if prefix is /usr/local. For all other configurations, you still need a rpath on Linux. - IMO, there could be a default case, assuming SysV-ish configurations. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-18 16:01 Message: Logged In: YES user_id=88611 As far as I can see, the problems are: relocation of binary/library path (this is solved by adding -R to LDSHARED depending on platform) SOVERSION - some systems like it, some do not. If you do SOVERSION, you must create a link to the proper version in the installation phase. IMO we can just avoid versioning at all and let the distribution builders do it themselves. The other way is to attach full version of python as SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1). I'm the author of the patch (ppython.diff). I'm not the author of the file dynamic.diff, I have included it here by accident and if it is possible to delete it from this page, it should be done. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 15:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 17:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 15:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 12:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 11:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 11:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 18:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Mon Mar 25 12:40:55 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 04:40:55 -0800 Subject: [Patches] [ python-Patches-514997 ] remove extra SET_LINENOs Message-ID: Patches item #514997, was opened at 2002-02-08 16:22 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470 Category: Parser/Compiler Group: None >Status: Closed >Resolution: Rejected Priority: 3 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Neil Schemenauer (nascheme) Summary: remove extra SET_LINENOs Initial Comment: This patch removes consecutive SET_LINENOs. The patch fixes test_hotspot, but does not fix a failure in inspect. I wasn't sure what was the problem was or why SET_LINENO would matter for inspect. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-25 07:40 Message: Logged In: YES user_id=33168 I'm rejected this patch because it would take a lot of work to get this patch to the point where it would be good enough for inclusion. Now to answer Tim's questions. Tabs vs spaces: depends on the day. I use both emacs & vi, emacs does convert to spaces. But I must have screwed something up. 0xffff was only a hack to not deal with line numbers > 2 **16. I was going for bang for the buck. I agree it would be best to remove this limitation. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-03-24 00:29 Message: Logged In: YES user_id=31435 Neal, do you have your editor set to insert spaces instead of tabs, and to consider "a tab" to be four spaces? Guido wrote this file using hard tabs considered as 8-space gimmicks, and the after-patch code is kinda gruesome due to the mixture of indentation styles. Second, why do you think a hard-coded 0xffff is something interesting for line numbers? Or are you just giving up when line numbers are >= 2**16? The code is mysterious here and needs a comment. It's probably not good to leave the code in a state where adjacent SET_LINENOs are collapsed if and only if the line numbers "aren't big" (then code using line numbers can't guess whether they are or aren't collapsed without duplicating the same lumpy logic). Third, c_lnotab is extremely delicate, historically subject to miserable rare bugs. If you've read the long comment block explaining it near the top of this file, I'd appreciate an argument (in code comments more than here ) for why just mucking with the last pair in a sequence of offset pairs can't break the subtle correctness property explained in the comment block. Finally, it's definitely worth tracking down why test_inspect fails: that test is difficult to understand, but the bottom line is that it's provoking an exception traceback and asserting that the computed line numbers correspond to the actual lines that are failing. The failing case provokes a three-frame traceback, and 2 of the 3 line numbers are wrong after the patch (the first is off by 1, and the third is off by 3; the frame in the middle gets the right line number). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-01 17:42 Message: Logged In: YES user_id=6380 Can you find someone interested in answering the inspect question? Otherwise this patch is stalled... ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=514997&group_id=5470 From noreply@sourceforge.net Mon Mar 25 12:47:41 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 04:47:41 -0800 Subject: [Patches] [ python-Patches-505826 ] demo warning for expressions w/no effect Message-ID: Patches item #505826, was opened at 2002-01-19 14:24 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=505826&group_id=5470 Category: Parser/Compiler Group: Python 2.3 >Status: Closed Resolution: None Priority: 5 Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: demo warning for expressions w/no effect Initial Comment: This patch is not meant to be applied as is. It is for discussion purposes. It modifies the compiler to warn about statements that have no effect. It does a printf() when it determines an expression has no effect. The sample definition is: a POP_TOP preceded by a BINARY_* or a LOAD_* operation. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-25 07:47 Message: Logged In: YES user_id=33168 I don't think this is useful any more. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=505826&group_id=5470 From noreply@sourceforge.net Mon Mar 25 13:23:00 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 05:23:00 -0800 Subject: [Patches] [ python-Patches-534304 ] PEP 263 phase 2 Implementation Message-ID: Patches item #534304, was opened at 2002-03-24 14:52 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470 Category: Parser/Compiler Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: SUZUKI Hisao (suzuki_hisao) Assigned to: Nobody/Anonymous (nobody) Summary: PEP 263 phase 2 Implementation Initial Comment: This is a sample implementation of PEP 263 phase 2. This implementation behaves just as normal Python does if no other coding hints are given. Thus it does not hurt anyone who uses Python now. Note that it is strictly compatible with the PEP in that every program valid in the PEP is also valid in this implementation. This implementation also accepts files in UTF-16 with BOM. They are read as UTF-8 internally. Please try "utf16sample.py" included. ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-25 14:23 Message: Logged In: YES user_id=21627 The patch looks good, but needs a number of improvements. 1. I have problems building this code. When trying to build pgen, I get an error message of Parser/parsetok.c: In function `parsetok': Parser/parsetok.c:175: `encoding_decl' undeclared The problem here is that graminit.h hasn't been built yet, but parsetok refers to the symbol. 2. For some reason, error printing for incorrect encodings does not work - it appears that it prints the wrong line in the traceback. 3. The escape processing in Unicode literals is incorrect. For example, u"\" should denote only the non-ascii character. However, your implementation replaces the non-ASCII character with \u, resulting in \u, so the first backslash unescapes the second one. 4. I believe the escape processing in byte strings is also incorrect for encodings that allow \ in the second byte. Before processing escape characters, you convert back into the source encoding. If this produces a backslash character, escape processing will misinterpret that byte as an escape character. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470 From noreply@sourceforge.net Mon Mar 25 14:01:36 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 06:01:36 -0800 Subject: [Patches] [ python-Patches-527027 ] Allow building python as shared library Message-ID: Patches item #527027, was opened at 2002-03-07 17:45 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 Category: Build Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Ondrej Palkovsky (ondrap) >Assigned to: Martin v. Lцwis (loewis) Summary: Allow building python as shared library Initial Comment: This patch allows building python as a shared library. - enables building shared python with '--enable-shared-python' configuration option - builds the file '.so' by default and changes the name on installation, so it is currently enabled on linux to be '0.0', but this can be easily changed - tested on linux, solaris(gcc), tru64(cc) and HP-UX 11.0(aCC). It produces the library using LDSHARED -o, while some architectures that were already building shared, used different algorithm. I'm not sure if it didn't break them (someone should check DGUX and BeOS). It also makes building shared library disabled by default, while these architectures had it enabled. - it rectifies a small problem on solaris2.8, that makes double inclusion of thread.o (this produces error on 'ld' for shared library). ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-25 15:01 Message: Logged In: YES user_id=21627 I think the remaining issues are shallow only: Few users will care about --enable-shared on BeOS and DG/UX; those will hopefully contribute patches. Likewise, for .sl libraries - I don't know HP-UX shared linking well enough to determine whether it supports library versions. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-25 13:27 Message: Logged In: YES user_id=88611 I have rebuilt the patch against CVS - --enable-shared instead of --enable-shared-python - sets rpath on Linux and Tru64 too - I didn't change the SOVERSION stuff. I think we should come to a conclusion with versioning first. BTW: am I correct that make install should create the symlink .sl -> .sl.1.0 when we use versioning? - this patch may break BeOS and DgUX. I think somone with access to these platforms should test it (he should use --enable-shared, as this patch changes the default behavior to --disable-shared for all platforms). ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-20 08:41 Message: Logged In: YES user_id=21627 The API version is maintained in modsupport.h:API_VERSION. I'm personally not concerned about breakage of API during the development of a new release. Absolutely no breakage should occur in maintenance releases. After all, a maintenance will replace pythonxy.dll on Windows with no protection against API breakage, thus, it is a bug if the API changes in a maintenace release. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 18:14 Message: Logged In: YES user_id=10327 This is exactly the problem -- if today's libpython23.so replaces last week's libpython23.so, then everything I built during the last week is going to break if the ABI changes. That's why I think that incorporating the version number from api.tex is a good idea -- call me an optimmist, but I think that any change will be documented. ;-) This kind of problem is NOT pretty. I went through it a few years ago when the GNU libc transitioned to versioned linking. It managed to cause a LOT of almost-intractable incompatibilities during that time, and I don't care at all to repeat that experience with Python. :-( ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 18:05 Message: Logged In: YES user_id=21627 The CVS version will usually use a completely different library name (e.g. libpython23.so), so there will be no conflicts with prior versions. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-19 16:13 Message: Logged In: YES user_id=10327 A SOVERSION of 0.0 makes perfect sense for the CVS head. Release versions should probably use 1.0. I don't quite know, though, if builds from CVS should keep a fixed SOVERSION -- after all, the API can change. One idea would be to use the tip version number of Doc/api/api.tex, i.e. libpython2.3.so.0.154 or libpython2.3.154.so.0.0. That way, installing a newer CVS version won't instantly beak everything people have built with it. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-19 15:35 Message: Logged In: YES user_id=21627 The patch looks quite good. There are a number of remaining issues that need to be resolved, though: - please regenerate the patch against the current CVS. As is, it fails to apply; parts of it are already in the CVS (the thr_create changes) - I think the SOVERSION should be 1.0, atleast initially: for most Python releases, there will be only a single release of the shared library, which should be named 1.0. - Why do you think that no rpath is needed on Linux? It is not needed if prefix is /usr, and on many installations, it is also not needed if prefix is /usr/local. For all other configurations, you still need a rpath on Linux. - IMO, there could be a default case, assuming SysV-ish configurations. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-18 16:01 Message: Logged In: YES user_id=88611 As far as I can see, the problems are: relocation of binary/library path (this is solved by adding -R to LDSHARED depending on platform) SOVERSION - some systems like it, some do not. If you do SOVERSION, you must create a link to the proper version in the installation phase. IMO we can just avoid versioning at all and let the distribution builders do it themselves. The other way is to attach full version of python as SOVERSION (e.g. 2.1.1 -> libpython2.1.so.2.1.1). I'm the author of the patch (ppython.diff). I'm not the author of the file dynamic.diff, I have included it here by accident and if it is possible to delete it from this page, it should be done. ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-16 17:38 Message: Logged In: YES user_id=6656 This ain't gonna happen on the 2.2.x branch, so changing group. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-15 15:05 Message: Logged In: YES user_id=21627 Yes, that is all right. The approach, in general, is also good, but please review my comments to #497102. Also, I still like to get a clarification as to who is the author of this code. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 17:10 Message: Logged In: YES user_id=88611 Ok, so no libtool. Did I get correctly, that you want: --enable-shared/--enable-static instead of --enable-shared-python, --disable-shared-python - Do you agree with the way it is done in the patch (ppython.diff) or do you propose another way? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-08 15:44 Message: Logged In: YES user_id=6380 libtool sucks. Case closed. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-08 12:09 Message: Logged In: YES user_id=21627 While I agree on the "not Linux only" and "use standard configure options" comments; I completely disagree on libtool - only over my dead body. libtool is broken, and it is a good thing that Python configure knows the compiler command line options on its own. ---------------------------------------------------------------------- Comment By: Ondrej Palkovsky (ondrap) Date: 2002-03-08 11:52 Message: Logged In: YES user_id=88611 Sorry, I've been inspired by the former patch and I have mistakenly included it here. My patch doesn't use LD_PRELOAD and creates the .a with -fPIC, so it is compatibile with other makes (not only GNU). I'll try to learn libttool and and try to do it that way though. ---------------------------------------------------------------------- Comment By: Matthias Urlichs (smurf) Date: 2002-03-08 11:22 Message: Logged In: YES user_id=10327 IMHO this patch has a couple of problems. The main one is that GNU configure has standard options for enabling shared library support, --enable/disable-shared/static. They should be used! The other is that it's Linux-only. Shared library support tends to work well, for varying definitions of "well" anyway, on lots of platforms, but you really need to use libtool for it. That would also get rid of the LD_PRELOAD, since that'd be encapsulated by libtool. It's a rather bigger job to convert something like Python to libtool properly instead of hacking the Makefile a bit, and the build will definitely get somewhat slower as a result, BUT if we agree that a shared Python library is a good idea (i think it is!), the work is definitely worth doing. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-07 19:36 Message: Logged In: YES user_id=21627 As the first issue, I'd like to clarify ownership of this code. This is the same patch as #497102, AFAICT, but contributed by a different submitter. So who wrote created that code originally? The same comments that I made to #497102 apply to this patch as well: why 0.0; please no unrelated changes (Hurd); why create both pic and non-pic objects; please no compiler-specific flags in the makefile; why LD_PRELOAD. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-07 18:09 Message: Logged In: YES user_id=6380 Could you submit the thread.o double inclusion patch separately? It's much less controversial. I like the idea of building Python as a shared lib, but I'm hesitant to add more code to an already super complex area of the configuration and build process. I need more reviewers. Maybe the submitter can get some other developers to comment? P.S. it would be better if you used the current CVS or at least the final 2.2 release as a basis for your patch. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=527027&group_id=5470 From noreply@sourceforge.net Mon Mar 25 15:23:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 07:23:31 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 10:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-24 23:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 16:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 13:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 10:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 16:00:19 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 08:00:19 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 11:00 Message: Logged In: YES user_id=12800 Oh sorry. A domain literal is something like [192.168.1.2] IOW, the IP address octets surrounded by square brackets. Should be easy enough to calculate. Attached is a proposed patch. As for the privacy violation, I don't think it's on the same level as the ftp issue because we're not divulging any information about the user. It could be argued that leaking the hostname might be enough to link the information to a specific user, and I might buy that argument, although it personally doesn't bother me too much (the IP address might be just as sufficient for linking and even NAT'd or DHCP'd addresses might be static enough to guess -- witness your own supposedly dynamic IP address :). And the IP will always be available via the socket peer. OTOH, Eduardo's claim isn't totally without merit. I'd like to be able to retain the ability to be properly RFC compliant, but could accept that the default be localhost.localdomain. If you (Guido) have a suggestion for an appropriate API for both these requirements, that would be great. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 10:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-24 23:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 16:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 13:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 10:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 16:10:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 08:10:59 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-30 01:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 16:10 Message: Logged In: YES user_id=35752 There is no way that smtplib can automatically and reliably find the FQDN. socket.getfqdn() is a hack, IMHO. It doesn't really matter though. The chances of an email server rejecting email based on the domain name following the HELO verb is very small. I recall seeing only one in actual use. I still think the code is fine as it is. socket.getfqdn() aways returns something. Most mail servers don't care what it returns. Changing the default to 'localhost.localdomain' doesn't really solve anything. In your example, the script would still not work for the user trying to send email through a misconfigured server. It would reject 'localhost.localdomain' just like it rejected whatever socket.getfqdn() returned. The only possible arguments for using 'localhost.localdomain' are that it's faster (doesn't require a DNS lookup) and that it gives away less information. It doesn't give away much information though. The remote server already has the sender's IP address. The hostname shouldn't mean very much. If someone is that paranoid they can pass 'localhost.localdomain' to SMTP.__init__. Eventually we should make 'localhost.localdomain' the default. Like I said, getfqdn() is a hack. We could probably make the change now and no one would care. I'm just being very conservative. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 16:00 Message: Logged In: YES user_id=12800 Oh sorry. A domain literal is something like [192.168.1.2] IOW, the IP address octets surrounded by square brackets. Should be easy enough to calculate. Attached is a proposed patch. As for the privacy violation, I don't think it's on the same level as the ftp issue because we're not divulging any information about the user. It could be argued that leaking the hostname might be enough to link the information to a specific user, and I might buy that argument, although it personally doesn't bother me too much (the IP address might be just as sufficient for linking and even NAT'd or DHCP'd addresses might be static enough to guess -- witness your own supposedly dynamic IP address :). And the IP will always be available via the socket peer. OTOH, Eduardo's claim isn't totally without merit. I'd like to be able to retain the ability to be properly RFC compliant, but could accept that the default be localhost.localdomain. If you (Guido) have a suggestion for an appropriate API for both these requirements, that would be great. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 15:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 04:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 21:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 18:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 12:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-30 02:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 17:16:42 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 09:16:42 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:16 Message: Logged In: YES user_id=6380 Neil: coping with a misconfigured server wasn't part of my scenario; only coping with a client that simply doesn't have a fqdn was. Some questions remain: (1) why can't we use localhost.localdomain today? (2) Why is getfqdn() a hack? (Apart from it being in the wrong module.) Hm, I just thought of something. Why shouldn't gethostname() be used as the default? Why bother with getfqdn() at all? At least when gethostname() returms something inappropriate for a particular server, it can be fixed locally by root by fixing the hostname. (This may explain why you think getfqdn() is a hack.) Barry: an appropriate API could be to change the default for local_hostname in __init__ to "localhost.localdomain" but to leave the code that sticks in socket.getfqdn() (or maybe just socket.gethostname()) if the value is explicitly given as None or empty. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 11:10 Message: Logged In: YES user_id=35752 There is no way that smtplib can automatically and reliably find the FQDN. socket.getfqdn() is a hack, IMHO. It doesn't really matter though. The chances of an email server rejecting email based on the domain name following the HELO verb is very small. I recall seeing only one in actual use. I still think the code is fine as it is. socket.getfqdn() aways returns something. Most mail servers don't care what it returns. Changing the default to 'localhost.localdomain' doesn't really solve anything. In your example, the script would still not work for the user trying to send email through a misconfigured server. It would reject 'localhost.localdomain' just like it rejected whatever socket.getfqdn() returned. The only possible arguments for using 'localhost.localdomain' are that it's faster (doesn't require a DNS lookup) and that it gives away less information. It doesn't give away much information though. The remote server already has the sender's IP address. The hostname shouldn't mean very much. If someone is that paranoid they can pass 'localhost.localdomain' to SMTP.__init__. Eventually we should make 'localhost.localdomain' the default. Like I said, getfqdn() is a hack. We could probably make the change now and no one would care. I'm just being very conservative. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 11:00 Message: Logged In: YES user_id=12800 Oh sorry. A domain literal is something like [192.168.1.2] IOW, the IP address octets surrounded by square brackets. Should be easy enough to calculate. Attached is a proposed patch. As for the privacy violation, I don't think it's on the same level as the ftp issue because we're not divulging any information about the user. It could be argued that leaking the hostname might be enough to link the information to a specific user, and I might buy that argument, although it personally doesn't bother me too much (the IP address might be just as sufficient for linking and even NAT'd or DHCP'd addresses might be static enough to guess -- witness your own supposedly dynamic IP address :). And the IP will always be available via the socket peer. OTOH, Eduardo's claim isn't totally without merit. I'd like to be able to retain the ability to be properly RFC compliant, but could accept that the default be localhost.localdomain. If you (Guido) have a suggestion for an appropriate API for both these requirements, that would be great. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 10:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-24 23:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 16:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 13:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 10:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 17:31:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 09:31:10 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-30 01:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 17:31 Message: Logged In: YES user_id=35752 So much discussion for such a little issue. :-) A misconfigured server must be part of your scenario. It's the only case were the hostname makes any difference. Using localhost.localdomain will work find on 99.99% of mail servers. For the remaining 0.01%, using socket.getfqdn() has a higher chance of working than using localhost.localdomain. If socket.getfqdn() can find a hostname that resolves back to the IP of the client side of the connection then it works. Using localhost.localdomain in that case will not work. If socket.getfqdn() cannot find the FQDN (due to NAT, tunnelling or whatever) things work just as well as if localhost.localdomain was used a default. Changing the default to localhost.localdomain fixes nothing! getfqdn() is a hack because it's relies on DNS. People always screw that up. :-) Regarding your suggested API change, I don't see how it would help. I doubt any code actually passes socket.getfqdn() to SMPT.helo(). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 17:16 Message: Logged In: YES user_id=6380 Neil: coping with a misconfigured server wasn't part of my scenario; only coping with a client that simply doesn't have a fqdn was. Some questions remain: (1) why can't we use localhost.localdomain today? (2) Why is getfqdn() a hack? (Apart from it being in the wrong module.) Hm, I just thought of something. Why shouldn't gethostname() be used as the default? Why bother with getfqdn() at all? At least when gethostname() returms something inappropriate for a particular server, it can be fixed locally by root by fixing the hostname. (This may explain why you think getfqdn() is a hack.) Barry: an appropriate API could be to change the default for local_hostname in __init__ to "localhost.localdomain" but to leave the code that sticks in socket.getfqdn() (or maybe just socket.gethostname()) if the value is explicitly given as None or empty. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 16:10 Message: Logged In: YES user_id=35752 There is no way that smtplib can automatically and reliably find the FQDN. socket.getfqdn() is a hack, IMHO. It doesn't really matter though. The chances of an email server rejecting email based on the domain name following the HELO verb is very small. I recall seeing only one in actual use. I still think the code is fine as it is. socket.getfqdn() aways returns something. Most mail servers don't care what it returns. Changing the default to 'localhost.localdomain' doesn't really solve anything. In your example, the script would still not work for the user trying to send email through a misconfigured server. It would reject 'localhost.localdomain' just like it rejected whatever socket.getfqdn() returned. The only possible arguments for using 'localhost.localdomain' are that it's faster (doesn't require a DNS lookup) and that it gives away less information. It doesn't give away much information though. The remote server already has the sender's IP address. The hostname shouldn't mean very much. If someone is that paranoid they can pass 'localhost.localdomain' to SMTP.__init__. Eventually we should make 'localhost.localdomain' the default. Like I said, getfqdn() is a hack. We could probably make the change now and no one would care. I'm just being very conservative. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 16:00 Message: Logged In: YES user_id=12800 Oh sorry. A domain literal is something like [192.168.1.2] IOW, the IP address octets surrounded by square brackets. Should be easy enough to calculate. Attached is a proposed patch. As for the privacy violation, I don't think it's on the same level as the ftp issue because we're not divulging any information about the user. It could be argued that leaking the hostname might be enough to link the information to a specific user, and I might buy that argument, although it personally doesn't bother me too much (the IP address might be just as sufficient for linking and even NAT'd or DHCP'd addresses might be static enough to guess -- witness your own supposedly dynamic IP address :). And the IP will always be available via the socket peer. OTOH, Eduardo's claim isn't totally without merit. I'd like to be able to retain the ability to be properly RFC compliant, but could accept that the default be localhost.localdomain. If you (Guido) have a suggestion for an appropriate API for both these requirements, that would be great. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 15:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 04:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 21:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 18:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 15:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 12:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 01:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-30 02:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 17:41:45 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 09:41:45 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:41 Message: Logged In: YES user_id=6380 OK. So is socket.gethostname() better than socket.getfqdn() or not? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 12:31 Message: Logged In: YES user_id=35752 So much discussion for such a little issue. :-) A misconfigured server must be part of your scenario. It's the only case were the hostname makes any difference. Using localhost.localdomain will work find on 99.99% of mail servers. For the remaining 0.01%, using socket.getfqdn() has a higher chance of working than using localhost.localdomain. If socket.getfqdn() can find a hostname that resolves back to the IP of the client side of the connection then it works. Using localhost.localdomain in that case will not work. If socket.getfqdn() cannot find the FQDN (due to NAT, tunnelling or whatever) things work just as well as if localhost.localdomain was used a default. Changing the default to localhost.localdomain fixes nothing! getfqdn() is a hack because it's relies on DNS. People always screw that up. :-) Regarding your suggested API change, I don't see how it would help. I doubt any code actually passes socket.getfqdn() to SMPT.helo(). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:16 Message: Logged In: YES user_id=6380 Neil: coping with a misconfigured server wasn't part of my scenario; only coping with a client that simply doesn't have a fqdn was. Some questions remain: (1) why can't we use localhost.localdomain today? (2) Why is getfqdn() a hack? (Apart from it being in the wrong module.) Hm, I just thought of something. Why shouldn't gethostname() be used as the default? Why bother with getfqdn() at all? At least when gethostname() returms something inappropriate for a particular server, it can be fixed locally by root by fixing the hostname. (This may explain why you think getfqdn() is a hack.) Barry: an appropriate API could be to change the default for local_hostname in __init__ to "localhost.localdomain" but to leave the code that sticks in socket.getfqdn() (or maybe just socket.gethostname()) if the value is explicitly given as None or empty. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 11:10 Message: Logged In: YES user_id=35752 There is no way that smtplib can automatically and reliably find the FQDN. socket.getfqdn() is a hack, IMHO. It doesn't really matter though. The chances of an email server rejecting email based on the domain name following the HELO verb is very small. I recall seeing only one in actual use. I still think the code is fine as it is. socket.getfqdn() aways returns something. Most mail servers don't care what it returns. Changing the default to 'localhost.localdomain' doesn't really solve anything. In your example, the script would still not work for the user trying to send email through a misconfigured server. It would reject 'localhost.localdomain' just like it rejected whatever socket.getfqdn() returned. The only possible arguments for using 'localhost.localdomain' are that it's faster (doesn't require a DNS lookup) and that it gives away less information. It doesn't give away much information though. The remote server already has the sender's IP address. The hostname shouldn't mean very much. If someone is that paranoid they can pass 'localhost.localdomain' to SMTP.__init__. Eventually we should make 'localhost.localdomain' the default. Like I said, getfqdn() is a hack. We could probably make the change now and no one would care. I'm just being very conservative. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 11:00 Message: Logged In: YES user_id=12800 Oh sorry. A domain literal is something like [192.168.1.2] IOW, the IP address octets surrounded by square brackets. Should be easy enough to calculate. Attached is a proposed patch. As for the privacy violation, I don't think it's on the same level as the ftp issue because we're not divulging any information about the user. It could be argued that leaking the hostname might be enough to link the information to a specific user, and I might buy that argument, although it personally doesn't bother me too much (the IP address might be just as sufficient for linking and even NAT'd or DHCP'd addresses might be static enough to guess -- witness your own supposedly dynamic IP address :). And the IP will always be available via the socket peer. OTOH, Eduardo's claim isn't totally without merit. I'd like to be able to retain the ability to be properly RFC compliant, but could accept that the default be localhost.localdomain. If you (Guido) have a suggestion for an appropriate API for both these requirements, that would be great. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 10:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-24 23:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 16:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 13:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 10:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 18:04:07 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 10:04:07 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 13:04 Message: Logged In: YES user_id=12800 Hold on. We're conflating issues here. To address the privacy issue, "localhost.localdomain" should be used. I don't see anything else being an appropriate defense against identity leakage (but IMHO, it's a limited defense anyway because you'll *always* leak your IP address) To be "correct" IMO means adhering to RFC 2821 as closely as is possible. Which means use the fqdn if available, otherwise use the domain literal. See attached patch for that. If we don't want to be RFC-correct but we want to be liberal enough to handle misconfigured client systems, then gethostname() is probably fine, but so would be localhost.localdomain. If we want to be robust in the face of overly strict smtp servers, then I think you're in a losing battle because they may only accept fqdn's that are reverse resolvable. But that may be impossible for the (perhaps misconfigured) client to calculate. And if that's the case, then the client likely has bigger problems. My preference would be for the default to be RFC-correct (i.e. fqdn w/domain literal fallback), and allow overrides via method arguments, as the code with my proposed patch would implement. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:41 Message: Logged In: YES user_id=6380 OK. So is socket.gethostname() better than socket.getfqdn() or not? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 12:31 Message: Logged In: YES user_id=35752 So much discussion for such a little issue. :-) A misconfigured server must be part of your scenario. It's the only case were the hostname makes any difference. Using localhost.localdomain will work find on 99.99% of mail servers. For the remaining 0.01%, using socket.getfqdn() has a higher chance of working than using localhost.localdomain. If socket.getfqdn() can find a hostname that resolves back to the IP of the client side of the connection then it works. Using localhost.localdomain in that case will not work. If socket.getfqdn() cannot find the FQDN (due to NAT, tunnelling or whatever) things work just as well as if localhost.localdomain was used a default. Changing the default to localhost.localdomain fixes nothing! getfqdn() is a hack because it's relies on DNS. People always screw that up. :-) Regarding your suggested API change, I don't see how it would help. I doubt any code actually passes socket.getfqdn() to SMPT.helo(). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:16 Message: Logged In: YES user_id=6380 Neil: coping with a misconfigured server wasn't part of my scenario; only coping with a client that simply doesn't have a fqdn was. Some questions remain: (1) why can't we use localhost.localdomain today? (2) Why is getfqdn() a hack? (Apart from it being in the wrong module.) Hm, I just thought of something. Why shouldn't gethostname() be used as the default? Why bother with getfqdn() at all? At least when gethostname() returms something inappropriate for a particular server, it can be fixed locally by root by fixing the hostname. (This may explain why you think getfqdn() is a hack.) Barry: an appropriate API could be to change the default for local_hostname in __init__ to "localhost.localdomain" but to leave the code that sticks in socket.getfqdn() (or maybe just socket.gethostname()) if the value is explicitly given as None or empty. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 11:10 Message: Logged In: YES user_id=35752 There is no way that smtplib can automatically and reliably find the FQDN. socket.getfqdn() is a hack, IMHO. It doesn't really matter though. The chances of an email server rejecting email based on the domain name following the HELO verb is very small. I recall seeing only one in actual use. I still think the code is fine as it is. socket.getfqdn() aways returns something. Most mail servers don't care what it returns. Changing the default to 'localhost.localdomain' doesn't really solve anything. In your example, the script would still not work for the user trying to send email through a misconfigured server. It would reject 'localhost.localdomain' just like it rejected whatever socket.getfqdn() returned. The only possible arguments for using 'localhost.localdomain' are that it's faster (doesn't require a DNS lookup) and that it gives away less information. It doesn't give away much information though. The remote server already has the sender's IP address. The hostname shouldn't mean very much. If someone is that paranoid they can pass 'localhost.localdomain' to SMTP.__init__. Eventually we should make 'localhost.localdomain' the default. Like I said, getfqdn() is a hack. We could probably make the change now and no one would care. I'm just being very conservative. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 11:00 Message: Logged In: YES user_id=12800 Oh sorry. A domain literal is something like [192.168.1.2] IOW, the IP address octets surrounded by square brackets. Should be easy enough to calculate. Attached is a proposed patch. As for the privacy violation, I don't think it's on the same level as the ftp issue because we're not divulging any information about the user. It could be argued that leaking the hostname might be enough to link the information to a specific user, and I might buy that argument, although it personally doesn't bother me too much (the IP address might be just as sufficient for linking and even NAT'd or DHCP'd addresses might be static enough to guess -- witness your own supposedly dynamic IP address :). And the IP will always be available via the socket peer. OTOH, Eduardo's claim isn't totally without merit. I'd like to be able to retain the ability to be properly RFC compliant, but could accept that the default be localhost.localdomain. If you (Guido) have a suggestion for an appropriate API for both these requirements, that would be great. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 10:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-24 23:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 16:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 13:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 10:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 18:41:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 10:41:54 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 13:41 Message: Logged In: YES user_id=6380 I'm skeptical about the effectiveness of providing overrides through defaulted arguments; this is something the author of the program using smtplib must anticipate and give its user an option to override. (And because of that, I'm at best -0 on adding the local_hostname argument to the constructor, as Neil checked in.) I now agree that leaking the fqdn isn't much of a provacy breach. I agree that fqdn w/domain literal fallback is the best compromise. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 13:04 Message: Logged In: YES user_id=12800 Hold on. We're conflating issues here. To address the privacy issue, "localhost.localdomain" should be used. I don't see anything else being an appropriate defense against identity leakage (but IMHO, it's a limited defense anyway because you'll *always* leak your IP address) To be "correct" IMO means adhering to RFC 2821 as closely as is possible. Which means use the fqdn if available, otherwise use the domain literal. See attached patch for that. If we don't want to be RFC-correct but we want to be liberal enough to handle misconfigured client systems, then gethostname() is probably fine, but so would be localhost.localdomain. If we want to be robust in the face of overly strict smtp servers, then I think you're in a losing battle because they may only accept fqdn's that are reverse resolvable. But that may be impossible for the (perhaps misconfigured) client to calculate. And if that's the case, then the client likely has bigger problems. My preference would be for the default to be RFC-correct (i.e. fqdn w/domain literal fallback), and allow overrides via method arguments, as the code with my proposed patch would implement. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:41 Message: Logged In: YES user_id=6380 OK. So is socket.gethostname() better than socket.getfqdn() or not? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 12:31 Message: Logged In: YES user_id=35752 So much discussion for such a little issue. :-) A misconfigured server must be part of your scenario. It's the only case were the hostname makes any difference. Using localhost.localdomain will work find on 99.99% of mail servers. For the remaining 0.01%, using socket.getfqdn() has a higher chance of working than using localhost.localdomain. If socket.getfqdn() can find a hostname that resolves back to the IP of the client side of the connection then it works. Using localhost.localdomain in that case will not work. If socket.getfqdn() cannot find the FQDN (due to NAT, tunnelling or whatever) things work just as well as if localhost.localdomain was used a default. Changing the default to localhost.localdomain fixes nothing! getfqdn() is a hack because it's relies on DNS. People always screw that up. :-) Regarding your suggested API change, I don't see how it would help. I doubt any code actually passes socket.getfqdn() to SMPT.helo(). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:16 Message: Logged In: YES user_id=6380 Neil: coping with a misconfigured server wasn't part of my scenario; only coping with a client that simply doesn't have a fqdn was. Some questions remain: (1) why can't we use localhost.localdomain today? (2) Why is getfqdn() a hack? (Apart from it being in the wrong module.) Hm, I just thought of something. Why shouldn't gethostname() be used as the default? Why bother with getfqdn() at all? At least when gethostname() returms something inappropriate for a particular server, it can be fixed locally by root by fixing the hostname. (This may explain why you think getfqdn() is a hack.) Barry: an appropriate API could be to change the default for local_hostname in __init__ to "localhost.localdomain" but to leave the code that sticks in socket.getfqdn() (or maybe just socket.gethostname()) if the value is explicitly given as None or empty. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 11:10 Message: Logged In: YES user_id=35752 There is no way that smtplib can automatically and reliably find the FQDN. socket.getfqdn() is a hack, IMHO. It doesn't really matter though. The chances of an email server rejecting email based on the domain name following the HELO verb is very small. I recall seeing only one in actual use. I still think the code is fine as it is. socket.getfqdn() aways returns something. Most mail servers don't care what it returns. Changing the default to 'localhost.localdomain' doesn't really solve anything. In your example, the script would still not work for the user trying to send email through a misconfigured server. It would reject 'localhost.localdomain' just like it rejected whatever socket.getfqdn() returned. The only possible arguments for using 'localhost.localdomain' are that it's faster (doesn't require a DNS lookup) and that it gives away less information. It doesn't give away much information though. The remote server already has the sender's IP address. The hostname shouldn't mean very much. If someone is that paranoid they can pass 'localhost.localdomain' to SMTP.__init__. Eventually we should make 'localhost.localdomain' the default. Like I said, getfqdn() is a hack. We could probably make the change now and no one would care. I'm just being very conservative. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 11:00 Message: Logged In: YES user_id=12800 Oh sorry. A domain literal is something like [192.168.1.2] IOW, the IP address octets surrounded by square brackets. Should be easy enough to calculate. Attached is a proposed patch. As for the privacy violation, I don't think it's on the same level as the ftp issue because we're not divulging any information about the user. It could be argued that leaking the hostname might be enough to link the information to a specific user, and I might buy that argument, although it personally doesn't bother me too much (the IP address might be just as sufficient for linking and even NAT'd or DHCP'd addresses might be static enough to guess -- witness your own supposedly dynamic IP address :). And the IP will always be available via the socket peer. OTOH, Eduardo's claim isn't totally without merit. I'd like to be able to retain the ability to be properly RFC compliant, but could accept that the default be localhost.localdomain. If you (Guido) have a suggestion for an appropriate API for both these requirements, that would be great. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 10:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-24 23:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 16:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 13:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 10:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 18:56:52 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 10:56:52 -0800 Subject: [Patches] [ python-Patches-497736 ] smtplib.py SMTP EHLO/HELO correct Message-ID: Patches item #497736, was opened at 2001-12-29 20:20 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Closed Resolution: Rejected Priority: 5 Submitted By: Eduardo Pйrez (eperez) Assigned to: Neil Schemenauer (nascheme) Summary: smtplib.py SMTP EHLO/HELO correct Initial Comment: If the machine from you are sending mail doesn't have a FQDN and the mail server requires a FQDN in HELO the current code will fail. Resolving the name it's a very bad idea: - It's something from other layer (DNS/IP) not from SMTP - It breaks when the name of the computer is not FQDN (as many dial-ins do) and the SMTP server does strict EHLO/HELO checking as stated before. - It breaks computers with a TCP tunnel to another host from the connection is originated if the relay does strict EHLO/HELO checking. - It breaks computers using NAT, the host that sees the server is not the one that sends the message if the relay does strict EHLO/HELO checking. - It's considered spyware as you are sending information some companies or people don't want to say: the internal structure of the network. No important mail client resolves the name. Look at netscape messenger or kmail. In fact kmail and perl's Net::SMTP does exactly what my patch does. Please don't resolve the names, as this approach works and the most used email clients do this. I send you the bugfix. ---------------------------------------------------------------------- >Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 13:56 Message: Logged In: YES user_id=12800 Cool, I will apply my patch and update the documentation. I'll leave the default argument as Neil implemented. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 13:41 Message: Logged In: YES user_id=6380 I'm skeptical about the effectiveness of providing overrides through defaulted arguments; this is something the author of the program using smtplib must anticipate and give its user an option to override. (And because of that, I'm at best -0 on adding the local_hostname argument to the constructor, as Neil checked in.) I now agree that leaking the fqdn isn't much of a provacy breach. I agree that fqdn w/domain literal fallback is the best compromise. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 13:04 Message: Logged In: YES user_id=12800 Hold on. We're conflating issues here. To address the privacy issue, "localhost.localdomain" should be used. I don't see anything else being an appropriate defense against identity leakage (but IMHO, it's a limited defense anyway because you'll *always* leak your IP address) To be "correct" IMO means adhering to RFC 2821 as closely as is possible. Which means use the fqdn if available, otherwise use the domain literal. See attached patch for that. If we don't want to be RFC-correct but we want to be liberal enough to handle misconfigured client systems, then gethostname() is probably fine, but so would be localhost.localdomain. If we want to be robust in the face of overly strict smtp servers, then I think you're in a losing battle because they may only accept fqdn's that are reverse resolvable. But that may be impossible for the (perhaps misconfigured) client to calculate. And if that's the case, then the client likely has bigger problems. My preference would be for the default to be RFC-correct (i.e. fqdn w/domain literal fallback), and allow overrides via method arguments, as the code with my proposed patch would implement. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:41 Message: Logged In: YES user_id=6380 OK. So is socket.gethostname() better than socket.getfqdn() or not? ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 12:31 Message: Logged In: YES user_id=35752 So much discussion for such a little issue. :-) A misconfigured server must be part of your scenario. It's the only case were the hostname makes any difference. Using localhost.localdomain will work find on 99.99% of mail servers. For the remaining 0.01%, using socket.getfqdn() has a higher chance of working than using localhost.localdomain. If socket.getfqdn() can find a hostname that resolves back to the IP of the client side of the connection then it works. Using localhost.localdomain in that case will not work. If socket.getfqdn() cannot find the FQDN (due to NAT, tunnelling or whatever) things work just as well as if localhost.localdomain was used a default. Changing the default to localhost.localdomain fixes nothing! getfqdn() is a hack because it's relies on DNS. People always screw that up. :-) Regarding your suggested API change, I don't see how it would help. I doubt any code actually passes socket.getfqdn() to SMPT.helo(). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 12:16 Message: Logged In: YES user_id=6380 Neil: coping with a misconfigured server wasn't part of my scenario; only coping with a client that simply doesn't have a fqdn was. Some questions remain: (1) why can't we use localhost.localdomain today? (2) Why is getfqdn() a hack? (Apart from it being in the wrong module.) Hm, I just thought of something. Why shouldn't gethostname() be used as the default? Why bother with getfqdn() at all? At least when gethostname() returms something inappropriate for a particular server, it can be fixed locally by root by fixing the hostname. (This may explain why you think getfqdn() is a hack.) Barry: an appropriate API could be to change the default for local_hostname in __init__ to "localhost.localdomain" but to leave the code that sticks in socket.getfqdn() (or maybe just socket.gethostname()) if the value is explicitly given as None or empty. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 11:10 Message: Logged In: YES user_id=35752 There is no way that smtplib can automatically and reliably find the FQDN. socket.getfqdn() is a hack, IMHO. It doesn't really matter though. The chances of an email server rejecting email based on the domain name following the HELO verb is very small. I recall seeing only one in actual use. I still think the code is fine as it is. socket.getfqdn() aways returns something. Most mail servers don't care what it returns. Changing the default to 'localhost.localdomain' doesn't really solve anything. In your example, the script would still not work for the user trying to send email through a misconfigured server. It would reject 'localhost.localdomain' just like it rejected whatever socket.getfqdn() returned. The only possible arguments for using 'localhost.localdomain' are that it's faster (doesn't require a DNS lookup) and that it gives away less information. It doesn't give away much information though. The remote server already has the sender's IP address. The hostname shouldn't mean very much. If someone is that paranoid they can pass 'localhost.localdomain' to SMTP.__init__. Eventually we should make 'localhost.localdomain' the default. Like I said, getfqdn() is a hack. We could probably make the change now and no one would care. I'm just being very conservative. ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-25 11:00 Message: Logged In: YES user_id=12800 Oh sorry. A domain literal is something like [192.168.1.2] IOW, the IP address octets surrounded by square brackets. Should be easy enough to calculate. Attached is a proposed patch. As for the privacy violation, I don't think it's on the same level as the ftp issue because we're not divulging any information about the user. It could be argued that leaking the hostname might be enough to link the information to a specific user, and I might buy that argument, although it personally doesn't bother me too much (the IP address might be just as sufficient for linking and even NAT'd or DHCP'd addresses might be static enough to guess -- witness your own supposedly dynamic IP address :). And the IP will always be available via the socket peer. OTOH, Eduardo's claim isn't totally without merit. I'd like to be able to retain the ability to be properly RFC compliant, but could accept that the default be localhost.localdomain. If you (Guido) have a suggestion for an appropriate API for both these requirements, that would be great. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 10:23 Message: Logged In: YES user_id=6380 Sorry, but what's a domain literal? I think that it's better not to get the client involved in getting this right; for example, someone might write a useful tool that sends email around, and then someone else might try to use this tool from a machine that doesn't have a fqdn. The author might not have thought of this (rather uncommon) situation; the user might not have enough Python whizz to know how to fix it. I'd like to hear also what you think of Eduardo's opinion that sending the fqdn is a privacy violation of the same kind as ftplib defaulting to sending username@hostname as the default password for anonymous login (which we did fix). If *you* (Barry) think this is without merit, it must be without merit. :-) ---------------------------------------------------------------------- Comment By: Barry Warsaw (bwarsaw) Date: 2002-03-24 23:00 Message: Logged In: YES user_id=12800 Sorry to take so long to respond on this one. RFC 2821 is the latest standard that smtplib.py should adhere to. Quoting: [HELO and EHLO] are used to identify the SMTP client to the SMTP server. The argument field contains the fully-qualified domain name of the SMTP client if one is available. In situations in which the SMTP client system does not have a meaningful domain name (e.g., when its address is dynamically allocated and no reverse mapping record is available), the client SHOULD send an address literal (see section 4.1.3), optionally followed by information that will help to identify the client system. Thus, I believe that sending the FQDN is the right default, although socket.getfqdn() should be used for portability. Neil's patch is the correct one (although there's a typo in the docstring, which I'll fix). By default the fqdn is used, but the user has the option to supply the local hostname as an argument to the SMTP constructor. Since RFC 2821's admonition is that the client SHOULD use a domain literal if the fqdn isn't available, I'm happy to leave it up to the client to get any supplied argument right. If we wanted to be more RFC-compliant, SMTP.__init__() could possibly check socket.getfqdn() to see if the return value was indeed fully-qualified, and if not, craft a domain literal for the HELO/EHLO. Since this is a SHOULD and not a MUST, I'm happy with the current behavior, but if you want to provide a patch for better RFC compliance here, I'd be happy to review it. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 16:51 Message: Logged In: YES user_id=35752 Did you read what I wrote? 220 cranky ESMTP Postfix (Debian/GNU) HELO localhost.localdomain 250 cranky MAIL FROM: 250 Ok RCPT TO: DATA 450 : Helo command rejected: Host not found 554 Error: no valid recipients Bring it up again in another few years and we will change the default. ---------------------------------------------------------------------- Comment By: Eduardo Pйrez (eperez) Date: 2002-03-24 13:39 Message: Logged In: YES user_id=60347 RFC 1123 was written 11 years ago when there weren't dial-ins, TCP tunnels, nor NATs. This patch fix scripts that run on computers that have the explained SMTP access, and it doesn't break any script I know about. Could you tell me cases were the current approach works and the patch proposed fails? I know the cases explained above were the current approach doesn't work and this patch works successfully. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 10:37 Message: Logged In: YES user_id=35752 I'm rejecting this patch. RFC 1123 requires that name sent after the HELO verb is "a valid principal host domain name for the client host". While RFC 1123 goes on to prohibit HELO-based rejections it is possible that some servers do reject mail based on HELO. Thus, changing the hostname sent to "localhost.localdomain" could potentially break scripts that currently work. The concern raised is still valid however. Finding the FQDN using gethostbyname() is unreliable. To address this concern I've added a "local_hostname" argument to the SMTP __init__ method. If provided it is used as the local hostname for the HELO and EHLO verbs. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-24 07:06 Message: Logged In: YES user_id=6380 Since Barry has not expressed any interest in this patch, reassigning to Neil, and set status to Accepted. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-23 20:42 Message: Logged In: YES user_id=35752 This patch looks correct in theory to me. Trying to find the FQDN is wrong, IMHO. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-12-29 21:24 Message: Logged In: YES user_id=6380 Seems reasonable to me, but I lack the SMTP knowledge to understand all the issues. Assigned to Barry Warsaw for review. (Barry: Eduardo found a similar privacy violation in ftplib, which I fixed. You might also ask Thomas Wouters for a review of the underlying idea.) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=497736&group_id=5470 From noreply@sourceforge.net Mon Mar 25 21:07:12 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 13:07:12 -0800 Subject: [Patches] [ python-Patches-476814 ] foreign-platform newline support Message-ID: Patches item #476814, was opened at 2001-10-31 11:41 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470 Category: Core (C code) Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jack Jansen (jackjansen) Assigned to: Jack Jansen (jackjansen) Summary: foreign-platform newline support Initial Comment: This patch enables Python to interpret all known newline conventions, CR, LF or CRLF, on all platforms. This support is enabled by configuring with --with-universal-newlines (so by default it is off, and everything should behave as usual). With universal newline support enabled two things happen: - When importing or otherwise parsing .py files any newline convention is accepted. - Python code can pass a new "t" mode parameter to open() which reads files with any newline convention. "t" cannot be combined with any other mode flags like "w" or "+", for obvious reasons. File objects have a new attribute "newlines" which contains the type of newlines encountered in the file (or None when no newline has been seen, or "mixed" if there were various types of newlines). Also included is a test script which tests both file I/O and parsing. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2002-03-25 16:07 Message: Logged In: YES user_id=6380 Thanks! But there's no documentation. Could I twist your arm for a separate doc patch? I'm tempted to give this a +1, but I'd like to hear from MvL and MAL to see if they foresee any interaction with their PEP 262 implemetation. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-03-13 17:44 Message: Logged In: YES user_id=45365 A new version of the patch. Main differences are that U is now the mode character to trigger universal newline input and --with-universal-newlines is default on. ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2002-01-16 17:47 Message: Logged In: YES user_id=45365 This version of the patch addresses the bug in Py_UniversalNewlineFread and fixes up some minor details. Tim's other issues are addressed (at least: I think they are:-) in a forthcoming PEP. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-12-13 18:57 Message: Logged In: YES user_id=31435 Back to Jack -- and sorry for sitting on it so long. Clearly this isn't making it into 2.2 in the core. As I said on Python-Dev, I believe this needs a PEP: the design decisions are debatable, so *should* be debated outside the Mac community too. Note, though, that I can't stop you from adding it to the 2.2 Mac distribution (if you want it badly enough there). If a PEP won't be written, I suggest finding someone else to review it again; maybe Guido. Note that the patch needs doc changes too. The patch to regrtest.py doesn't belong here (I assume it just slipped in). There seems a lot of code in support of the f_newlinetypes member, and the value of that member isn't clear -- I can't imagine a good use for it (maybe it's a Mac thing?). The implementation of Py_UniversalNewlineFread appears incorrect to me: it reads n bytes *every* time around the outer loop, no matter how few characters are still required, and n doesn't change inside the loop. The business about the GIL may be due to the lack of docs: are, or are not, people supposed to release the GIL themselves around calls to these guys? It's not documented, and it appears your intent differed from my guess. Finally, it would be better to call ferror () after calling fread() instead of before it . ---------------------------------------------------------------------- Comment By: Jack Jansen (jackjansen) Date: 2001-11-14 10:13 Message: Logged In: YES user_id=45365 Here's a new version of the patch. To address your issues one by one: - get_line and Py_UniversalNewlineFgets are too difficult to integrate, at leat, I don't see how I could do it. The storage management of get_line gets in the way. - The global lock comment I don't understand. The Universal... routines are replacements for fgets() and fread(), so have nothing to do with the interpreter lock. - The logic of all three routines (get_line too) has changed and I've put comments in. I hope this addresses some of the points. - If universal_newline is false for a certain PyFileObject we now immedeately take a quick exit via fgets() or fread(). There's also a new test script, that tests some more border cases (like lines longer than 100 characters, and a lone CR just before end of file). ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2001-11-05 03:16 Message: Logged In: YES user_id=31435 It would be better if get_line just called Py_UniversalNewlineFgets (when appropriate) instead of duplicating its logic inline. Py_UniversalNewlineFgets and Py_UniversalNewlineFread should deal with releasing the global lock themselves -- the correct granularity for lock release/reacquire is around the C-level input routines (esp. for fread). The new routines never check for I/O errors! Why not? It seems essential. The new Fgets checks for EOF at the end of the loop instead of the top. This is surprising, and I stared a long time in vain trying to guess why. Setting newlinetypes |= NEWLINE_CR; immediately after seeing an '\r' would be as fast (instead of waiting to see EOF and then inferring the prior existence of '\r' indirectly from the state of the skipnextlf flag). Speaking of which , the fobj tests in the inner loop waste cycles. Set the local flag vrbls whether or not fobj is NULL. When you're *out* of the inner loop you can simply decline to store the new masks when fobj is NULL (and you're already doing the latter anyway). A test and branch inside the loop is much more expensive than or'ing in a flag bit inside the loop, ditto harder to understand. Floating the univ_newline test out of the loop (and duplicating the loop body, one way for univ_newline true and the other for it false) would also save a test and branch on every character. Doing fread one character at a time is very inefficient. Since you know you need to obtain n characters in the end, and that these transformations require reading at least n characters, you could very profitably read n characters in one gulp at the start, then switch to k at a time where k is the number of \r\n pairs seen since the last fread call. This is easier to code than it sounds . It would be fine by me if you included (and initialized) the new file-object fields all the time, whether or not universal newlines are configured. I'd rather waste a few bytes in a file object than see #ifdefs spread thru the code. I'll be damned if I can think of a quick way to do this stuff on Windows -- native Windows fgets() is still the only Windows handle we have on avoiding crushing thread overhead inside MS's C library. I'll think some more about it (the thrust still being to eliminate the 't' mode flag, as whined about on Python-Dev). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-31 12:38 Message: Logged In: YES user_id=6380 Tim, can you review this or pass it on to someone else who has time? Jack developed this patch after a discussion in which I was involved in some of the design, but I won't have time to look at it until December. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=476814&group_id=5470 From noreply@sourceforge.net Mon Mar 25 21:12:21 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 13:12:21 -0800 Subject: [Patches] [ python-Patches-534862 ] help asyncore recover from repr() probs Message-ID: Patches item #534862, was opened at 2002-03-25 15:12 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534862&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: help asyncore recover from repr() probs Initial Comment: I've had this patch my my copy of asyncore.py for quite awhile. It works for me as a way to recover from repr() bogosities, though I'm unfamiliar enough with repr/str issues and asyncore to know if this is the right way to make it more bulletproof (or if it should even be made more bulletproof). Skip ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534862&group_id=5470 From noreply@sourceforge.net Mon Mar 25 21:33:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 13:33:08 -0800 Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput Message-ID: Patches item #516297, was opened at 2002-02-11 19:56 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Neil Schemenauer (nascheme) Summary: iterator for lineinput Initial Comment: Taking the route of least evasiveness, I have come up with a VERY simple iterator interface for fileinput. Basically, __iter__() returns self and next() calls __getitem__() with the proper number. This was done to have the patch only add methods and not change any existing ones, thus minimizing any chance of breaking existing code. Now the module on the whole, however, could possibly stand an update now that generators are coming. I have a recipe up at the Cookbook that uses generators to implement fileinput w/o in-place editing (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506). If there is enough interest, I would be quite willing to rewrite fileinput using generators. And if some of the unneeded methods could be deprecated (__getitem__, readline), then the whole module could probably be cleaned up a decent amount and have a possible speed improvement. ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2002-03-25 13:33 Message: Logged In: YES user_id=357491 Adding an iterator interface that returns itself means that you only need to keep track of a single object. Using the iter() fxn on the original fileinput returns a canned iterator that has none of the methods that a FileInput instance has. This means that if you want to stop iterating over the current file and move on to the next one in the FileInput instance, you have to call .nextfile() on the original object; you can't call it on the iterator. Having the __iter__() method return the instance itself means that you can call .nextfile() on the iterator (or the original since they are the same). It also also updates the module (albeit in a hackish way) to be a little bit more modern. Also note that I uploaded a new diff and deleted the old one; I accidently left out the return command in the original diff. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 20:35 Message: Logged In: YES user_id=35752 Why do you need fileinput to have a __iter__ method? As far as I can see it only slows things down. As it is now iter(fileinput.input()) works just fine. Adding __iter__ and next() just add another layer of method calls. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 From noreply@sourceforge.net Mon Mar 25 21:43:01 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 13:43:01 -0800 Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput Message-ID: Patches item #516297, was opened at 2002-02-12 03:56 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Neil Schemenauer (nascheme) Summary: iterator for lineinput Initial Comment: Taking the route of least evasiveness, I have come up with a VERY simple iterator interface for fileinput. Basically, __iter__() returns self and next() calls __getitem__() with the proper number. This was done to have the patch only add methods and not change any existing ones, thus minimizing any chance of breaking existing code. Now the module on the whole, however, could possibly stand an update now that generators are coming. I have a recipe up at the Cookbook that uses generators to implement fileinput w/o in-place editing (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506). If there is enough interest, I would be quite willing to rewrite fileinput using generators. And if some of the unneeded methods could be deprecated (__getitem__, readline), then the whole module could probably be cleaned up a decent amount and have a possible speed improvement. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 21:43 Message: Logged In: YES user_id=35752 I'm still not getting it. It only way to get an 'iterator' object wrapping the FileInput instance is to call iter() on it. Why would you want to do that? Just use readline() and nextfile(). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-25 21:33 Message: Logged In: YES user_id=357491 Adding an iterator interface that returns itself means that you only need to keep track of a single object. Using the iter() fxn on the original fileinput returns a canned iterator that has none of the methods that a FileInput instance has. This means that if you want to stop iterating over the current file and move on to the next one in the FileInput instance, you have to call .nextfile() on the original object; you can't call it on the iterator. Having the __iter__() method return the instance itself means that you can call .nextfile() on the iterator (or the original since they are the same). It also also updates the module (albeit in a hackish way) to be a little bit more modern. Also note that I uploaded a new diff and deleted the old one; I accidently left out the return command in the original diff. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 04:35 Message: Logged In: YES user_id=35752 Why do you need fileinput to have a __iter__ method? As far as I can see it only slows things down. As it is now iter(fileinput.input()) works just fine. Adding __iter__ and next() just add another layer of method calls. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 From noreply@sourceforge.net Mon Mar 25 22:45:32 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Mon, 25 Mar 2002 14:45:32 -0800 Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput Message-ID: Patches item #516297, was opened at 2002-02-11 19:56 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Neil Schemenauer (nascheme) Summary: iterator for lineinput Initial Comment: Taking the route of least evasiveness, I have come up with a VERY simple iterator interface for fileinput. Basically, __iter__() returns self and next() calls __getitem__() with the proper number. This was done to have the patch only add methods and not change any existing ones, thus minimizing any chance of breaking existing code. Now the module on the whole, however, could possibly stand an update now that generators are coming. I have a recipe up at the Cookbook that uses generators to implement fileinput w/o in-place editing (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506). If there is enough interest, I would be quite willing to rewrite fileinput using generators. And if some of the unneeded methods could be deprecated (__getitem__, readline), then the whole module could probably be cleaned up a decent amount and have a possible speed improvement. ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2002-03-25 14:45 Message: Logged In: YES user_id=357491 The point of the patch was to put an iterator interface on to fileinput without requiring a wrapper. Basically it was to update it so that it if for some reason for loops start to require an iterator interface then it is already done. It was also to make sure that if an iterator was needed that it would have all the methods it could need. A side-effect is the need for one less object if you want an iterator since __iter__ just returns self. One possible desire of this is passing around the instance. If you pass the iterator as fileinput is now implemented you don't have the access to the original instance and thus can't use any of its methods. If you pass the FileInput instance you would have to regenerate the iterator everytime you wanted to use it after being passed. With this implementation you just pass the original instance since it can act as a FileInput instance or an iterator. I realize this is not down-right needed, I am not arguing there. I am just saying that it is a nice feature to have that does not add any excessive feature to the language. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 13:43 Message: Logged In: YES user_id=35752 I'm still not getting it. It only way to get an 'iterator' object wrapping the FileInput instance is to call iter() on it. Why would you want to do that? Just use readline() and nextfile(). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-25 13:33 Message: Logged In: YES user_id=357491 Adding an iterator interface that returns itself means that you only need to keep track of a single object. Using the iter() fxn on the original fileinput returns a canned iterator that has none of the methods that a FileInput instance has. This means that if you want to stop iterating over the current file and move on to the next one in the FileInput instance, you have to call .nextfile() on the original object; you can't call it on the iterator. Having the __iter__() method return the instance itself means that you can call .nextfile() on the iterator (or the original since they are the same). It also also updates the module (albeit in a hackish way) to be a little bit more modern. Also note that I uploaded a new diff and deleted the old one; I accidently left out the return command in the original diff. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-24 20:35 Message: Logged In: YES user_id=35752 Why do you need fileinput to have a __iter__ method? As far as I can see it only slows things down. As it is now iter(fileinput.input()) works just fine. Adding __iter__ and next() just add another layer of method calls. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 From dcpo@163.com Tue Mar 26 06:12:46 2002 From: dcpo@163.com (dcpo@163.com) Date: Tue, 26 Mar 2002 14:12:46 +0800 Subject: [Patches] =?GB2312?B?1tC2q7XPsN20ury+ufq8ysnMxreyqcDAu+E=?= Message-ID: This is a multi-part message in MIME format --=_NextPart_000_0018_01C1A1B3.E2C7FE20 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable =D5=E2=CA=C7=D2=BB=B8=F6HTML=B8=F1=CA=BD=B5=C4=D3=CA=BC=FE --=_NextPart_000_0018_01C1A1B3.E2C7FE20 Content-Type: text/html Content-Transfer-Encoding: quoted-printable Mail<body bgcolor=3D"#FFFFFF" text=3D= "#000000"></body> --=_NextPart_000_0018_01C1A1B3.E2C7FE20-- From noreply@sourceforge.net Tue Mar 26 18:42:31 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 26 Mar 2002 10:42:31 -0800 Subject: [Patches] [ python-Patches-535335 ] 2.2 patches for BSD/OS 5.0 Message-ID: Patches item #535335, was opened at 2002-03-26 13:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Jeffrey Honig (jchonig) Assigned to: Nobody/Anonymous (nobody) Summary: 2.2 patches for BSD/OS 5.0 Initial Comment: The following patches were necessary to get Python 2.2 to work on BSD/OS 5.0. More may follow as we are still attempting to resolve some issues related to the regression tests (although these may be OS issues). Thanks. Jeff ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470 From noreply@sourceforge.net Tue Mar 26 18:53:46 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 26 Mar 2002 10:53:46 -0800 Subject: [Patches] [ python-Patches-535335 ] 2.2 patches for BSD/OS 5.0 Message-ID: Patches item #535335, was opened at 2002-03-26 13:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Jeffrey Honig (jchonig) Assigned to: Nobody/Anonymous (nobody) Summary: 2.2 patches for BSD/OS 5.0 Initial Comment: The following patches were necessary to get Python 2.2 to work on BSD/OS 5.0. More may follow as we are still attempting to resolve some issues related to the regression tests (although these may be OS issues). Thanks. Jeff ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-26 13:53 Message: Logged In: YES user_id=33168 Lib/posixfile.py & Lib/test/test_fcntl.py seem harmless. configure is generated, so configure.in will need the changes made to it. There seem to be many tests which fail, but perhaps shouldn't: fork1, locale, minidom, poll, pyexpat, sax, unicode_file? I'm also unsure of the benefit of adding contrib/{lib/include} to setup.py. This could be fine, but I don't know anything about distutils. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470 From noreply@sourceforge.net Tue Mar 26 19:08:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 26 Mar 2002 11:08:27 -0800 Subject: [Patches] [ python-Patches-535335 ] 2.2 patches for BSD/OS 5.0 Message-ID: Patches item #535335, was opened at 2002-03-26 13:42 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470 Category: Build Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Jeffrey Honig (jchonig) Assigned to: Nobody/Anonymous (nobody) Summary: 2.2 patches for BSD/OS 5.0 Initial Comment: The following patches were necessary to get Python 2.2 to work on BSD/OS 5.0. More may follow as we are still attempting to resolve some issues related to the regression tests (although these may be OS issues). Thanks. Jeff ---------------------------------------------------------------------- >Comment By: Jeffrey Honig (jchonig) Date: 2002-03-26 14:08 Message: Logged In: YES user_id=96862 Re: configure.in vs configure: we don't use autoconf here so modifying configure.in doesn't help us. I should have copies the changes and submitted them, but then they aren't too hard to figure out.... Re: contrib{lib/include}: We install many of the packages that we install from the net (which we call contrib packages) into the /usr/contrib heirarchy. They won't be found by setup.py unless those paths are present. Re: regrtest.py: Apologies about the regrtest.py content, there are some tests in there that shouldn't be, ignore it for now, I'll submit an update later. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-26 13:53 Message: Logged In: YES user_id=33168 Lib/posixfile.py & Lib/test/test_fcntl.py seem harmless. configure is generated, so configure.in will need the changes made to it. There seem to be many tests which fail, but perhaps shouldn't: fork1, locale, minidom, poll, pyexpat, sax, unicode_file? I'm also unsure of the benefit of adding contrib/{lib/include} to setup.py. This could be fine, but I don't know anything about distutils. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=535335&group_id=5470 From noreply@sourceforge.net Tue Mar 26 20:31:17 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Tue, 26 Mar 2002 12:31:17 -0800 Subject: [Patches] [ python-Patches-516297 ] iterator for lineinput Message-ID: Patches item #516297, was opened at 2002-02-12 03:56 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 Category: Library (Lib) Group: Python 2.3 >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Brett Cannon (bcannon) Assigned to: Neil Schemenauer (nascheme) Summary: iterator for lineinput Initial Comment: Taking the route of least evasiveness, I have come up with a VERY simple iterator interface for fileinput. Basically, __iter__() returns self and next() calls __getitem__() with the proper number. This was done to have the patch only add methods and not change any existing ones, thus minimizing any chance of breaking existing code. Now the module on the whole, however, could possibly stand an update now that generators are coming. I have a recipe up at the Cookbook that uses generators to implement fileinput w/o in-place editing (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/112506). If there is enough interest, I would be quite willing to rewrite fileinput using generators. And if some of the unneeded methods could be deprecated (__getitem__, readline), then the whole module could probably be cleaned up a decent amount and have a possible speed improvement. ---------------------------------------------------------------------- >Comment By: Neil Schemenauer (nascheme) Date: 2002-03-26 20:31 Message: Logged In: YES user_id=35752 I've checked in a modified version of this patch. Instead of FileInput.next calling FileInput.__getitem__ I've made __getitem__ call next. This keeps the common case of "for line in fileinput.input()" fast. See fileinput 1.9. ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-25 22:45 Message: Logged In: YES user_id=357491 The point of the patch was to put an iterator interface on to fileinput without requiring a wrapper. Basically it was to update it so that it if for some reason for loops start to require an iterator interface then it is already done. It was also to make sure that if an iterator was needed that it would have all the methods it could need. A side-effect is the need for one less object if you want an iterator since __iter__ just returns self. One possible desire of this is passing around the instance. If you pass the iterator as fileinput is now implemented you don't have the access to the original instance and thus can't use any of its methods. If you pass the FileInput instance you would have to regenerate the iterator everytime you wanted to use it after being passed. With this implementation you just pass the original instance since it can act as a FileInput instance or an iterator. I realize this is not down-right needed, I am not arguing there. I am just saying that it is a nice feature to have that does not add any excessive feature to the language. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 21:43 Message: Logged In: YES user_id=35752 I'm still not getting it. It only way to get an 'iterator' object wrapping the FileInput instance is to call iter() on it. Why would you want to do that? Just use readline() and nextfile(). ---------------------------------------------------------------------- Comment By: Brett Cannon (bcannon) Date: 2002-03-25 21:33 Message: Logged In: YES user_id=357491 Adding an iterator interface that returns itself means that you only need to keep track of a single object. Using the iter() fxn on the original fileinput returns a canned iterator that has none of the methods that a FileInput instance has. This means that if you want to stop iterating over the current file and move on to the next one in the FileInput instance, you have to call .nextfile() on the original object; you can't call it on the iterator. Having the __iter__() method return the instance itself means that you can call .nextfile() on the iterator (or the original since they are the same). It also also updates the module (albeit in a hackish way) to be a little bit more modern. Also note that I uploaded a new diff and deleted the old one; I accidently left out the return command in the original diff. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-25 04:35 Message: Logged In: YES user_id=35752 Why do you need fileinput to have a __iter__ method? As far as I can see it only slows things down. As it is now iter(fileinput.input()) works just fine. Adding __iter__ and next() just add another layer of method calls. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=516297&group_id=5470 From noreply@sourceforge.net Thu Mar 28 07:21:43 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 27 Mar 2002 23:21:43 -0800 Subject: [Patches] [ python-Patches-536117 ] Typo in turtle.py Message-ID: Patches item #536117, was opened at 2002-03-28 08:21 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536117&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Nobody/Anonymous (nobody) Summary: Typo in turtle.py Initial Comment: Guy Barre has detected a typo (a missing self.) in turtle.py. This patch comes from the correction he suggested in the python-fr mailing list. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536117&group_id=5470 From noreply@sourceforge.net Thu Mar 28 07:28:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 27 Mar 2002 23:28:48 -0800 Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files Message-ID: Patches item #536120, was opened at 2002-03-28 08:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Nobody/Anonymous (nobody) Summary: splitext and leading point of hidden files Initial Comment: The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs'). The patch is intended to leave the leading point as part of the name. Existing code will possibly break, so this patch is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470 From noreply@sourceforge.net Thu Mar 28 07:33:59 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 27 Mar 2002 23:33:59 -0800 Subject: [Patches] [ python-Patches-536125 ] Typo in turtle.py Message-ID: Patches item #536125, was opened at 2002-03-28 08:33 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Nobody/Anonymous (nobody) Summary: Typo in turtle.py Initial Comment: Guy Barre has detected a typo (a missing self.) in turtle.py. This patch comes from the correction he suggested in the python-fr mailing list. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470 From noreply@sourceforge.net Thu Mar 28 07:38:39 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Wed, 27 Mar 2002 23:38:39 -0800 Subject: [Patches] [ python-Patches-536125 ] Typo in turtle.py Message-ID: Patches item #536125, was opened at 2002-03-28 08:33 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470 Category: Tkinter Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Nobody/Anonymous (nobody) Summary: Typo in turtle.py Initial Comment: Guy Barre has detected a typo (a missing self.) in turtle.py. This patch comes from the correction he suggested in the python-fr mailing list. ---------------------------------------------------------------------- >Comment By: Sebastien Keim (s_keim) Date: 2002-03-28 08:38 Message: Logged In: YES user_id=498191 I have done a little mistake. This patch is the same than patch number 536177. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536125&group_id=5470 From noreply@sourceforge.net Thu Mar 28 08:02:48 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 28 Mar 2002 00:02:48 -0800 Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files Message-ID: Patches item #536120, was opened at 2002-03-28 02:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Nobody/Anonymous (nobody) Summary: splitext and leading point of hidden files Initial Comment: The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs'). The patch is intended to leave the leading point as part of the name. Existing code will possibly break, so this patch is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-28 03:02 Message: Logged In: YES user_id=31435 I expect this change has scant chance of being accepted. The idea that leading dot means "hidden" is an arbitrary convention of the ls utility, and your desire to call a .name file "pure name" instead of "pure extension" seems arbitrary too. The behavior of splitext is perfectly predictable as-is across platforms now (note the implication: if you intend to change the semantics for posixpath, you'll also have to sell that it should be changed for dospath.py, ntpath.py, macpath.py, os2emxpath.py, and riscospath.py). Note that the patched function splits, e.g., '/usr/local/tim.one/seven' into '/usr/local/tim' and '.one/seven' I assume that's not the result you intended. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470 From noreply@sourceforge.net Thu Mar 28 08:42:33 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Thu, 28 Mar 2002 00:42:33 -0800 Subject: [Patches] [ python-Patches-536120 ] splitext and leading point of hidden files Message-ID: Patches item #536120, was opened at 2002-03-28 08:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536120&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Sebastien Keim (s_keim) Assigned to: Nobody/Anonymous (nobody) Summary: splitext and leading point of hidden files Initial Comment: The posixpath.splitext function doesn't do the right thing with leading point of hidden files. For sample: splitext('.emacs')==('','.emacs'). The patch is intended to leave the leading point as part of the name. Existing code will possibly break, so this patch is probably quite controversial. If the behaviour change is rejected, the patch could be modified to improve performances without behaviour changes. ---------------------------------------------------------------------- >Comment By: Sebastien Keim (s_keim) Date: 2002-03-28 09:42 Message: Logged In: YES user_id=498191 oop's your right. I thought that the for loop was only a reminiscence of the time when the string module was coded in python. In fact it seems that things are a little more complex than I intended :( But if we replace: if i<1 or p[i-1]=='/': by: if i<0 or i



 

ў№ їшДЎѕКАє Б¤єёїґґЩёй Б¤БЯИч »з°ъ µеё®ёз, јцЅЕ °ЕєОё¦ ЗШБЦЅГёй ґЩАЅєОЕНґВ ёЮАПАМ №ЯјЫµЗБц ѕКА» °НАФґПґЩ.
ў№ ёЮАПЕ¬¶уАМѕрЖ®АЗ ЗКЕН ±вґЙА» АМїлЗПї© [±¤°н] №®±ёё¦ ЗКЕНёµЗПёй ёрµз ±¤°н ёЮАПА» АЪµїАё·О ВчґЬЗПЅЗ јц АЦЅАґПґЩ.

јцЅЕ°ЕєО(Unsubscribe)
From noreply@sourceforge.net Sat Mar 30 08:58:37 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 30 Mar 2002 00:58:37 -0800 Subject: [Patches] [ python-Patches-536908 ] missing #include guards/extern "C" Message-ID: Patches item #536908, was opened at 2002-03-29 22:10 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: David Abrahams (david_abrahams) Assigned to: Nobody/Anonymous (nobody) >Summary: missing #include guards/extern "C" Initial Comment: cvs server: Diffing . Index: cStringIO.h ====================================================== ============= RCS file: /cvsroot/python/python/dist/src/Include/cStringI O.h,v retrieving revision 2.15 diff -r2.15 cStringIO.h 2a3,5 > #ifdef __cplusplus > extern "C" { > #endif 130a134,136 > #ifdef __cplusplus > } > #endif Index: descrobject.h ====================================================== ============= RCS file: /cvsroot/python/python/dist/src/Include/descrobj ect.h,v retrieving revision 2.8 diff -r2.8 descrobject.h 1a2,6 > #ifndef Py_DESCROBJECT_H > #define Py_DESCROBJECT_H > #ifdef __cplusplus > extern "C" { > #endif 80a86,88 > #ifdef __cplusplus > } > #endif Index: iterobject.h ====================================================== ============= RCS file: /cvsroot/python/python/dist/src/Include/iterobje ct.h,v retrieving revision 1.3 diff -r1.3 iterobject.h 0a1,2 > #ifndef Py_ITEROBJECT_H > #define Py_ITEROBJECT_H 1a4,6 > #ifdef __cplusplus > extern "C" { > #endif 13a19,22 > #ifdef __cplusplus > } > #endif > #endif Py_ITEROBJECT_H ---------------------------------------------------------------------- >Comment By: Martin v. Lцwis (loewis) Date: 2002-03-30 09:58 Message: Logged In: YES user_id=21627 Thanks for the patch, applied as cStringIO.h 2.16 descrobject.h 2.9 iterobject.h 1.4 *Please* use context diffs in the future. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-29 23:35 Message: Logged In: YES user_id=21627 Please attach the patch as a context (-c) or unified (-u) diff to this report. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470 From noreply@sourceforge.net Sat Mar 30 09:01:24 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 30 Mar 2002 01:01:24 -0800 Subject: [Patches] [ python-Patches-536908 ] missing #include guards/extern "C" Message-ID: Patches item #536908, was opened at 2002-03-29 22:10 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470 Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: David Abrahams (david_abrahams) Assigned to: Nobody/Anonymous (nobody) >Summary: missing #include guards/extern "C" Initial Comment: cvs server: Diffing . Index: cStringIO.h ====================================================== ============= RCS file: /cvsroot/python/python/dist/src/Include/cStringI O.h,v retrieving revision 2.15 diff -r2.15 cStringIO.h 2a3,5 > #ifdef __cplusplus > extern "C" { > #endif 130a134,136 > #ifdef __cplusplus > } > #endif Index: descrobject.h ====================================================== ============= RCS file: /cvsroot/python/python/dist/src/Include/descrobj ect.h,v retrieving revision 2.8 diff -r2.8 descrobject.h 1a2,6 > #ifndef Py_DESCROBJECT_H > #define Py_DESCROBJECT_H > #ifdef __cplusplus > extern "C" { > #endif 80a86,88 > #ifdef __cplusplus > } > #endif Index: iterobject.h ====================================================== ============= RCS file: /cvsroot/python/python/dist/src/Include/iterobje ct.h,v retrieving revision 1.3 diff -r1.3 iterobject.h 0a1,2 > #ifndef Py_ITEROBJECT_H > #define Py_ITEROBJECT_H 1a4,6 > #ifdef __cplusplus > extern "C" { > #endif 13a19,22 > #ifdef __cplusplus > } > #endif > #endif Py_ITEROBJECT_H ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-30 09:58 Message: Logged In: YES user_id=21627 Thanks for the patch, applied as cStringIO.h 2.16 descrobject.h 2.9 iterobject.h 1.4 *Please* use context diffs in the future. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-29 23:35 Message: Logged In: YES user_id=21627 Please attach the patch as a context (-c) or unified (-u) diff to this report. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536908&group_id=5470 From noreply@sourceforge.net Sat Mar 30 11:16:54 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 30 Mar 2002 03:16:54 -0800 Subject: [Patches] [ python-Patches-536241 ] string.zfill and unicode Message-ID: Patches item #536241, was opened at 2002-03-28 14:26 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470 Category: Library (Lib) Group: None Status: Closed Resolution: Fixed Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: A.M. Kuchling (akuchling) Summary: string.zfill and unicode Initial Comment: This patch makes the function string.zfill work with unicode instances (and instances of str and unicode subclasses). Currently string.zfill(u"123", 10) results in "0000u'123'". With this patch the result is u'0000000123'. Should zfill be made a real str und unicode method? I noticed that a zfill implementation is available in unicodeobject.c, but commented out. ---------------------------------------------------------------------- >Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-30 12:16 Message: Logged In: YES user_id=89016 But Python could be compiled without unicode support (by undefining PY_USING_UNICODE), and string.zfill should work even in this case. What about making zfill a real str and unicode method? ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2002-03-29 17:24 Message: Logged In: YES user_id=11375 Thanks for your patch! I've checked it into CVS, with two modifications. First, I removed the code to handle the case where Python doesn't have a unicode() built-in; there's no expection that you can take the standard library for Python version N and use it with version N-1, so this code isn't needed. Second, I changed string.zfill() to take the str() and not the repr() when it gets a non-string object because that seems to make more sense. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470 From noreply@sourceforge.net Sat Mar 30 11:25:27 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 30 Mar 2002 03:25:27 -0800 Subject: [Patches] [ python-Patches-536241 ] string.zfill and unicode Message-ID: Patches item #536241, was opened at 2002-03-28 13:26 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470 Category: Library (Lib) Group: None >Status: Open Resolution: Fixed Priority: 5 Submitted By: Walter Dцrwald (doerwalter) Assigned to: A.M. Kuchling (akuchling) Summary: string.zfill and unicode Initial Comment: This patch makes the function string.zfill work with unicode instances (and instances of str and unicode subclasses). Currently string.zfill(u"123", 10) results in "0000u'123'". With this patch the result is u'0000000123'. Should zfill be made a real str und unicode method? I noticed that a zfill implementation is available in unicodeobject.c, but commented out. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-30 11:25 Message: Logged In: YES user_id=6656 Hah, I was going to say that but was distracted by IE wiping out the machine I'm sitting at. Re-opening. ---------------------------------------------------------------------- Comment By: Walter Dцrwald (doerwalter) Date: 2002-03-30 11:16 Message: Logged In: YES user_id=89016 But Python could be compiled without unicode support (by undefining PY_USING_UNICODE), and string.zfill should work even in this case. What about making zfill a real str and unicode method? ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2002-03-29 16:24 Message: Logged In: YES user_id=11375 Thanks for your patch! I've checked it into CVS, with two modifications. First, I removed the code to handle the case where Python doesn't have a unicode() built-in; there's no expection that you can take the standard library for Python version N and use it with version N-1, so this code isn't needed. Second, I changed string.zfill() to take the str() and not the repr() when it gets a non-string object because that seems to make more sense. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536241&group_id=5470 From noreply@sourceforge.net Sat Mar 30 11:27:10 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 30 Mar 2002 03:27:10 -0800 Subject: [Patches] [ python-Patches-511219 ] suppress type restrictions on locals() Message-ID: Patches item #511219, was opened at 2002-01-31 14:55 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=511219&group_id=5470 Category: Core (C code) >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Cesar Douady (douady) Assigned to: Nobody/Anonymous (nobody) Summary: suppress type restrictions on locals() Initial Comment: This patch suppresses the restriction that global and local dictionaries do not access overloaded __getitem__ and __setitem__ if passed an object derived from class dict. An exception is made for the builtin insertion and reference in the global dict to make sure this object exists and to suppress the need for the derived class to take care of this implementation dependent detail. The behavior of eval and exec has been updated for code objects which have the CO_NEWLOCALS flag set : if explicitely passed a local dict, a new local dict is not generated. This allows one to pass an explicit local dict to the code object of a function (which otherwise cannot be achieved). If this cannot be done for backward compatibility problems, then an alternative would consist in using the "new" module to create a code object from a function with CO_NEWLOCALS reset but it seems logical to me to use the information explicitely provided. Free and cell variables are not managed in this version. If the patch is accepted, I am willing to finish the job and implement free and cell variables, but this requires a serious rework of the Cell object: free variables should be accessed using the method of the dict in which they relies and today, this dict is not accessible from the Cell object. Robustness : Currently, the plain test suite passes (with a modification of test_desctut which precisely verifies that the suppressed restriction is enforced). I have introduced a new test (test_subdict.py) which verifies the new behavior. Because of performance, the plain case (when the local dict is a plain dict) is optimized so that differences in performance are not measurable (within 1%) when run on the test suite (i.e. I timed make test). ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-30 11:27 Message: Logged In: YES user_id=6656 And there's precisely no way it's going into 2.2.x. ---------------------------------------------------------------------- Comment By: Cesar Douady (douady) Date: 2002-03-30 00:08 Message: Logged In: YES user_id=428521 to install this patch from python revision 2.2, follow these steps : - get the python.diff file from this page - cd Python-2.2 - run "patch -p1 Patches item #534304, was opened at 2002-03-24 13:52 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470 Category: Parser/Compiler >Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: SUZUKI Hisao (suzuki_hisao) Assigned to: Nobody/Anonymous (nobody) Summary: PEP 263 phase 2 Implementation Initial Comment: This is a sample implementation of PEP 263 phase 2. This implementation behaves just as normal Python does if no other coding hints are given. Thus it does not hurt anyone who uses Python now. Note that it is strictly compatible with the PEP in that every program valid in the PEP is also valid in this implementation. This implementation also accepts files in UTF-16 with BOM. They are read as UTF-8 internally. Please try "utf16sample.py" included. ---------------------------------------------------------------------- >Comment By: Michael Hudson (mwh) Date: 2002-03-30 11:27 Message: Logged In: YES user_id=6656 Not going into 2.2.x. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-25 13:23 Message: Logged In: YES user_id=21627 The patch looks good, but needs a number of improvements. 1. I have problems building this code. When trying to build pgen, I get an error message of Parser/parsetok.c: In function `parsetok': Parser/parsetok.c:175: `encoding_decl' undeclared The problem here is that graminit.h hasn't been built yet, but parsetok refers to the symbol. 2. For some reason, error printing for incorrect encodings does not work - it appears that it prints the wrong line in the traceback. 3. The escape processing in Unicode literals is incorrect. For example, u"\" should denote only the non-ascii character. However, your implementation replaces the non-ASCII character with \u, resulting in \u, so the first backslash unescapes the second one. 4. I believe the escape processing in byte strings is also incorrect for encodings that allow \ in the second byte. Before processing escape characters, you convert back into the source encoding. If this produces a backslash character, escape processing will misinterpret that byte as an escape character. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470 From noreply@sourceforge.net Sun Mar 31 04:12:28 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 30 Mar 2002 20:12:28 -0800 Subject: [Patches] [ python-Patches-536278 ] force gzip to open files with 'b' Message-ID: Patches item #536278, was opened at 2002-03-28 09:53 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536278&group_id=5470 Category: Library (Lib) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Skip Montanaro (montanaro) Assigned to: Nobody/Anonymous (nobody) Summary: force gzip to open files with 'b' Initial Comment: It doesn't make sense that the gzip module should try to open a file in text mode. The attached patch forces a 'b' into the file open mode if it wasn't given. I also modified the test slightly to try and tickle this code, but I can't test it very effectively, because I don't do Windows... :-) ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-30 23:12 Message: Logged In: YES user_id=31435 I suggest fixing this via changing the test to if mode and 'b' not in mode: Then mode=None and mode='' will be left alone (as Neal says, the code already does the right thing for those). ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-03-28 10:04 Message: Logged In: YES user_id=33168 There is a problem (sorry, I have an evil mind). :-) If '' is passed as the mode, before the patch, this would have been converted to 'rb'. After the patch, mode will become 'b' and that will raise an exception: >>> open('/dev/null', 'b') IOError: [Errno 22] Invalid argument: b If you add an (and mode) condition and that should be fine. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536278&group_id=5470 From noreply@sourceforge.net Sun Mar 31 07:11:35 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sat, 30 Mar 2002 23:11:35 -0800 Subject: [Patches] [ python-Patches-536909 ] pymalloc for types and other cleanups Message-ID: Patches item #536909, was opened at 2002-03-29 16:11 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470 Category: Core (C code) Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: Neil Schemenauer (nascheme) >Assigned to: Neil Schemenauer (nascheme) Summary: pymalloc for types and other cleanups Initial Comment: This patch changes typeobject to use pymalloc for managing the memory of subclassable types. It also fixes a bug that caused an interpreter built without GC to crash. Testing this patch was a bitch. There are three knobs related to MM now (with-cycle-gc, with-pymalloc, and PYMALLOC_DEBUG). I think I found different bugs when testing with each possible combination. There's one bit of ugliness in this patch. Extension module writers have to use _PyMalloc_Del to initialize the tp_free pointer. There should be a "public" function for that. ---------------------------------------------------------------------- >Comment By: Tim Peters (tim_one) Date: 2002-03-31 02:11 Message: Logged In: YES user_id=31435 Neil, I appreciate the work! I'm afraid I screwed you at the same time. How do you want to proceed? I think "the plan" now is that we go back to the PyObject_XXX interface, and when pymalloc is enabled map most flavors of "free memory" ({Py{Mem, Object}_{Del, DEL, Free, FREE}) to the pymalloc free. You're not required to work on this, but if you've got some spare energy I could sure use the help. ---------------------------------------------------------------------- Comment By: Neil Schemenauer (nascheme) Date: 2002-03-29 18:09 Message: Logged In: YES user_id=35752 I'm counting on Tim to finish PyMem_NukeIt. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-29 17:47 Message: Logged In: YES user_id=21627 I see another memory allocation family here: What function should objects allocated through PyType_GenericAlloc be released with? If you change the behaviour of PyType_GenericAlloc, all types in extensions written for 2.2 that use PyType_GenericAlloc will break, since they will still have PyObject_Del in their tp_free slot. I believe "families" should always be complete, so along with PyType_GenericAlloc goes PyType_GenericFree. If you want it fully backwards compatible, you need to introduce PyType_PyMallocAlloc... ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=536909&group_id=5470 From goldinf_obases3728_1y@yahoo.com Sun Mar 31 10:12:52 2002 From: goldinf_obases3728_1y@yahoo.com (GOLD INFO) Date: Sun, 31 Mar 2002 14:12:52 +0400 Subject: [Patches] =?windows-1251?Q?=CB=E5=E3=EA=EE=F1=F2=FC_=EF=F0=E8=E2=EB=E5=F7=E5=ED=E8=FF_=ED=EE=E2=FB=F5_=EA=EB=E8=E5=ED=F2=EE=E2?= Message-ID: <286362002303110125295@yahoo.com> ------=_NextPart_84815C5ABAF209EF376268C8 Content-type: text/plain; charset="windows-1251" ------=_NextPart_84815C5ABAF209EF376268C8 Content-Type: text/html; charset="windows-1251"
"GOLD INFO" свидетельствует вам свое почтение.
Наконец-то вы можете получить то, что вам так не хватает для привлечения новых клиентов!

Базы для контактов по электронной почте (самый дешевый способ бизнес-контактов). Все предприятия в этих базах имеют электронный адрес:
57.000 предприятий Москвы
48.000 предприятий России (без Москвы)
25.000 предприятий ближнего зарубежья

Если Вы хотите найти потребителей в новом для вас регионе России или СНГ, то начать стоит с приобретения базы данных предприятий этого региона.

Возможно, ваши коммерческие предложения имеет смысл делать только самым крупным компаниям. Мы подготовили для вас базы данных только тех предприятий Москвы, России и СНГ, численность штата которых: более 50 человек, более 250, 500, 1000 человек.

Работаете на московском рынке или собираетесь на него выходить? У нас есть максимально полная информация о предприятиях - участниках рынков любых видов деятельности в Москве.

Если вы продаете свои услуги посредством телефонного маркетинга в Москве, база "Телемаркетинг (Москва)" - сделает вашу работу намного эффективнее. В базе - 150 тыс. московских предприятий, 270 тыс. номеров телефонов!

Электронные рассылки - самый эффективный вид рекламы. Закажите рассылку вашей информации по предприятиям Москвы, России или СНГ. Вас завалят заказами! Не верите? Ознакомьтесь с законом эффективности электронных рассылок и прочтите, что пишут об этом СМИ.

Если Вы решили сделать веб-сайт, обращайтесь в нашу студию веб-мастеринга.

Подробности на нашем сайте: www.help-marketing.net
Заявки и вопросы присылайте на E-mail: sales@help-marketing.net

Желаем вам удачи!
GOLD INFO. Мы делаем ваш бизнес успешным.
------=_NextPart_84815C5ABAF209EF376268C8-- From noreply@sourceforge.net Sun Mar 31 16:16:08 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 31 Mar 2002 08:16:08 -0800 Subject: [Patches] [ python-Patches-534304 ] PEP 263 phase 2 Implementation Message-ID: Patches item #534304, was opened at 2002-03-24 22:52 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470 Category: Parser/Compiler Group: Python 2.3 Status: Open Resolution: None Priority: 5 Submitted By: SUZUKI Hisao (suzuki_hisao) Assigned to: Nobody/Anonymous (nobody) Summary: PEP 263 phase 2 Implementation Initial Comment: This is a sample implementation of PEP 263 phase 2. This implementation behaves just as normal Python does if no other coding hints are given. Thus it does not hurt anyone who uses Python now. Note that it is strictly compatible with the PEP in that every program valid in the PEP is also valid in this implementation. This implementation also accepts files in UTF-16 with BOM. They are read as UTF-8 internally. Please try "utf16sample.py" included. ---------------------------------------------------------------------- >Comment By: SUZUKI Hisao (suzuki_hisao) Date: 2002-04-01 01:16 Message: Logged In: YES user_id=495142 Thank you for your review. Now 1. and 3. are fixed, and 2. is improved. (4. is not true.) ---------------------------------------------------------------------- Comment By: Michael Hudson (mwh) Date: 2002-03-30 20:27 Message: Logged In: YES user_id=6656 Not going into 2.2.x. ---------------------------------------------------------------------- Comment By: Martin v. Lцwis (loewis) Date: 2002-03-25 22:23 Message: Logged In: YES user_id=21627 The patch looks good, but needs a number of improvements. 1. I have problems building this code. When trying to build pgen, I get an error message of Parser/parsetok.c: In function `parsetok': Parser/parsetok.c:175: `encoding_decl' undeclared The problem here is that graminit.h hasn't been built yet, but parsetok refers to the symbol. 2. For some reason, error printing for incorrect encodings does not work - it appears that it prints the wrong line in the traceback. 3. The escape processing in Unicode literals is incorrect. For example, u"\" should denote only the non-ascii character. However, your implementation replaces the non-ASCII character with \u, resulting in \u, so the first backslash unescapes the second one. 4. I believe the escape processing in byte strings is also incorrect for encodings that allow \ in the second byte. Before processing escape characters, you convert back into the source encoding. If this produces a backslash character, escape processing will misinterpret that byte as an escape character. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=534304&group_id=5470 From noreply@sourceforge.net Sun Mar 31 21:10:30 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 31 Mar 2002 13:10:30 -0800 Subject: [Patches] [ python-Patches-452110 ] socketmodule ssl: server & thread Message-ID: Patches item #452110, was opened at 2001-08-17 08:10 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=452110&group_id=5470 Category: Library (Lib) Group: None >Status: Deleted Resolution: None Priority: 5 Submitted By: Jozef Hatala (jhatala) Assigned to: Jeremy Hylton (jhylton) Summary: socketmodule ssl: server & thread Initial Comment: Simple enhancement to the SSL support in module socket : - support for writing SSL servers (as well as clients) - Py_*_ALLOW_THREADS arround blocking calls to openssl - rsa temp key to work with older export netscape - renamed attribute server to peer This patch allows for powerfull application servers like the following one to be accessed with "netscape https://localhost:1443/" from socket import * p=socket(AF_INET,SOCK_STREAM) p.bind(('localhost',1443)) p.listen(1) while 1 : s,a = p.accept() c = sslserver(s,'server.key','server.crt') print "They said:", c.read() c.write('HTTP/1.0 200 OK\r\n') c.write('Content-Type: text/plain\r\n\r\n** Hi! **') c.close() TODO: a kind of makefile() on the ssl object like on a socket would be welcome. Have fun, jh ---------------------------------------------------------------------- Comment By: Gerhard Hдring (ghaering) Date: 2001-10-22 06:51 Message: Logged In: YES user_id=163326 I don't think it is a good idea to add this. Python's builtin client-side SSL support is already pretty weak. This patch would add a minimal SSL server implementation, but it shares some of the same weaknesses, like missing the ability to set the SSL method (version 2, version 3, version 2 or 3). I'd recommend not adding any more SSL features at this point, but for Python 2.2 only keeping the existing client-side functionality and fixing any remaining bugs there. I'm working on something that would hopefully be better in the longrun: an SSL API that the various Python SSL modules (m2crypto, POW, pyOpenSSL) can implement and Python will then use one of these third-party modules for https, smtp/tls etc. Sort of a plugin ability for an SSL module. If you add stuff to the broken SSL API now, you'll either have to carry it around for a long time or, if my proposal get implemented and accepted, the workarounds will be clunkier. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-10-18 19:10 Message: Logged In: YES user_id=6380 Time to look at this again? ---------------------------------------------------------------------- Comment By: Jozef Hatala (jhatala) Date: 2001-10-17 07:43 Message: Logged In: YES user_id=300564 This patch now against Python 2.2a3 contains: SSL server support (SSL_accept) [as before] additionally: allow threads around getaddrinfo &Co. more verbose exc messages (for failures in ssl() and sslserver()) methods recv and send on ssl object as equivalents of read and write. methods makefile on ssl object (a look-alike and does no dup!) a client/server test (depends on os.fork()) ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2001-10-16 09:05 Message: Logged In: YES user_id=31392 If you can provide test cases, I'll provide documentation. But hurry, if we don't get this done this week, we may miss Python 2.2. ---------------------------------------------------------------------- Comment By: Jozef Hatala (jhatala) Date: 2001-10-16 03:21 Message: Logged In: YES user_id=300564 I'll submit a simple test with certificates and an enhanced patch for 2.2a2 (does not patch cleanly any more) soon (this week) [time and inet access issues]. I haven't written any doc. There was none for ssl. I know that is no excuse... Does some-one volonotere? ---------------------------------------------------------------------- Comment By: Jeremy Hylton (jhylton) Date: 2001-10-11 09:13 Message: Logged In: YES user_id=31392 Jozef-- are you going to contribute tests and documentation? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2001-08-18 23:17 Message: Logged In: YES user_id=6380 Nice, but where's the documentation? (Thanks for the docstrings though!) And the test suite? ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=452110&group_id=5470 From noreply@sourceforge.net Sun Mar 31 23:12:23 2002 From: noreply@sourceforge.net (noreply@sourceforge.net) Date: Sun, 31 Mar 2002 15:12:23 -0800 Subject: [Patches] [ python-Patches-537536 ] bug 535444 super() broken w/classmethods Message-ID: Patches item #537536, was opened at 2002-03-31 23:12 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=537536&group_id=5470 Category: Core (C code) Group: Python 2.2.x Status: Open Resolution: None Priority: 5 Submitted By: Phillip J. Eby (pje) Assigned to: Nobody/Anonymous (nobody) Summary: bug 535444 super() broken w/classmethods Initial Comment: This patch fixes bug #535444. It is against the current CVS version of Python, and addresses the problem by adding a 'starttype' variable to 'super_getattro', which works the same as 'starttype' in the pure-Python version of super in the descriptor tutorial. This variable is then passed to the descriptor __get__ function, ala 'descriptor.__get__(self.__obj__,starttype)'. This patch does not correct the pure-Python version of 'super' in the descriptor tutorial; I don't know where that file is or how to submit a patch for it. This patch also does not include a regression test for the bug. I do not know what would be considered the appropriate test script to place this in. Thanks. ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=305470&aid=537536&group_id=5470