From daniel at stutzbachenterprises.com Mon Nov 2 17:53:00 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 2 Nov 2009 10:53:00 -0600 Subject: [Python-ideas] UCS2 vs UCS4 ABIs Message-ID: Scope ----- This idea affects the CPython ABI for extension modules. It has no impact on the Python language syntax nor other Python implementations. The Problem ----------- Currently, Python can be built with an internal Unicode representation of UCS2 or UCS4. The two are binary incompatible, but the distinction is not included as part of the platform name. Consequently, if one installs a binary egg (e.g., with easy_install), there's a good chance one will get an error such as the following when trying to use it: undefined symbol: PyUnicodeUCS2_FromString In Python 2, some extension modules can blissfully link to either ABI, as the problem only arises for modules that call a PyUnicode_* macro (which expands to calling either a PyUnicodeUCS2_* or PyUnicodeUCS4_* function). For Python 3, every extension type will need to call a PyUnicode_* macro, since __repr__ must return a Unicode object. This problem has been known since at least 2006, as seen in this thread from the distutils-sig: http://markmail.org/message/bla5vrwlv3kn3n7e?q=thread:bla5vrwlv3kn3n7e In that thread, it was suggested that the Unicode representation become part of the platform name. That change would require a distutils and/or setuptools change, which has not happened and does not appear likely to happen in the near future. It would also mean that anyone who wants to provide binary eggs for common platforms will need to provide twice as many eggs. Solution -------- Get rid of the ABI difference for the 99% of extension modules that don't care about the internal representation of Unicode strings. From the extension module's point of view, PyObject is opaque. It will manipulate the Unicode string entirely through PyUnicode_* function calls and does not care about the internal representation. For example, PyUnicode_FromString has the following signature in the documentation: PyObject *PyUnicode_FromString(const char *u) Currently, it's #ifdef'ed to either PyUnicodeUCS2_FromString or PyUnicodeUCS4_FromString. Remove the macro and name the function PyUnicode_FromString regardless of which internal representation is being used. The vast majority of binary eggs will then work correctly on both UCS2 and UCS4 Pythons. Functions that explicitly use Py_UNICODE or PyUnicodeObject as part of their signature will continue to be #ifdef'ed, so extension modules that *do* care about the internal representation will still generate a link error. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Nov 2 18:34:51 2009 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Nov 2009 09:34:51 -0800 Subject: [Python-ideas] UCS2 vs UCS4 ABIs In-Reply-To: References: Message-ID: On Mon, Nov 2, 2009 at 8:53 AM, Daniel Stutzbach wrote: > Scope > ----- > > This idea affects the CPython ABI for extension modules.? It has no impact > on the Python language syntax nor other Python implementations. > > The Problem > ----------- > > Currently, Python can be built with an internal Unicode representation of > UCS2 or UCS4.? The two are binary incompatible, but the distinction is not > included as part of the platform name.? Consequently, if one installs a > binary egg (e.g., with easy_install), there's a good chance one will get an > error such as the following when trying to use it: > > ??????? undefined symbol: PyUnicodeUCS2_FromString > > In Python 2, some extension modules can blissfully link to either ABI, as > the problem only arises for modules that call a PyUnicode_* macro (which > expands to calling either a PyUnicodeUCS2_* or PyUnicodeUCS4_* function). > For Python 3, every extension type will need to call a PyUnicode_* macro, > since __repr__ must return a Unicode object. > > This problem has been known since at least 2006, as seen in this thread from > the distutils-sig: > > http://markmail.org/message/bla5vrwlv3kn3n7e?q=thread:bla5vrwlv3kn3n7e > > In that thread, it was suggested that the Unicode representation become part > of the platform name.? That change would require a distutils and/or > setuptools change, which has not happened and does not appear likely to > happen in the near future.? It would also mean that anyone who wants to > provide binary eggs for common platforms will need to provide twice as many > eggs. > > Solution > -------- > > Get rid of the ABI difference for the 99% of extension modules that don't > care about the internal representation of Unicode strings.? From the > extension module's point of view, PyObject is opaque.? It will manipulate > the Unicode string entirely through PyUnicode_* function calls and does not > care about the internal representation. > > For example, PyUnicode_FromString has the following signature in the > documentation: > ??????? PyObject *PyUnicode_FromString(const char *u) > Currently, it's #ifdef'ed to either PyUnicodeUCS2_FromString or > PyUnicodeUCS4_FromString. > > Remove the macro and name the function PyUnicode_FromString regardless of > which internal representation is being used.? The vast majority of binary > eggs will then work correctly on both UCS2 and UCS4 Pythons. > > Functions that explicitly use Py_UNICODE or PyUnicodeObject as part of their > signature will continue to be #ifdef'ed, so extension modules that *do* care > about the internal representation will still generate a link error. IIUC your proposal doesn't get rid of the root of the problem (that there are two incompatible choices for Unicode string representation) but only proposes that there be a purely "abstract" API for working with string objects, which, if used religiously by extension modules, would allow them to be linked with either family of runtimes. This sounds attractive, but I kind of doubt that changing a single API is sufficient. Perhaps it would be useful to do a kind of review or survey of how many Unicode APIs are used by the typical extension? -- --Guido van Rossum (python.org/~guido) From daniel at stutzbachenterprises.com Mon Nov 2 18:45:11 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 2 Nov 2009 11:45:11 -0600 Subject: [Python-ideas] UCS2 vs UCS4 ABIs In-Reply-To: References: Message-ID: On Mon, Nov 2, 2009 at 11:34 AM, Guido van Rossum wrote: > This sounds attractive, but I kind of doubt that changing a single API > is sufficient. Perhaps it would be useful to do a kind of review or > survey of how many Unicode APIs are used by the typical extension? > I made an editing error. I meant to suggest altering all the PyUnicode_* macro/functions, except those that explicitly use Py_UNICODE or PyUnicodeObject in their signature. PyUnicode_FromString was just an example. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Nov 2 18:57:52 2009 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Nov 2009 09:57:52 -0800 Subject: [Python-ideas] UCS2 vs UCS4 ABIs In-Reply-To: References: Message-ID: On Mon, Nov 2, 2009 at 9:45 AM, Daniel Stutzbach wrote: > On Mon, Nov 2, 2009 at 11:34 AM, Guido van Rossum wrote: >> >> This sounds attractive, but I kind of doubt that changing a single API >> is sufficient. Perhaps it would be useful to do a kind of review or >> survey of how many Unicode APIs are used by the typical extension? > > I made an editing error.? I meant to suggest altering all the PyUnicode_* > macro/functions, except those that explicitly use Py_UNICODE or > PyUnicodeObject in their signature.? PyUnicode_FromString was just an > example. We'd also have to hide the macros that can be used to access the internals of a PyUnicodeObject, in order for that approach to be safe. Basically, an extension would have to include a second header file to use those macros and it would have to somehow indicate to the linker that it is using UCS2 or UCS4 internals as well. I would want to err on the safe side here -- if it was at all easy to create an extension that *seems* to be ABI-neutral but *actually* relies on knowledge about the UCS2 or UCS4 representation, we'd be creating a worse problem. Users don't like stuff not working, but they *really* don't like stuff crashing with random core dumps -- if it has to be broken, let it break very loudly and explicitly. The current approach satisfies that requirement -- it probably just errs too far on the "never assume it might work" side. -- --Guido van Rossum (python.org/~guido) From daniel at stutzbachenterprises.com Mon Nov 2 20:50:26 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Mon, 2 Nov 2009 13:50:26 -0600 Subject: [Python-ideas] UCS2 vs UCS4 ABIs In-Reply-To: References: Message-ID: On Mon, Nov 2, 2009 at 11:57 AM, Guido van Rossum wrote: > We'd also have to hide the macros that can be used to access the > internals of a PyUnicodeObject, in order for that approach to be safe. > Basically, an extension would have to include a second header file to > use those macros and it would have to somehow indicate to the linker > that it is using UCS2 or UCS4 internals as well. > I don't know of a portable way to indicate that to the linker simply by including a header file. I wish I did. Here is one idea that will cause a linker error if there's a mismatch and one of the macros are used. It does cause the macro to execute an extra CPU instruction or two, though. In unicodeobject.h: /* Require the macro to reference a global variable that will only be present if the Unicode ABI matches correctly. Arrange for the global variable to always have the value zero, and add it to the return value of the macro. */ #if Py_UNICODE_SIZE == 4 extern const int Py_UnicodeZero_UCS4; #define Py_UNICODE_ZERO (Py_UnicodeZero_UCS4) #else extern const int Py_UnicodeZero_UCS2; #define Py_UNICODE_ZERO (Py_UnicodeZero_UCS2) #endif #define PyUnicode_AS_UNICODE(op) \ (Py_UNICODE_ZERO + (((PyUnicodeObject *)(op))->str)) In unicodeobject.c: extern const int Py_UNICODE_ZERO = 0; > I would want to err on the safe side here -- if it was at all easy to > create an extension that *seems* to be ABI-neutral but *actually* > relies on knowledge about the UCS2 or UCS4 representation, we'd be > creating a worse problem. Users don't like stuff not working, but they > *really* don't like stuff crashing with random core dumps -- if it has > to be broken, let it break very loudly and explicitly. The current > approach satisfies that requirement -- it probably just errs too far > on the "never assume it might work" side. > Agreed. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From terry at jon.es Tue Nov 3 22:41:45 2009 From: terry at jon.es (Terry Jones) Date: Tue, 3 Nov 2009 22:41:45 +0100 Subject: [Python-ideas] Anonymizing the PyCon review process Message-ID: <19184.41881.63114.226901@jon.es> Last night I got a couple of PyCon talks rejected, and someone else sent me a rejection email they'd received. I wasn't surprised at the rejections, but I was quite surprised that many of the review comments were at least in part based on the presenter (sometimes the incorrectly assumed presenter) instead of on the proposed talk. Out of the 14 reviews, 5 of them have comments about the author of the proposal. I'll just give 2 examples here, taken from 2 different reviewers on 2 different proposals. 1. Imagine you're a relatively unknown Python programmer, you submit a talk to PyCon, and get back a review whose first sentence reads: I don't know the reputation of this particular speaker, so I won't "+1" That sends a pretty unfortunate message: Because you're not a recognized Python person, I wont give you the thumbs up. Maybe I'm being naive or simplistic, but I'd have hoped one route to becoming recognized would be by giving a PyCon talk. From the POV of the review recipient the leading justification for a non-recommendation has nothing to do with the talk! 2. What if you gave a PyCon talk in an earlier year that wasn't rated as highly as other talks? You send in a PyCon proposal, and get this back: I like XXX but honestly his talk at pycon 09 went poorly. That was the *entire* review in this case. What's the message here? Sounds like: well, you gave a talk once and it wasn't so great, so part of my vote against your proposal is because of that. It's like telling people to go away and not bother ever submitting again. Again, it's that's not based on the current talk proposal. People tend to get better at giving talks. If a proposal's content is technically good enough to get in, let them give another talk and help them to make it better. Just when is XXX supposed to re-send another PyCon talk, if ever? What makes this even worse is that XXX was not even the primary author of the proposal, and was not to be the speaker. So here we have a review that's negative *entirely* due to a talk that someone else gave in a previous year. How discouraging. Should the person in the future "take one for the team" and decline to be listed as a co-author on joint proposals - even though they're not going to speak - fearing that a reviewer will reply with a -1 and a one-line dismissal? That's the unfortunate dynamic that the above "review" has created. I hope this doesn't sound like personal sour grapes. It's not at all. I've had *tons* of rejection letters in my life (see http://bit.ly/1xytIr), including from PyCon. They're water off a duck's back at this point :-) I do however care about Python and the Python community. The most important point is the message that's sent back to aspiring speakers. Reviews that are based on the supposed character, or old talks, or how recognized you are or aren't, or on a guess as to which of multiple authors might be doing the presenting - all of those send a bad message. They make PyCon look insular and cliquey. If the committee of people is (or merely gives the impression of being) inwardly focused, the community and in the longer term perhaps the language itself will suffer through reduced diversity and through discouraging precisely the people who are animated enough and have the initiative and ambition to submit talks. Those are *exactly* the wrong folks to discourage. The obvious suggestion is to anonymize the review process. That's standard in mature conferences. It doesn't eliminate bias (in fact you *don't want* to eliminate bias - you need it to survive, you need it to assess quality), but it does reduce the opportunity for judgment based on the wrong things. When I say "wrong" I mean: if you're going to judge based on stuff that's not just the proposal content, then ask for a CV, or a speaking record, or whatever you intend to consider in the review process. Anonymizing conference reviewing has healthy effects. I've seen it up close in academic circles. It's like a breath of fresh air and the results are surprising. When they did it in the genetic algorithms world, all of a sudden really interesting talks were being accepted from all over the world and many very experienced researchers were having multiple talks rejected. That was unexpected, refreshing, and generally agreed to be a very healthy and embracing/welcoming move. If PyCon doesn't move to anonymizing reviews, then at least *try* not to base acceptance decisions on who a speaker is (or, worse, who it's presumed to be). If for some reason you have to, it's *perhaps* better not to tell the poor submitter that they're being rejected in part based on who they are or aren't. The CFP requests a talk proposal, and that's what should be reacted to in the review response, even if there's more to the story. Some of the comments above *might* be appropriate for a conference committee meeting, but not for the first (or only!) line of a review. OK, rant over :-) Regards to everyone & thanks for all the PyCon work. I know how much work it is, and that it's not easy. I hope to be able to make it to Atlanta. Terry Jones From solipsis at pitrou.net Tue Nov 3 23:49:32 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Nov 2009 22:49:32 +0000 (UTC) Subject: [Python-ideas] Anonymizing the PyCon review process References: <19184.41881.63114.226901@jon.es> Message-ID: Hello Terry, Terry Jones writes: > Last night I got a couple of PyCon talks rejected, and someone else sent me > a rejection email they'd received. I wasn't surprised at the rejections, > but I was quite surprised that many of the review comments were at least in > part based on the presenter (sometimes the incorrectly assumed presenter) > instead of on the proposed talk. I am not part of the review board, and neither have I tried to submit a talk, and I'm not even sure this is the right mailing-list, but I found it interesting to read your personal report of your attempts to submit one. I also must say that, while I don't like popularity mechanisms myself, it's not very surprising to find them in the Python community since they exist everywhere else :) Thanks for the write-up. Antoine. From python at rcn.com Wed Nov 4 00:32:02 2009 From: python at rcn.com (Raymond Hettinger) Date: Tue, 3 Nov 2009 15:32:02 -0800 Subject: [Python-ideas] Anonymizing the PyCon review process References: <19184.41881.63114.226901@jon.es> Message-ID: [Terry Jones] > The obvious suggestion is to anonymize the review process. FWIW, that was tried and the people complained about that too. Who would you rather hear speak about the future of Python, Guido and someone else? About the state of Twisted, from someone on that team or from a user who read the Twisted book? About UnladedSwallow or AppEngine, someone on Google's team or someone who has played around with it for a while? Also, there are some folks like Alex Martelli whose talks I will seek out no matter what he's talking about (because it's always worthwhile). Likewise, it's not irrelevant if a speaker previously gave a talk that sucked. Surely, the review process has room for improvements and better balance but anonymizing is a step too far IMO. Raymond From digitalxero at gmail.com Wed Nov 4 01:17:52 2009 From: digitalxero at gmail.com (Dj Gilcrease) Date: Tue, 3 Nov 2009 17:17:52 -0700 Subject: [Python-ideas] Anonymizing the PyCon review process In-Reply-To: References: <19184.41881.63114.226901@jon.es> Message-ID: On Tue, Nov 3, 2009 at 4:32 PM, Raymond Hettinger wrote: > Who would you rather hear speak about the future of Python, Guido and > someone else? > About the state of Twisted, from someone on that team or from a user who > read the Twisted book? > About UnladedSwallow or AppEngine, someone on Google's team or someone who > has played around with it for a while? This is where a 2 stage review process helps. Stage one it to group all talks with the same topic and do a speaker validation process to make sure the speaker of a given topic is an expert in that area, or failing a known or verifiable expert proposing a talk in the area then all talks on that topic should be passed on to the anonymized process to be judged based on their proposal. > Also, there are some folks like Alex Martelli whose talks I will seek out no > matter what he's talking about (because it's always worthwhile). This may be true, but should have little to no bearing on the talk proposal process because if his talks are always good then his proposals are likly to be good as well and thus get approval based on that > Likewise, it's not irrelevant if a speaker previously gave a talk that sucked. This is true as well. And should be considered in stage one of the process, but a single talk that sucked I would say is not a basis to reject someone outright. I would say two poor talks @pycon in the past three years warrants a negative vote, but a single bad talk, unless the voter attended that talk in person, isnt justification for a negative vote. From brett at python.org Wed Nov 4 02:27:22 2009 From: brett at python.org (Brett Cannon) Date: Tue, 3 Nov 2009 17:27:22 -0800 Subject: [Python-ideas] Anonymizing the PyCon review process In-Reply-To: References: <19184.41881.63114.226901@jon.es> Message-ID: On Tue, Nov 3, 2009 at 15:32, Raymond Hettinger wrote: > > [Terry Jones] >> >> The obvious suggestion is to anonymize the review process. > > FWIW, that was tried and the people complained about that too. > > Who would you rather hear speak about the future of Python, Guido and > someone else? > About the state of Twisted, from someone on that team or from a user who > read the Twisted book? > About UnladedSwallow or AppEngine, someone on Google's team or someone who > has played around with it for a while? > > Also, there are some folks like Alex Martelli whose talks I will seek out no > matter what he's talking about (because it's always worthwhile). Likewise, > it's not irrelevant if a speaker previously gave a talk that sucked. > > Surely, the review process has room for improvements and better balance but > anonymizing is a step too far IMO. I agree. The year we went fully anonymous did not turn out as well as previous years. And I would also like to say this is off-topic for python-ideas. This would be better discussed on the pycon-pc list or directly at some other PyCon mailing list. -Brett From jnoller at gmail.com Wed Nov 4 02:32:57 2009 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 3 Nov 2009 20:32:57 -0500 Subject: [Python-ideas] Anonymizing the PyCon review process In-Reply-To: <19184.41881.63114.226901@jon.es> References: <19184.41881.63114.226901@jon.es> Message-ID: <4222a8490911031732w477e8bdbm6a7007049b4035f7@mail.gmail.com> On Tue, Nov 3, 2009 at 4:41 PM, Terry Jones wrote: ...mega snip ... > > Terry Jones This is not the correct forum for this. If you have issues with the process, you can email me, as I am/was the acting committee chair for talks, Van Lindberg as the pycon chair, or the pycon-org, or pycon-pc lists. Jesse From jnoller at gmail.com Wed Nov 4 02:42:35 2009 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 3 Nov 2009 20:42:35 -0500 Subject: [Python-ideas] Anonymizing the PyCon review process In-Reply-To: References: <19184.41881.63114.226901@jon.es> Message-ID: <4222a8490911031742pd1d297fm8ccbd97f1bd1628c@mail.gmail.com> On Tue, Nov 3, 2009 at 7:17 PM, Dj Gilcrease wrote: > On Tue, Nov 3, 2009 at 4:32 PM, Raymond Hettinger wrote: > This is true as well. And should be considered in stage one of the > process, but a single talk that sucked I would say is not a basis to > reject someone outright. I would say two poor talks @pycon in the past > three years warrants a negative vote, but a single bad talk, unless > the voter attended that talk in person, isnt justification for a > negative vote. You're welcome to offer your feedback to the pycon-pc, or pycon-org mailing lists. jesse From stephen at xemacs.org Wed Nov 4 04:08:05 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 04 Nov 2009 12:08:05 +0900 Subject: [Python-ideas] Anonymizing the PyCon review process In-Reply-To: References: <19184.41881.63114.226901@jon.es> Message-ID: <87ocnj0yy2.fsf@uwakimon.sk.tsukuba.ac.jp> Raymond Hettinger writes: > [Terry Jones] > > The obvious suggestion is to anonymize the review process. > > FWIW, that was tried and the people complained about that too. > > Who would you rather hear speak about the future of Python, Guido > and someone else? > About the state of Twisted, from someone on that team or from a > user who read the Twisted book? > About UnladedSwallow or AppEngine, someone on Google's team or > someone who has played around with it for a while? That's what invited talks are for. Guido van Rossum or Alex Martelli, you invite them to give a keynote. But you can also salt the regular sessions with "invited" speakers. There's nothing that says that people can't suggest themselves for invitations. > Surely, the review process has room for improvements and better > balance but anonymizing is a step too far IMO. Anonymizing is the only way to get a reasonable balance between the very short-term view you are presenting, and the long-term view of encouraging new participants with good ideas and discouraging/warning old-timers whose ideas and views have gone stale, or even started to stink. Good proposals have a fairly high correlation with good talks; although you can't expect to win them all. You don't have to anonymize all the sessions/talks, either, but probably at least half should be refereed blind. From jnoller at gmail.com Wed Nov 4 04:38:29 2009 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 3 Nov 2009 22:38:29 -0500 Subject: [Python-ideas] Anonymizing the PyCon review process In-Reply-To: <87ocnj0yy2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <19184.41881.63114.226901@jon.es> <87ocnj0yy2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4222a8490911031938w705d7c28raf8ac839b54af577@mail.gmail.com> On Tue, Nov 3, 2009 at 10:08 PM, Stephen J. Turnbull wrote: > Raymond Hettinger writes: > > ?> [Terry Jones] > ?> > The obvious suggestion is to anonymize the review process. > ?> > ?> FWIW, that was tried and the people complained about that too. > ?> > ?> Who would you rather hear speak about the future of Python, Guido > ?> and someone else? > ?> About the state of Twisted, from someone on that team or from a > ?> user who read the Twisted book? > ?> About UnladedSwallow or AppEngine, someone on Google's team or > ?> someone who has played around with it for a while? > > That's what invited talks are for. ?Guido van Rossum or Alex Martelli, > you invite them to give a keynote. ?But you can also salt the regular > sessions with "invited" speakers. ?There's nothing that says that > people can't suggest themselves for invitations. > Invited speakers are invited, they don't self nominate. If you have suggestions to change this, please join the pycon mailing list(s). > Anonymizing is the only way to get a reasonable balance between the > very short-term view you are presenting, and the long-term view of > encouraging new participants with good ideas and discouraging/warning > old-timers whose ideas and views have gone stale, or even started to > stink. ?Good proposals have a fairly high correlation with good talks; > although you can't expect to win them all. ?You don't have to > anonymize all the sessions/talks, either, but probably at least half > should be refereed blind. I encourage you to participate in Pycon organization next year. This is not the proper forum. Jesse From tjreedy at udel.edu Wed Nov 4 20:36:59 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 04 Nov 2009 14:36:59 -0500 Subject: [Python-ideas] Anonymizing the PyCon review process In-Reply-To: <4222a8490911031732w477e8bdbm6a7007049b4035f7@mail.gmail.com> References: <19184.41881.63114.226901@jon.es> <4222a8490911031732w477e8bdbm6a7007049b4035f7@mail.gmail.com> Message-ID: Jesse Noller wrote: > On Tue, Nov 3, 2009 at 4:41 PM, Terry Jones wrote: > > ...mega snip ... >> Terry Jones > > This is not the correct forum for this. If you have issues with the > process, you can email me, as I am/was the acting committee chair for > talks, Van Lindberg as the pycon chair, or the pycon-org, or pycon-pc > lists. I am not going to join the list but I have a couple of suggestions: If you are going judge speakers, do so openly; perhaps even tell poor speakers not to bother again. Proposals with multiple authors should then have blanks for 'intended speaker' and 'alternate speaker' so the right person can be judged. Editors of professional journals read reviews for acceptibility before sending them on to authors. Consider the same here. Terry Jan Reedy From fetchinson at googlemail.com Wed Nov 4 22:09:11 2009 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Wed, 4 Nov 2009 22:09:11 +0100 Subject: [Python-ideas] Anonymizing the PyCon review process In-Reply-To: <19184.41881.63114.226901@jon.es> References: <19184.41881.63114.226901@jon.es> Message-ID: > Last night I got a couple of PyCon talks rejected, and someone else sent me > a rejection email they'd received. I wasn't surprised at the rejections, > but I was quite surprised that many of the review comments were at least in > part based on the presenter (sometimes the incorrectly assumed presenter) > instead of on the proposed talk. [snip] It seems you are mistaken about the purpose of a conference (not only python, but any academic or professional gathering). Since the number of presenters is always much less than the number of people who listen, the goal of every good conference organizer should be to first look after the interest of the people who listen and only then look after the interest of the presenters. In other words, the number one goal is to have the audience enjoy the show. Once this is done, one can think about what is in the interest of the presenters. Whenever there is a conflict, the interest of the audience comes first. Making somebody well known in the python community by giving him a slot at pycon is not the number one goal of a conference. So if somebody is sad/unhappy/etc because he can not present, well, that's not a problem at all if the decision was made in the good faith that turning away the person and giving the slot to someone else will increase the enjoyment of the audience. I believe the cases you mentioned fall into this category. Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From stefan_ml at behnel.de Thu Nov 5 08:35:18 2009 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 05 Nov 2009 08:35:18 +0100 Subject: [Python-ideas] UCS2 vs UCS4 ABIs In-Reply-To: References: Message-ID: Daniel Stutzbach, 02.11.2009 17:53: > Scope > ----- > > This idea affects the CPython ABI for extension modules. It has no impact > on the Python language syntax nor other Python implementations. > > The Problem > ----------- > > Currently, Python can be built with an internal Unicode representation of > UCS2 or UCS4. The two are binary incompatible, but the distinction is not > included as part of the platform name. Isn't that the main issue here? IMHO, if EasyInstall was fixed to distinguish extensions for UCS2/UCS4 platforms, that would just make the issue go away for most users. Not for extension builders and package maintainers, admittedly, but certainly for most users. Stefan From daniel at stutzbachenterprises.com Thu Nov 5 14:46:41 2009 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 5 Nov 2009 07:46:41 -0600 Subject: [Python-ideas] UCS2 vs UCS4 ABIs In-Reply-To: References: Message-ID: On Thu, Nov 5, 2009 at 1:35 AM, Stefan Behnel wrote: > Isn't that the main issue here? IMHO, if EasyInstall was fixed to > distinguish extensions for UCS2/UCS4 platforms, that would just make the > issue go away for most users. Not for extension builders and package > maintainers, admittedly, but certainly for most users. > If easy_install were fixed in the way suggested by PJE [1], eggs could effectively be labeled as "UCS2", "UCS4", or "Don't Care". Right now, all eggs are essentially labeled "Don't Care", even if they will fail to link. My proposal would greatly expand the number of eggs that can legitimately be labeled "Don't Care". It's a complementary proposal; fixing easy_install is certainly still important. :-) [1] http://bit.ly/1bO62 -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Nov 6 02:15:36 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 05 Nov 2009 20:15:36 -0500 Subject: [Python-ideas] Add encoding attribute to bytes Message-ID: A Python interpreter has one encoding for floats, ints, and strings. sys.float_info and sys.int_info give details about the first two. although they are mostly invisible to user code. (I presume they are attached to sys rather than float and int precisely because this.) A couple of recent posts have discussed making the unicode encoding (UCS2 v 4) both less visible and more discoverable to extensions. Bytes are nearly always an encoding of *something*, but the particular encoding used is instance-specific. As Guido has said, the programmer must keep track. But how? In an OO language, one obvious way is as an attribute of the instance. That would be carried with the instance and make it self-identifying. What I do not know if it is feasible to give an immutable instance of a builtin class a mutable attribute slot. If it were, I think this could make 3.x bytes easier and more transparent to use. When a string is encoded to bytes, the attribute would be set. If it were then pickled, the attribute would be stored with it and restored with it, and less easily lost. If it were then decoded, the attribute would be used. If it were sent to the net, the attribute would be used to set the appropriate headers. The reverse process would apply from net to bytes to (unicode) text. Bytes representing other types of data, such as nedia could also be tagged, not just those representing text. This would be a proposal for 3.3 at the earliest. It would involved revising stdlib modules, as appropriate, to use the new info. Terry Jan Reedy From python at mrabarnett.plus.com Fri Nov 6 03:19:35 2009 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 06 Nov 2009 02:19:35 +0000 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: <4AF387B7.1060808@mrabarnett.plus.com> Terry Reedy wrote: > A Python interpreter has one encoding for floats, ints, and strings. > sys.float_info and sys.int_info give details about the first two. > although they are mostly invisible to user code. (I presume they are > attached to sys rather than float and int precisely because this.) A > couple of recent posts have discussed making the unicode encoding (UCS2 > v 4) both less visible and more discoverable to extensions. > > Bytes are nearly always an encoding of *something*, but the particular > encoding used is instance-specific. As Guido has said, the programmer > must keep track. But how? In an OO language, one obvious way is as an > attribute of the instance. That would be carried with the instance and > make it self-identifying. > > What I do not know if it is feasible to give an immutable instance of a > builtin class a mutable attribute slot. If it were, I think this could > make 3.x bytes easier and more transparent to use. When a string is > encoded to bytes, the attribute would be set. If it were then pickled, > the attribute would be stored with it and restored with it, and less > easily lost. If it were then decoded, the attribute would be used. If it > were sent to the net, the attribute would be used to set the appropriate > headers. The reverse process would apply from net to bytes to (unicode) > text. > > Bytes representing other types of data, such as nedia could also be > tagged, not just those representing text. > > This would be a proposal for 3.3 at the earliest. It would involved > revising stdlib modules, as appropriate, to use the new info. > You said "give an immutable instance of a builtin class a mutable attribute slot". Why would the slot be mutable? Surely if the attribute said that the bytes represented a certain type of data then you shouldn't be able to change it. ("The attribute says that the bytes are UTF-8, but I'm going to change it so that it says they are ISO-8859-1.") I think that the attribute should be immutable. From stephen at xemacs.org Fri Nov 6 05:18:11 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 06 Nov 2009 13:18:11 +0900 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: <4AF387B7.1060808@mrabarnett.plus.com> References: <4AF387B7.1060808@mrabarnett.plus.com> Message-ID: <87pr7w2sn0.fsf@uwakimon.sk.tsukuba.ac.jp> MRAB writes: > You said "give an immutable instance of a builtin class a mutable > attribute slot". Why would the slot be mutable? I think the idea is that in many cases you won't know what the encoding is until after you've read the bytes. But I don't really see this idea as that useful either way. The obvious use case for me would be in the email module. So you read in a message and create a bytes object, which you stash away for later use as necessary. The header and the body, each MIME part, each MIME part header and payload, and so on recursively are identified as slices of the BigBytesObject you read in at the beginning, which is implicitly a binary blob and doesn't need an encoding (strike one). Each header identifies the encoding (which here would have to refer ambiguously to Content-Type or Content-Transfer-Encoding, strike two) of the corresponding payload. And you'll need to deal with cases where Content-Type and Content-Transfer-Encoding are both relevant, strike three. You may as well keep the various layers of encoding explicitly in email-specific objects, so use case: email strikes out. That's only one use case, of course. But we can see what a use case would have to look like: you read in a bytes object, just enough to enable you to accurately parse the rest of the stream in the same way and tag each bytes part with an appropriate encoding. What are they? From ncoghlan at gmail.com Fri Nov 6 10:13:26 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 06 Nov 2009 19:13:26 +1000 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: <4AF3E8B6.1080002@gmail.com> Terry Reedy wrote: > Bytes are nearly always an encoding of *something*, but the particular > encoding used is instance-specific. As Guido has said, the programmer > must keep track. But how? In an OO language, one obvious way is as an > attribute of the instance. That would be carried with the instance and > make it self-identifying. I work in comms and spend a lot of time shuttling bytes from one place to another without caring in the least about the encoding. Caring about that kind of detail is application layer stuff and belongs in application layer objects. More importantly, such an attribute implies a defined responsibility for keeping it accurate. For application layer objects, it is possible to define that. For a low level data structure like bytes, it isn't. Attaching metadata to something without defining a responsible entity for keeping that metadata accurate and up to date is a recipe for trouble. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From g.brandl at gmx.net Sat Nov 7 00:17:36 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 07 Nov 2009 00:17:36 +0100 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: Terry Reedy schrieb: > A Python interpreter has one encoding for floats, ints, and strings. > sys.float_info and sys.int_info give details about the first two. > although they are mostly invisible to user code. (I presume they are > attached to sys rather than float and int precisely because this.) A > couple of recent posts have discussed making the unicode encoding (UCS2 > v 4) both less visible and more discoverable to extensions. > > Bytes are nearly always an encoding of *something*, but the particular > encoding used is instance-specific. As Guido has said, the programmer > must keep track. But how? In an OO language, one obvious way is as an > attribute of the instance. That would be carried with the instance and > make it self-identifying. > > What I do not know if it is feasible to give an immutable instance of a > builtin class a mutable attribute slot. As soon as you can mutate an instance, it is not an immutable type anymore. Calling it "immutable" despite will cause trouble. (The same bytes instance could be used somewhere else transparently, e.g. as a function default argument, or cached as a constant local.) As for the usefulness, I often have to work with proprietary communication protocols between computer and devices, and there the bytes have no encoding whatsoever (though I agree that most bytes do have a meaningful encoding). However, a class as fundamental as "bytes" should not be burdened with an attribute that may not even apply -- it's easy to make a custom class to represent a (bytes, encoding) pair. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From jimjjewett at gmail.com Sat Nov 7 22:07:06 2009 From: jimjjewett at gmail.com (Jim Jewett) Date: Sat, 7 Nov 2009 16:07:06 -0500 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: On Thu, Nov 5, 2009 at 8:15 PM, Terry Reedy wrote: > A Python interpreter has one encoding for floats, ints, and strings. > sys.float_info and sys.int_info give details about the first two. (Instead of changing bytes,) This suggests a sys.string_info that contains information about the default string representation --including whether the internal encoding is UCS2 or UCS4 or something else. That should at least make it possible to give better diagnostic messages. -jJ From zooko at zooko.com Sat Nov 7 22:36:40 2009 From: zooko at zooko.com (Zooko Wilcox-O'Hearn) Date: Sat, 7 Nov 2009 13:36:40 -0800 Subject: [Python-ideas] UCS2 vs UCS4 ABIs In-Reply-To: References: Message-ID: <08EF3DBA-5812-47FF-BFA9-CDE7BBA7A900@zooko.com> Please see also this thread: http://www.mail-archive.com/python-dev at python.org/msg42272.html This is a complementary proposal: that the Python devs should encourage the Linux distributors to converge on a common UCS2/4 choice. If the ABI improvement that you suggest is not adopted, then my proposal will help users. If the ABI improvement that you suggest is adopted, then my proposal will still help users. Likewise with the proposal to include the UCS2/4 configuration in the platform description on Linux: http://bugs.python.org/setuptools/ issue78 . If that proposal is not implemented, then my proposal will help users. If setuptools issue78 is implemented, then my proposal will still help users. Regards, Zooko From ncoghlan at gmail.com Sun Nov 8 05:24:36 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 08 Nov 2009 14:24:36 +1000 Subject: [Python-ideas] Refactoring the import system to be object-oriented at the top level Message-ID: <4AF64804.3010802@gmail.com> (Disclaimer: this is complicated Py-in-the-sky stuff, and I'm handwaving away a lot of major problems with the concept, not least of which is the sheer amount of work involved. I just wanted to get the idea published somewhere while I was thinking about it) I'm in the process of implementing a runpy.run_path function for 2.7/3.2 to allow Python code to use the zipfile and directory execution feature provided by the CPython command line in 2.6/3.1. It turns out the global state used for the import system is causing some major pain in the implementation. It's solvable, but it will probably involve a couple of rather ugly hacks and the result sure as hell isn't going to be thread-safe. Anyway, the gist of the idea in the subject line is to take all of the PEP 302 data stores and make them attributes of an ImportEngine class. This would affect at least: sys.modules sys.path sys.path_hooks sys.path_importer_cache sys.meta_path sys.dont_write_bytecode The underlying import machinery would migrate to instance methods of the new class. The standard import engine instance would be stored in a new sys module attribute (e.g. 'sys.import_engine'). For backwards compatibility, the existing sys attributes would remain as references to the relevant instance attributes of the standard engine. Modules would get a new special attribute (e.g. '__import_engine__') identifying the import engine that was used to import that module. __import__ would be modified to take the new special attribute into account. The main immediate benefit from my point of view would be to allow runpy to create *copies* of the standard import engine so that runpy.run_module and runpy.run_path could go do their thing without impacting the rest of the interpreter. At the moment that really isn't feasible, hence the lack of thread safety in that module. I suspect such a change would greatly simplify experimentation with Python security models as well: restricted code could be given a restricted import engine rather than the restrictions having to be baked in to the standard import engine. >From an OO design point of view, it's a classic migration of global state and multiple functions to manipulate that state into a single global instance of a new class. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From grosser.meister.morti at gmx.net Sun Nov 8 11:19:00 2009 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sun, 08 Nov 2009 11:19:00 +0100 Subject: [Python-ideas] Refactoring the import system to be object-oriented at the top level In-Reply-To: <4AF64804.3010802@gmail.com> References: <4AF64804.3010802@gmail.com> Message-ID: <4AF69B14.9090006@gmx.net> Sounds like the ClassLoader in Java. I think it would be a good idea. On 11/08/2009 05:24 AM, Nick Coghlan wrote: > ... > Anyway, the gist of the idea in the subject line is to take all of the > PEP 302 data stores and make them attributes of an ImportEngine class. > This would affect at least: > > sys.modules > sys.path > sys.path_hooks > sys.path_importer_cache > sys.meta_path > sys.dont_write_bytecode > > The underlying import machinery would migrate to instance methods of the > new class. > > The standard import engine instance would be stored in a new sys module > attribute (e.g. 'sys.import_engine'). For backwards compatibility, the > existing sys attributes would remain as references to the relevant > instance attributes of the standard engine. > > Modules would get a new special attribute (e.g. '__import_engine__') > identifying the import engine that was used to import that module. > __import__ would be modified to take the new special attribute into account. > ... From fetchinson at googlemail.com Sun Nov 8 16:36:51 2009 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Sun, 8 Nov 2009 16:36:51 +0100 Subject: [Python-ideas] [OT] Python history Message-ID: Are Guido's articles on python history at python-history.blogspot.com over? I guess there are lots more anecdotes and stories still but the blog hasn't been updated for a long time so was wondering whether I should expect anything down the pipe? Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From brett at python.org Sun Nov 8 21:54:06 2009 From: brett at python.org (Brett Cannon) Date: Sun, 8 Nov 2009 12:54:06 -0800 Subject: [Python-ideas] Refactoring the import system to be object-oriented at the top level In-Reply-To: <4AF64804.3010802@gmail.com> References: <4AF64804.3010802@gmail.com> Message-ID: On Sat, Nov 7, 2009 at 20:24, Nick Coghlan wrote: > (Disclaimer: this is complicated Py-in-the-sky stuff, and I'm handwaving > away a lot of major problems with the concept, not least of which is the > sheer amount of work involved. I just wanted to get the idea published > somewhere while I was thinking about it) > > I'm in the process of implementing a runpy.run_path function for 2.7/3.2 > to allow Python code to use the zipfile and directory execution feature > provided by the CPython command line in 2.6/3.1. It turns out the global > state used for the import system is causing some major pain in the > implementation. It's solvable, but it will probably involve a couple of > rather ugly hacks and the result sure as hell isn't going to be thread-safe. > > Anyway, the gist of the idea in the subject line is to take all of the > PEP 302 data stores and make them attributes of an ImportEngine class. > This would affect at least: > > sys.modules > sys.path > sys.path_hooks > sys.path_importer_cache > sys.meta_path > sys.dont_write_bytecode > > The underlying import machinery would migrate to instance methods of the > new class. > Do you really mean methods or just instance attributes? I personally don't care personally, but it does require more of an API design otherwise. > The standard import engine instance would be stored in a new sys module > attribute (e.g. 'sys.import_engine'). For backwards compatibility, the > existing sys attributes would remain as references to the relevant > instance attributes of the standard engine. How would that work? Because they are module attributes there is no way to use a property to have them return what the current sys.import_engine uses. > > Modules would get a new special attribute (e.g. '__import_engine__') > identifying the import engine that was used to import that module. > __import__ would be modified to take the new special attribute into account. > Take into account how? As in when importing a package to always use the import engine used for the parent module in the package? > The main immediate benefit from my point of view would be to allow runpy > to create *copies* of the standard import engine so that > runpy.run_module and runpy.run_path could go do their thing without > impacting the rest of the interpreter. At the moment that really isn't > feasible, hence the lack of thread safety in that module. > > I suspect such a change would greatly simplify experimentation with > Python security models as well: restricted code could be given a > restricted import engine rather than the restrictions having to be baked > in to the standard import engine. > Huh, I wonder made you think about that example? =) > >From an OO design point of view, it's a classic migration of global > state and multiple functions to manipulate that state into a single > global instance of a new class. If anything it makes it easier to discover everything that affects importing instead of having to crawl through the sys docs to find every attribute that happens to mention the word "import". -Brett From ncoghlan at gmail.com Sun Nov 8 21:56:01 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 09 Nov 2009 06:56:01 +1000 Subject: [Python-ideas] Refactoring the import system to be object-oriented at the top level In-Reply-To: <4AF69B14.9090006@gmx.net> References: <4AF64804.3010802@gmail.com> <4AF69B14.9090006@gmx.net> Message-ID: <4AF73061.7040701@gmail.com> Mathias Panzenb?ck wrote: > Sounds like the ClassLoader in Java. I think it would be a good idea. On further reflection, it occurred to me that it should be possible to do something along these lines with importlib, *without* necessarily replacing the builtin import machinery (i.e. having a special instance of the class that mapped its instance attributes to the appropriate sys module attributes). Unfortunately, I doubt I'll have the cycles any time soon to pursue the idea further :P Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Sun Nov 8 22:20:50 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 09 Nov 2009 07:20:50 +1000 Subject: [Python-ideas] Refactoring the import system to be object-oriented at the top level In-Reply-To: References: <4AF64804.3010802@gmail.com> Message-ID: <4AF73632.9000405@gmail.com> Brett Cannon wrote: >> The underlying import machinery would migrate to instance methods of the >> new class. >> > > Do you really mean methods or just instance attributes? I personally > don't care personally, but it does require more of an API design > otherwise. I did mean methods, but I also realise how much work would be involved in actually following up on this idea. (As the saying goes, real innovation is 1% inspiration, 99% perspiration!) If you don't move the machinery itself into instance methods then you just end up having to pass the storage object around to various functions. Might as well make that parameter 'self' and use methods. >> The standard import engine instance would be stored in a new sys module >> attribute (e.g. 'sys.import_engine'). For backwards compatibility, the >> existing sys attributes would remain as references to the relevant >> instance attributes of the standard engine. > > How would that work? Because they are module attributes there is no > way to use a property to have them return what the current > sys.import_engine uses. Yes, I eventually realised it would be better to turn the dependency around the other way (i.e. have an engine subclass that used properties to refer to the sys module attributes) >> Modules would get a new special attribute (e.g. '__import_engine__') >> identifying the import engine that was used to import that module. >> __import__ would be modified to take the new special attribute into account. >> > > Take into account how? As in when importing a package to always use > the import engine used for the parent module in the package? Yes, that's what I was thinking. That would be necessary to allow operations like the runpy methods to execute without having side effects on the main import machinery the way they do now. Having run_path and run_module functions that were as side-effect free (and hence thread-safe) as exec and execfile would be kind of cool. >> I suspect such a change would greatly simplify experimentation with >> Python security models as well: restricted code could be given a >> restricted import engine rather than the restrictions having to be baked >> in to the standard import engine. >> > > Huh, I wonder made you think about that example? =) Not *just* your efforts over the last few years, although those were definitely a major inspiration :) >> >From an OO design point of view, it's a classic migration of global >> state and multiple functions to manipulate that state into a single >> global instance of a new class. > > If anything it makes it easier to discover everything that affects > importing instead of having to crawl through the sys docs to find > every attribute that happens to mention the word "import". Heck, *I* had to stare at dir(sys) for a while to make the list of import-related attributes in my post and I've been working on import related code for years. Even then I almost missed 'dont_write_bytecode' and wouldn't be the least surprised if someone pointed out that I actually had missed something else :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From guido at python.org Sun Nov 8 22:43:06 2009 From: guido at python.org (Guido van Rossum) Date: Sun, 8 Nov 2009 13:43:06 -0800 Subject: [Python-ideas] [OT] Python history In-Reply-To: References: Message-ID: On Sun, Nov 8, 2009 at 7:36 AM, Daniel Fetchinson wrote: > Are Guido's articles on python history at python-history.blogspot.com over? > I guess there are lots more anecdotes and stories still but the blog > hasn't been updated for a long time so was wondering whether I should > expect anything down the pipe? I've got plenty more to post, eventually. "It's just resting." -- --Guido van Rossum (python.org/~guido) From brett at python.org Sun Nov 8 22:55:30 2009 From: brett at python.org (Brett Cannon) Date: Sun, 8 Nov 2009 13:55:30 -0800 Subject: [Python-ideas] Refactoring the import system to be object-oriented at the top level In-Reply-To: <4AF73632.9000405@gmail.com> References: <4AF64804.3010802@gmail.com> <4AF73632.9000405@gmail.com> Message-ID: On Sun, Nov 8, 2009 at 13:20, Nick Coghlan wrote: > Brett Cannon wrote: >>> The underlying import machinery would migrate to instance methods of the >>> new class. >>> >> >> Do you really mean methods or just instance attributes? I personally >> don't care personally, but it does require more of an API design >> otherwise. > > I did mean methods, but I also realise how much work would be involved > in actually following up on this idea. (As the saying goes, real > innovation is 1% inspiration, 99% perspiration!) > > If you don't move the machinery itself into instance methods then you > just end up having to pass the storage object around to various > functions. Might as well make that parameter 'self' and use methods. > I don't quite follow. What difference does it make if they are instance attributes compared to methods? The data still needs to be stored somewhere that is unique per instance to get the semantics you want. The other thing you could do with this is provide import_module() on the object so it is a fully self-contained object that can do an entire import on its own without having to touch anything else (heck, you could even go so far as to have their own module cache, but that might be too far as all loaders currently are expected to work with sys.modules). >>> The standard import engine instance would be stored in a new sys module >>> attribute (e.g. 'sys.import_engine'). For backwards compatibility, the >>> existing sys attributes would remain as references to the relevant >>> instance attributes of the standard engine. >> >> How would that work? Because they are module attributes there is no >> way to use a property to have them return what the current >> sys.import_engine uses. > > Yes, I eventually realised it would be better to turn the dependency > around the other way (i.e. have an engine subclass that used properties > to refer to the sys module attributes) Yeah, you could have it default to the attributes on the sys module if no instance attributes are set. > >>> Modules would get a new special attribute (e.g. '__import_engine__') >>> identifying the import engine that was used to import that module. >>> __import__ would be modified to take the new special attribute into account. >>> >> >> Take into account how? As in when importing a package to always use >> the import engine used for the parent module in the package? > > Yes, that's what I was thinking. That would be necessary to allow > operations like the runpy methods to execute without having side effects > on the main import machinery the way they do now. > > Having run_path and run_module functions that were as side-effect free > (and hence thread-safe) as exec and execfile would be kind of cool. > That would be nice to have. >>> I suspect such a change would greatly simplify experimentation with >>> Python security models as well: restricted code could be given a >>> restricted import engine rather than the restrictions having to be baked >>> in to the standard import engine. >>> >> >> Huh, I wonder made you think about that example? =) > > Not *just* your efforts over the last few years, although those were > definitely a major inspiration :) > >>> >From an OO design point of view, it's a classic migration of global >>> state and multiple functions to manipulate that state into a single >>> global instance of a new class. >> >> If anything it makes it easier to discover everything that affects >> importing instead of having to crawl through the sys docs to find >> every attribute that happens to mention the word "import". > > Heck, *I* had to stare at dir(sys) for a while to make the list of > import-related attributes in my post and I've been working on import > related code for years. Even then I almost missed 'dont_write_bytecode' > and wouldn't be the least surprised if someone pointed out that I > actually had missed something else :) Yeah, there are a lot of them. From fetchinson at googlemail.com Sun Nov 8 23:41:18 2009 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Sun, 8 Nov 2009 23:41:18 +0100 Subject: [Python-ideas] [OT] Python history In-Reply-To: References: Message-ID: >> Are Guido's articles on python history at python-history.blogspot.com >> over? >> I guess there are lots more anecdotes and stories still but the blog >> hasn't been updated for a long time so was wondering whether I should >> expect anything down the pipe? > > I've got plenty more to post, eventually. > > "It's just resting." Great! Looking forward to them! Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From ncoghlan at gmail.com Mon Nov 9 13:36:18 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 09 Nov 2009 22:36:18 +1000 Subject: [Python-ideas] Refactoring the import system to be object-oriented at the top level In-Reply-To: References: <4AF64804.3010802@gmail.com> <4AF73632.9000405@gmail.com> Message-ID: <4AF80CC2.4070204@gmail.com> Brett Cannon wrote: > On Sun, Nov 8, 2009 at 13:20, Nick Coghlan wrote: >> Brett Cannon wrote: >>>> The underlying import machinery would migrate to instance methods of the >>>> new class. >>>> >>> Do you really mean methods or just instance attributes? I personally >>> don't care personally, but it does require more of an API design >>> otherwise. >> I did mean methods, but I also realise how much work would be involved >> in actually following up on this idea. (As the saying goes, real >> innovation is 1% inspiration, 99% perspiration!) >> >> If you don't move the machinery itself into instance methods then you >> just end up having to pass the storage object around to various >> functions. Might as well make that parameter 'self' and use methods. >> > > I don't quite follow. What difference does it make if they are > instance attributes compared to methods? The data still needs to be > stored somewhere that is unique per instance to get the semantics you > want. > > The other thing you could do with this is provide import_module() on > the object so it is a fully self-contained object that can do an > entire import on its own without having to touch anything else (heck, > you could even go so far as to have their own module cache, but that > might be too far as all loaders currently are expected to work with > sys.modules). Slight miscommunication there: by "underlying import machinery" I meant the functions that currently do the heavy lifting for imports (i.e. most of the code in import.c), along with their equivalents in importlib. The sys attribute equivalents would indeed just be normal attributes on the as-yet-hypothetical ImportEngine instances. I suspect you're right that there would be problems with the PEP 302 design currently encouraging loader and importer implementations to work with the sys attributes directly - backwards compatibility on that front is one of the big issues I was handwaving away in the original post. A PEP 3115 inspired thought is it may make sense to allow loaders to split load_module() into two distinct steps (prepare_module() and exec_module()) and leave the sys.modules manipulation to the import engine. That is (using the sample load_module() implementation from PEP 302), something along the lines of: def prepare_module(self, fullname, mod=None): if mod is None: mod = imp.new_module(fullname) mod.__file__ = "<%s>" % self.__class__.__name__ mod.__loader__ = self if self._is_package(fullname): mod.__path__ = [] return mod def exec_module(self, fullname, mod): exec self._get_code(fullname) in mod.__dict__ The key difference here is that module caching becomes entirely the responsibility of the import engine rather than relying on each loader to do it correctly. It would also give the import engine a chance to monkey with the module globals before the module code is executed (e.g. ensuring __package__ is set, setting a new __import_engine__ variable, overriding __import__ to play nicely with the current import engine) If a non-global import system adopted such an alternate loader protocol it could easily avoid invoking standard loaders that directly manipulated the sys attributes. >>>> The standard import engine instance would be stored in a new sys module >>>> attribute (e.g. 'sys.import_engine'). For backwards compatibility, the >>>> existing sys attributes would remain as references to the relevant >>>> instance attributes of the standard engine. >>> How would that work? Because they are module attributes there is no >>> way to use a property to have them return what the current >>> sys.import_engine uses. >> Yes, I eventually realised it would be better to turn the dependency >> around the other way (i.e. have an engine subclass that used properties >> to refer to the sys module attributes) > > Yeah, you could have it default to the attributes on the sys module if > no instance attributes are set. I was actually thinking of a SysImportEngine subclass that turned them all into properties that referenced the appropriate objects in sys. I'm starting to convince myself that I should *find* the time to experiment with this in the sandbox... then again, I wouldn't be entirely surprised if Guido deemed all this outright abuse of the import system :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From brett at python.org Mon Nov 9 19:08:07 2009 From: brett at python.org (Brett Cannon) Date: Mon, 9 Nov 2009 10:08:07 -0800 Subject: [Python-ideas] Fwd: [Python-Dev] PEP 3003 - Python Language Moratorium In-Reply-To: References: <200911081115.01803.steve@pearwood.info> <200911091006.26433.steve@pearwood.info> Message-ID: FYI to everyone on this list. ---------- Forwarded message ---------- From: Guido van Rossum Date: Mon, Nov 9, 2009 at 09:56 Subject: Re: [Python-Dev] PEP 3003 - Python Language Moratorium To: Brett Cannon Cc: python-dev at python.org Thanks Brett. ?I've moved the moratorium PEP to Status: Accepted. I've added the words about inclusion of 3.2 and exclusion of 3.3 (which were eaten by a svn conflict when I previously tried to add them) and added a section to th end stating that an extension will require another PEP. --Guido [snip - non-critical email Guido was replying to] From tjreedy at udel.edu Tue Nov 10 03:15:40 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 09 Nov 2009 21:15:40 -0500 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: Jim Jewett wrote: > On Thu, Nov 5, 2009 at 8:15 PM, Terry Reedy wrote: >> A Python interpreter has one encoding for floats, ints, and strings. >> sys.float_info and sys.int_info give details about the first two. > > (Instead of changing bytes,) > > This suggests a sys.string_info that contains information about the > default string representation --including whether the internal > encoding is UCS2 or UCS4 or something else. > > That should at least make it possible to give better diagnostic messages. What to do about interpreter-wide unicode string info, if anything, is related but separate from what to do about instance-specific bytes info. From tjreedy at udel.edu Tue Nov 10 03:22:14 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 09 Nov 2009 21:22:14 -0500 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: <4AF387B7.1060808@mrabarnett.plus.com> References: <4AF387B7.1060808@mrabarnett.plus.com> Message-ID: MRAB wrote: > Terry Reedy wrote: >> A Python interpreter has one encoding for floats, ints, and strings. >> sys.float_info and sys.int_info give details about the first two. >> although they are mostly invisible to user code. (I presume they are >> attached to sys rather than float and int precisely because this.) A >> couple of recent posts have discussed making the unicode encoding >> (UCS2 v 4) both less visible and more discoverable to extensions. >> >> Bytes are nearly always an encoding of *something*, but the particular >> encoding used is instance-specific. As Guido has said, the programmer >> must keep track. But how? In an OO language, one obvious way is as an >> attribute of the instance. That would be carried with the instance and >> make it self-identifying. >> >> What I do not know if it is feasible to give an immutable instance of >> a builtin class a mutable attribute slot. If it were, I think this >> could make 3.x bytes easier and more transparent to use. When a string >> is encoded to bytes, the attribute would be set. If it were then >> pickled, the attribute would be stored with it and restored with it, >> and less easily lost. If it were then decoded, the attribute would be >> used. If it were sent to the net, the attribute would be used to set >> the appropriate headers. The reverse process would apply from net to >> bytes to (unicode) text. >> >> Bytes representing other types of data, such as nedia could also be >> tagged, not just those representing text. >> >> This would be a proposal for 3.3 at the earliest. It would involved >> revising stdlib modules, as appropriate, to use the new info. >> > You said "give an immutable instance of a builtin class a mutable > attribute slot". Why would the slot be mutable? As Stephen said, in case the info is initially missing or determined to be erroneous. > Surely if the attribute > said that the bytes represented a certain type of data then you > shouldn't be able to change it. ("The attribute says that the bytes are > UTF-8, but I'm going to change it so that it says they are ISO-8859-1.") > I think that the attribute should be immutable. Encoding set by unicode.encode or a wrapper thereof is definitionally correct and should not be changed. Encoding inferred by mimetype header or file extension might be erroneous. I had in mind that the difference might be indicated somehow: 'utf8' versus 'utf8?', for instance. Terry Jan Reedy From tjreedy at udel.edu Tue Nov 10 03:44:11 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 09 Nov 2009 21:44:11 -0500 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: Georg Brandl wrote: >> What I do not know if it is feasible to give an immutable instance of a >> builtin class a mutable attribute slot. > > As soon as you can mutate an instance, it is not an immutable type anymore. > Calling it "immutable" despite will cause trouble. (The same bytes instance > could be used somewhere else transparently, e.g. as a function default > argument, or cached as a constant local.) OK, scratch that implementation of my idea. > > As for the usefulness, I often have to work with proprietary communication > protocols between computer and devices, and there the bytes have no encoding > whatsoever Random bits? It seems to me that protocol means some sort of encoding, formatting, or structuring, some sort of agreed on interpretation, even if private. > (though I agree that most bytes do have a meaningful encoding). > However, a class as fundamental as "bytes" should not be burdened with an > attribute that may not even apply -- it's easy to make a custom class to > represent a (bytes, encoding) pair. The fundamental problem I am interested in is the separation of raw data from how to use it info. Text encoding of bytes in only one instance, though the most common that pops up on Python list. I had also thought of something like (imcomplete): class Textbytes: def __init__(self, text, code): if type(text) is str: text = text.encode(code) if type(text) is bytes: self.text = text self.code = code else: raise ValueError() def __str__(self): return self.text.decode(self.code) b = Textbytes('abc', 'utf8') print(b) One problem is that it is a lot bulkier than a raw bytes. Leaving that aside, a custom class is just that: custom. Stdlib modules will neither accept nor produce such a wrapper rathar than bytes. My underlying idea is that maybe the standard Python distribution should promote encapsulation of encoding info with raw bytes to make bug-free usage easier. Adding an attribute was one implementation idea. Adding a standardized wrapper class (at least in a module) would be another. Terry Jan Reedy From python at mrabarnett.plus.com Tue Nov 10 03:54:45 2009 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 10 Nov 2009 02:54:45 +0000 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: <4AF387B7.1060808@mrabarnett.plus.com> Message-ID: <4AF8D5F5.6010704@mrabarnett.plus.com> Terry Reedy wrote: > MRAB wrote: >> Terry Reedy wrote: >>> A Python interpreter has one encoding for floats, ints, and strings. >>> sys.float_info and sys.int_info give details about the first two. >>> although they are mostly invisible to user code. (I presume they are >>> attached to sys rather than float and int precisely because this.) A >>> couple of recent posts have discussed making the unicode encoding >>> (UCS2 v 4) both less visible and more discoverable to extensions. >>> >>> Bytes are nearly always an encoding of *something*, but the >>> particular encoding used is instance-specific. As Guido has said, the >>> programmer must keep track. But how? In an OO language, one obvious >>> way is as an attribute of the instance. That would be carried with >>> the instance and make it self-identifying. >>> >>> What I do not know if it is feasible to give an immutable instance of >>> a builtin class a mutable attribute slot. If it were, I think this >>> could make 3.x bytes easier and more transparent to use. When a >>> string is encoded to bytes, the attribute would be set. If it were >>> then pickled, the attribute would be stored with it and restored with >>> it, and less easily lost. If it were then decoded, the attribute >>> would be used. If it were sent to the net, the attribute would be >>> used to set the appropriate headers. The reverse process would apply >>> from net to bytes to (unicode) text. >>> >>> Bytes representing other types of data, such as nedia could also be >>> tagged, not just those representing text. >>> >>> This would be a proposal for 3.3 at the earliest. It would involved >>> revising stdlib modules, as appropriate, to use the new info. >>> >> You said "give an immutable instance of a builtin class a mutable >> attribute slot". Why would the slot be mutable? > > As Stephen said, in case the info is initially missing or determined to > be erroneous. > >> Surely if the attribute >> said that the bytes represented a certain type of data then you >> shouldn't be able to change it. ("The attribute says that the bytes are >> UTF-8, but I'm going to change it so that it says they are ISO-8859-1.") >> I think that the attribute should be immutable. > > Encoding set by unicode.encode or a wrapper thereof is definitionally > correct and should not be changed. Encoding inferred by mimetype header > or file extension might be erroneous. I had in mind that the difference > might be indicated somehow: 'utf8' versus 'utf8?', for instance. > I was thinking more along the lines of saying that the attribute (default None) is specified when the bytes object is created. You wouldn't be able to change it, but you could create a new bytes object with a different attribute: new_bytes = bytes(old_bytes, "utf8") The actual bytes themselves wouldn't need to be copied; they could be safely shared because 'bytes' objects are immutable. There then comes the question of whether new_bytes == old_bytes. From stephen at xemacs.org Tue Nov 10 05:30:22 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 10 Nov 2009 13:30:22 +0900 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: <87vdhjdmsh.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > The fundamental problem I am interested in is the separation of raw data > from how to use it info. But this is ambiguous. Take reStructuredText. It *is* text/plain. But it also *is* application/x-structuredtext. Not to forget application/octet-stream. An MUA will treat it as the first, docutils as the second, and gzip as the third. > My underlying idea is that maybe the standard Python distribution > should promote encapsulation of encoding info with raw bytes to > make bug-free usage easier. I think you will find that every use case makes different demands on this feature, and that it typically interacts with higher-level needs of the application. There's a reason that ASN.1 is insanely complex and only applications that really need it ever use it. This feature will either be too simple to serve most practical needs, or too complex to serve most practical programmers. And "bug-free" usage is hopeless. Much, perhaps the vast majority, of the coding information will be automatically derived from sources you deprecate as "heuristic", like MIME Content-Type headers. It will get attached to the bytes as an attribute, and after that you can't know how reliable it is. If you have a practical example of such a simple class (bytes + encoding attribute) that serves as a base for more complex applications, I'd really like to see them. But until there are real use cases on the table, I have to say I can't see the proposed facility as being particularly useful to the email package, for example. From g.brandl at gmx.net Tue Nov 10 09:20:15 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 10 Nov 2009 08:20:15 +0000 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: Terry Reedy schrieb: > Georg Brandl wrote: > >>> What I do not know if it is feasible to give an immutable instance of a >>> builtin class a mutable attribute slot. >> >> As soon as you can mutate an instance, it is not an immutable type anymore. >> Calling it "immutable" despite will cause trouble. (The same bytes instance >> could be used somewhere else transparently, e.g. as a function default >> argument, or cached as a constant local.) > > OK, scratch that implementation of my idea. >> >> As for the usefulness, I often have to work with proprietary communication >> protocols between computer and devices, and there the bytes have no encoding >> whatsoever > > Random bits? It seems to me that protocol means some sort of encoding, > formatting, or structuring, some sort of agreed on interpretation, even > if private. Sure, but nothing you could map entirely onto a string of Unicode characters. Georg From ncoghlan at gmail.com Tue Nov 10 11:41:26 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Nov 2009 20:41:26 +1000 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: <4AF94356.6040502@gmail.com> Terry Reedy wrote: >> As for the usefulness, I often have to work with proprietary >> communication >> protocols between computer and devices, and there the bytes have no >> encoding >> whatsoever > > Random bits? It seems to me that protocol means some sort of encoding, > formatting, or structuring, some sort of agreed on interpretation, even > if private. This is true, but the encoding scheme *isn't* a property of the binary data in and of itself. It's metadata about it that guides the application as to how the stream should be interpreted. For a lot of the things I've done in the past, I haven't cared at all about the encoding of binary data - I've just been schlepping bits from point A to point B and back without caring what they actually *meant*. Other times I didn't have to guess or pass any metadata around because the comms port was hardwired to a particular device that only knew one way of communicating - the definition of the protocol was implicit in the implementation of the interface software. In fact, one of the key features typically desired in a communications protocol is for it to be content neutral: you push binary data in one end and get the same binary data out of the other end. Peer applications using the channel to communicate with each other don't need to care what the channel is doing with the data, but equally importantly, the software implementing the comms channel doesn't need to know how to interpret the bits it is transporting*. For other applications, the Unicode encoding might be important to know. Some will care more about the MIME type, or use some other defined binary encoding (what is the Unicode encoding of an sqlite or bsddb database file?). Other applications may be interested in a proprietary binary format that is formally defined solely by the code that knows how to read and write it. Can bytes be used to store encoded Unicode data? Sure they can. But they can be used for a whole host of other things as well, so burdening them with an attribute that is occasional helpful, but more often dead weight or even outright misleading would be a mistake. Cheers, Nick. * Sometimes a bit more coupling makes sense when there are engineering advantages to be had, but this is usually an application specific thing (e.g. IP has a protocol field that identifies different application layer protocols such as TCP, UDP and ESP which have different network performance expectations, This allows IP network routers to apply different rules without having to peek inside the payload of each IP packet) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From tjreedy at udel.edu Tue Nov 10 21:10:13 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 10 Nov 2009 15:10:13 -0500 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: Message-ID: Georg Brandl wrote: > Terry Reedy schrieb: >> Random bits? It seems to me that protocol means some sort of encoding, >> formatting, or structuring, some sort of agreed on interpretation, even >> if private. > > Sure, but nothing you could map entirely onto a string of Unicode characters. My idea is not limited to unicode encodings. But I see that one field/attribute can be either too many or too few, and hence not a universal solution. From tjreedy at udel.edu Tue Nov 10 21:12:50 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 10 Nov 2009 15:12:50 -0500 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: <4AF94356.6040502@gmail.com> References: <4AF94356.6040502@gmail.com> Message-ID: Nick Coghlan wrote: > Terry Reedy wrote: >>> As for the usefulness, I often have to work with proprietary >>> communication >>> protocols between computer and devices, and there the bytes have no >>> encoding >>> whatsoever >> Random bits? It seems to me that protocol means some sort of encoding, >> formatting, or structuring, some sort of agreed on interpretation, even >> if private. > > This is true, but the encoding scheme *isn't* a property of the binary > data in and of itself. It's metadata about it that guides the > application as to how the stream should be interpreted. > > For a lot of the things I've done in the past, I haven't cared at all > about the encoding of binary data - I've just been schlepping bits from > point A to point B and back without caring what they actually *meant*. > Other times I didn't have to guess or pass any metadata around because > the comms port was hardwired to a particular device that only knew one > way of communicating - the definition of the protocol was implicit in > the implementation of the interface software. > > In fact, one of the key features typically desired in a communications > protocol is for it to be content neutral: you push binary data in one > end and get the same binary data out of the other end. Peer applications > using the channel to communicate with each other don't need to care what > the channel is doing with the data, but equally importantly, the > software implementing the comms channel doesn't need to know how to > interpret the bits it is transporting*. > > For other applications, the Unicode encoding might be important to know. > Some will care more about the MIME type, or use some other defined > binary encoding (what is the Unicode encoding of an sqlite or bsddb > database file?). Other applications may be interested in a proprietary > binary format that is formally defined solely by the code that knows how > to read and write it. > > Can bytes be used to store encoded Unicode data? Sure they can. But they > can be used for a whole host of other things as well, so burdening them > with an attribute that is occasional helpful, but more often dead weight > or even outright misleading would be a mistake. > > Cheers, > Nick. > > * Sometimes a bit more coupling makes sense when there are engineering > advantages to be had, but this is usually an application specific thing > (e.g. IP has a protocol field that identifies different application > layer protocols such as TCP, UDP and ESP which have different network > performance expectations, This allows IP network routers to apply > different rules without having to peek inside the payload of each IP packet) Your experience has been different from mine. Thanks for the exposition. I can see why you prefer metadata to either be in the stream itself or as part of a wrapper object. Terry Jan Reedy From ncoghlan at gmail.com Tue Nov 10 22:26:44 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 11 Nov 2009 07:26:44 +1000 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: References: <4AF94356.6040502@gmail.com> Message-ID: <4AF9DA94.6000502@gmail.com> Terry Reedy wrote: > Your experience has been different from mine. Thanks for the exposition. > I can see why you prefer metadata to either be in the stream itself or > as part of a wrapper object. One of the things I've learned on python-list/-dev/-ideas is that the *kind* of software one writes regularly makes a big difference to what seems like a good idea. I tend to write fairly low level hardware control code, so that's the way I tend to think. Others come from the financial world or from an academic/scientific background or are interested in Python for education purposes or in building big frameworks that try to solve the world (or at least a particular problem space within it ;). It says a lot about Python's flexibility as a language that it applies so well to so many different problem domains, but it can lead to some interesting discussions when we try to align the interests of all those different ill-defined groups :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From greg.ewing at canterbury.ac.nz Wed Nov 11 06:42:41 2009 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 11 Nov 2009 18:42:41 +1300 Subject: [Python-ideas] Add encoding attribute to bytes In-Reply-To: <4AF9DA94.6000502@gmail.com> References: <4AF94356.6040502@gmail.com> <4AF9DA94.6000502@gmail.com> Message-ID: <4AFA4ED1.4030002@canterbury.ac.nz> Nick Coghlan wrote: > It says a lot about Python's flexibility as a language that it applies > so well to so many different problem domains, but it can lead to some > interesting discussions when we try to align the interests of all those > different ill-defined groups :) Yes, and I think that because of this diversity of requirements, it's very important to keep the basic building blocks of the language as simple and focused as possible. The fundamental types should each concentrate on doing just one thing and doing it well. Seems to me the bytes type is just right as it is -- basic raw data that you can use any way you see fit. Anything more specialised should be built by the user to suit their use case. -- Greg From richismyname at gmail.com Wed Nov 11 17:33:30 2009 From: richismyname at gmail.com (Richard Saunders) Date: Wed, 11 Nov 2009 09:33:30 -0700 Subject: [Python-ideas] Ordered Dictionary Literals Message-ID: Hey All, We were exploring new features of Python 3.0 at our Tucson User's Group here in Tucson (TuPLE: Tucson Python Language Enthusiasts), in particular, the OrderedDict. See http://groups.google.com/group/TuPLEgroup/browse_thread/thread/40af73f8e194a4f8 Has there been any discussion about making a "better" OrderedDict literal? I did some googling and didn't find anything. Basically, the thought was there might be a place for a slightly better literal for OrderedDict in Python 3.0 od = OrderedDict([('a',1),('b':2)]) # seems clumsy The two ideas floated were: od = ['a':1, 'b':2, 'c':3] # [ ] imply order, the ':' implies key-value or od = o{'a':1, 'b':2, 'c':3} # { } imply dict, o implies order Apologies if this is the wrong place for this discussion. There has been a lot of opinions flying here at work and at TuPLE which I will be happy to share if this is the right place. ;) Gooday, Richie -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Wed Nov 11 17:44:12 2009 From: masklinn at masklinn.net (Masklinn) Date: Wed, 11 Nov 2009 17:44:12 +0100 Subject: [Python-ideas] Ordered Dictionary Literals In-Reply-To: References: Message-ID: On 11 Nov 2009, at 17:33 , Richard Saunders wrote: > Hey All, > > We were exploring new features of Python 3.0 at our Tucson User's Group > here in Tucson (TuPLE: Tucson Python Language Enthusiasts), in particular, > the OrderedDict. See > > http://groups.google.com/group/TuPLEgroup/browse_thread/thread/40af73f8e194a4f8 > > Has there been any discussion about making a "better" OrderedDict literal? I > did > some googling and didn't find anything. > > Basically, the thought was there might be a place for a slightly better > literal for OrderedDict > in Python 3.0 > od = OrderedDict([('a',1),('b':2)]) # seems clumsy > > The two ideas floated were: > od = ['a':1, 'b':2, 'c':3] # [ ] imply order, the ':' implies key-value > > or > > od = o{'a':1, 'b':2, 'c':3} # { } imply dict, o implies order > > Apologies if this is the wrong place for this discussion. There has been > a lot of opinions flying here at work and at TuPLE which I will be happy > to share if this is the right place. ;) The first one is, I think, pretty smart considering the built-in set syntax in 3.x is the same as the dict's, except without the colons ({1, 2, 3} is a set, {'a':1, 'b':2, 'c':3}). Sadly (for the proposal) since PEP 3003 on the Python Language Moratorium (http://www.python.org/dev/peps/pep-3003/) was accepted there can be no change to the language's syntax until the end of the moratorium, and this would be a syntactic alteration of the language. Your only option is therefore to stash it until the end of the moratorium (maybe take the time to try it out/implement it, and submit a full PEP with a ready-made implementation when the moratorium ends). Anyway, if you choose to take the time to implement a proof of concept during the moratorium, I personally prefer the first idea to the second one. From tjreedy at udel.edu Wed Nov 11 20:09:44 2009 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 11 Nov 2009 14:09:44 -0500 Subject: [Python-ideas] Ordered Dictionary Literals In-Reply-To: References: Message-ID: Richard Saunders wrote: > > Hey All, > > We were exploring new features of Python 3.0 at our Tucson User's Group You should actually be using 3.1 if you are not. > The two ideas floated were: > od = ['a':1, 'b':2, 'c':3] # [ ] imply order, the ':' implies key-value Interesting idea, but this would mean making ordered dict a fundamental builtin type that all implementations must include rather than a somewhat optional module import. I do not think it qualifies, at least not yet. > > or > > od = o{'a':1, 'b':2, 'c':3} # { } imply dict, o implies order > > Apologies if this is the wrong place for this discussion. Perfect place for such idea. Terry Jan Reedy From python at mrabarnett.plus.com Wed Nov 11 20:39:37 2009 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 11 Nov 2009 19:39:37 +0000 Subject: [Python-ideas] Ordered Dictionary Literals In-Reply-To: References: Message-ID: <4AFB12F9.1010102@mrabarnett.plus.com> Terry Reedy wrote: > Richard Saunders wrote: >> >> Hey All, >> >> We were exploring new features of Python 3.0 at our Tucson User's Group > > You should actually be using 3.1 if you are not. > > >> The two ideas floated were: >> od = ['a':1, 'b':2, 'c':3] # [ ] imply order, the ':' implies >> key-value > > Interesting idea, but this would mean making ordered dict a fundamental > builtin type that all implementations must include rather than a > somewhat optional module import. I do not think it qualifies, at least > not yet. >> >> or >> >> od = o{'a':1, 'b':2, 'c':3} # { } imply dict, o implies order >> >> Apologies if this is the wrong place for this discussion. > > Perfect place for such idea. > This was discussed in mid June. Guido said -100. From ncoghlan at gmail.com Wed Nov 11 22:13:07 2009 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 12 Nov 2009 07:13:07 +1000 Subject: [Python-ideas] Ordered Dictionary Literals In-Reply-To: References: Message-ID: <4AFB28E3.4080509@gmail.com> Richard Saunders wrote: > Has there been any discussion about making a "better" OrderedDict > literal? I did > some googling and didn't find anything. I think we want to see it get some more field testing as part of the collections module before bringing it into the core of the language is seriously considered. If it's popular and useful, then doing so is definitely a possibility though. The history of the set builtin (i.e. first introduced in a module, then made a builtin, then given literal syntax) is indicative of the kind of time frames we're talking about here. Rather than immediately jumping to a literal though, it might prove to be more fruitful to explore the use of an ordered dictionary for keyword arguments. Without that, convenient shortcuts like "OrderedDict(a=1, b=2, c=3)" would never become possible. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From cmjohnson.mailinglist at gmail.com Thu Nov 12 05:43:13 2009 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Wed, 11 Nov 2009 18:43:13 -1000 Subject: [Python-ideas] Ordered Dictionary Literals In-Reply-To: <4AFB12F9.1010102@mrabarnett.plus.com> References: <4AFB12F9.1010102@mrabarnett.plus.com> Message-ID: <3bdda690911112043q483d02b6s62c0a888c5d3f12c@mail.gmail.com> 2009/11/11 MRAB: > This was discussed in mid June. Guido said -100. Indeed. See http://mail.python.org/pipermail/python-ideas/2009-June/thread.html#4916 and http://mail.python.org/pipermail/python-ideas/2009-June/004924.html From solipsis at pitrou.net Thu Nov 12 17:46:26 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 12 Nov 2009 16:46:26 +0000 (UTC) Subject: [Python-ideas] Ordered Dictionary Literals References: Message-ID: Richard Saunders writes: > > Basically, the thought was there might be a place for a slightly better > literal for OrderedDict > in Python 3.0 > od = OrderedDict([('a',1),('b':2)]) # seems clumsy How about something like: od = OrderedDict.from_literal(""" {'a': 1, 'b': 2} """) Of course, you need to hook/reimplement a full-blown parser :) From grosser.meister.morti at gmx.net Thu Nov 12 22:27:40 2009 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Thu, 12 Nov 2009 22:27:40 +0100 Subject: [Python-ideas] Ordered Dictionary Literals In-Reply-To: References: Message-ID: <4AFC7DCC.2090401@gmx.net> On 11/12/2009 05:46 PM, Antoine Pitrou wrote: > Richard Saunders writes: >> >> Basically, the thought was there might be a place for a slightly better >> literal for OrderedDict >> in Python 3.0 >> od = OrderedDict([('a',1),('b':2)]) # seems clumsy > > How about something like: > > od = OrderedDict.from_literal(""" > {'a': 1, 'b': 2} """) > > Of course, you need to hook/reimplement a full-blown parser :) > this would eliminate the [ ]: def odict(*items): return OrderedDict(items) od = odict(('a', 1), ('b', 2)) well, 2 chars isn't much. however, I don't think its worth the effort. From arnodel at googlemail.com Fri Nov 13 11:07:26 2009 From: arnodel at googlemail.com (Arnaud Delobelle) Date: Fri, 13 Nov 2009 10:07:26 +0000 Subject: [Python-ideas] Ordered Dictionary Literals In-Reply-To: <4AFC7DCC.2090401@gmx.net> References: <4AFC7DCC.2090401@gmx.net> Message-ID: <9bfc700a0911130207s2eefe15s2462c5301e7647bb@mail.gmail.com> 2009/11/12 Mathias Panzenb?ck : > On 11/12/2009 05:46 PM, Antoine Pitrou wrote: >> Richard Saunders writes: >>> >>> Basically, the thought was there might be a place for a slightly better >>> literal for OrderedDict >>> in Python 3.0 >>> ?od = OrderedDict([('a',1),('b':2)]) ?# seems clumsy >> >> How about something like: >> >> od = OrderedDict.from_literal(""" >> ? {'a': 1, 'b': 2} """) >> >> Of course, you need to hook/reimplement a full-blown parser :) >> > > this would eliminate the [ ]: > def odict(*items): > ? ? ? ?return OrderedDict(items) > > od = odict(('a', 1), ('b', 2)) > > well, 2 chars isn't much. however, I don't think its worth the effort. Or: def odict(*items): iteritems = iter(items) return OrderedDict(zip(items, items)) od = odict('a',1, 'b',2) Not worth it either :) -- Arnaud From solipsis at pitrou.net Tue Nov 24 19:39:03 2009 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 24 Nov 2009 18:39:03 +0000 (UTC) Subject: [Python-ideas] Remove GIL with CAS instructions? References: <4ADE2AA9.4030604@molden.no> Message-ID: Ok, out of curiousity I gave it a try: replacing INCREF/DECREF with atomic instructions (*) slows down the interpreter by 20 to 40% depending on the workload. And keep in mind this is the tip of the iceberg: to remove the GIL you'd also have to add fine-grained locking in other places (dicts etc.). Which makes me agree with the commonly expressed opinion that CPython would probably need to ditch refcounting (at least in the critical paths) if we want to remove the GIL. Regards Antoine. (*) using gcc's atomic primitives which, I have checked, are inlined as carefully optimized assembler: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html#Atomic-Builtins From guido at python.org Tue Nov 24 20:10:14 2009 From: guido at python.org (Guido van Rossum) Date: Tue, 24 Nov 2009 11:10:14 -0800 Subject: [Python-ideas] Remove GIL with CAS instructions? In-Reply-To: References: <4ADE2AA9.4030604@molden.no> Message-ID: Note that Greg Stein reached this same conclusion (and similar numbers) over 10 years ago... On Tue, Nov 24, 2009 at 10:39 AM, Antoine Pitrou wrote: > > Ok, out of curiousity I gave it a try: replacing INCREF/DECREF with atomic > instructions (*) slows down the interpreter by 20 to 40% depending on the > workload. And keep in mind this is the tip of the iceberg: to remove the GIL > you'd also have to add fine-grained locking in other places (dicts etc.). > > Which makes me agree with the commonly expressed opinion that CPython would > probably need to ditch refcounting (at least in the critical paths) if we want > to remove the GIL. > > Regards > > Antoine. > > > (*) using gcc's atomic primitives which, I have checked, are inlined as > carefully optimized assembler: > http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html#Atomic-Builtins > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From collinw at gmail.com Tue Nov 24 20:21:05 2009 From: collinw at gmail.com (Collin Winter) Date: Tue, 24 Nov 2009 11:21:05 -0800 Subject: [Python-ideas] Remove GIL with CAS instructions? In-Reply-To: References: <4ADE2AA9.4030604@molden.no> Message-ID: <43aa6ff70911241121n188cf23dqcb612ed05c8a366d@mail.gmail.com> On Tue, Nov 24, 2009 at 11:10 AM, Guido van Rossum wrote: > Note that Greg Stein reached this same conclusion (and similar > numbers) over 10 years ago... It's worth repeating this kind of experiment; the hardware landscape has changed a lot in 10 years. It's interesting that the results are the same a decade later. Collin > On Tue, Nov 24, 2009 at 10:39 AM, Antoine Pitrou wrote: >> >> Ok, out of curiousity I gave it a try: replacing INCREF/DECREF with atomic >> instructions (*) slows down the interpreter by 20 to 40% depending on the >> workload. And keep in mind this is the tip of the iceberg: to remove the GIL >> you'd also have to add fine-grained locking in other places (dicts etc.). >> >> Which makes me agree with the commonly expressed opinion that CPython would >> probably need to ditch refcounting (at least in the critical paths) if we want >> to remove the GIL. >> >> Regards >> >> Antoine. >> >> >> (*) using gcc's atomic primitives which, I have checked, are inlined as >> carefully optimized assembler: >> http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html#Atomic-Builtins >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From lie.1296 at gmail.com Thu Nov 26 23:00:27 2009 From: lie.1296 at gmail.com (Lie Ryan) Date: Fri, 27 Nov 2009 09:00:27 +1100 Subject: [Python-ideas] XOR In-Reply-To: References: <4AE7770D.5070303@molden.no> Message-ID: Alexander Belopolsky wrote: > On Tue, Oct 27, 2009 at 6:41 PM, Sturla Molden wrote: >> Why does Python have a bitwise but not a logical xor operator? > > .. because it does: != > >>>> True != True > False >>>> True != False > True >>>> False != False > False > > In 2.x you can even use <> if you like syntactic sugar. :-) > > On arbitrary types a xor b is arguably bool(a) != bool(b) rather than > simple a != b, but it is rare enough to warrant additional syntax. > > I thought I've seen this answered in an FAQ list somewhere. I've seen this in Java. But the field is different there, with no operator overloading != is always be equivalent with logical XOR. I'm +0 on the proposal, for being very rarely needed.