From tjreedy at udel.edu Wed Feb 1 01:29:36 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 31 Jan 2012 19:29:36 -0500 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> Message-ID: On 1/31/2012 1:06 PM, Eric Snow wrote: > +1 for reconsidering the d.[name] / d.(name) / d!name syntax. d.[name] is too much like d[name] The . that modifies the meaning of 'name' is too far away. d.(name) is like d.name except to me the () means to use the value of name rather than 'name' itself. This is just what you are trying to say. I believe () is used elsewhere with that meaning. I could live with this. d!name has the advantage? of no brackets, but just looks crazy since ! meant 'not' in Python. -- Terry Jan Reedy From ethan at stoneleaf.us Wed Feb 1 01:42:24 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 31 Jan 2012 16:42:24 -0800 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> Message-ID: <4F288A70.5000705@stoneleaf.us> Terry Reedy wrote: > On 1/31/2012 1:06 PM, Eric Snow wrote: > >> +1 for reconsidering the d.[name] / d.(name) / d!name syntax. > > d.[name] is too much like d[name] The . that modifies the meaning of > 'name' is too far away. > > d.(name) is like d.name except to me the () means to use the value of > name rather than 'name' itself. This is just what you are trying to say. > I believe () is used elsewhere with that meaning. I could live with this. > > d!name has the advantage? of no brackets, but just looks crazy since ! > meant 'not' in Python. I'm not a fan of any of the .[], .(), .{} patterns, nor of .! . What about the colon? d:name #use the value of name ~Ethan~ From python at mrabarnett.plus.com Wed Feb 1 02:06:18 2012 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 01 Feb 2012 01:06:18 +0000 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: <4F288A70.5000705@stoneleaf.us> References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F288A70.5000705@stoneleaf.us> Message-ID: <4F28900A.3080505@mrabarnett.plus.com> On 01/02/2012 00:42, Ethan Furman wrote: > Terry Reedy wrote: >> On 1/31/2012 1:06 PM, Eric Snow wrote: >> >>> +1 for reconsidering the d.[name] / d.(name) / d!name syntax. >> >> d.[name] is too much like d[name] The . that modifies the meaning of >> 'name' is too far away. >> >> d.(name) is like d.name except to me the () means to use the value of >> name rather than 'name' itself. This is just what you are trying to say. >> I believe () is used elsewhere with that meaning. I could live with this. >> >> d!name has the advantage? of no brackets, but just looks crazy since ! >> meant 'not' in Python. > > > I'm not a fan of any of the .[], .(), .{} patterns, nor of .! . > .() looks the most sensible to me. > What about the colon? > > d:name #use the value of name > Surely you jest? :-) From simon.sapin at kozea.fr Wed Feb 1 09:06:33 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Wed, 01 Feb 2012 09:06:33 +0100 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: <4F28900A.3080505@mrabarnett.plus.com> References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F288A70.5000705@stoneleaf.us> <4F28900A.3080505@mrabarnett.plus.com> Message-ID: <4F28F289.3080407@kozea.fr> Le 01/02/2012 02:06, MRAB a ?crit : >> > I'm not a fan of any of the .[], .(), .{} patterns, nor of .! . >> > > .() looks the most sensible to me. > If .[] looks like indexing, .() looks like calling. (I?m not for or against either of these, just pointing out that they have the same problem.) Regards, -- Simon Sapin From jsbueno at python.org.br Wed Feb 1 12:33:46 2012 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 1 Feb 2012 09:33:46 -0200 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: <4F28F289.3080407@kozea.fr> References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F288A70.5000705@stoneleaf.us> <4F28900A.3080505@mrabarnett.plus.com> <4F28F289.3080407@kozea.fr> Message-ID: On Wed, Feb 1, 2012 at 6:06 AM, Simon Sapin wrote: > Le 01/02/2012 02:06, MRAB a ?crit : > >>> > ?I'm not a fan of any of the .[], .(), .{} patterns, nor of .! . >>> > >> >> .() looks the most sensible to me. >> > > If .[] looks like indexing, .() looks like calling. (I?m not for or against > either of these, just pointing out that they have the same problem.) Still, there should be something with a closing token. Try to imagine three of these in a chain, if the syntax is a colon: name1:name2:name3:name4 -> which could mean either of: name1:(name2:(name3:name4)), (name1:name2):(name3.name4) and so on - (not to mention other expressions involving names, though these would be less ambiguous due to to operator precedence. js -><- > > Regards, > > -- > Simon Sapin > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From p.f.moore at gmail.com Wed Feb 1 14:05:44 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 1 Feb 2012 13:05:44 +0000 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F288A70.5000705@stoneleaf.us> <4F28900A.3080505@mrabarnett.plus.com> <4F28F289.3080407@kozea.fr> Message-ID: On 1 February 2012 11:33, Joao S. O. Bueno wrote: > Still, there should be something with a closing token. > Try to imagine three of these in a chain, if the syntax is a colon: > > name1:name2:name3:name4 -> which could mean either of: > > name1:(name2:(name3:name4)), (name1:name2):(name3.name4) > and so on ?- (not to mention other expressions involving names, though these > would be less ambiguous due to to operator precedence. No more so than a.b.c.d.e I would expect a:b to behave exactly the same as a.b, Except that it uses getitem rather than getattr under the hood (was that the proposal? I'm completely confused by now as to what this new syntax is intended to achieve...) But I don't like the idea in any case, so I remain -1 on the whole proposal. Paul. From massimo.dipierro at gmail.com Wed Feb 1 14:32:45 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Wed, 1 Feb 2012 07:32:45 -0600 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F288A70.5000705@stoneleaf.us> <4F28900A.3080505@mrabarnett.plus.com> <4F28F289.3080407@kozea.fr> Message-ID: Using x:[....] wouldn't it create ambiguities when parsing (lambda x:[....])? How about x. as a shortcut for x.__dict__ so we can do x.key -> x.__dict__['key'] x.[key] -> x.__dict__[key] x..keys() -> x.__dict__.keys() x..values() -> x.__dict__.values() for attribute in x.: print 'x.'+attribute and leave open the possibility of 3 dots for for ranges 1...5 -> range(1,5) 1,2...10 -> range(1,10,2-1) On Feb 1, 2012, at 7:05 AM, Paul Moore wrote: > On 1 February 2012 11:33, Joao S. O. Bueno wrote: >> Still, there should be something with a closing token. >> Try to imagine three of these in a chain, if the syntax is a colon: >> >> name1:name2:name3:name4 -> which could mean either of: >> >> name1:(name2:(name3:name4)), (name1:name2):(name3.name4) >> and so on - (not to mention other expressions involving names, though these >> would be less ambiguous due to to operator precedence. > > No more so than a.b.c.d.e > > I would expect a:b to behave exactly the same as a.b, Except that it > uses getitem rather than getattr under the hood (was that the > proposal? I'm completely confused by now as to what this new syntax is > intended to achieve...) > > But I don't like the idea in any case, so I remain -1 on the whole proposal. > > Paul. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From masklinn at masklinn.net Wed Feb 1 14:44:09 2012 From: masklinn at masklinn.net (Masklinn) Date: Wed, 1 Feb 2012 14:44:09 +0100 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F288A70.5000705@stoneleaf.us> <4F28900A.3080505@mrabarnett.plus.com> <4F28F289.3080407@kozea.fr> Message-ID: <88A63247-5EBD-4213-B065-36A81B4610F6@masklinn.net> On 2012-02-01, at 14:32 , Massimo Di Pierro wrote: > Using x:[....] wouldn't it create ambiguities when parsing (lambda x:[....])? > > How about x. as a shortcut for x.__dict__ so we can do > > x.key -> x.__dict__['key'] > x.[key] -> x.__dict__[key] > x..keys() -> x.__dict__.keys() > x..values() -> x.__dict__.values() > > for attribute in x.: > print 'x.'+attribute > > and leave open the possibility of 3 dots for for ranges > 1...5 -> range(1,5) > 1,2...10 -> range(1,10,2-1) Yeah, readability schmeadability. Also, >>> 1..__int__() 1 that's going to look good. From g.brandl at gmx.net Wed Feb 1 20:35:04 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 01 Feb 2012 20:35:04 +0100 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F288A70.5000705@stoneleaf.us> <4F28900A.3080505@mrabarnett.plus.com> <4F28F289.3080407@kozea.fr> Message-ID: Am 01.02.2012 14:32, schrieb Massimo Di Pierro: > Using x:[....] wouldn't it create ambiguities when parsing (lambda x:[....])? > > How about x. as a shortcut for x.__dict__ so we can do > > x.key -> x.__dict__['key'] > x.[key] -> x.__dict__[key] > x..keys() -> x.__dict__.keys() > x..values() -> x.__dict__.values() > > for attribute in x.: > print 'x.'+attribute > > and leave open the possibility of 3 dots for for ranges > 1...5 -> range(1,5) Actually no, because 1. is a float literal. So 1...keys() would already be valid, and you have to use four dots for ranges. I would suggest five to be on the safe side (plus it has as many dots as there are letters in "range", therefore easy to remember). SCNR, Georg From jeanpierreda at gmail.com Thu Feb 2 07:40:14 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 2 Feb 2012 01:40:14 -0500 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: <4F288A70.5000705@stoneleaf.us> References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F288A70.5000705@stoneleaf.us> Message-ID: On Tue, Jan 31, 2012 at 7:42 PM, Ethan Furman wrote: > What about the colon? The colon would be confusing in some circumstances. It's already used inside dict literals and slices. -- Devin From techtonik at gmail.com Thu Feb 2 09:41:39 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 2 Feb 2012 11:41:39 +0300 Subject: [Python-ideas] PEP x: Static module/package inspection In-Reply-To: References: <29228470.233.1324982829840.JavaMail.geo-discussion-forums@yqbl25> <20544069.58.1325067324546.JavaMail.geo-discussion-forums@yqiz15> Message-ID: A rather user friendly proof of the concept with `ast` module is ready. http://pypi.python.org/pypi/astdump/ `astdump` contains get_top_vars() method, which extracts sufficient information from module's AST to generate setup.py for itself. This capability can already be reused for plugin version discovery mechanisms. ISTM the working library should motivate authors better than a PEP convention. =) `astdump` doesn't provide complete module introspection capabilities. I've primarily focused on getting the output done, so for a proper API it would be nice to study use case examples first. `astdump` contains tree walker with filtering capabilities by node type and level. What "python-object" should expose and how to make this convenient is not completely clear for me. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Feb 2 14:35:10 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 2 Feb 2012 23:35:10 +1000 Subject: [Python-ideas] PEP x: Static module/package inspection In-Reply-To: References: <29228470.233.1324982829840.JavaMail.geo-discussion-forums@yqbl25> <20544069.58.1325067324546.JavaMail.geo-discussion-forums@yqiz15> Message-ID: On Thu, Dec 29, 2011 at 1:28 AM, Michael Foord wrote: > On a simple level, all of this is already "obtainable" by using the ast > module that can parse Python code. I would love to see a "python-object" > layer on top of this that will take an ast for a module (or other object) > and return something that represents the same object as the ast. > > So all module level objects will have corresponding objects - where they are > Python objects (builtin-literals) then they will represented exactly. For > classes and functions you'll get an object back that has the same attributes > plus some metadata (e.g. for functions /? methods what arguments they take > etc). > > That is certainly doable and would make introspecting-without-executing a > lot simpler. The existing 'clbr' (class browser) module in the stdlib also attempts to play in this same space. I wouldn't say it does it particularly *well* (since it's easy to confuse with valid Python constructs), but it tries. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From yselivanov.ml at gmail.com Fri Feb 3 16:09:49 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 3 Feb 2012 10:09:49 -0500 Subject: [Python-ideas] unpacking context managers in WITH statement Message-ID: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Hello, With the removal of "contextlib.nested" in python 3.2 nothing was introduced to replace it. However, I found it pretty useful, despite the fact that it had its own quirks. These quirks can (at least partially) be addressed by allowing unpacking syntax in the context manager. Consider the following snipped of code: ctxs = () if args.profile: ctxs += (ApplicationProfilerContext(),) if args.logging: ctxs += (ApplicationLoggingContext(),) with *ctxs: Application.run() As of now, without "nested" we have either option of reimplementing it, or to write lots of ugly code with nested 'try..except's. So the feature was taken out, but nothing replaced it. What do you think guys? Thanks, Yury From grosser.meister.morti at gmx.net Fri Feb 3 17:30:12 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Fri, 03 Feb 2012 17:30:12 +0100 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: <4F2C0B94.6090005@gmx.net> Of course there is something to replace nested: >>> with open("egg.txt","w") as egg, open("spam.txt","w") as spam: >>> egg.write("egg") >>> spam.write("spam") The nested function was removed because it is broken. E.g. take this: >>> with nested(open("egg.txt","w"), open("spam.txt","w")) as egg, spam: >>> egg.write("egg") >>> barspamwrite("spam") What if opening of spam.txt produces an exception? Then egg.txt will never be closed! The new with syntax takes care of this. It basically rewrites it as: >>> with open("egg.txt","w") as egg: >>> with open("spam.txt","w") as spam: >>> egg.write("egg") >>> spam.write("spam") On 02/03/2012 04:09 PM, Yury Selivanov wrote: > Hello, > > With the removal of "contextlib.nested" in python 3.2 nothing was introduced to replace it. However, I found it pretty useful, despite the fact that it had its own quirks. These quirks can (at least partially) be addressed by allowing unpacking syntax in the context manager. > > Consider the following snipped of code: > > ctxs = () > if args.profile: > ctxs += (ApplicationProfilerContext(),) > if args.logging: > ctxs += (ApplicationLoggingContext(),) > with *ctxs: > Application.run() > > As of now, without "nested" we have either option of reimplementing it, or to write lots of ugly code with nested 'try..except's. So the feature was taken out, but nothing replaced it. > > What do you think guys? > > Thanks, > Yury > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From grosser.meister.morti at gmx.net Fri Feb 3 17:35:10 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Fri, 03 Feb 2012 17:35:10 +0100 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: <4F2C0CBE.5010708@gmx.net> Oh, wait. You do something a bit different. Hm, yes, when you have a list of context managers its something different. Still, I'm not sure if it is a good thing to do it like you've proposed. After all, usually the constructor of a nested context manager shall only be called if the parent context could be entered. You would construct all context managers before you enter any. Maybe it's ok for for your case, but it might send the wrong signal to the developers and might be used like nested was (see my other mail). On 02/03/2012 04:09 PM, Yury Selivanov wrote: > Hello, > > With the removal of "contextlib.nested" in python 3.2 nothing was introduced to replace it. However, I found it pretty useful, despite the fact that it had its own quirks. These quirks can (at least partially) be addressed by allowing unpacking syntax in the context manager. > > Consider the following snipped of code: > > ctxs = () > if args.profile: > ctxs += (ApplicationProfilerContext(),) > if args.logging: > ctxs += (ApplicationLoggingContext(),) > with *ctxs: > Application.run() > > As of now, without "nested" we have either option of reimplementing it, or to write lots of ugly code with nested 'try..except's. So the feature was taken out, but nothing replaced it. > > What do you think guys? > > Thanks, > Yury > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From yselivanov.ml at gmail.com Fri Feb 3 17:36:53 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 3 Feb 2012 11:36:53 -0500 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: <4F2C0B94.6090005@gmx.net> References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> <4F2C0B94.6090005@gmx.net> Message-ID: This is not about explicitly writing comma-separated list of context managers in the statement, but rather about ability to compose this list dynamically. With unpacking you won't have the problem of uncaught exception in __new__/__init__, because such exception would propagate on the stage of constructing the list of managers. Exceptions occurred during the with statement execution, i.e. in __enter__ and __exit__ methods will work just fine, or am I missing something? On 2012-02-03, at 11:30 AM, Mathias Panzenb?ck wrote: > Of course there is something to replace nested: > > >>> with open("egg.txt","w") as egg, open("spam.txt","w") as spam: > >>> egg.write("egg") > >>> spam.write("spam") > > The nested function was removed because it is broken. E.g. take this: > > >>> with nested(open("egg.txt","w"), open("spam.txt","w")) as egg, spam: > >>> egg.write("egg") > >>> barspamwrite("spam") > > What if opening of spam.txt produces an exception? Then egg.txt will never be closed! The new with syntax takes care of this. It basically rewrites it as: > > >>> with open("egg.txt","w") as egg: > >>> with open("spam.txt","w") as spam: > >>> egg.write("egg") > >>> spam.write("spam") > > On 02/03/2012 04:09 PM, Yury Selivanov wrote: >> Hello, >> >> With the removal of "contextlib.nested" in python 3.2 nothing was introduced to replace it. However, I found it pretty useful, despite the fact that it had its own quirks. These quirks can (at least partially) be addressed by allowing unpacking syntax in the context manager. >> >> Consider the following snipped of code: >> >> ctxs = () >> if args.profile: >> ctxs += (ApplicationProfilerContext(),) >> if args.logging: >> ctxs += (ApplicationLoggingContext(),) >> with *ctxs: >> Application.run() >> >> As of now, without "nested" we have either option of reimplementing it, or to write lots of ugly code with nested 'try..except's. So the feature was taken out, but nothing replaced it. >> >> What do you think guys? >> >> Thanks, >> Yury >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From yselivanov.ml at gmail.com Fri Feb 3 17:47:29 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 3 Feb 2012 11:47:29 -0500 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: <4F2C0CBE.5010708@gmx.net> References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> <4F2C0CBE.5010708@gmx.net> Message-ID: Well, I bet most of the developers will continue using explicit syntax as it is just more convenient. Unpacking is just a specific feature to address some specific needs, where the case about "not executing constructor in case of parent context fault" may not be applicable. "With" statement is far more now than just about opening files after all ;) On 2012-02-03, at 11:35 AM, Mathias Panzenb?ck wrote: > Oh, wait. You do something a bit different. Hm, yes, when you have a list of context managers its something different. Still, I'm not sure if it is a good thing to do it like you've proposed. After all, usually the constructor of a nested context manager shall only be called if the parent context could be entered. You would construct all context managers before you enter any. Maybe it's ok for for your case, but it might send the wrong signal to the developers and might be used like nested was (see my other mail). > > On 02/03/2012 04:09 PM, Yury Selivanov wrote: >> Hello, >> >> With the removal of "contextlib.nested" in python 3.2 nothing was introduced to replace it. However, I found it pretty useful, despite the fact that it had its own quirks. These quirks can (at least partially) be addressed by allowing unpacking syntax in the context manager. >> >> Consider the following snipped of code: >> >> ctxs = () >> if args.profile: >> ctxs += (ApplicationProfilerContext(),) >> if args.logging: >> ctxs += (ApplicationLoggingContext(),) >> with *ctxs: >> Application.run() >> >> As of now, without "nested" we have either option of reimplementing it, or to write lots of ugly code with nested 'try..except's. So the feature was taken out, but nothing replaced it. >> >> What do you think guys? >> >> Thanks, >> Yury >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From fuzzyman at gmail.com Fri Feb 3 18:50:06 2012 From: fuzzyman at gmail.com (Michael Foord) Date: Fri, 3 Feb 2012 17:50:06 +0000 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: On 3 February 2012 15:09, Yury Selivanov wrote: > Hello, > > With the removal of "contextlib.nested" in python 3.2 nothing was > introduced to replace it. However, I found it pretty useful, despite the > fact that it had its own quirks. These quirks can (at least partially) be > addressed by allowing unpacking syntax in the context manager. > > Consider the following snipped of code: > > ctxs = () > if args.profile: > ctxs += (ApplicationProfilerContext(),) > if args.logging: > ctxs += (ApplicationLoggingContext(),) > with *ctxs: > Application.run() > Well, I quite like this syntax and it does allow you to do something not currently easily possible: with *ctxs as tuple_of_results: ... The use case is reasonably obscure however, and should this be possible: with *ctx, other as tuple_of_results, another: ... Michael > > As of now, without "nested" we have either option of reimplementing it, or > to write lots of ugly code with nested 'try..except's. So the feature was > taken out, but nothing replaced it. > > What do you think guys? > > Thanks, > Yury > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Feb 3 19:06:56 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 3 Feb 2012 13:06:56 -0500 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: <9BC442E1-569C-43D5-8A70-07B72ECCE50B@gmail.com> On 2012-02-03, at 12:50 PM, Michael Foord wrote: > with *ctxs as tuple_of_results: This is not necessary, as 'ctxs' already holds all instances of all context managers; so the 'ctxs' would be equal to 'tuple_of_results' > with *ctx, other as tuple_of_results, another: > ... Looks useful to me. - Yury From fuzzyman at gmail.com Fri Feb 3 19:09:56 2012 From: fuzzyman at gmail.com (Michael Foord) Date: Fri, 3 Feb 2012 18:09:56 +0000 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: <9BC442E1-569C-43D5-8A70-07B72ECCE50B@gmail.com> References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> <9BC442E1-569C-43D5-8A70-07B72ECCE50B@gmail.com> Message-ID: On 3 February 2012 18:06, Yury Selivanov wrote: > On 2012-02-03, at 12:50 PM, Michael Foord wrote: > > > with *ctxs as tuple_of_results: > > This is not necessary, as 'ctxs' already holds all instances of > all context managers; so the 'ctxs' would be equal to 'tuple_of_results' > The results are whatever is returned by ctx.__enter__(), not the context manager itself. Michael > > > with *ctx, other as tuple_of_results, another: > > ... > > Looks useful to me. > > - > Yury > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Feb 3 19:11:03 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 3 Feb 2012 13:11:03 -0500 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> <9BC442E1-569C-43D5-8A70-07B72ECCE50B@gmail.com> Message-ID: Ah, yes, my bad. Then I'm +1 on that one ;) On 2012-02-03, at 1:09 PM, Michael Foord wrote: > > > On 3 February 2012 18:06, Yury Selivanov wrote: > On 2012-02-03, at 12:50 PM, Michael Foord wrote: > > > with *ctxs as tuple_of_results: > > This is not necessary, as 'ctxs' already holds all instances of > all context managers; so the 'ctxs' would be equal to 'tuple_of_results' > > The results are whatever is returned by ctx.__enter__(), not the context manager itself. > > Michael > > > > with *ctx, other as tuple_of_results, another: > > ... > > Looks useful to me. > > - > Yury > > > > -- > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > > May you share freely, never taking more than you give. > > -- the sqlite blessing http://www.sqlite.org/different.html > From ncoghlan at gmail.com Sat Feb 4 07:22:48 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 4 Feb 2012 16:22:48 +1000 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: On Sat, Feb 4, 2012 at 1:09 AM, Yury Selivanov wrote: > As of now, without "nested" we have either option of reimplementing it, or to write lots of ugly code with nested 'try..except's. ?So the feature was taken out, but nothing replaced it. > > What do you think guys? I think you should try contextlib2 :) Specifically, ContextStack: http://contextlib2.readthedocs.org/en/latest/index.html#contextlib2.ContextStack Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From storchaka at gmail.com Sat Feb 4 22:17:49 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 04 Feb 2012 23:17:49 +0200 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: 03.02.12 17:09, Yury Selivanov ???????(??): > With the removal of "contextlib.nested" in python 3.2 nothing was introduced to replace it. However, I found it pretty useful, despite the fact that it had its own quirks. These quirks can (at least partially) be addressed by allowing unpacking syntax in the context manager. > > Consider the following snipped of code: > > ctxs = () > if args.profile: > ctxs += (ApplicationProfilerContext(),) > if args.logging: > ctxs += (ApplicationLoggingContext(),) > with *ctxs: > Application.run() > > As of now, without "nested" we have either option of reimplementing it, or to write lots of ugly code with nested 'try..except's. So the feature was taken out, but nothing replaced it. class EmptyContext: def __enter__(self): return self def __exit__(self, exc_type, exc_value, traceback): pass with ApplicationProfilerContext() if args.profile else EmptyContext(): with ApplicationLoggingContext() if args.logging else EmptyContext(): Application.run() Of cause, it will be better to use some special singleton value (None, False or ellipsis) instead EmptyContext(). If any false value would mean an empty context, we will be able to use "with args.profile and ApplicationProfilerContext()" idiom. From python at 2sn.net Sat Feb 4 23:12:53 2012 From: python at 2sn.net (Alexander Heger) Date: Sat, 04 Feb 2012 16:12:53 -0600 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: <4F2DAD65.1020708@2sn.net> > Well, I quite like this syntax and it does allow you to do something not > currently easily possible: > > with *ctxs as tuple_of_results: > ... > > The use case is reasonably obscure however, and should this be possible: > > with *ctx, other as tuple_of_results, another: wouldn't it be with *ctx, other as *tuple_of_results, another: to allow more general forms like with *ctx, other as first, *some_in_the_middle, last: -Alexander From python at 2sn.net Sat Feb 4 22:59:00 2012 From: python at 2sn.net (Alexander Heger) Date: Sat, 04 Feb 2012 15:59:00 -0600 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> Message-ID: <4F2DAA24.6000300@2sn.net> >> +1 for reconsidering the d.[name] / d.(name) / d!name syntax. > > d.[name] is too much like d[name] The . that modifies the meaning of > 'name' is too far away. I think this is the best choice. > d.(name) I think 'x' and ('x') should remain the same -Alexander From ncoghlan at gmail.com Sun Feb 5 01:20:30 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 5 Feb 2012 10:20:30 +1000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) Message-ID: Rather than throwing out random ideas in a popularity contest, it's important to carefully review what it is that people don't like about the current state of affairs. Currently, when dealing with attributes that are statically determined, your code looks like this: x.attr # Reference x.attr = y # Bind del x.attr # Unbind Now, suppose for some reason we want to determine the attributes *dynamically*. The reason for doing this could be as simple as wanting to avoid code duplication when performing the same operation on multiple attributes (i.e. "for attr in 'attr1 attr2 attr3'.split(): ..."). At this point, dedicated syntactic support disappears and we're now using builtin functions instead: getattr(x, attr) # Reference setattr(x, attr, y) # Bind delattr(x, attr) # Unbind hasattr(x, attr) # Existence query (essentially a shorthand for getattr() in a try/except block) So, that's the status quo any proposals are competing against. It's easy enough to write, easy to read and easy to look up if you don't already know what it does (an often underestimated advantage of builtin operations over syntax is that the former are generally *much* easier to look up in the documentation). However, it can start to look rather clumsy when multiple dynamic attribute operations are chained together. Compare this static code: x.attr1 = y.attr1 x.attr2 = y.attr2 x.attr3 = y.attr3 With the following dynamic code: for attr in "attr1 attr2 attr3".split(): setattr(x, attr, getattr(y, attr)) The inner assignment in that loop is *very* noisy for a simple assignment. Splitting out a temporary variable cleans things up a bit, but it's still fairly untidy: for attr in "attr1 attr2 attr3".split(): val = getattr(y, attr) setattr(x, attr, val) It would be a *lot* cleaner if we could just use a normal assignment statement instead of builtin functions to perform the name binding. As it turns out, for ordinary instances, we can already do exactly that: for attr in "attr1 attr2 attr3".split(): vars(x)[attr] = vars(y)[attr] In short, I think proposals for dedicated syntax for dynamic attribute access are misguided - instead, such efforts should go into enhancing vars() to return objects that support *full* dict-style access to the underlying object's attribute namespace (with descriptor protocol support and all). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Feb 5 01:38:40 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 5 Feb 2012 10:38:40 +1000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: On Sun, Feb 5, 2012 at 10:20 AM, Nick Coghlan wrote: > It would be a *lot* cleaner if we could just use a normal assignment > statement instead of builtin functions to perform the name binding. As > it turns out, for ordinary instances, we can already do exactly that: > > ? ?for attr in "attr1 attr2 attr3".split(): > ? ? ? ?vars(x)[attr] = vars(y)[attr] That can obviously also be written: xa, ya = vars(x), vars(y) for attr in "attr1 attr2 attr3".split(): va[attr] = ya[attr] In other words, don't think about new syntax. Think about how to correctly implement a full object proxy that provides the MutableMapping interface, with get/set/delitem on the proxy corresponding with get/set/delattr on the underlying object. Then think about whether or not returning such an object from vars() would be backwards compatible, or whether a new API would be needed to create one (e.g. attrview(x)). Finally, such an object can be prototyped quite happily outside the standard library, so consider writing it and publishing it on PyPI as a standalone module. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Sun Feb 5 02:13:19 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 04 Feb 2012 20:13:19 -0500 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: On 2/4/2012 7:20 PM, Nick Coghlan wrote: > Rather than throwing out random ideas in a popularity contest, it's > important to carefully review what it is that people don't like about > the current state of affairs. > > Currently, when dealing with attributes that are statically > determined, your code looks like this: > > x.attr # Reference > x.attr = y # Bind > del x.attr # Unbind > > Now, suppose for some reason we want to determine the attributes > *dynamically*. The reason for doing this could be as simple as wanting > to avoid code duplication when performing the same operation on > multiple attributes (i.e. "for attr in 'attr1 attr2 attr3'.split(): > ..."). > > At this point, dedicated syntactic support disappears and we're now > using builtin functions instead: > > getattr(x, attr) # Reference > setattr(x, attr, y) # Bind > delattr(x, attr) # Unbind > hasattr(x, attr) # Existence query (essentially a shorthand for > getattr() in a try/except block) > > So, that's the status quo any proposals are competing against. It's > easy enough to write, easy to read and easy to look up if you don't > already know what it does (an often underestimated advantage of > builtin operations over syntax is that the former are generally *much* > easier to look up in the documentation). Also, functions can be passed as arguments, whereas syntax cannot, which is why we have the operator module. > However, it can start to look rather clumsy when multiple dynamic > attribute operations are chained together. > > Compare this static code: > > x.attr1 = y.attr1 > x.attr2 = y.attr2 > x.attr3 = y.attr3 > > With the following dynamic code: > > for attr in "attr1 attr2 attr3".split(): > setattr(x, attr, getattr(y, attr)) > > The inner assignment in that loop is *very* noisy for a simple > assignment. Splitting out a temporary variable cleans things up a bit, > but it's still fairly untidy: > > for attr in "attr1 attr2 attr3".split(): > val = getattr(y, attr) > setattr(x, attr, val) > > It would be a *lot* cleaner if we could just use a normal assignment > statement instead of builtin functions to perform the name binding. As > it turns out, for ordinary instances, we can already do exactly that: > > for attr in "attr1 attr2 attr3".split(): > vars(x)[attr] = vars(y)[attr] > > In short, I think proposals for dedicated syntax for dynamic attribute > access are misguided - instead, such efforts should go into enhancing > vars() to return objects that support *full* dict-style access to the > underlying object's attribute namespace (with descriptor protocol > support and all). > > Cheers, > Nick. > -- Terry Jan Reedy From nathan.alexander.rice at gmail.com Sun Feb 5 03:03:03 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Sat, 4 Feb 2012 21:03:03 -0500 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: <4F2DAA24.6000300@2sn.net> References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F2DAA24.6000300@2sn.net> Message-ID: I think .() is the nicest of the suggestions thus far. I don't mind the <- and -> syntax so much either, I could live with obj<-foo. Nathan From cs at zip.com.au Sun Feb 5 03:17:44 2012 From: cs at zip.com.au (Cameron Simpson) Date: Sun, 5 Feb 2012 13:17:44 +1100 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: <20120205021744.GA8647@cskk.homeip.net> On 05Feb2012 10:20, Nick Coghlan wrote: [...] | In short, I think proposals for dedicated syntax for dynamic attribute | access are misguided - instead, such efforts should go into enhancing | vars() to return objects that support *full* dict-style access to the | underlying object's attribute namespace (with descriptor protocol | support and all). +10 _Where_ do you people find the time to write these well thought out posts? I'm very much for making vars() better supported (the docs have caveats about assigning to it). All the syntax suggestions I've seen look cumbersome or ugly and some are actively misleading to my eye (did I really see an "<-" in there?) I see my random sig quote picker has worked well again:-) Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ A strong conviction that something must be done is the parent of many bad measures. - Daniel Webster From ironfroggy at gmail.com Sun Feb 5 04:00:10 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 4 Feb 2012 22:00:10 -0500 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: On Sat, Feb 4, 2012 at 7:20 PM, Nick Coghlan wrote: > It would be a *lot* cleaner if we could just use a normal assignment > statement instead of builtin functions to perform the name binding. As > it turns out, for ordinary instances, we can already do exactly that: > > ? ?for attr in "attr1 attr2 attr3".split(): > ? ? ? ?vars(x)[attr] = vars(y)[attr] > > In short, I think proposals for dedicated syntax for dynamic attribute > access are misguided - instead, such efforts should go into enhancing > vars() to return objects that support *full* dict-style access to the > underlying object's attribute namespace (with descriptor protocol > support and all). I love the idea, and I think such a solution is much more straight forward than any syntax change. While it would be great to extend the functionality of vars(), it would be easier to add a new builtin that returns some kind of proxy. If vars() was changed to return this proxy, it could potentially break a lot of existing code mutating the dict returned by vars. The question is: Is yet another builtin or the messy compatibility change the worse option? -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ncoghlan at gmail.com Sun Feb 5 07:23:50 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 5 Feb 2012 16:23:50 +1000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: On Sun, Feb 5, 2012 at 1:00 PM, Calvin Spealman wrote: > The question is: Is yet another builtin or the messy compatibility change the > worse option? That's actually a question for (much) further down the road. The *current* question is whether anyone is interested enough in the concept to prototype it as a PyPI module. That's a lot more work than just posting suggestions here :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Sun Feb 5 08:40:30 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sun, 5 Feb 2012 00:40:30 -0700 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: On Sat, Feb 4, 2012 at 5:20 PM, Nick Coghlan wrote: > Rather than throwing out random ideas in a popularity contest, it's > important to carefully review what it is that people don't like about > the current state of affairs. > [snipped] > > It would be a *lot* cleaner if we could just use a normal assignment > statement instead of builtin functions to perform the name binding. As > it turns out, for ordinary instances, we can already do exactly that: > > ? ?for attr in "attr1 attr2 attr3".split(): > ? ? ? ?vars(x)[attr] = vars(y)[attr] > > In short, I think proposals for dedicated syntax for dynamic attribute > access are misguided - instead, such efforts should go into enhancing > vars() to return objects that support *full* dict-style access to the > underlying object's attribute namespace (with descriptor protocol > support and all). Good call on this, Nick. :) -eric From storchaka at gmail.com Sun Feb 5 13:46:00 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 05 Feb 2012 14:46:00 +0200 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: 05.02.12 02:20, Nick Coghlan ???????(??): > It would be a *lot* cleaner if we could just use a normal assignment > statement instead of builtin functions to perform the name binding. As > it turns out, for ordinary instances, we can already do exactly that: > > for attr in "attr1 attr2 attr3".split(): > vars(x)[attr] = vars(y)[attr] > > In short, I think proposals for dedicated syntax for dynamic attribute > access are misguided - instead, such efforts should go into enhancing > vars() to return objects that support *full* dict-style access to the > underlying object's attribute namespace (with descriptor protocol > support and all). One-liner "def vars(v): return v.__dict__"? From yoavglazner at gmail.com Sun Feb 5 13:58:08 2012 From: yoavglazner at gmail.com (yoav glazner) Date: Sun, 5 Feb 2012 12:58:08 +0000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: On Sun, Feb 5, 2012 at 12:46 PM, Serhiy Storchaka wrote: > 05.02.12 02:20, Nick Coghlan ???????(??): > >> It would be a *lot* cleaner if we could just use a normal assignment >> statement instead of builtin functions to perform the name binding. As >> it turns out, for ordinary instances, we can already do exactly that: >> >> for attr in "attr1 attr2 attr3".split(): >> vars(x)[attr] = vars(y)[attr] >> >> In short, I think proposals for dedicated syntax for dynamic attribute >> access are misguided - instead, such efforts should go into enhancing >> vars() to return objects that support *full* dict-style access to the >> underlying object's attribute namespace (with descriptor protocol >> support and all). >> > > One-liner "def vars(v): return v.__dict__"? > This does't work for properties: >>> class p: @property def pop(self): return 'corn' >>> def vars(x): return x.__dict__ >>> p().pop 'corn' >>> vars(p())['pop'] Traceback (most recent call last): File "", line 1, in vars(p())['pop'] KeyError: 'pop' >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmjohnson.mailinglist at gmail.com Sun Feb 5 14:25:04 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Sun, 5 Feb 2012 03:25:04 -1000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: On Feb 5, 2012, at 2:58 AM, yoav glazner wrote: > > > On Sun, Feb 5, 2012 at 12:46 PM, Serhiy Storchaka wrote: > 05.02.12 02:20, Nick Coghlan ???????(??): > It would be a *lot* cleaner if we could just use a normal assignment > statement instead of builtin functions to perform the name binding. As > it turns out, for ordinary instances, we can already do exactly that: > > for attr in "attr1 attr2 attr3".split(): > vars(x)[attr] = vars(y)[attr] > > In short, I think proposals for dedicated syntax for dynamic attribute > access are misguided - instead, such efforts should go into enhancing > vars() to return objects that support *full* dict-style access to the > underlying object's attribute namespace (with descriptor protocol > support and all). > > One-liner "def vars(v): return v.__dict__"? > > This does't work for properties: It's not that hard to make something that basically works with properties: >>> class vars2: ... def __init__(self, obj): ... self.obj = obj ... ... def __getitem__(self, key): ... return getattr(self.obj, key) ... ... def __setitem__(self, key, value): ... setattr(self.obj, key, value) ... ... def __delitem__(self, key): ... delattr(self.obj, key) ... >>> class P: ... def __init__(self): ... self.value = 1 ... ... @property ... def pop(self): return 'corn' ... ... @property ... def double(self): ... return self.value * 2 ... ... @double.setter ... def double(self, value): ... self.value = value/2 ... >>> p = P() >>> p.pop 'corn' >>> v = vars2(p) >>> v['pop'] 'corn' >>> v['value'] 1 >>> v['double'] 2 >>> v['double'] = 4 >>> v['value'] 2.0 >>> v['double'] 4.0 In a real module, you'd probably want to be more thorough about emulating a __dict__ dictionary though by adding item() and keys() etc. From p.f.moore at gmail.com Sun Feb 5 15:06:10 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 5 Feb 2012 14:06:10 +0000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: 2012/2/5 Carl M. Johnson : > It's not that hard to make something that basically works with properties: Indeed... > In a real module, you'd probably want to be more thorough about emulating a __dict__ dictionary though by adding item() and keys() etc. ... and precisely! The discussions so far have concentrated on the "easy" side of things. Writing a working module would ensure that all the corner cases get covered. And as a benefit, would provide an implementation that could be taken straight into the core/stdlib, hugely reducing the core developer effort that is otherwise needed to take even the best thought out proposal into reality. Paul. From storchaka at gmail.com Sun Feb 5 15:28:35 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 05 Feb 2012 16:28:35 +0200 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: 05.02.12 15:25, Carl M. Johnson ???????(??): > On Feb 5, 2012, at 2:58 AM, yoav glazner wrote: >> This does't work for properties: > It's not that hard to make something that basically works with properties: del v['pop'] AttributeError: P instance has no attribute 'pop' > In a real module, you'd probably want to be more thorough about emulating a __dict__ dictionary though by adding item() and keys() etc. It's impossible in general. class A: def __getattr__(self, name): return len(name) From p.f.moore at gmail.com Sun Feb 5 16:16:40 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 5 Feb 2012 15:16:40 +0000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: 2012/2/5 Serhiy Storchaka : > 05.02.12 15:25, Carl M. Johnson ???????(??): >> On Feb 5, 2012, at 2:58 AM, yoav glazner wrote: >>> This does't work for properties: >> It's not that hard to make something that basically works with properties: > > del v['pop'] > AttributeError: P instance has no attribute 'pop' > > >> In a real module, you'd probably want to be more thorough about emulating a __dict__ dictionary though by adding item() and keys() etc. > > It's impossible in general. > > class A: > ? ?def __getattr__(self, name): > ? ? ? ?return len(name) >>> class proxy: ... def __init__(self, orig): ... self._orig = orig ... def __getitem__(self, attr): ... return getattr(self._orig,attr) ... >>> class A: ... def __getattr__(self, name): ... return len(name) ... >>> a = A() >>> proxy(a)['hello'] 5 >>> Extending the proxy class to include setting, deleting, and various corner cases, is left as an exercise for the reader :-) Paul From yoavglazner at gmail.com Sun Feb 5 16:33:48 2012 From: yoavglazner at gmail.com (yoav glazner) Date: Sun, 5 Feb 2012 15:33:48 +0000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: > > >> In a real module, you'd probably want to be more thorough about > emulating a __dict__ dictionary though by adding item() and keys() etc. > > > > It's impossible in general. > > > > class A: > > def __getattr__(self, name): > > return len(name) > > > >>> class proxy: > ... def __init__(self, orig): > ... self._orig = orig > ... def __getitem__(self, attr): > ... return getattr(self._orig,attr) > ... > >>> class A: > ... def __getattr__(self, name): > ... return len(name) > ... > >>> a = A() > >>> proxy(a)['hello'] > 5 > >>> > > Extending the proxy class to include setting, deleting, and various > corner cases, is left as an exercise for the reader :-) >>> proxy(a).keys() ?!? -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Feb 5 17:02:54 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 5 Feb 2012 16:02:54 +0000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: On 5 February 2012 15:33, yoav glazner wrote: >> Extending the proxy class to include setting, deleting, and various >> corner cases, is left as an exercise for the reader :-) > > >>>>?proxy(a).keys() > ?!? One of the exercises :-) Paul From tjreedy at udel.edu Sun Feb 5 19:05:42 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 05 Feb 2012 13:05:42 -0500 Subject: [Python-ideas] Dict-like object with property access In-Reply-To: References: <17E44F92-A978-4E93-9713-7D542E61ED10@masklinn.net> <3DFDD08E-D82B-4706-8DAB-9F06B9E1F403@gmail.com> <20120130200226.0a5ab1d8@pitrou.net> <4F26EE37.2040104@stoneleaf.us> <4F2DAA24.6000300@2sn.net> Message-ID: On 2/4/2012 9:03 PM, Nathan Rice wrote: > I think .() is the nicest of the suggestions thus far. I don't mind Except, as someone said, x.(n) looks like calling x. with arg n. So I am withdrawing my support for that and agree with what Nick wrote a day or two ago. -- Terry Jan Reedy From simon.sapin at kozea.fr Sun Feb 5 21:45:38 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Sun, 05 Feb 2012 21:45:38 +0100 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: <4F2EEA72.7080104@kozea.fr> Le 05/02/2012 16:33, yoav glazner a ?crit : > > > >>> class proxy: > ... def __init__(self, orig): > ... self._orig = orig > ... def __getitem__(self, attr): > ... return getattr(self._orig,attr) > ... > >>> class A: > ... def __getattr__(self, name): > ... return len(name) > ... > >>> a = A() > >>> proxy(a)['hello'] > 5 > >>> > > Extending the proxy class to include setting, deleting, and various > corner cases, is left as an exercise for the reader :-) > > > >>> proxy(a).keys() > ?!? Hi, +1 on extending vars(). I like this idea much more than adding syntax. In this case, proxy(a).key() would be based on dir(a) (or something similar) and have the same (documented) limitations. I think this is acceptable, and the proxy object is still useful. Regards, -- Simon Sapin -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Feb 5 22:18:09 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 5 Feb 2012 21:18:09 +0000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: <4F2EEA72.7080104@kozea.fr> References: <4F2EEA72.7080104@kozea.fr> Message-ID: On 5 February 2012 20:45, Simon Sapin wrote: > +1 on extending vars(). I like this idea much more than adding syntax. > > In this case, proxy(a).key() would be based on dir(a) (or something similar) > and have the same (documented) limitations. I think this is acceptable, and > the proxy object is still useful. vars() and dir() do very different things: >>> class A: ... pass ... >>> a = A() >>> a.a = 1 >>> dir(a) ['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__h ash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '_ _setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'a'] >>> vars(a) {'a': 1} >>> In my view, following the spec of vars() is far more useful and matches better the original requirement, which was to simulate javascript's index/attribute duality. Methods (and even more so special methods) don't really fit in here. I'd argue for the definition: proxy(obj)['a'] <=> obj.a proxy(obj)['a' = val] <=> obj.a = val del proxy(obj)['a'] <=> del obj.a 'a' in proxy(obj) <=> hasattr(obj, 'a') proxy(obj).keys() <=> vars(obj).keys() len(proxy(obj)) <=> len(vars(obj)) In other words, indexing defers to getattr/setattr/delattr, containment uses hasattr, but anything else goes via vars. In terms of ABCs, Sized/Iterable behaviour comes from vars(), Container/Mapping/MutableMapping behaviour comes from {has,get,set,del}attr. It's mildly inconsistent for objects which implement their own attribute access, but those aren't the key use case, and the behaviour is well defined even for those. Paul From simon.sapin at kozea.fr Sun Feb 5 22:30:38 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Sun, 05 Feb 2012 22:30:38 +0100 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: <4F2EEA72.7080104@kozea.fr> Message-ID: <4F2EF4FE.8030202@kozea.fr> Le 05/02/2012 22:18, Paul Moore a ?crit : > In my view, following the spec of vars() is far more useful and > matches better the original requirement, which was to simulate > javascript's index/attribute duality. Methods (and even more so > special methods) don't really fit in here. > > I'd argue for the definition: > > proxy(obj)['a']<=> obj.a > proxy(obj)['a' = val]<=> obj.a = val > del proxy(obj)['a']<=> del obj.a > 'a' in proxy(obj)<=> hasattr(obj, 'a') > proxy(obj).keys()<=> vars(obj).keys() > len(proxy(obj))<=> len(vars(obj)) I?m fine with that too and I agree it is probably better. My point was that not all keys that can be used in proxy(a)[key] without KeyError will be in proxy(a).keys(), but that?s okay because the same already happens with getattr() and dir() By the way, the proxy should also turn AttributeError into KeyError, for consistency with other Mapping types. Regards, -- Simon Sapin From p.f.moore at gmail.com Mon Feb 6 00:47:15 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 5 Feb 2012 23:47:15 +0000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: <4F2EF4FE.8030202@kozea.fr> References: <4F2EEA72.7080104@kozea.fr> <4F2EF4FE.8030202@kozea.fr> Message-ID: On 5 February 2012 21:30, Simon Sapin wrote: > By the way, the proxy should also turn AttributeError into KeyError, for > consistency with other Mapping types. Clearly. And arguably, this is a good case for the new "raise KeyError from None" form to suppress exception chaining... Paul. From grosser.meister.morti at gmx.net Mon Feb 6 02:07:55 2012 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Mon, 06 Feb 2012 02:07:55 +0100 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: References: Message-ID: <4F2F27EB.2070807@gmx.net> On 02/05/2012 04:16 PM, Paul Moore wrote: > 2012/2/5 Serhiy Storchaka: >> 05.02.12 15:25, Carl M. Johnson ???????(??): >>> On Feb 5, 2012, at 2:58 AM, yoav glazner wrote: >>>> This does't work for properties: >>> It's not that hard to make something that basically works with properties: >> >> del v['pop'] >> AttributeError: P instance has no attribute 'pop' >> >> >>> In a real module, you'd probably want to be more thorough about emulating a __dict__ dictionary though by adding item() and keys() etc. >> >> It's impossible in general. >> >> class A: >> def __getattr__(self, name): >> return len(name) > > >>>> class proxy: > ... def __init__(self, orig): > ... self._orig = orig > ... def __getitem__(self, attr): > ... return getattr(self._orig,attr) > ... >>>> class A: > ... def __getattr__(self, name): > ... return len(name) > ... >>>> a = A() >>>> proxy(a)['hello'] > 5 >>>> > > Extending the proxy class to include setting, deleting, and various > corner cases, is left as an exercise for the reader :-) > class attrs(object): __slots__ = 'obj', def __init__(self,obj): self.obj = obj def __getitem__(self, key): try: return getattr(self.obj, key) except AttributeError: raise KeyError(key) def __setitem__(self, key, value): try: setattr(self.obj, key, value) except AttributeError: raise KeyError(key) def __delitem__(self, key): try: delattr(self.obj, key) except AttributeError: raise KeyError(key) def __contains__(self, key): return hasattr(self.obj, key) def get(self, key, default=None): try: return getattr(self.obj, key, default) except AttributeError: raise KeyError(key) def keys(self): return iter(dir(self.obj)) def values(self): for key in dir(self.obj): yield getattr(self.obj, key) def items(self): for key in dir(self.obj): yield key, getattr(self.obj, key) def __len__(self): return len(dir(self.obj)) def __iter__(self): return iter(dir(self.obj)) def __repr__(self): return repr(dict(self)) From ncoghlan at gmail.com Mon Feb 6 02:41:38 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 6 Feb 2012 11:41:38 +1000 Subject: [Python-ideas] Shorthand syntax for get/set/delattr (was Re: Dict-like object with property access) In-Reply-To: <4F2F27EB.2070807@gmx.net> References: <4F2F27EB.2070807@gmx.net> Message-ID: On Mon, Feb 6, 2012 at 11:07 AM, Mathias Panzenb?ck wrote: This is a good start, but still has a few issues. > ? ? ? ?def get(self, key, default=None): > ? ? ? ? ? ? ? ?try: > ? ? ? ? ? ? ? ? ? ? ? ?return getattr(self.obj, key, default) > ? ? ? ? ? ? ? ?except AttributeError: > ? ? ? ? ? ? ? ? ? ? ? ?raise KeyError(key) This will never raise KeyError. It needs to use a dedicated sentinel object so it can tell the difference between "default=None" and "default not supplied" and invoke getattr() accordingly. > ? ? ? ?def keys(self): > ? ? ? ? ? ? ? ?return iter(dir(self.obj)) > > ? ? ? ?def values(self): > ? ? ? ? ? ? ? ?for key in dir(self.obj): > ? ? ? ? ? ? ? ? ? ? ? ?yield getattr(self.obj, key) > > ? ? ? ?def items(self): > ? ? ? ? ? ? ? ?for key in dir(self.obj): > ? ? ? ? ? ? ? ? ? ? ? ?yield key, getattr(self.obj, key) These 3 methods should return views with the appropriate APIs rather than iterators. > ? ? ? ?def __repr__(self): > ? ? ? ? ? ? ? ?return repr(dict(self)) The appropriate output for str() and repr() is definitely open for question. Interaction with serialisation APIs such as pickle and json will also need investigation. These kinds of question are why I think it is well-worth exploring this concept on PyPI. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From julien at tayon.net Mon Feb 6 21:01:29 2012 From: julien at tayon.net (julien tayon) Date: Mon, 6 Feb 2012 21:01:29 +0100 Subject: [Python-ideas] matrix operations on dict :) Message-ID: Hello, Proposing vector operations on dict, and acknowledging there was an homeomorphism from rooted n-ary trees to dict, was inducing the possibility of making matrix of dict / trees. Since, linear algebrae on dict was coldly welcomed, I waited to have some code to back me up to push my reasoning furhter, and it happily worked the way books predicted. This was the reasoning : - dict <=> vector - Vectors + linear algebrae <=> matrix - Most of Rooted Trees <=> dict( dict( ... ) ) ** - Matrix * Vector = Vector2 <=> Matrix * tree1 = Tree2 ** see here for a coded explanation http://readthedocs.org/docs/vectordict/en/latest/intro.html#homeomorphism-between-dict-and-k-ary-rooted-tree for a sample of API, code, and result see here. http://readthedocs.org/docs/vectordict/en/latest/matrix.html#api dict of dict might not be the best way to make trees but, having matrix operations on dict is being able to transform trees in trees natively. The module is still quite a proof of concept, and it is not the implementation I advocate, but rather the idea. Because : isn't transforming trees into trees quite a recurrent task in modern Computer Science with key value database ? Plus matrix * tree being side effect free, it is a good candidate for a canonical way to tranform tree in a parallelisable way. And by the way I implemented matrix as vectordict so ... we have matrix operations on matrix. ^_^ (Brace thourself, InceptionMatrix are coming) For the ?not faint of heart? that are able to read un Perlish un PEP8 code : http://pypi.python.org/pypi/VectorDict/0.3.0 my 2 euro cents (which of course worth more than 2 US cents <:o) ), Cheers, PS : I am not sure that using defaultdict as a backend was the best idea of the century, but keys appearing in a dict after an addition where not very much in my idea of how a normal python dict should behave. PPS : I will -if I still have time- code sets operations on dict, issubset, diff, union, intersection. These are quite easy, but so unfun. Since I am not very gifted at explaining, I prefer to code and show the result later. -- Julien Tayon From yselivanov.ml at gmail.com Mon Feb 6 21:08:50 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 6 Feb 2012 15:08:50 -0500 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: Well, native syntax would be much useful, but ContextStack seems like a decent workaround. Will it be included in the stdlib (py3.3)? On 2012-02-04, at 1:22 AM, Nick Coghlan wrote: > On Sat, Feb 4, 2012 at 1:09 AM, Yury Selivanov wrote: >> As of now, without "nested" we have either option of reimplementing it, or to write lots of ugly code with nested 'try..except's. So the feature was taken out, but nothing replaced it. >> >> What do you think guys? > > I think you should try contextlib2 :) > > Specifically, ContextStack: > http://contextlib2.readthedocs.org/en/latest/index.html#contextlib2.ContextStack > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Feb 6 21:54:38 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 7 Feb 2012 06:54:38 +1000 Subject: [Python-ideas] unpacking context managers in WITH statement In-Reply-To: References: <6BCA6FFD-7B32-4AA1-949C-B41CE932471F@gmail.com> Message-ID: On Tue, Feb 7, 2012 at 6:08 AM, Yury Selivanov wrote: > Well, native syntax would be much useful, but ContextStack seems like > a decent workaround. ?Will it be included in the stdlib (py3.3)? Most likely (I'm the primary maintainer of contextlib, so it's basically my call). Feedback on what it's like to use in practice would definitely help with that - I put it up on PyPI as contextlib2 so people could try it out and help me avoid repeating the mistakes we made with nested() (which was an error prone bug trap). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From simon.sapin at kozea.fr Mon Feb 6 23:18:30 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Mon, 06 Feb 2012 23:18:30 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: Message-ID: <4F3051B6.5060801@kozea.fr> Le 06/02/2012 21:01, julien tayon a ?crit : > Proposing vector operations on dict, and acknowledging there was an > homeomorphism from rooted n-ary trees to dict, was inducing the > possibility of making matrix of dict / trees. Hi, I studied linear algebra and I think I understand it fairly well. However, after reading your email and the linked documentation, I?m just confused. I really don?t know what this is about. I *think* that you are defining something like a mathematical group[1] or ring[2], but: * Over what elements? (Any dict? dicts with some property?) * How exactly are your "addition" and "multiplication" (if any) defined? * Why? I?m sure I could come up with a well-defined but absurd (and useless) "group", but why is yours interesting? [1] http://en.wikipedia.org/wiki/Group_%28mathematics%29 [2] http://en.wikipedia.org/wiki/Ring_%28mathematics%29 Regards, -- Simon Sapin From simon.sapin at kozea.fr Mon Feb 6 23:21:13 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Mon, 06 Feb 2012 23:21:13 +0100 Subject: [Python-ideas] A key parameter for heapq.merge In-Reply-To: <4F0AC6B6.2040403@trueblade.com> References: <4F0A116D.3030202@kozea.fr> <73792DF0-F128-437E-8AC8-A9F34D042FF4@gmail.com> <4F0AC596.60906@kozea.fr> <4F0AC6B6.2040403@trueblade.com> Message-ID: <4F305259.3060705@kozea.fr> Le 09/01/2012 11:51, Eric V. Smith a ?crit : > On 1/9/2012 5:46 AM, Simon Sapin wrote: >> I just opened http://bugs.python.org/issue13742 , but I can?t assign it. >> (New account on the tracker.) > I assigned it to Raymond. Hi, I think my latest patch on #13742 looks good. Is something else missing? Thanks, -- Simon Sapin From julien at tayon.net Tue Feb 7 02:00:24 2012 From: julien at tayon.net (julien tayon) Date: Tue, 7 Feb 2012 02:00:24 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: <4F3051B6.5060801@kozea.fr> References: <4F3051B6.5060801@kozea.fr> Message-ID: 2012/2/6 Simon Sapin : > Le 06/02/2012 21:01, julien tayon a ?crit : > >> Proposing vector operations on dict, and acknowledging there was an >> homeomorphism from rooted n-ary trees to dict, was inducing the >> possibility of making matrix of dict / trees. > > > Hi, > > I studied linear algebra and I think I understand it fairly well. However, > after reading your email and the linked documentation, I?m just confused. I > really don?t know what this is about. > Because there is some dust under the carpet. Lets define the notion of dict(dict()) (rooted k-ary trees) as a vector. Imagine : tree A { a: { b : 1, c : 2 }, e : 3.0 } This is the same as vector B dict( tuple([ 'a', 'b' ]) = 1, tuple([ 'a', 'c' ]) = 2, tuple([ e ]) = 3.0 ) By collapsing the path of intricated dict to a single key (made of the ordered list of keys to the value) you always fall back on a dict of depth 1. I can construct A from B and B from A without any loose of properties. Thus it is equivalent. (Sparse) Matrix are therefore build this way dict( tuple( source_tuple, destination_tuple ) = function ) (I could not resolve myself to code stupid matrices and that have only magnitude instead of function, therefore my matrices are not the one of linear algebrae) so any dict of dict is the same as a one depth dict. A path to a value defines a dimension, each paths are considered orthogonal. So further reasoning will be made on vectors / 1 depth level dict. There are two problems : 1) we can generate an infinite number of key so these are implicitly incomplete vectors on an infinite base it is as if when you define : dict( x = 1) you also mean dict( x = 1, y = null element for multiplication and neutral for addition , dn = null element for multiplication and neutral for addtion .... ) and I hear the tao of python saying explicit is better than implicit. But a dict is explicitly an incomplete vector on an infinite base. 2) The problem is in the algebrae of the values/leaves because the implementation is made by delegating addition all the way up to the value. Normaly, we think of values of a vector as scalars/magnitude (float/int for instance), and addition should have no more consistency as the consistency of all the additions in a dict for all the values. Unless you need something else. I intended to be a little more consistent and to enforce the fact that no values should have additions properties different than the properties of a ring. However, I found no easy way to do this. I would need all the classes to tell wich algebrae they support by the mean of a property that would tell if Class A + Class B is commutative/associative/distributive. Still I can do interesting thing, so I don't really want to castrate the beast. (Plus, it would mean writing a PEP proposal, which is beyond my abilities) > I *think* that you are defining something like a mathematical group[1] or > ring[2], but: As far as I am concerned I am pretty confident in having all the properties of the ring with + & * on dict. And I think I do. Tell me if I need other tests here : https://github.com/jul/ADictAdd_iction/blob/master/vector_dict/ConsistentAlgebrae.py#L169 or if I misunderstood any properties. I may have a good intuition, I may recognize things when I see them, but I am clumsy with words. > > * Over what elements? (Any dict? dicts with some property?) Well, you achieve better algebraic properties if leaves belongs to a ring at least (float, numpy.array). But I have funny results with records too. Since dimension can be considered independant as long as values have the same type all goes well as long as you do operations supported by the leaves. And if you know what you are doing when mixing up dimension, all goes well. for instance if with a matrix you multiply a nump.array with a weight (float) for instance, it has sense. if you multiply a string by a string, well, you go into trouble, and I cowardly let the excpetion raise. For distance of vectors I also run in trouble when dealing with records like algebrae for + and * ( list and string ) I see an elegant workaround wich would be to know what is distance(record1 - record2) even though I ignore what record1 - record2 is. For instance with two string or ordered list, the distance can be defined as edition distance even though string1 - string2 is a nonsense. But I dont know yet how to make it fit in the puzzle. Because sometimes norm( A - B ) has more sense than A - B I may need to refactor the norm method. At this point I may also need cooperations from the other classes :) I may admit, I have not thought of everything and there can be some holes in the racket, but, it is promising, and I had much fun using it so far. > * How exactly are your "addition" and "multiplication" (if any) defined? by doing the following rules : Addition : given two vectors (as prior defined), we make the assumption that these are vectors on an infinit base (made of all the possible paths), and that when two vectors are added there are two possibles cases : * if keys exists in both dict : add leaf * if key in one dict only : create a leaf in the resulting dict with the value (therefore assuming that undefined keys are neutral to addition) For multiplication I just accecpt either dict multiplication (non existent path default value being the null element) or scalar/magnitude multiplication * if you do 2 * dict( x = val, y = val2 ) it will do dict( x = 2 *val, y = 2 *val2 ) if you do dict( x = val, y = val2 ) * 2 it will do dict( x = val * 2, y = val2 * 2 ) (as with vectors) * if you do dict1 * dict2 any non common keys being implicitly the zero element of multiplication it will be * for each common keys thre resulting value is the product of the leaves of this key. * if a key is present in only one dict, the resulting dict get pruned of the key (unexpressed path are set to the zero of multiplication) (it is achived with a silly overloading of + - / * of default dict no magic is made here). > * Why? I?m sure I could come up with a well-defined but absurd (and useless) > "group", but why is yours interesting? > * it is fun, (not an argument, I do agree) * ruby and Perl dont have it :) and I am close to come up with a jqueryish like grammar of manipulation on tree made of imbricated dict, (well real tree implemented with a parent property for each node and attributes and value might be better suited) * it gives results in the key/value database context and jsonish stuff ? la mongodb if you consider what map reduce in key/value database is. It boils down to retrieving a dict of dict emitting a document (dict of dict) by the mean of a projection/or a matrix, and aggregating results (addition for instance is vastly used) in a reduce operation. For this it may be quite usefull, it factorizes code in mattrix that can be stored, combined, added ... - in web/text indexing you can split a text to a serie of invariant form of words with their associated frequency and measure how close they are from a keyword by using either jaccard or cosinus similarity. You can already do it, out of the box. I am now fighting my way to see if I can easily build correlation matrices from two dicts. - matrix being vector, and matrix changing trees into trees you can set a matrix as a matrix value making a transformation in a subtree, this is equivalent and easier as composing functions. - if you have database of graphs that have no loops and a root, you can query for similar path, of find paths in the path (as long as they can be expressed in the form of dict of dict ..) (I had too much time I coded a method to do it ) * it has applications : - with the find method + projection + match_subtree , we have static code analysis. Imagine an AST. If I don't mistakes it can fit in a tree (as a dict of dict). you can find any exactly matching tree (for instance a bad design pattern) or close enough tree (using jaccard or cos) and says there might be a problem in the resulting path of the code. You can also transform AST in AST thus doing funny things such as on the fly transorfmation of AST. * would you like a rotation matrix for dict( x = , y = , z = ) ? or a polar transformation ? * would you like a map/reduce of a tree of products that gives the total, the average, and deviance in one pass ? I can see many applications in transforming trees to trees. I can see some quirks though (since dict are unordered, in a matrix if a destination is included in an already existing destination, it should be forbidden since we cannot ensure the order of the operations, and this dooms the concept of matrix in matrix). I may lack a little precision in the wording still. A langage is as strong as its base types. Giving testoterone to a type (or creating a new powerfull one) is de facto strengthening a langage. ;) Regards, -- Julien From steve at pearwood.info Tue Feb 7 02:12:39 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 7 Feb 2012 12:12:39 +1100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: Message-ID: <20120207011239.GC28570@ando> On Mon, Feb 06, 2012 at 09:01:29PM +0100, julien tayon wrote: > Hello, > > Proposing vector operations on dict, and acknowledging there was an > homeomorphism from rooted n-ary trees to dict, was inducing the > possibility of making matrix of dict / trees. This seems interesting to me, but I don't see that they are important enough to be built-in to dicts. At most, this could be a module in the standard library, but before that happens, you would have to prove the usefulness of the module. I suggest polishing it to a fit state to use in production, including tests, and putting it on PyPI. Once you can demonstrate some interest for it, then you can propose it gets added to the std lib. Otherwise, this looks rather like a library of functions looking for a use. It might help if you demonstrate what concrete problems this helps you solve. -- Steven From raymond.hettinger at gmail.com Tue Feb 7 03:17:09 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 6 Feb 2012 18:17:09 -0800 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: Message-ID: <84655D8D-B4D6-40AE-AA10-07781006E8C9@gmail.com> On Feb 6, 2012, at 12:01 PM, julien tayon wrote: > Proposing vector operations on dict, and acknowledging there was an > homeomorphism from rooted n-ary trees to dict, was inducing the > possibility of making matrix of dict / trees. And if you add tensor operations, the implementation can remain independent of the system of reference :-) Contravariantly yours, Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Tue Feb 7 03:43:44 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Mon, 6 Feb 2012 20:43:44 -0600 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: <84655D8D-B4D6-40AE-AA10-07781006E8C9@gmail.com> References: <84655D8D-B4D6-40AE-AA10-07781006E8C9@gmail.com> Message-ID: <42BF1377-E42F-416A-BFEA-6048F3632A81@gmail.com> On Feb 6, 2012, at 8:17 PM, Raymond Hettinger wrote: > > On Feb 6, 2012, at 12:01 PM, julien tayon wrote: > >> Proposing vector operations on dict, and acknowledging there was an >> homeomorphism from rooted n-ary trees to dict, was inducing the >> possibility of making matrix of dict / trees. > > And if you add tensor operations, the implementation can remain > independent of the system of reference :-) > > Contravariantly yours, > > > Raymond +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Feb 7 04:20:14 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 06 Feb 2012 22:20:14 -0500 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: <4F3051B6.5060801@kozea.fr> Message-ID: On 2/6/2012 8:00 PM, julien tayon wrote: > Lets define the notion of dict(dict()) (rooted k-ary trees) as a > vector. Imagine : > tree A > { a: > { b : 1, > c : 2 > }, > e : 3.0 > } > > This is the same as > vector B > dict( > tuple([ 'a', 'b' ]) = 1, > tuple([ 'a', 'c' ]) = 2, > tuple([ e ]) = 3.0 > ) Adding quotes makes it different from tree A. The difference is important because a dict simulates a (sparse) array only if the keys are ordered (or are ordered sequences of ordered objects). Strings are ordered, but not general objects. There is no need to write tuples as tuple(somelist) dict( ('a', 'b') = 1 ('a', 'c') = 2 ('e',) = 3.0 ) Since your definitions of + and * on dicts does not use order, using the terms 'vector' and 'matrix' just seem distracting to me. The only thing you are extracting from them is the idea of component-wise operations on collections. What is important is whether the operations apply to the *values*. Whenever one has a dict whose values are lists, it is common to start with empty lists and add items to the list for each key with d[key].append(val) You could imagine this operation as performing your dict addition d + {key:[val]} in place and then performing standard list addition in place (.extend). But thinking this way has limited use. In actual application, the code is likely to be something like: for key,val in source: d.get(key,[]).append(val) There are three points here. 1. These patterns are specific to subcategories of dicts. 2. For dicts (and sets and lists), in-place modification is more common than creating new dicts (or sets or lists). Python is not mathematics, and it is not a functional, side-effect-free language. 3. The source of modifiers is usually an iterator -- a category rather than a class. The iterator does not have to be based on a dict and typically is not. The same points apply to lists. list1 + list2 is rare compared to list1.append(item) and list1.extend(iterable_of_items). And of course, both apply to all lists and objects and iterables, rather than specialized subcategories. -- Terry Jan Reedy From julien at tayon.net Tue Feb 7 11:24:17 2012 From: julien at tayon.net (julien tayon) Date: Tue, 7 Feb 2012 11:24:17 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: <20120207011239.GC28570@ando> References: <20120207011239.GC28570@ando> Message-ID: 2012/2/7 Steven D'Aprano : > This seems interesting to me, but I don't see that they are important > enough to be built-in to dicts. > > At most, this could be a module in the standard library, but before that > happens, you would have to prove the usefulness of the module. I suggest > polishing it to a fit state to use in production, including tests, and > putting it on PyPI. Once you can demonstrate some interest for it, then > you can propose it gets added to the std lib. > Of course, it's already on pypi, the unittest are being buit up, I just coded way too much stuff, so code coverage is slowly increasing. Since it's 90% syntaxic sugar, it is just a commodity for syntax of tree manipulation. I can improve the readability though. But, > Otherwise, this looks rather like a library of functions looking for a > use. It might help if you demonstrate what concrete problems this helps > you solve. Since 95% of the functions are method of a dict, I guess, we may call it an object. Cheers, From julien at tayon.net Tue Feb 7 11:45:04 2012 From: julien at tayon.net (julien tayon) Date: Tue, 7 Feb 2012 11:45:04 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: <4F3051B6.5060801@kozea.fr> Message-ID: 2012/2/7 Terry Reedy : > > Since your definitions of + and * on dicts does not use order, using the > terms 'vector' and 'matrix' just seem distracting to me. The only thing you > are extracting from them is the idea of component-wise operations on > collections. What is important is whether the operations apply to the > *values*. I have checked I can go back and forth without problem. but oky I forgot the quote in A. I know it may have been disturbing, and I apologize. Well, order in actual notation is a commodity for not repeating the dimension's name. it's easier to write [ 1 , 2, 3 ] then to always repeat [ x = 1, y = 2, z = 3 ] I am just going back to the basis. Our disagreement mainly comes from me forgetting the quote in A. > > In actual application, the code is likely to be something like: > ?for key,val in source: > ? ?d.get(key,[]).append(val) > It does not propagate recursively though. so adding d1 = { 'a' , { 'b' , { 'c' : 1 } , 'd' : 2 } with d2 = { 'a' , { 'b' , { 'c' : 2 } , 'd' : 1 } wont work with yout example, but will work with my definition of d1 + d2. > There are three points here. > 1. These patterns are specific to subcategories of dicts. It has sense too for non scalar value, it was just already tough trying to explain with scalars, so I limited myself to the simple case. > 2. For dicts (and sets and lists), in-place modification is more common than > creating new dicts (or sets or lists). Python is not mathematics, and it is > not a functional, side-effect-free language. I just have to switch a Flag in the matrix operator to make it operate in place. I was not sure wich option was best. > 3. The source of modifiers is usually an iterator -- a category rather than > a class. The iterator does not have to be based on a dict and typically is > not. well, you are definitely right on this point. > > The same points apply to lists. list1 + list2 is rare compared to > list1.append(item) and list1.extend(iterable_of_items). And of course, both > apply to all lists and objects and iterables, rather than specialized > subcategories. I also provide an iterator in the form [ ( (path_tovalue), value ), ... ] ^_^ since it does recursive calls, I don't like it, and I try not to make it too obvious. Cheers, Julien From sturla at molden.no Tue Feb 7 19:19:36 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 19:19:36 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: Message-ID: <4F316B38.7020608@molden.no> On 06.02.2012 21:01, julien tayon wrote: > Hello, > > Proposing vector operations on dict, and acknowledging there was an > homeomorphism from rooted n-ary trees to dict, was inducing the > possibility of making matrix of dict / trees. > > Since, linear algebrae on dict was coldly welcomed, I waited to have > some code to back me up to push my reasoning furhter, and it happily > worked the way books predicted. Why would you want to use a hash table (Python dict) for linear algebra? Not sure I can think of a worse datastructure for the purpose. There are NumPy... And in the standard library there is an array module... For matrix multiplication you can use DGEMM from any LAPACK library if you don't like NumPy (e.g. by means of ctypes). What really should be discussed is inclusion of NumPy in the standard library (that is NumPy, not SciPy). Sturla From ubershmekel at gmail.com Wed Feb 8 12:27:48 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Wed, 8 Feb 2012 13:27:48 +0200 Subject: [Python-ideas] Add a recursive function to the glob package Message-ID: Many times I've wanted glob to give me all the "*.zip" or "*.py" or "*.h" files in a directory *and subdirectories* ever since I started using python 7 years ago. I don't know if I'm the only one or not but here's a patch: http://bugs.python.org/issue13968 I'd love to hear feedback on the notion and implementation, Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Feb 8 13:08:38 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 8 Feb 2012 22:08:38 +1000 Subject: [Python-ideas] Add a recursive function to the glob package In-Reply-To: References: Message-ID: On Wed, Feb 8, 2012 at 9:27 PM, Yuval Greenfield wrote: > Many times I've wanted glob to give me all the "*.zip" or "*.py" or "*.h" > files in a directory and subdirectories ever since I started using python 7 > years ago. > > I don't know if I'm the only one or not but?here's a patch: > http://bugs.python.org/issue13968 > > I'd love to hear feedback on the notion and implementation, walkdir [1] is designed to handle that use case and more. >>> from walkdir import file_paths, filtered_walk >>> paths = file_paths(filtered_walk('.', included_files=['*.py'])) >>> print('\n'.join(sorted(paths))) ./dist/walkdir-0.2.1/build/lib.linux-x86_64-2.7/walkdir.py ./dist/walkdir-0.2.1/docs/conf.py ./dist/walkdir-0.2.1/setup.py ./dist/walkdir-0.2.1/test_walkdir.py ./dist/walkdir-0.2.1/walkdir.py ./docs/conf.py ./setup.py ./test_walkdir.py ./walkdir.py It's not completely certain yet, but there's a fair chance I'll be adding at least a subset of the walkdir API to shutil in 3.3 (the idea actually started as just adding os.filtered_walk() to shutil, but I moved it to PyPI to give people an opportunity to try out the API. [1] http://walkdir.readthedocs.org -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From julien at tayon.net Wed Feb 8 13:12:32 2012 From: julien at tayon.net (julien tayon) Date: Wed, 8 Feb 2012 13:12:32 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: <4F316B38.7020608@molden.no> References: <4F316B38.7020608@molden.no> Message-ID: > Why would you want to use a hash table (Python dict) for linear algebra? * Because it naturally provides matrix. And matrix are an easy way to formalize and standardize tree manipulations which are a growing concern in real life computer craft. * Because actual CS is precise but not exact, and that metrics on objects enable more exact comparison : == is the actual way to compare it is precise , but metrics (cos, norm, dot) enable is_close_to( Value , modulo error ) For instance no actual langage can tell if two floats are equal, because, there are error margins. pi != 3.14159 pi is close to 3.1 [+-.05] Exactitude and precision are not the same. > Not sure I can think of a worse datastructure for the purpose. Well, it is the other way round. The least surprise principle is that + - * / behave the way it usually does 90% of the time at my very personnal opinion, mathematical signs should have been reserved in all langages to operation analog to mathematics. And linear algebrae is one of the most accpeted behaviour for these symbols. Since there are more than one way to add / mul / div /sub, at my very own opinion every and each class defining these signs *should* tell which arithmetics they supports, so that we can predict the behaviour of the composition of these operations and conflicts. We have same symbols, with different meaning, it is a degenerescence that should disambiguized(? not sure of the orthograph). For instance in order to raise inconsistency exceptions. > > There are NumPy... > And in the standard library there is an array module... Which at the opposite of list() supports + - * / in the algebraic sense. > > For matrix multiplication you can use DGEMM from any LAPACK library if you > don't like NumPy (e.g. by means of ctypes). > It is stupid to code matrix with an hash, I just say as there is a strong analogy between dict and vectors, as a result matrix that operates on dict exists and I can give them a meaning of transforming rooted trees in rooted trees. > What really should be discussed is inclusion of NumPy in the standard > library (that is NumPy, not SciPy). > +1 for the inclusion of numpy in stdlib :) Even though I think it would need a little syntaxic sugar to make it more pythonic. Cheers, Jul From ncoghlan at gmail.com Wed Feb 8 13:36:55 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 8 Feb 2012 22:36:55 +1000 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: <4F316B38.7020608@molden.no> Message-ID: On Wed, Feb 8, 2012 at 10:12 PM, julien tayon wrote: >> What really should be discussed is inclusion of NumPy in the standard >> library (that is NumPy, not SciPy). >> > +1 for the inclusion of numpy in stdlib :) There's more to stdlib inclusion than "hey, wouldn't it be nice if was part of the stdlib?". It needs to make sense to do so, usually by providing a tangible benefit to the overall Python ecosystem. For smaller projects (especially predominantly single person projects), stdlib adoption comes with a guarantee of some level of long term support (in particular, making sure the module continues to work with newer versions of Python and on newer operating system releases). That isn't really the case with NumPy - it has a sizable developer base of its own, along with solid backing from Enthought. Incorporation into the standard library would be a *lot* of pain for minimal gain. If it helps, just consider SciPy Python's "stdlib++" if you're doing any kind of heavy number crunching with Python. There's a reason the PyPy folks were able to raise money to sponsor their NumPyPy compatibility effort - it's because the SciPy ecosystem is centred around NumPy, and NumPyPy promises to let developers enjoy the benefit's of PyPy without losing access to SciPy. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From sturla at molden.no Wed Feb 8 17:08:29 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 08 Feb 2012 17:08:29 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: <4F316B38.7020608@molden.no> Message-ID: <4F329DFD.5000109@molden.no> On 08.02.2012 13:36, Nick Coghlan wrote: > That isn't really the case with NumPy - it has a sizable developer > base of its own, along with solid backing from Enthought. I thing you got this the wrong way. Inclusion in the stdlib requires long-term support, it is not a way to ensure long-term support for projects that don't have it. (And NumPy is likely to be supported for a very long time.) > Incorporation into the standard library would be a *lot* of pain for > minimal gain. If it helps, just consider SciPy Python's "stdlib++" if > you're doing any kind of heavy number crunching with Python. There's a > reason the PyPy folks were able to raise money to sponsor their > NumPyPy compatibility effort - it's because the SciPy ecosystem is > centred around NumPy, and NumPyPy promises to let developers enjoy the > benefit's of PyPy without losing access to SciPy. NumPy is not just for number-crunching. It is also a general memory abstraction, a mutable container for any kind of binary data, for any kind of bit and byte fiddling, reading and parsing binary data, memory mapping binary files, etc. Another use-cases are computer graphics and image processing. Sturla From sturla at molden.no Wed Feb 8 17:21:12 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 08 Feb 2012 17:21:12 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: <4F316B38.7020608@molden.no> Message-ID: <4F32A0F8.5070408@molden.no> On 08.02.2012 13:12, julien tayon wrote: > * Because it naturally provides matrix. And matrix are an easy way to > formalize and standardize tree manipulations which are a growing > concern in real life computer craft. No, it naturally provides a hash-table, which is a simple in-memory database, not a matrix. > at my very personnal opinion, mathematical signs should have been > reserved in all langages to operation analog to mathematics. And > linear algebrae is one of the most accpeted behaviour for these > symbols. There is a world beyond linear algebra. Sometimes we need to do things that cannot easily be fit into the semantics of matrix operations. And for those that only can think in terms of matrices there are languages called Matlab, Scilab, and Octave. > It is stupid to code matrix with an hash, I just say as there is a > strong analogy between dict and vectors, No there is not. A vector is ordered, a hash-table (dict) is unordered. - In a vectorlike structure, e.g. a Python list, element i+1 is stored subsequently to element i. - In a hash-table, e.g. a Python dict, element hash(i+1) is not stored subsequently to element hash(i). Sturla From masklinn at masklinn.net Wed Feb 8 23:00:39 2012 From: masklinn at masklinn.net (Masklinn) Date: Wed, 8 Feb 2012 23:00:39 +0100 Subject: [Python-ideas] Optional key to `bisect`'s functions? Message-ID: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> The ``bisect`` stuff is pretty neat, although probably underused (especially the insorts), but their usefulness is limited by the requirement that the lists directly contain sortable items, as opposed to ``sorted`` or ``list.sort``. It's possible to "use" them by copy/pasting the (Python) functions into the project/library code and adding either a custom key directly or a key function, but while this can still yield an order-of-magnitude speed gain over post-sorting sequences, it's cumbersome and it loses the advantage of _bisect's accelerators. Therefore, I believe it would be pretty neat to add an optional ``key=`` keyword (only?) argument, with the same semantics as in ``sorted``. It would make ``bisect`` much easier to use especially in stead of append + sorted combinations. The key should work for both insertion functions and bisection search ones. Thoughts? From amauryfa at gmail.com Wed Feb 8 23:18:54 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 8 Feb 2012 23:18:54 +0100 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: Hi, 2012/2/8 Masklinn > The ``bisect`` stuff is pretty neat, although probably underused > (especially the insorts), but their usefulness is limited by the > requirement that the lists directly contain sortable items, as opposed > to ``sorted`` or ``list.sort``. > > It's possible to "use" them by copy/pasting the (Python) functions > into the project/library code and adding either a custom key directly > or a key function, but while this can still yield an > order-of-magnitude speed gain over post-sorting sequences, it's > cumbersome and it loses the advantage of _bisect's accelerators. > > Therefore, I believe it would be pretty neat to add an optional > ``key=`` keyword (only?) argument, with the same semantics as in > ``sorted``. It would make ``bisect`` much easier to use especially > in stead of append + sorted combinations. The key should work for > both insertion functions and bisection search ones. > bisect key This was proposed several times on the issue tracker (search for "bisect key"), and these proposals have always been rejected: http://bugs.python.org/issue4356 http://bugs.python.org/issue1451588 http://bugs.python.org/issue3374 The last one summarizes the reasons of the rejection. The documentation (http://docs.python.org/library/bisect.html, "see also") contains a link to a "SortedCollection" recipe. I haven't looked at the SortedCollection class in detail, but you could try to have it included in the stdlib... -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Thu Feb 9 01:03:57 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Wed, 8 Feb 2012 17:03:57 -0700 Subject: [Python-ideas] [Python-Dev] matrix operations on dict :) In-Reply-To: References: Message-ID: On Wed, Feb 8, 2012 at 9:54 AM, julien tayon wrote: > 2012/2/7 Mark Janssen : > > On Mon, Feb 6, 2012 at 6:12 PM, Steven D'Aprano > wrote: > > > > I have the problem looking for this solution! > > > { "a" : 1 } + { "a" : { "b" : 1 } } == KABOOM. This a counter example > proving it does not handle all structures. > > Ah, but I already anticipated this. One just has to decide the relationship between the *group* and the *atomic*. (These are key words that you can find out about at pangaia.sf.net "grouping model"). Admittedly, this might be arbitrary, but once decided you get the full power of the recursive data structure. It's kind of like defining the base case of factorial. The math (in my world) simply decided that factorial(0)=1 as the convention of "an empty product" (Wikipedia::Factorial). But, in theory, it should work and provide considerable power. Since it's all arbitrary one shouldn't get hung up too much on which convention is adopted, even though it will have to be followed thereafter. But "practice beats purity", as they say... :) mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Thu Feb 9 01:08:42 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Wed, 8 Feb 2012 17:08:42 -0700 Subject: [Python-ideas] [Python-Dev] matrix operations on dict :) In-Reply-To: References: Message-ID: I wrote: > But, in theory, it should work and provide considerable power. Since it's > all arbitrary one shouldn't get hung up too much on which convention is > adopted, even though it will have to be followed thereafter. But "practice > beats purity", as they say... :) > > Oh, I should give my suggestion: That when a "non-named" atomic constant is added to a grouping (i.e. dict), a special key called "anon" (or perhaps the bulit-in None as the special key would actually work without ambiguity to other parts of python) is created with that holds the constant. Cheers! mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Feb 9 01:25:27 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 8 Feb 2012 16:25:27 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: Hmm... I disagree with Raymond's rejection of the proposed feature. I have come across use cases for this functionality multiple time in real life. Basically Raymond says "bisect can call the key() function many times, which leads to bad design". His alternative, to use a list of (key, value) tuples, is often a bit clumsy when passing the sorted list to another function (e.g. for printing); having to transform the list using e.g. [v for (k, v) in a] feels clumsy and suboptimal. So I'm not sure that refusing the key= option always leads to the best design (in the sense of the most readable code). Adding key= is particularly attractive since the current invariant is something like "if a == sorted(a) before the operation, then a == sorted(a) after the operation". Adding a key= option would simply change that to sorted(a, key=key) on both counts. Also note that "many times" is actually O(log N) per insertion, which isn't so bad. The main use case for bisect() is to manage a list that sees updates *and* iterations -- otherwise building the list unsorted and sorting it at the end would make more sense. The key= option provides a balance between the cost/elegance for updates and for iterations. --Guido On Wed, Feb 8, 2012 at 2:18 PM, Amaury Forgeot d'Arc wrote: > Hi, > > 2012/2/8 Masklinn > >> The ``bisect`` stuff is pretty neat, although probably underused >> (especially the insorts), but their usefulness is limited by the >> requirement that the lists directly contain sortable items, as opposed >> to ``sorted`` or ``list.sort``. >> >> It's possible to "use" them by copy/pasting the (Python) functions >> into the project/library code and adding either a custom key directly >> or a key function, but while this can still yield an >> order-of-magnitude speed gain over post-sorting sequences, it's >> cumbersome and it loses the advantage of _bisect's accelerators. >> >> Therefore, I believe it would be pretty neat to add an optional >> ``key=`` keyword (only?) argument, with the same semantics as in >> ``sorted``. It would make ``bisect`` much easier to use especially >> in stead of append + sorted combinations. The key should work for >> both insertion functions and bisection search ones. >> bisect key > > > This was proposed several times on the issue tracker (search for "bisect > key"), > and these proposals have always been rejected: > http://bugs.python.org/issue4356 > http://bugs.python.org/issue1451588 > http://bugs.python.org/issue3374 > The last one summarizes the reasons of the rejection. > The documentation (http://docs.python.org/library/bisect.html, "see also") > contains a link to a "SortedCollection" recipe. > > I haven't looked at the SortedCollection class in detail, but you could > try to have it included in the stdlib... > > -- > Amaury Forgeot d'Arc > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Feb 9 01:51:17 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 09 Feb 2012 11:51:17 +1100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: <4F32A0F8.5070408@molden.no> References: <4F316B38.7020608@molden.no> <4F32A0F8.5070408@molden.no> Message-ID: <4F331885.3000305@pearwood.info> Sturla Molden wrote: > On 08.02.2012 13:12, julien tayon wrote: >> It is stupid to code matrix with an hash, I just say as there is a >> strong analogy between dict and vectors, > > No there is not. A vector is ordered, a hash-table (dict) is unordered. > > - In a vectorlike structure, e.g. a Python list, element i+1 is stored > subsequently to element i. Not necessarily. There is nothing in the API for Python lists that *requires* that elements are stored in one continuous array. That's a side-effect of the implementation. > - In a hash-table, e.g. a Python dict, element hash(i+1) is not stored > subsequently to element hash(i). You are focusing too much on accidental implementation details and not enough on the fundamental concept of "vector" or "hash table". Fundamentally, a "dict" is a data structure that associates arbitrary keys to values, such that each key is unique but values may not be. Note that the use of a hash table for dicts (mappings) is just one possible implementation. Fundamentally a list is a mapping from sequential (and therefore unique) integer keys (the indexes) to values. Note that a linear array with the key (index) being implicit rather than explicit is just one possible implementation. I have no opinion on whether Julien's proposal is useful or not, but vectors (lists) can be implemented using mappings (dicts). Lua is proof of this: their table type operates as both list and dict. http://lua-users.org/wiki/TablesTutorial -- Steven From stephen at xemacs.org Thu Feb 9 03:13:40 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 09 Feb 2012 11:13:40 +0900 Subject: [Python-ideas] [Python-Dev] matrix operations on dict :) In-Reply-To: References: Message-ID: <87d39oycrv.fsf@uwakimon.sk.tsukuba.ac.jp> Mark Janssen writes: > The math (in my world) simply decided that factorial(0)=1 as the > convention of "an empty product" (Wikipedia::Factorial). In modern math (ie, post-Eilenberg-Mac Lane), it's not really a convention (unlike, say, Euclid's Parallel Postulate); it's the only way to go if you want the idea of product to generalize. If you don't understand that, I have serious doubts that you know what you're talking about. If you do understand that, please take care to be more precise. From tjreedy at udel.edu Thu Feb 9 03:39:49 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 08 Feb 2012 21:39:49 -0500 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: On 2/8/2012 7:25 PM, Guido van Rossum wrote: > Hmm... I disagree with Raymond's rejection of the proposed feature. I > have come across use cases for this functionality multiple time in real > life. > > Basically Raymond says "bisect can call the key() function many times, > which leads to bad design". His alternative, to use a list of (key, > value) tuples, is often a bit clumsy when passing the sorted list to > another function (e.g. for printing); having to transform the list using > e.g. [v for (k, v) in a] feels clumsy and suboptimal. So I'm not sure > that refusing the key= option always leads to the best design (in the > sense of the most readable code). An alternative to the n x 2 array is two n-arrays or a 2 x n array. Then there is no problem using either the keys or the values array. To use insort_right or insort_left for this, they would have to return the insertion position instead of None. Right now one must use bisect_right or _left and then .insert into both arrays instead of just the vals array. > Adding key= is particularly attractive since the current invariant is > something like "if a == sorted(a) before the operation, then a == > sorted(a) after the operation". Adding a key= option would simply change > that to sorted(a, key=key) on both counts. > > Also note that "many times" is actually O(log N) per insertion, which > isn't so bad. The main use case for bisect() is to manage a list that > sees updates *and* iterations -- otherwise building the list unsorted > and sorting it at the end would make more sense. The key= option > provides a balance between the cost/elegance for updates and for iterations. For *large enough* lists, the O(n*n) cost of insertions will dominate the O(n*logN) key() calls, so reducing the latter to O(n) key calls will not matter. -- Terry Jan Reedy From tjreedy at udel.edu Thu Feb 9 03:42:53 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 08 Feb 2012 21:42:53 -0500 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: On 2/8/2012 5:18 PM, Amaury Forgeot d'Arc wrote: > This was proposed several times on the issue tracker (search for "bisect > key"), > and these proposals have always been rejected: > http://bugs.python.org/issue4356 > http://bugs.python.org/issue1451588 > http://bugs.python.org/issue3374 Do these all suggest a specific api and if so, do they agree? > The last one summarizes the reasons of the rejection. > The documentation (http://docs.python.org/library/bisect.html, "see also") > contains a link to a "SortedCollection" recipe. > > I haven't looked at the SortedCollection class in detail, but you could > try to have it included in the stdlib... > -- Terry Jan Reedy From tjreedy at udel.edu Thu Feb 9 03:44:56 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 08 Feb 2012 21:44:56 -0500 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: <4F331885.3000305@pearwood.info> References: <4F316B38.7020608@molden.no> <4F32A0F8.5070408@molden.no> <4F331885.3000305@pearwood.info> Message-ID: On 2/8/2012 7:51 PM, Steven D'Aprano wrote: > Not necessarily. There is nothing in the API for Python lists that > *requires* that elements are stored in one continuous array. That's a > side-effect of the implementation. I believe NumPy uses multiple blocks, as does deque. -- Terry Jan Reedy From masklinn at masklinn.net Thu Feb 9 09:45:24 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 09:45:24 +0100 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: <490602D8-AF8E-420D-ADAD-A3B7A8171E75@masklinn.net> On 2012-02-09, at 01:25 , Guido van Rossum wrote: > Hmm... I disagree with Raymond's rejection of the proposed feature. I have > come across use cases for this functionality multiple time in real life. > > Basically Raymond says "bisect can call the key() function many times, > which leads to bad design". His alternative, to use a list of (key, value) > tuples, is often a bit clumsy when passing the sorted list to another > function (e.g. for printing); having to transform the list using e.g. [v > for (k, v) in a] feels clumsy and suboptimal. Yes, this is the kind of things which prompted my original email. It is even clumsier when there are many (smaller) lists to manipulate and insert into in turn, and requires two verbose and potentially expensive phases of decoration and undecoration. Using two separate lists has similar (though simpler) issues, especially when producing API-related structures as the "helper" collection must be cleaned up during an undecoration phase. And as Terry notes, this gets very clumsy as each candidate insertion now requires three statements (a call to bisect_right to get the insertion index followed by an insertion into the helper list and an other one for the actual list). From masklinn at masklinn.net Thu Feb 9 09:53:48 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 09:53:48 +0100 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: On 2012-02-09, at 03:42 , Terry Reedy wrote: > On 2/8/2012 5:18 PM, Amaury Forgeot d'Arc wrote: >> This was proposed several times on the issue tracker (search for "bisect >> key"), >> and these proposals have always been rejected: >> http://bugs.python.org/issue4356 >> http://bugs.python.org/issue1451588 >> http://bugs.python.org/issue3374 > > Do these all suggest a specific api and if so, do they agree? http://bugs.python.org/issue4356 Suggests a ``key=`` argument behaving as with ``sorted`` and ``list.sort``: collection values are decorated with the key before comparisons. This is exactly my original email. http://bugs.python.org/issue1451588 Suggests a ``cmp=`` argument (proposal precedes Python 3 and ``key=`` taking over) to use instead of the built-in comparison operator. http://bugs.python.org/issue3374 Suggests all of ``cmp=`` (this again being opposed in ``cmp=`` having been dropped from Python 3), ``key=`` and ``reverse=``. In summary, all three suggest following the existing API of ``list.sort`` and ``sorted``, and at least implementing its ``key=`` argument (I am taking issue1451588 as doing so, since it suggests the mechanism and argument which preceded ``key``) From robert.kern at gmail.com Thu Feb 9 12:20:38 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 09 Feb 2012 11:20:38 +0000 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: <4F316B38.7020608@molden.no> <4F32A0F8.5070408@molden.no> <4F331885.3000305@pearwood.info> Message-ID: On 2/9/12 2:44 AM, Terry Reedy wrote: > On 2/8/2012 7:51 PM, Steven D'Aprano wrote: > >> Not necessarily. There is nothing in the API for Python lists that >> *requires* that elements are stored in one continuous array. That's a >> side-effect of the implementation. > > I believe NumPy uses multiple blocks, as does deque. numpy uses uniformly strided memory starting from a single memory location, which is not quite the same as using multiple blocks. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From arnodel at gmail.com Thu Feb 9 15:27:20 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Thu, 9 Feb 2012 14:27:20 +0000 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: On 9 February 2012 00:25, Guido van Rossum wrote: > Basically Raymond says "bisect can call the key() function many times, which > leads to bad design". His alternative, to use a list of (key, value) tuples, > is often a bit clumsy when passing the sorted list to another function (e.g. > for printing); having to transform the list using e.g. [v for (k, v) in a] > feels clumsy and suboptimal. Also, in Python 3 one can't assume that values will be comparable so the (key, value) tuple trick won't work: comparing the tuples may well throw a TypeError. Here's a simple example below. The class 'Person' has no natural order, but we may want to keep a list of people sorted by iq: >>> class Person: ... def __init__(self, height, iq): ... self.height = height ... self.iq = iq ... >>> arno = Person(184, 101) >>> guido = Person(179, 185) >>> steve = Person(168, 101) >>> key = lambda p: p.iq >>> people = [] >>> bisect.insort(people, (key(arno), arno)) >>> bisect.insort(people, (key(guido), guido)) >>> bisect.insort(people, (key(steve), steve)) Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: Person() < Person() >>> -- Arnaud From sturla at molden.no Thu Feb 9 15:27:29 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 15:27:29 +0100 Subject: [Python-ideas] matrix operations on dict :) In-Reply-To: References: <4F316B38.7020608@molden.no> <4F32A0F8.5070408@molden.no> <4F331885.3000305@pearwood.info> Message-ID: <4F33D7D1.5060800@molden.no> On 09.02.2012 03:44, Terry Reedy wrote: > I believe NumPy uses multiple blocks, as does deque. No it does not (see Robert Kern's reply). But a lot of numerical codes in C or Java do, using an array of pointers (C) or an array of arrays (Java) to emulate a two dimensional array, particularly numerical code written by amateurs. I've also seen this in Python, using (heaven forbid) lists of lists as a 2D array replacement. It is sad that Numerical Receipes encourages this coding style. (Actually the third edition does not, but it is not sufficient to remedy the damage.) Those who don't understand why "jagged arrays" can be a problem should stick to Matlab, Fortran or NumPy. Sturla From techtonik at gmail.com Thu Feb 9 15:36:40 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 9 Feb 2012 17:36:40 +0300 Subject: [Python-ideas] Python 3000 TIOBE -3% Message-ID: Hi, I didn't want to grow FUD on python-dev, but a FUD there seems to be a good topic for discussion here. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html As you may see, Python is losing its positions. I blame Python 3 and that Python development is not concentrating on users enough [1], and that there is a big resistance in getting the things done (/moin/ prefix story) and the whole communication process is a bit discouraging. If it is not the cause, then the cause is the lack of visibility into the real problem, but what the real problem is? I guess the topic is for upcoming language summit at PyCon, but it will be hard for me to get there this year from Belarus, so it would be nice to read some opinions here. 1. http://python-for-humans.heroku.com/ -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Thu Feb 9 16:05:09 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 16:05:09 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: <0E6116E4-A406-4C49-A8C1-C18D6228C141@masklinn.net> On 2012-02-09, at 15:36 , anatoly techtonik wrote: > Hi, > > I didn't want to grow FUD on python-dev, but a FUD there seems to be a good > topic for discussion here. > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html 1. Python-ideas is not the right place for this stuff (neither is Python-dev, by the way) 2. Why would anybody care exactly? From ubershmekel at gmail.com Thu Feb 9 16:13:03 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 9 Feb 2012 17:13:03 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <0E6116E4-A406-4C49-A8C1-C18D6228C141@masklinn.net> References: <0E6116E4-A406-4C49-A8C1-C18D6228C141@masklinn.net> Message-ID: On Thu, Feb 9, 2012 at 5:05 PM, Masklinn wrote: > On 2012-02-09, at 15:36 , anatoly techtonik wrote: > > Hi, > > > > I didn't want to grow FUD on python-dev, but a FUD there seems to be a > good > > topic for discussion here. > > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html > > 1. Python-ideas is not the right place for this stuff (neither is > Python-dev, by the way) > 2. Why would anybody care exactly? > 1. Where would be the correct place to talk about a grand state of python affairs? 2. Like it or not, many use such ratings to decide which language to learn, which language to use for their next project and whether or not to be proud of their language of choice. I think it's important for python to be popular and good. One without the other isn't too useful. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Feb 9 16:19:23 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Feb 2012 16:19:23 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% References: Message-ID: <20120209161923.4417cbae@pitrou.net> On Thu, 9 Feb 2012 17:36:40 +0300 anatoly techtonik wrote: > Hi, > > I didn't want to grow FUD on python-dev, but a FUD there seems to be a good > topic for discussion here. It isn't. From stefan_ml at behnel.de Thu Feb 9 16:24:53 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 09 Feb 2012 16:24:53 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <0E6116E4-A406-4C49-A8C1-C18D6228C141@masklinn.net> Message-ID: Yuval Greenfield, 09.02.2012 16:13: > On Thu, Feb 9, 2012 at 5:05 PM, Masklinn wrote: >> On 2012-02-09, at 15:36 , anatoly techtonik wrote: >>> I didn't want to grow FUD on python-dev, but a FUD there seems to be a >>> good topic for discussion here. >>> http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html >> >> 1. Python-ideas is not the right place for this stuff (neither is >> Python-dev, by the way) >> 2. Why would anybody care exactly? > > 1. Where would be the correct place to talk about a grand state of > python affairs? The right place to discuss "most things Python" is python-list, aka. comp.lang.python. Stefan From nathan.alexander.rice at gmail.com Thu Feb 9 16:31:51 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Thu, 9 Feb 2012 10:31:51 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <0E6116E4-A406-4C49-A8C1-C18D6228C141@masklinn.net> Message-ID: > Where would be the correct place to talk about a grand state of python > affairs? > Like it or not, many use such ratings to decide which language to learn, > which language to use for their next project and whether or not to be proud > of their language of choice. > > I think it's important for python to be popular and good. One without the > other isn't too useful. The reason python is slipping in the index is the same reason that its popularity doesn't matter (much). Wrapper generating tools, cross language interfaces and whatnot are making "polyglot" programming a pretty simple affair these days... The TIOBE index for the most part has two distinct groups: Languages that people use at work, where risk aversion are large driving forces (see java, c/++, php) and languages people use personally because they enjoy programming in them. Because the library issue for a new or less popular language is not as big a deal as it once was, people have more freedom in their choice, and that is reflected in the diversification of "fun" languages. Javascript is an outlier here, you don't have a choice if you target the browser. Nathan From solipsis at pitrou.net Thu Feb 9 16:47:17 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 9 Feb 2012 16:47:17 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% References: <20120209161923.4417cbae@pitrou.net> Message-ID: <20120209164717.237d2c5a@pitrou.net> On Thu, 9 Feb 2012 16:19:23 +0100 Antoine Pitrou wrote: > On Thu, 9 Feb 2012 17:36:40 +0300 > anatoly techtonik > wrote: > > Hi, > > > > I didn't want to grow FUD on python-dev, but a FUD there seems to be a good > > topic for discussion here. > > It isn't. And to elaborate a bit, here's the description of the python-ideas list: ?This list is to contain discussion of speculative language ideas for Python for possible inclusion into the language. If an idea gains traction it can then be discussed and honed to the point of becoming a solid proposal to put to either python-dev or python-3000 as appropriate.? (*) python-ideas is not a catchall for random opinions about Python. (*) someone should really remove that python-3000 reference From benjamin at python.org Thu Feb 9 16:50:10 2012 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 9 Feb 2012 15:50:10 +0000 (UTC) Subject: [Python-ideas] Python 3000 TIOBE -3% References: Message-ID: anatoly techtonik writes: > As you may see, Python is losing its positions. I blame Python 3 and that Python development is not concentrating on users enough [1], and that there is a big resistance in getting the things done (/moin/ prefix story) and the whole communication process is a bit discouraging. Indeed. What would you suggest to alleviate that? From ehlesmes at gmail.com Thu Feb 9 17:40:11 2012 From: ehlesmes at gmail.com (Edward Lesmes) Date: Thu, 9 Feb 2012 11:40:11 -0500 Subject: [Python-ideas] map iterator Message-ID: An iterator version of map should be available for large sets of data. -- Edward Lesmes -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Feb 9 17:41:44 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 08:41:44 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: Bingo. That clinches it. We need to add key=. On Thu, Feb 9, 2012 at 6:27 AM, Arnaud Delobelle wrote: > On 9 February 2012 00:25, Guido van Rossum wrote: > > Basically Raymond says "bisect can call the key() function many times, > which > > leads to bad design". His alternative, to use a list of (key, value) > tuples, > > is often a bit clumsy when passing the sorted list to another function > (e.g. > > for printing); having to transform the list using e.g. [v for (k, v) in > a] > > feels clumsy and suboptimal. > > Also, in Python 3 one can't assume that values will be comparable so > the (key, value) tuple trick won't work: comparing the tuples may well > throw a TypeError. Here's a simple example below. The class 'Person' > has no natural order, but we may want to keep a list of people sorted > by iq: > > >>> class Person: > ... def __init__(self, height, iq): > ... self.height = height > ... self.iq = iq > ... > >>> arno = Person(184, 101) > >>> guido = Person(179, 185) > >>> steve = Person(168, 101) > >>> key = lambda p: p.iq > >>> people = [] > >>> bisect.insort(people, (key(arno), arno)) > >>> bisect.insort(people, (key(guido), guido)) > >>> bisect.insort(people, (key(steve), steve)) > Traceback (most recent call last): > File "", line 1, in > TypeError: unorderable types: Person() < Person() > >>> > > -- > Arnaud > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Thu Feb 9 17:47:03 2012 From: phd at phdru.name (Oleg Broytman) Date: Thu, 9 Feb 2012 20:47:03 +0400 Subject: [Python-ideas] map iterator In-Reply-To: References: Message-ID: <20120209164703.GB15324@iskra.aviel.ru> On Thu, Feb 09, 2012 at 11:40:11AM -0500, Edward Lesmes wrote: > An iterator version of map should be available for large sets of data. itertools.imap Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From malaclypse2 at gmail.com Thu Feb 9 17:48:27 2012 From: malaclypse2 at gmail.com (Jerry Hill) Date: Thu, 9 Feb 2012 11:48:27 -0500 Subject: [Python-ideas] map iterator In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 11:40 AM, Edward Lesmes wrote: > An iterator version of map should be available for large sets of data. > The python time machine strikes again. In python 2, this is available as itertools.imap. In python 3, this is the default behavior of the map() function. -- Jerry -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Thu Feb 9 17:49:29 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Thu, 9 Feb 2012 10:49:29 -0600 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: Here is another data point: http://redmonk.com/sogrady/2012/02/08/language-rankings-2-2012/ Unfortunately the TIOBE index does matter. I can speak for python in education and trends I seen. Python is and remains the easiest language to teach but it is no longer true that getting Python to run is easer than alternatives (not for the average undergrad student). It used to be you download python 2.5 and you were in business. Now you have to make a choice 2.x or 3.x. 20% of the students cannot tell one from the other (even after been told repeatedly which one to use). Three weeks into the class they complain with "the class code won't compile" (the same 20% cannot tell a compiler form an interpreter). 50+% of the students have a mac and an increasing number of packages depend on numpy. Installing numpy on mac is a lottery. Those who do not have a mac have windows and they expect an IDE like eclipse. I know you can use Python with eclipse but they do not. They download Python and complain that IDLE has no autocompletion, no line numbers, no collapsible functions/classes. From the hard core computer scientists prospective there are usually three objections to using Python: - Most software engineers think we should only teach static type languages - Those who care about scalability complain about the GIL - The programming language purists complain about the use of reference counting instead of garbage collection The net result is that people cannot agree and it is getting increasingly difficult to make the case for the use of Python in intro CS courses. For some reason javaScript seems to win these days. Massimo On Feb 9, 2012, at 8:36 AM, anatoly techtonik wrote: > Hi, > > I didn't want to grow FUD on python-dev, but a FUD there seems to be > a good topic for discussion here. > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html > > As you may see, Python is losing its positions. I blame Python 3 and > that Python development is not concentrating on users enough [1], > and that there is a big resistance in getting the things done (/ > moin/ prefix story) and the whole communication process is a bit > discouraging. If it is not the cause, then the cause is the lack of > visibility into the real problem, but what the real problem is? > > I guess the topic is for upcoming language summit at PyCon, but it > will be hard for me to get there this year from Belarus, so it would > be nice to read some opinions here. > > > 1. http://python-for-humans.heroku.com/ > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ehlesmes at gmail.com Thu Feb 9 18:12:01 2012 From: ehlesmes at gmail.com (Edward Lesmes) Date: Thu, 9 Feb 2012 12:12:01 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: Massimo Di Pierro writes: > 50+% of the students have a mac and an increasing number of packages depend on numpy. Installing numpy on mac is a lottery. About the numpy dependency, I think is a reason to integrate numpy in python. -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Thu Feb 9 18:21:26 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 10 Feb 2012 01:21:26 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: This On Feb 10, 2012 12:49 AM, "Massimo Di Pierro" wrote: > Here is another data point: > http://redmonk.com/sogrady/2012/02/08/language-rankings-2-2012/ > > Unfortunately the TIOBE index does matter. I can speak for python in > education and trends I seen. > > Python is and remains the easiest language to teach but it is no longer > true that getting Python to run is easer than alternatives (not for the > average undergrad student). It used to be you download python 2.5 and you > were in business. Now you have to make a choice 2.x or 3.x. 20% of the > students cannot tell one from the other (even after been told repeatedly > which one to use). Three weeks into the class they complain with "the class > code won't compile" (the same 20% cannot tell a compiler form an > interpreter). > > 50+% of the students have a mac and an increasing number of packages > depend on numpy. Installing numpy on mac is a lottery. > > Those who do not have a mac have windows and they expect an IDE like > eclipse. I know you can use Python with eclipse but they do not. They > download Python and complain that IDLE has no autocompletion, no line > numbers, no collapsible functions/classes. > > From the hard core computer scientists prospective there are usually three > objections to using Python: > - Most software engineers think we should only teach static type languages > - Those who care about scalability complain about the GIL > - The programming language purists complain about the use of reference > counting instead of garbage collection > > The net result is that people cannot agree and it is getting increasingly > difficult to make the case for the use of Python in intro CS courses. For > some reason javaScript seems to win these days. > > Massimo > > > On Feb 9, 2012, at 8:36 AM, anatoly techtonik wrote: > > Hi, > > I didn't want to grow FUD on python-dev, but a FUD there seems to be a > good topic for discussion here. > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html > > As you may see, Python is losing its positions. I blame Python 3 and that > Python development is not concentrating on users enough [1], and that there > is a big resistance in getting the things done (/moin/ prefix story) and > the whole communication process is a bit discouraging. If it is not the > cause, then the cause is the lack of visibility into the real problem, but > what the real problem is? > > I guess the topic is for upcoming language summit at PyCon, but it will be > hard for me to get there this year from Belarus, so it would be nice to > read some opinions here. > > > 1. http://python-for-humans.heroku.com/ > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Thu Feb 9 18:27:25 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Thu, 9 Feb 2012 09:27:25 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: On Feb 8, 2012, at 4:25 PM, Guido van Rossum wrote: > Also note that "many times" is actually O(log N) per insertion, which isn't so bad. The main use case for bisect() is to manage a list that sees updates *and* iterations -- otherwise building the list unsorted and sorting it at the end would make more sense. The key= option provides a balance between the cost/elegance for updates and for iterations. Would you be open to introducing a SortedList class to encapsulate the data so that key functions get applied no more than once per record and the sort order is maintained as new items are inserted? ISTM, the whole problem with bisect is that the underlying list is naked, leaving no way to easily correlate the sort keys with the corresponding sorted records. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Thu Feb 9 18:35:17 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 10 Feb 2012 01:35:17 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: >From my own observations, the recent drop is sure to uncertainty with Python 3, and an increase of alternatives on server side, such as Node. The transition is only going to get more painful as system critical software lags on 2.x while users clamour for 3.x. I understand there are some fundamental problems in running both simultaneously which makes gradual integration not a possibility. Dynamic typing also doesn't help, making it very hard to automatically port, and update dependencies. Lesser reasons include an increasing gap in scalability to multicore compared with other languages (the GIL being the gorilla here, multiprocessing is unacceptable as long as native threading is the only supported concurrency mechanism), and a lack of enthusiasm from key technologies and vendors: GAE, gevent, matplotlib, are a few encountered personally. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Feb 9 18:43:49 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Feb 2012 12:43:49 -0500 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: On 2/9/2012 11:41 AM, Guido van Rossum wrote: > Bingo. That clinches it. We need to add key=. I reopened http://bugs.python.org/issue4356 with the above quoted. It has a patch with tests ready for review. -- Terry Jan Reedy From massimo.dipierro at gmail.com Thu Feb 9 18:46:45 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Thu, 9 Feb 2012 11:46:45 -0600 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> I think if easy_install, gevent, numpy (*), and win32 extensions where included in 3.x, together with a slightly better Idle (still based on Tkinter, with multiple pages, autocompletion, collapsible, line numbers, better printing with syntax highlitghing), and if easy_install were accessible via Idle, this would be a killer version. Longer term removing the GIL and using garbage collection should be a priority. I am not sure what is involved and how difficult it is but perhaps this is what PyCon money can be used for. If this cannot be done without breaking backward compatibility again, then 3.x should be considered an experimental branch, people should be advised to stay with 2.7 (2.8?) and then skip to 4.x directly when these problems are resolved. Python should not make a habit of breaking backward compatibility. It would be really nice if it were to include an async web server (based on gevent for example) and better parser for HTTP headers and a python based template language (like mako or the web2py one) not just for the web but for document generation in general. Massimo On Feb 9, 2012, at 11:12 AM, Edward Lesmes wrote: > Massimo Di Pierro writes: > > 50+% of the students have a mac and an increasing number of > packages depend on numpy. Installing numpy on mac is a lottery. > > About the numpy dependency, I think is a reason to integrate numpy > in python. > > From guido at python.org Thu Feb 9 18:48:18 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 09:48:18 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: On Thu, Feb 9, 2012 at 9:27 AM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > On Feb 8, 2012, at 4:25 PM, Guido van Rossum wrote: > > Also note that "many times" is actually O(log N) per insertion, which > isn't so bad. The main use case for bisect() is to manage a list that sees > updates *and* iterations -- otherwise building the list unsorted and > sorting it at the end would make more sense. The key= option provides a > balance between the cost/elegance for updates and for iterations. > > > Would you be open to introducing a SortedList class to encapsulate the > data so that key functions get applied no more than once per record and the > sort order is maintained as new items are inserted? > Hm. A good implementation of such a thing would probably require a B-tree implementation (or some other tree). That sounds like a good data type to have in the collections module, but doesn't really address the desire to use bisect on a list. Also it further erodes my desire not to bother the programmer with subtle decisions about the choice of data type: the most basic types (string, number, list, dict) are easy enough to distinguish, and further choices between subclasses or alternatives are usually a distraction. > ISTM, the whole problem with bisect is that the underlying list is naked, > leaving no way to easily correlate the sort keys with the corresponding > sorted records. > That same "problem" would exist for sorted() and list.sort(), and the solution there (parameterize it with a key= option) easily generalizes to bisect (and to heapq, for that matter). The more fundamental "conflict" here seems to be between algorithms and classes. list.sort(), bisect and heapq focus on the algorithm. In some sense they reflect the state of the world before object-oriented programming was invented. Sometimes it is useful to encapsulate these in classes. Other times, the encapsulation doesn't add to the clarity of the program. One more thing: bisect.py doesn't only apply to insertions. It is also useful to find a "nearest" elements in a pre-sorted list. Probably that list was sorted using list.sort(), possibly using key=... -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stutzbach at google.com Thu Feb 9 18:50:19 2012 From: stutzbach at google.com (Daniel Stutzbach) Date: Thu, 9 Feb 2012 09:50:19 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: On Wed, Feb 8, 2012 at 4:25 PM, Guido van Rossum wrote: > Also note that "many times" is actually O(log N) per insertion, which > isn't so bad. The main use case for bisect() is to manage a list that sees > updates *and* iterations -- otherwise building the list unsorted and > sorting it at the end would make more sense. The key= option provides a > balance between the cost/elegance for updates and for iterations. > Maintaining a sorted list using Python's list type is a trap. The bisect is O(log n), but insertion and deletion are still O(n). A SortedList class that provides O(log n) insertions is useful from time to time. There are several existing implementations available (I wrote one of them, on top of my blist type), each with their pros and cons. -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Feb 9 19:02:09 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Feb 2012 13:02:09 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: On 2/9/2012 12:12 PM, Edward Lesmes wrote: > Massimo Di Pierro writes: > > 50+% of the students have a mac and an increasing number of packages > depend on numpy. Installing numpy on mac is a lottery. > > About the numpy dependency, I think is a reason to integrate numpy in > python. And make installing Python on the Mac a lottery? -- Terry Jan Reedy From steve at pearwood.info Thu Feb 9 19:03:45 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Feb 2012 05:03:45 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: <4F340A81.60300@pearwood.info> Massimo Di Pierro wrote: > Here is another data point: > http://redmonk.com/sogrady/2012/02/08/language-rankings-2-2012/ > > Unfortunately the TIOBE index does matter. I can speak for python in > education and trends I seen. > > Python is and remains the easiest language to teach but it is no longer > true that getting Python to run is easer than alternatives (not for the > average undergrad student). Is that a commentary on Python, or the average undergrad student? > It used to be you download python 2.5 and > you were in business. Now you have to make a choice 2.x or 3.x. 20% of > the students cannot tell one from the other (even after been told > repeatedly which one to use). Three weeks into the class they complain > with "the class code won't compile" (the same 20% cannot tell a compiler > form an interpreter). Python has a compiler. The "c" in .pyc files stands for "compiled" and Python has a built-in function called "compile". It just happens to compile to byte code that runs on a virtual machine, not machine code running on physical hardware. PyPy takes it even further, with a JIT compiler that operates on the byte code. > 50+% of the students have a mac and an increasing number of packages > depend on numpy. Installing numpy on mac is a lottery. > > Those who do not have a mac have windows and they expect an IDE like > eclipse. I know you can use Python with eclipse but they do not. They > download Python and complain that IDLE has no autocompletion, no line > numbers, no collapsible functions/classes. > > From the hard core computer scientists prospective there are usually > three objections to using Python: > - Most software engineers think we should only teach static type languages > - Those who care about scalability complain about the GIL How is that relevant to a language being taught to undergrads? Sounds more like an excuse to justify dislike of teaching Python rather than an actual reason to dislike Python. > - The programming language purists complain about the use of reference > counting instead of garbage collection The programming language purists should know better than that. The choice of which garbage collection implementation (ref counting is garbage collection) is a quality of implementation detail, not a language feature. -- Steven From masklinn at masklinn.net Thu Feb 9 19:14:39 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 19:14:39 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F340A81.60300@pearwood.info> References: <4F340A81.60300@pearwood.info> Message-ID: <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> On 2012-02-09, at 19:03 , Steven D'Aprano wrote: > The choice of which garbage collection implementation (ref counting is garbage collection) is a quality of implementation detail, not a language feature. That's debatable, it's an implementation detail with very different semantics which tends to leak out into usage patterns of the language (as it did with CPython, which basically did not get fixed in the community until Pypy started ascending), especially when the language does not provide "better" ways to handle things (as Python finally did by adding context managers in 2.5). So theoretically, automatic refcounting is a detail, but practically it influences language usage differently than most other GC techniques (when it'd the only GC strategy in the language anyway) From guido at python.org Thu Feb 9 19:22:47 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 10:22:47 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F340A81.60300@pearwood.info> References: <4F340A81.60300@pearwood.info> Message-ID: On Thu, Feb 9, 2012 at 10:03 AM, Steven D'Aprano wrote: > Massimo Di Pierro wrote: > >> Here is another data point: >> http://redmonk.com/sogrady/**2012/02/08/language-rankings-**2-2012/ >> >> Unfortunately the TIOBE index does matter. I can speak for python in >> education and trends I seen. >> >> Python is and remains the easiest language to teach but it is no longer >> true that getting Python to run is easer than alternatives (not for the >> average undergrad student). >> > > Is that a commentary on Python, or the average undergrad student? Well either way it's depressing... > It used to be you download python 2.5 and you were in business. Now you >> have to make a choice 2.x or 3.x. 20% of the students cannot tell one from >> the other (even after been told repeatedly which one to use). Three weeks >> into the class they complain with "the class code won't compile" (the same >> 20% cannot tell a compiler form an interpreter). >> > > Python has a compiler. The "c" in .pyc files stands for "compiled" and > Python has a built-in function called "compile". It just happens to compile > to byte code that runs on a virtual machine, not machine code running on > physical hardware. PyPy takes it even further, with a JIT compiler that > operates on the byte code. Not sure how that's relevant. Massimo used "won't compile" as a shorthand for "has a syntax error". 50+% of the students have a mac and an increasing number of packages > depend on numpy. Installing numpy on mac is a lottery. > But that was the same in the 2.5 days. The problem is worse now because (a) numpy is going mainstream, and (b) Macs don't come with a C compiler any more. I think the answer will have to be in making an effort to produce robust and frequently updated downloads of numpy to match various popular Python versions and platforms. This is a major pain (packaging always is) so maybe some incentive is necessary (just like ActiveState has its Python distros). > Those who do not have a mac have windows and they expect an IDE like > eclipse. I know you can use Python with eclipse but they do not. They > download Python and complain that IDLE has no autocompletion, no line > numbers, no collapsible functions/classes. > Hm. I know a fair number of people who use Eclipse to edit Python (there's some plugin). This seems easy enough to address by just pointing people to the plugin, I don't think Python itself is to blame here. From the hard core computer scientists prospective there are usually > three objections to using Python: > - Most software engineers think we should only teach static type languages > - Those who care about scalability complain about the GIL > How is that relevant to a language being taught to undergrads? Sounds more > like an excuse to justify dislike of teaching Python rather than an actual > reason to dislike Python. I can see the discomfort if the other professors keep bringing this up. It is, sadly, a very effective troll. (Before it was widely know, the most common troll was the whitespace. People would declare it to be ridiculous without ever having tried it. Same with the GIL.) - The programming language purists complain about the use of reference > counting instead of garbage collection > The programming language purists should know better than that. The choice > of which garbage collection implementation (ref counting is garbage > collection) is a quality of implementation detail, not a language feature. > Yeah, trolls are a pain. We need to start spreading more effective counter-memes. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Thu Feb 9 19:25:18 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Thu, 9 Feb 2012 12:25:18 -0600 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F340A81.60300@pearwood.info> References: <4F340A81.60300@pearwood.info> Message-ID: <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> On Feb 9, 2012, at 12:03 PM, Steven D'Aprano wrote: > Massimo Di Pierro wrote: >> Here is another data point: >> http://redmonk.com/sogrady/2012/02/08/language-rankings-2-2012/ >> Unfortunately the TIOBE index does matter. I can speak for python >> in education and trends I seen. >> Python is and remains the easiest language to teach but it is no >> longer true that getting Python to run is easer than alternatives >> (not for the average undergrad student). > > Is that a commentary on Python, or the average undergrad student? I teach so the average student is my benchmark. Please do not misunderstand. While some may be lazy, but the average CS undergrad is not stupid but quite intelligent. They just do not like wasting time with setups and I sympathize with that. Batteries included is the Python motto. >> It used to be you download python 2.5 and you were in business. Now >> you have to make a choice 2.x or 3.x. 20% of the students cannot >> tell one from the other (even after been told repeatedly which one >> to use). Three weeks into the class they complain with "the class >> code won't compile" (the same 20% cannot tell a compiler form an >> interpreter). > > Python has a compiler. The "c" in .pyc files stands for "compiled" > and Python has a built-in function called "compile". It just happens > to compile to byte code that runs on a virtual machine, not machine > code running on physical hardware. PyPy takes it even further, with > a JIT compiler that operates on the byte code. > > >> 50+% of the students have a mac and an increasing number of >> packages depend on numpy. Installing numpy on mac is a lottery. >> Those who do not have a mac have windows and they expect an IDE >> like eclipse. I know you can use Python with eclipse but they do >> not. They download Python and complain that IDLE has no >> autocompletion, no line numbers, no collapsible functions/classes. >> From the hard core computer scientists prospective there are >> usually three objections to using Python: >> - Most software engineers think we should only teach static type >> languages >> - Those who care about scalability complain about the GIL > > How is that relevant to a language being taught to undergrads? > Sounds more like an excuse to justify dislike of teaching Python > rather than an actual reason to dislike Python. > > >> - The programming language purists complain about the use of >> reference counting instead of garbage collection > > The programming language purists should know better than that. The > choice of which garbage collection implementation (ref counting is > garbage collection) is a quality of implementation detail, not a > language feature. Don't shoot the messenger please. You can dismiss or address the problem. Anyway... undergrads do care because they will take 4 years to grade and they do not want to come out with obsolete skills. Our undergrads learn Python, Ruby, Java, Javascript and C++. Many know other languages which they learn on their own (Scala and Clojure are popular). They all agree multi-core is the future and whichever language can deal with them better is the future too. As masklinn says, the difference between garbage collection and reference counting is more than an implementation issue. > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From guido at python.org Thu Feb 9 19:26:07 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 10:26:07 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: On Thu, Feb 9, 2012 at 10:14 AM, Masklinn wrote: > On 2012-02-09, at 19:03 , Steven D'Aprano wrote: > > The choice of which garbage collection implementation (ref counting is > garbage collection) is a quality of implementation detail, not a language > feature. > > That's debatable, it's an implementation detail with very different > semantics which tends to leak out into usage patterns of the language (as > it did with CPython, which basically did not get fixed in the community > until Pypy started ascending), I think it was actually Jython that first sensitized the community to this issue. > especially when the language does not provide "better" ways to handle > things (as Python finally did by adding context managers in 2.5). > > So theoretically, automatic refcounting is a detail, but practically it > influences language usage differently than most other GC techniques (when > it'd the only GC strategy in the language anyway) > Are there still Python idioms/patterns/recipes around that depend on refcounting? (There also used to be some well-known anti-patterns that were only bad because of the refcounting, mostly around saving exceptions. But those should all have melted away -- CPython has had auxiliary GC for over a decade.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Feb 9 19:30:55 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 19:30:55 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: <4F3410DF.30602@molden.no> On 09.02.2012 19:02, Terry Reedy wrote: > And make installing Python on the Mac a lottery? Or a subset of NumPy? The main offender is numpy.linalg, with needs a BLAS library that should be tuned to the hardware. (There is a reason NumPy and SciPy binary installers on Windows are bloated.) And from what I have seen on complaints building NumPy on Mav it tends to be the BLAS/LAPACK stuff that drives people crazy, particularly those who want to use ATLAS (Which is a bit stupid, as OpenBLAS/GotoBLAS2 is easier to build and much faster.) If Python comes with NumPy built against Netlib reference BLAS, there will be lots of complaints that "Matlab is so much faster then Python" when it is actually the BLAS libraries that are different. But I am not sure we want 50-100 MB of bloat in the Python binary installer just to cover all possible cases of CPU-tuned OpenBLAS/GotoBLAS2 or ATLAS libraries. Sturla From guido at python.org Thu Feb 9 19:34:50 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 10:34:50 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> Message-ID: On Thu, Feb 9, 2012 at 10:25 AM, Massimo Di Pierro < massimo.dipierro at gmail.com> wrote: > > On Feb 9, 2012, at 12:03 PM, Steven D'Aprano wrote: > > Massimo Di Pierro wrote: >> >>> Here is another data point: >>> http://redmonk.com/sogrady/**2012/02/08/language-rankings-**2-2012/ >>> Unfortunately the TIOBE index does matter. I can speak for python in >>> education and trends I seen. >>> Python is and remains the easiest language to teach but it is no longer >>> true that getting Python to run is easer than alternatives (not for the >>> average undergrad student). >>> >> >> Is that a commentary on Python, or the average undergrad student? >> > > I teach so the average student is my benchmark. Please do not > misunderstand. While some may be lazy, but the average CS undergrad is not > stupid but quite intelligent. They just do not like wasting time with > setups and I sympathize with that. Batteries included is the Python motto. > > > It used to be you download python 2.5 and you were in business. Now you >>> have to make a choice 2.x or 3.x. 20% of the students cannot tell one from >>> the other (even after been told repeatedly which one to use). Three weeks >>> into the class they complain with "the class code won't compile" (the same >>> 20% cannot tell a compiler form an interpreter). >>> >> >> Python has a compiler. The "c" in .pyc files stands for "compiled" and >> Python has a built-in function called "compile". It just happens to compile >> to byte code that runs on a virtual machine, not machine code running on >> physical hardware. PyPy takes it even further, with a JIT compiler that >> operates on the byte code. >> >> >> 50+% of the students have a mac and an increasing number of packages >>> depend on numpy. Installing numpy on mac is a lottery. >>> Those who do not have a mac have windows and they expect an IDE like >>> eclipse. I know you can use Python with eclipse but they do not. They >>> download Python and complain that IDLE has no autocompletion, no line >>> numbers, no collapsible functions/classes. >>> From the hard core computer scientists prospective there are usually >>> three objections to using Python: >>> - Most software engineers think we should only teach static type >>> languages >>> - Those who care about scalability complain about the GIL >>> >> >> How is that relevant to a language being taught to undergrads? Sounds >> more like an excuse to justify dislike of teaching Python rather than an >> actual reason to dislike Python. >> >> >> - The programming language purists complain about the use of reference >>> counting instead of garbage collection >>> >> >> The programming language purists should know better than that. The choice >> of which garbage collection implementation (ref counting is garbage >> collection) is a quality of implementation detail, not a language feature. >> > > Don't shoot the messenger please. > > You can dismiss or address the problem. Anyway... undergrads do care > because they will take 4 years to grade and they do not want to come out > with obsolete skills. Our undergrads learn Python, Ruby, Java, Javascript > and C++. Many know other languages which they learn on their own (Scala and > Clojure are popular). I'd give those students a bonus for being in touch with what's popular in academia. Point them to Haskell next. They may amount to something. > They all agree multi-core is the future and whichever language can deal > with them better is the future too. > Surely not JavaScript (which is single-threaded and AFAIK also uses refcounting :-). Also, AFAIK Ruby has a GIL much like Python. I think it's time to start a PR offensive explaining why these are not the problem the trolls make them out to be, and how you simply have to use different patterns for scaling in some languages than in others. And note that a single-threaded event-driven process can serve 100,000 open sockets -- while no JVM can create 100,000 threads. As masklinn says, the difference between garbage collection and reference > counting is more than an implementation issue. > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Feb 9 19:36:41 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 10:36:41 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3410DF.30602@molden.no> References: <4F3410DF.30602@molden.no> Message-ID: On Thu, Feb 9, 2012 at 10:30 AM, Sturla Molden wrote: > On 09.02.2012 19:02, Terry Reedy wrote: > > And make installing Python on the Mac a lottery? >> > > Or a subset of NumPy? > > The main offender is numpy.linalg, with needs a BLAS library that should > be tuned to the hardware. (There is a reason NumPy and SciPy binary > installers on Windows are bloated.) And from what I have seen on complaints > building NumPy on Mav it tends to be the BLAS/LAPACK stuff that drives > people crazy, particularly those who want to use ATLAS (Which is a bit > stupid, as OpenBLAS/GotoBLAS2 is easier to build and much faster.) If > Python comes with NumPy built against Netlib reference BLAS, there will be > lots of complaints that "Matlab is so much faster then Python" when it is > actually the BLAS libraries that are different. But I am not sure we want > 50-100 MB of bloat in the Python binary installer just to cover all > possible cases of CPU-tuned OpenBLAS/GotoBLAS2 or ATLAS libraries. > I don't know much of this area, but maybe this is something where a dynamic installer (along the lines of easy_install) might actually be handy? The funny thing is that most Java software is even more bloated and you rarely hear about that (at least not from Java users ;-). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Thu Feb 9 19:37:18 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 19:37:18 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: On 2012-02-09, at 19:26 , Guido van Rossum wrote: > On Thu, Feb 9, 2012 at 10:14 AM, Masklinn wrote: >> On 2012-02-09, at 19:03 , Steven D'Aprano wrote: >>> The choice of which garbage collection implementation (ref counting is >> garbage collection) is a quality of implementation detail, not a language >> feature. >> >> That's debatable, it's an implementation detail with very different >> semantics which tends to leak out into usage patterns of the language (as >> it did with CPython, which basically did not get fixed in the community >> until Pypy started ascending), > > I think it was actually Jython that first sensitized the community to this > issue. > The first one was Jython yes, of course, but I did not see the "movement" gain much prominence before Pypy started looking like a serious CPython alternative, before that there were a few voices lost in the desert. >> especially when the language does not provide "better" ways to handle >> things (as Python finally did by adding context managers in 2.5). >> >> So theoretically, automatic refcounting is a detail, but practically it >> influences language usage differently than most other GC techniques (when >> it'd the only GC strategy in the language anyway) > > Are there still Python idioms/patterns/recipes around that depend on > refcounting? There shouldn't be, but I'm not going to rule out reliance on automatic resource cleanup just yet, I'm sure there are still significant pieces of code using those in the wild. From mwm at mired.org Thu Feb 9 19:42:37 2012 From: mwm at mired.org (Mike Meyer) Date: Thu, 9 Feb 2012 10:42:37 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: <20120209104237.154be949@bhuda.mired.org> On Fri, 10 Feb 2012 01:35:17 +0800 Matt Joiner wrote: > the GIL being the gorilla here, multiprocessing is unacceptable as > long as native threading is the only supported concurrency mechanism If threading is the only acceptable concurrency mechanism, then Python is the wrong language to use. But you're also not building scaleable systems, which is most of where it really matters. If you're willing to consider things other than threading - and you have to if you want to build scaleable systems - then Python makes a good choice. Personally, I'd like to see a modern threading model in Python, especially if it's tools can be extended to work with other concurrency mechanisms. But that's a *long* way into the future. As for "popular vs. good" - "good" is subjective measure. So the two statements "anything popular is good" and "nothing popular was ever good unless it had no competition" can both be true. Personally, I lean toward the latter. I tend to find things that are popular to not be very good, which makes me distrust the taste of the populace. The python core developers, on the other hand, have an excellent record when it comes to keeping the language good - and the failures tend to be concessions to popularity! So I'd rather the current system for adding features stay in place and *not* see the language add features just to gain popularity. We already have Perl if you want that kind of language. That said, it's perfectly reasonable to suggest changes you think will improve the popularity of the language. But be prepared to show that they're actually good, as opposed to merely possibly popular. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From sturla at molden.no Thu Feb 9 19:44:41 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 19:44:41 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> Message-ID: <4F341419.6030808@molden.no> On 09.02.2012 19:25, Massimo Di Pierro wrote: > As masklinn says, the difference between garbage collection and > reference counting is more than an implementation issue. Actually it is not. The GIL is a problem for those who want to use threading.Thread and plain Python code for parallel processing. Those who think in those terms have typically prior experience with Java or .NET. Processes are excellent for concurrency, cf. multiprocessing, os.fork and MPI. They actually are more efficient than threads (due to avoidance of false sharing cache lines) and safer (deadlock and livelocks are more difficult to produce). And I assume students who learn to use such tools from the start are not annoyed by the GIL. The GIL annoys those who have learned to expect threading.Thread for CPU bound concurrency in advance -- which typically means prior experience with Java. Python threads are fine for their intended use -- e.g. I/O and background tasks in a GUI. Sturla From guido at python.org Thu Feb 9 19:44:42 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 10:44:42 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: On Thu, Feb 9, 2012 at 10:37 AM, Masklinn wrote: > On 2012-02-09, at 19:26 , Guido van Rossum wrote: > > On Thu, Feb 9, 2012 at 10:14 AM, Masklinn wrote: > >> On 2012-02-09, at 19:03 , Steven D'Aprano wrote: > >>> The choice of which garbage collection implementation (ref counting is > >> garbage collection) is a quality of implementation detail, not a > language > >> feature. > >> > >> That's debatable, it's an implementation detail with very different > >> semantics which tends to leak out into usage patterns of the language > (as > >> it did with CPython, which basically did not get fixed in the community > >> until Pypy started ascending), > > > > I think it was actually Jython that first sensitized the community to > this > > issue. > > > The first one was Jython yes, of course, but I did not see the "movement" > gain much prominence before Pypy started looking like a serious CPython > alternative, before that there were a few voices lost in the desert. > I guess everyone has a different perspective. >> especially when the language does not provide "better" ways to handle >> things (as Python finally did by adding context managers in 2.5). >> >> So theoretically, automatic refcounting is a detail, but practically it >> influences language usage differently than most other GC techniques (when >> it'd the only GC strategy in the language anyway) > > Are there still Python idioms/patterns/recipes around that depend on > refcounting? There shouldn't be, but I'm not going to rule out reliance on automatic > resource cleanup just yet, I'm sure there are still significant pieces > of code using those in the wild. > I am guessing in part that's a function of resistance to change, and in part it means PyPy hasn't gotten enough mindshare yet. (Raise your hand if you have PyPy installed on one of your systems. Raise your hand if you use it. Raise your hand if you are a PyPy contributor. :-) Anyway, the refcounting objection seems the least important one. The more important trolls to fight are "static typing is always better" and "the GIL makes Python multicore-unfriendly". TBH, I see some movement in the static typing discussion, evidence that the static typing zealots are considering a hybrid approach (e.g. C# dynamic, and the optional static type checks in Dart). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Feb 9 19:46:55 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 19:46:55 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F3410DF.30602@molden.no> Message-ID: <4F34149F.5020909@molden.no> On 09.02.2012 19:36, Guido van Rossum wrote: > I don't know much of this area, but maybe this is something where a > dynamic installer (along the lines of easy_install) might actually be handy? That is what NumPy and SciPy does on Windows. But it also means the "superpack" installer is a very big download. Sturla From steve at pearwood.info Thu Feb 9 19:50:09 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 10 Feb 2012 05:50:09 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> References: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> Message-ID: <4F341561.3050409@pearwood.info> Massimo Di Pierro wrote: > I think if easy_install, gevent, numpy (*), and win32 extensions where > included in 3.x, together with a slightly better Idle (still based on > Tkinter, with multiple pages, autocompletion, collapsible, line numbers, > better printing with syntax highlitghing), and if easy_install were > accessible via Idle, this would be a killer version. IDLE does look a little long in the tooth. > Longer term removing the GIL and using garbage collection should be a > priority. I am not sure what is involved and how difficult it is but > perhaps this is what PyCon money can be used for. It isn't difficult to find out about previous attempts to remove the GIL. Googling for "python removing the gil" brings up plenty of links, including: http://www.artima.com/weblogs/viewpost.jsp?thread=214235 http://dabeaz.blogspot.com.au/2011/08/inside-look-at-gil-removal-patch-of.html Or just use Jython or IronPython, neither of which have a GIL. And since neither of them support Python 3 yet, you have no confusing choice of version to make. I'm not sure if IronPython is suitable for teaching, if you have to support Macs as well as Windows, but as a counter-argument against GIL trolls, there are two successful implementations of Python without the GIL. (And neither is as popular as CPython, which I guess says something about where people's priorities lie. If the GIL was as serious a problem in practice as people claim, there would be far more interest in Jython and IronPython.) > If this cannot be done > without breaking backward compatibility again, then 3.x should be > considered an experimental branch, people should be advised to stay with > 2.7 (2.8?) and then skip to 4.x directly when these problems are > resolved. Python should not make a habit of breaking backward > compatibility. Python 4.x (Python 4000) is pure vapourware. It it irresponsible to tell people to stick to Python 2.7 (there will be no 2.8) in favour of something which may never exist. http://www.python.org/dev/peps/pep-0404/ -- Steven From masklinn at masklinn.net Thu Feb 9 19:50:20 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 19:50:20 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> Message-ID: On 2012-02-09, at 19:34 , Guido van Rossum wrote: >> They all agree multi-core is the future and whichever language can deal >> with them better is the future too. >> > > Surely not JavaScript (which is single-threaded and AFAIK also uses > refcounting :-). I don't think I've seen a serious refcounted JS implementation in the last decade. , although it is possible that JS runtimes have localized usage of references and reference-counted resources. AFAIK all modern JS runtimes are JITed which probably does not mesh well with refcounting. In any case, V8 (Chrome's runtime) uses a stop-the-world generational GC for sure[0], Mozilla's SpiderMonkey uses a GC as well[1] although I'm not sure which type (the reference to JS_MarkGCThing indicates it could be or at least use a mark-and-sweep amongst its strategies), Webkit/Safari's JavaScriptCore uses a GC as well[2] and MSIE's JScript used a mark-and-sweep GC back in 2003[3] (although the DOM itself was in COM, and reference-counted). > And note that a > single-threaded event-driven process can serve 100,000 open sockets -- > while no JVM can create 100,000 threads. Only because it's OS threads of course, Erlang is not evented and has no problem spawning half a million (preempted) processes if there's RAM enough to store them. [0] http://code.google.com/apis/v8/design.html#garb_coll [1] https://developer.mozilla.org/en/SpiderMonkey/1.8.5#Garbage_collection [2] Since ~2009 http://www.masonchang.com/blog/2009/3/26/nitros-garbage-collector.html [3] http://blogs.msdn.com/b/ericlippert/archive/2003/09/17/53038.aspx From guido at python.org Thu Feb 9 19:54:27 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 10:54:27 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> Message-ID: On Thu, Feb 9, 2012 at 10:50 AM, Masklinn wrote: > On 2012-02-09, at 19:34 , Guido van Rossum wrote: > >> They all agree multi-core is the future and whichever language can deal > >> with them better is the future too. > >> > > > > Surely not JavaScript (which is single-threaded and AFAIK also uses > > refcounting :-). > > I don't think I've seen a serious refcounted JS implementation in the last > decade. , although it is possible that JS runtimes have localized usage > of references and reference-counted resources. AFAIK all modern JS > runtimes are JITed which probably does not mesh well with refcounting. > > In any case, V8 (Chrome's runtime) uses a stop-the-world generational > GC for sure[0], Mozilla's SpiderMonkey uses a GC as well[1] although > I'm not sure which type (the reference to JS_MarkGCThing indicates it > could be or at least use a mark-and-sweep amongst its strategies), > Webkit/Safari's JavaScriptCore uses a GC as well[2] and MSIE's JScript > used a mark-and-sweep GC back in 2003[3] (although the DOM itself was > in COM, and reference-counted). > I stand corrected (but I am right about the single-threadedness :-). > And note that a > single-threaded event-driven process can serve 100,000 open sockets -- > while no JVM can create 100,000 threads. Only because it's OS threads of course, Erlang is not evented and has no > problem spawning half a million (preempted) processes if there's RAM > enough to store them. > Sure. But the people complaining about the GIL come from Java, not from Erlang. (Erlang users typically envy Python because of its superior standard library. :-) > > [0] http://code.google.com/apis/v8/design.html#garb_coll > [1] https://developer.mozilla.org/en/SpiderMonkey/1.8.5#Garbage_collection > [2] Since ~2009 > http://www.masonchang.com/blog/2009/3/26/nitros-garbage-collector.html > [3] http://blogs.msdn.com/b/ericlippert/archive/2003/09/17/53038.aspx -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Thu Feb 9 19:53:47 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 19:53:47 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: <974D4AD7-6F17-4CEE-BDAD-C0D5B83D0F34@masklinn.net> On 2012-02-09, at 19:44 , Guido van Rossum wrote: > TBH, I see some movement in the static typing discussion, evidence that the > static typing zealots are considering a hybrid approach (e.g. C# dynamic, > and the optional static type checks in Dart). These seem to be efforts of people trying for both sides (for various reasons) more than people firmly rooted in one camp or another. Dart was widely panned for its wonky approach to "static typing", which is generally considered a joke amongst people looking for actual static type (in that they're about as useful as Python 3's type annotations). From sturla at molden.no Thu Feb 9 19:57:20 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 19:57:20 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120209104237.154be949@bhuda.mired.org> References: <20120209104237.154be949@bhuda.mired.org> Message-ID: <4F341710.9030806@molden.no> On 09.02.2012 19:42, Mike Meyer wrote: > If threading is the only acceptable concurrency mechanism, then Python > is the wrong language to use. But you're also not building scaleable > systems, which is most of where it really matters. If you're willing > to consider things other than threading - and you have to if you want > to build scaleable systems - then Python makes a good choice. Yes or no... Python is used for parallel computing on the biggest supercomputers, monsters like Cray and IBM blue genes with tens of thousands of CPUs. But what really fails to scale is the Python module loader! For example it can take hours to "import numpy" for 30,000 Python processes on a blue gene. And yes, nobody would consider to use Java for such systems, even though Java does not have a GIL (well, theads do no matter that much on a cluster with distributed memory anyway). It is Python, C and Fortran that are popular. But that really disproves that Python sucks for big concurrency, except perhaps for the module loader. Sturla From amcnabb at mcnabbs.org Thu Feb 9 19:58:10 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Thu, 9 Feb 2012 11:58:10 -0700 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: <20120209185810.GC20556@mcnabbs.org> On Thu, Feb 09, 2012 at 10:44:42AM -0800, Guido van Rossum wrote: > I am guessing in part that's a function of resistance to change, and in > part it means PyPy hasn't gotten enough mindshare yet. (Raise your hand if > you have PyPy installed on one of your systems. Raise your hand if you use > it. Raise your hand if you are a PyPy contributor. :-) I don't know if you actually want replies, but I'll bite. I have pypy installed (from the standard Fedora pypy package), and for a particular project it provided a 20x speedup. I'm not a PyPy contributor, but I'm a believer. I would use PyPy everywhere if it worked with Python 3 and scipy. My apologies if this was just a rhetorical question. :) -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From masklinn at masklinn.net Thu Feb 9 20:03:28 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 20:03:28 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> Message-ID: <701473A6-1EF8-405D-80AB-6546774E03FE@masklinn.net> On 2012-02-09, at 19:54 , Guido van Rossum wrote: > > I stand corrected (but I am right about the single-threadedness :-). Absolutely (until WebWorkers anyway) >> And note that a >> single-threaded event-driven process can serve 100,000 open sockets -- >> while no JVM can create 100,000 threads. > > Only because it's OS threads of course, Erlang is not evented and has no >> problem spawning half a million (preempted) processes if there's RAM >> enough to store them. >> > > Sure. But the people complaining about the GIL come from Java, not from > Erlang. (Erlang users typically envy Python because of its superior > standard library. :-) True. Then they remember how good Python is with concurrency, distribution and distributed resilience :D (don't forget syntax, one of Erlang's biggest failures) (although it pleased cfbolz since he could get syntax coloration for his prolog) From guido at python.org Thu Feb 9 20:03:26 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 11:03:26 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> References: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> Message-ID: On Thu, Feb 9, 2012 at 9:46 AM, Massimo Di Pierro < massimo.dipierro at gmail.com> wrote: > I think if easy_install, gevent, numpy (*), and win32 extensions where > included in 3.x, together with a slightly better Idle (still based on > Tkinter, with multiple pages, autocompletion, collapsible, line numbers, > better printing with syntax highlitghing), and if easy_install were > accessible via Idle, this would be a killer version. > IIRC gevent still needs to be ported to 3.x (maybe someone with the necessary skills should apply to the PSF for funding). But the rest sounds like the domain of a superinstaller, not inclusion in the stdlib. IDLE will never be able to compete with Eclipse -- you can love one or the other bot not both. Longer term removing the GIL and using garbage collection should be a > priority. I am not sure what is involved and how difficult it is but > perhaps this is what PyCon money can be used for. I think the best way to accomplish both is to focus on PyPy. It needs porting to 3.x; Google has already given them some money towards this goal. > If this cannot be done without breaking backward compatibility again, then > 3.x should be considered an experimental branch, people should be advised > to stay with 2.7 (2.8?) and then skip to 4.x directly when these problems > are resolved. That's really bad advice. 4.x will not be here for another decade. > Python should not make a habit of breaking backward compatibility. > Agreed. 4.x should be fully backwards compatible -- with 3.x, not with 2.x. It would be really nice if it were to include an async web server (based on > gevent for example) and better parser for HTTP headers and a python based > template language (like mako or the web2py one) not just for the web but > for document generation in general. > Again, that's a bundling issue. With the infrequency of Python releases, anything still under development is much better off being distributed separately. Bundling into core Python requires a package to be essentially stable, i.e., dead. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Feb 9 20:05:15 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 11:05:15 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F341710.9030806@molden.no> References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> Message-ID: On Thu, Feb 9, 2012 at 10:57 AM, Sturla Molden wrote: > On 09.02.2012 19:42, Mike Meyer wrote: > > If threading is the only acceptable concurrency mechanism, then Python >> is the wrong language to use. But you're also not building scaleable >> systems, which is most of where it really matters. If you're willing >> to consider things other than threading - and you have to if you want >> to build scaleable systems - then Python makes a good choice. >> > > Yes or no... Python is used for parallel computing on the biggest > supercomputers, monsters like Cray and IBM blue genes with tens of > thousands of CPUs. But what really fails to scale is the Python module > loader! For example it can take hours to "import numpy" for 30,000 Python > processes on a blue gene. And yes, nobody would consider to use Java for > such systems, even though Java does not have a GIL (well, theads do no > matter that much on a cluster with distributed memory anyway). It is > Python, C and Fortran that are popular. But that really disproves that > Python sucks for big concurrency, except perhaps for the module loader. > I'm curious about the module loader problem. Did someone ever analyze the cause and come up with a fix? Is it the import lock? Maybe it's something for the bug tracker. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Feb 9 20:06:35 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 11:06:35 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120209185810.GC20556@mcnabbs.org> References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> Message-ID: On Thu, Feb 9, 2012 at 10:58 AM, Andrew McNabb wrote: > On Thu, Feb 09, 2012 at 10:44:42AM -0800, Guido van Rossum wrote: > > I am guessing in part that's a function of resistance to change, and in > > part it means PyPy hasn't gotten enough mindshare yet. (Raise your hand > if > > you have PyPy installed on one of your systems. Raise your hand if you > use > > it. Raise your hand if you are a PyPy contributor. :-) > > I don't know if you actually want replies, but I'll bite. I have pypy > installed (from the standard Fedora pypy package), and for a particular > project it provided a 20x speedup. I'm not a PyPy contributor, but I'm > a believer. > > I would use PyPy everywhere if it worked with Python 3 and scipy. My > apologies if this was just a rhetorical question. :) Thanks for replying, it was not a rhetorical question. It's something I'm considering asking during my keynote at PyCon next month. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Feb 9 20:08:36 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 20:08:36 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> Message-ID: <4F3419B4.6010802@molden.no> On 09.02.2012 19:50, Masklinn wrote: > I don't think I've seen a serious refcounted JS implementation in the last > decade. , although it is possible that JS runtimes have localized usage > of references and reference-counted resources. AFAIK all modern JS > runtimes are JITed which probably does not mesh well with refcounting. > > In any case, V8 (Chrome's runtime) uses a stop-the-world generational > GC for sure[0], And Chrome uses one *process* for each tab, right? Is there a reason Chrome does not use one thread for each tab, such as security? > Only because it's OS threads of course, Erlang is not evented and has no > problem spawning half a million (preempted) processes if there's RAM > enough to store them. Actually, spawning half a million OS threads will burn the computer. *POFF* ... and it goes up in a ball of smoke. Spawning half a million threads is the Windows equivalent of a fork bomb. I think you confuse threads and fibers/coroutines. Sturla From ericsnowcurrently at gmail.com Thu Feb 9 20:12:46 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 9 Feb 2012 12:12:46 -0700 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> Message-ID: On Thu, Feb 9, 2012 at 12:05 PM, Guido van Rossum wrote: > On Thu, Feb 9, 2012 at 10:57 AM, Sturla Molden wrote: >> Yes or no... Python is used for parallel computing on the biggest >> supercomputers, monsters like Cray and IBM blue genes with tens of thousands >> of CPUs. But what really fails to scale is the Python module loader! For >> example it can take hours to "import numpy" for 30,000 Python processes on a >> blue gene. And yes, nobody would consider to use Java for such systems, even >> though Java does not have a GIL (well, theads do no matter that much on a >> cluster with distributed memory anyway). It is Python, C and Fortran that >> are popular. But that really disproves that Python sucks for big >> concurrency, except perhaps for the module loader. > > > I'm curious about the module loader problem. Did someone ever analyze the > cause and come up with a fix? Is it the import lock? Maybe it's something > for the bug tracker. +1 -eric From anacrolix at gmail.com Thu Feb 9 20:16:00 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 10 Feb 2012 03:16:00 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120209104237.154be949@bhuda.mired.org> References: <20120209104237.154be949@bhuda.mired.org> Message-ID: > If threading is the only acceptable concurrency mechanism, then Python > is the wrong language to use. But you're also not building scaleable > systems, which is most of where it really matters. If you're willing > to consider things other than threading - and you have to if you want > to build scaleable systems - then Python makes a good choice. Yes but core Python doesn't have any other true concurrency mechanisms other than native threading, and they're too heavyweight for this purpose alone. On top of this they're useless for Python-only parallelism. > Personally, I'd like to see a modern threading model in Python, > especially if it's tools can be extended to work with other > concurrency mechanisms. But that's a *long* way into the future. Too far. It needs to be now. The downward spiral is already beginning. Mobile phones are going multicore. My next desktop will probably have 8 cores or more. All the heavyweight languages are firing up thread/STM standardizations and implementations to make this stuff more performant and easier than it already is. > As for "popular vs. good" - "good" is subjective measure. So the two > statements "anything popular is good" and "nothing popular was ever > good unless it had no competition" can both be true. > > Personally, I lean toward the latter. I tend to find things that are > popular to not be very good, which makes me distrust the taste of the > populace. The python core developers, on the other hand, have an > excellent record when it comes to keeping the language good - and the > failures tend to be concessions to popularity! So I'd rather the > current system for adding features stay in place and *not* see the > language add features just to gain popularity. We already have Perl if > you want that kind of language. > > That said, it's perfectly reasonable to suggest changes you think will > improve the popularity of the language. But be prepared to show that > they're actually good, as opposed to merely possibly popular. This doesn't apply to "enabling" features. Features that make it possible for popular stuff to happen. Concurrency isn't popular, but parallelism is. At least where the GIL is concerned, an good alternative concurrency mechanism doesn't exist. (The popular one is native threading). From g.rodola at gmail.com Thu Feb 9 20:16:00 2012 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Thu, 9 Feb 2012 20:16:00 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: Il 09 febbraio 2012 18:35, Matt Joiner ha scritto: > From my own observations, the recent drop is sure to uncertainty with Python > 3, and an increase of alternatives on server side, such as Node. > > The transition is only going to get more painful as system critical software > lags on 2.x while users clamour for 3.x. I think it's not only a matter of 3th party modules not being ported quickly enough or the amount of work involved when facing the 2->3 conversion. I bet a lot of people don't want to upgrade for another reason: unicode. The impression I got is that python 3 forces the user to use and *understand* unicode and a lot of people simply don't want to deal with that. In python 2 there was no such a strong imposition. Python 2 string type acting both as bytes and as text was certainly ambiguos and "impure" on different levels and changing that was definitively a win in terms of purity and correctness. I bet most advanced users are happy with this change. On the other hand, Python 2 average user was free to ignore that distinction even if that meant having subtle bugs hidden somewhere in his/her code. I think this aspect shouldn't be underestimated. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ From mwm at mired.org Thu Feb 9 20:18:10 2012 From: mwm at mired.org (Mike Meyer) Date: Thu, 9 Feb 2012 11:18:10 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F341710.9030806@molden.no> References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> Message-ID: <20120209111810.58e0cf42@bhuda.mired.org> On Thu, 09 Feb 2012 19:57:20 +0100 Sturla Molden wrote: > On 09.02.2012 19:42, Mike Meyer wrote: > > If threading is the only acceptable concurrency mechanism, then Python > > is the wrong language to use. But you're also not building scaleable > > systems, which is most of where it really matters. If you're willing > > to consider things other than threading - and you have to if you want > > to build scaleable systems - then Python makes a good choice. > Yes or no... Python is used for parallel computing on the biggest > supercomputers, monsters like Cray and IBM blue genes with tens of > thousands of CPUs. But what really fails to scale is the Python module > loader! For example it can take hours to "import numpy" for 30,000 > Python processes on a blue gene. Whether or not hours of time to import is an issue depends on what you're doing. I typically build systems running on hundreds of CPUs for weeks on end, meaning you get years of CPU time per run. So if it took a few hours of CPU time to get started, it wouldn't be much of a problem. If it took a few hours of wall clock time - well, that would be more of a problem, mostly because that long of an outage would be unacceptable. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From anacrolix at gmail.com Thu Feb 9 20:19:36 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 10 Feb 2012 03:19:36 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F341419.6030808@molden.no> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> Message-ID: > The GIL annoys those who have learned to expect threading.Thread for CPU > bound concurrency in advance -- which typically means prior experience with > Java. Python threads are fine for their intended use -- e.g. I/O and > background tasks in a GUI. Even for that purpose they're too heavy. The GIL conflicts, and boilerplate overhead spawning threads is obscene for more than trivial cases. From sturla at molden.no Thu Feb 9 20:23:14 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 20:23:14 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> Message-ID: <4F341D22.4020706@molden.no> On 09.02.2012 20:05, Guido van Rossum wrote: > I'm curious about the module loader problem. Did someone ever analyze > the cause and come up with a fix? Is it the import lock? Maybe it's > something for the bug tracker. See this: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html The offender is actually imp.find_module, which results in huge number of failed open() calls when used concurrently from many processes. So a solution is to have one process locate the modules and then broadcast their location to the other processes. There is even a paper on the issue. Here they suggest importing from ramdisk might work on IBM blue gene, but not on Cray. http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf Another solution might be to use sys.meta_path to bypass imp.find_module: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059813.html The best solution would of course be to fix imp.find_module so it scales properly. Sturla From phd at phdru.name Thu Feb 9 20:23:44 2012 From: phd at phdru.name (Oleg Broytman) Date: Thu, 9 Feb 2012 23:23:44 +0400 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3419B4.6010802@molden.no> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F3419B4.6010802@molden.no> Message-ID: <20120209192344.GA22166@iskra.aviel.ru> On Thu, Feb 09, 2012 at 08:08:36PM +0100, Sturla Molden wrote: > And Chrome uses one *process* for each tab, right? Is there a reason > Chrome does not use one thread for each tab, such as security? Safety, I dare say. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From pyideas at rebertia.com Thu Feb 9 20:23:56 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Thu, 9 Feb 2012 11:23:56 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3419B4.6010802@molden.no> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F3419B4.6010802@molden.no> Message-ID: On Thu, Feb 9, 2012 at 11:08 AM, Sturla Molden wrote: > On 09.02.2012 19:50, Masklinn wrote: > >> I don't think I've seen a serious refcounted JS implementation in the last >> decade. , although it is possible that JS runtimes have localized usage >> of references and reference-counted resources. AFAIK all modern JS >> runtimes are JITed which probably does not mesh well with refcounting. >> >> In any case, V8 (Chrome's runtime) uses a stop-the-world generational >> GC for sure[0], > > And Chrome uses one *process* for each tab, right? Is there a reason Chrome > does not use one thread for each tab, such as security? Stability and security. If something goes wrong/rogue, the effects are reasonably isolated to the individual tab in question. And they can use OS resource/privilege limiting APIs to lock down these processes as much as possible. Cheers, Chris From ericsnowcurrently at gmail.com Thu Feb 9 20:25:45 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 9 Feb 2012 12:25:45 -0700 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 12:16 PM, Giampaolo Rodol? wrote: > I bet a lot of people don't want to upgrade for another reason: unicode. > The impression I got is that python 3 forces the user to use and > *understand* unicode and a lot of people simply don't want to deal > with that. > In python 2 there was no such a strong imposition. > Python 2 string type acting both as bytes and as text was certainly > ambiguos and "impure" on different levels and changing that was > definitively a win in terms of purity and correctness. > I bet most advanced users are happy with this change. > On the other hand, Python 2 average user was free to ignore that > distinction even if that meant having subtle bugs hidden somewhere in > his/her code. > I think this aspect shouldn't be underestimated. Isn't that more accurate for framework writers, rather than for "average" users? How often do average users have to address encoding/decoding in Python 3? -eric From sturla at molden.no Thu Feb 9 20:25:48 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 20:25:48 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> Message-ID: <4F341DBC.5010609@molden.no> On 09.02.2012 20:05, Guido van Rossum wrote: > I'm curious about the module loader problem. Did someone ever analyze > the cause and come up with a fix? Is it the import lock? Maybe it's > something for the bug tracker. See this: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html The offender is actually imp.find_module, which results in huge number of failed open() calls when used concurrently from many processes. So a solution is to have one process locate the modules and then broadcast their location to the other processes. There is even a paper on the issue. Here they suggest importing from ramdisk might work on IBM blue gene, but not on Cray. http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf Another solution might be to use sys.meta_path to bypass imp.find_module: http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059813.html The best solution would of course be to fix imp.find_module so it scales properly. Sturla From masklinn at masklinn.net Thu Feb 9 20:27:22 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 20:27:22 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3419B4.6010802@molden.no> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F3419B4.6010802@molden.no> Message-ID: On 2012-02-09, at 20:08 , Sturla Molden wrote: > On 09.02.2012 19:50, Masklinn wrote: >> I don't think I've seen a serious refcounted JS implementation in the last >> decade. , although it is possible that JS runtimes have localized usage >> of references and reference-counted resources. AFAIK all modern JS >> runtimes are JITed which probably does not mesh well with refcounting. >> >> In any case, V8 (Chrome's runtime) uses a stop-the-world generational >> GC for sure[0], > > And Chrome uses one *process* for each tab, right? Is there a reason Chrome does not use one thread for each tab, such as security? I do not know the precise reasons no, but it probably has to do with security and ensuring isolation yes (webpage semantics mandate that each page gets its very own isolated javascript execution context) >> Only because it's OS threads of course, Erlang is not evented and has no >> problem spawning half a million (preempted) processes if there's RAM >> enough to store them. > > Actually, spawning half a million OS threads will burn the computer. > > *POFF* > > ... and it goes up in a ball of smoke. > > Spawning half a million threads is the Windows equivalent of a fork bomb. > > I think you confuse threads and fibers/coroutines. No. You probably misread my comment somehow. From sturla at molden.no Thu Feb 9 20:30:09 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 20:30:09 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> Message-ID: <4F341EC1.6060004@molden.no> On 09.02.2012 20:19, Matt Joiner wrote: >> The GIL annoys those who have learned to expect threading.Thread for CPU >> bound concurrency in advance -- which typically means prior experience with >> Java. Python threads are fine for their intended use -- e.g. I/O and >> background tasks in a GUI. > > Even for that purpose they're too heavy. The GIL conflicts, and > boilerplate overhead spawning threads is obscene for more than trivial > cases. In which case you want to use I/O completion ports on Windows. (And they scale equally well from Python.) Sturla From anacrolix at gmail.com Thu Feb 9 20:31:58 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 10 Feb 2012 03:31:58 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: > Isn't that more accurate for framework writers, rather than for > "average" users? ?How often do average users have to address > encoding/decoding in Python 3? Constantly. As a Python noob I tried Python 3 it was the first wall I encountered. I had to learn Unicode right then and there. Fortunately, the Python docs HOWTO on Unicode is excellent. From stefan_ml at behnel.de Thu Feb 9 20:32:03 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 09 Feb 2012 20:32:03 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F34149F.5020909@molden.no> References: <4F3410DF.30602@molden.no> <4F34149F.5020909@molden.no> Message-ID: Sturla Molden, 09.02.2012 19:46: > On 09.02.2012 19:36, Guido van Rossum wrote: > >> I don't know much of this area, but maybe this is something where a >> dynamic installer (along the lines of easy_install) might actually be handy? > > That is what NumPy and SciPy does on Windows. But it also means the > "superpack" installer is a very big download. I think this is an area where distributors can best play their role. If you want Python to include SciPy, go and ask Enthought. If you also want an operating system with it, go and ask Debian or Canonical. Or macports, if you prefer paying for your apples instead. Stefan From pydanny at gmail.com Thu Feb 9 20:34:57 2012 From: pydanny at gmail.com (Daniel Greenfeld) Date: Thu, 9 Feb 2012 11:34:57 -0800 Subject: [Python-ideas] Python-ideas Digest, Vol 63, Issue 23 In-Reply-To: References: Message-ID: > ? 1. Re: Python 3000 TIOBE -3% (Massimo Di Pierro) > ? 2. Re: Python 3000 TIOBE -3% (Guido van Rossum) > ? 3. Re: Python 3000 TIOBE -3% (Sturla Molden) > ? 4. Re: Python 3000 TIOBE -3% (Guido van Rossum) > Date: Thu, 9 Feb 2012 12:25:18 -0600 > From: Massimo Di Pierro > To: Steven D'Aprano > Cc: python-ideas >> Massimo Di Pierro wrote: >>> Here is another data point: >>> http://redmonk.com/sogrady/2012/02/08/language-rankings-2-2012/ >>> Unfortunately the TIOBE index does matter. I can speak for python >>> in education and trends I seen. >>> Python is and remains the easiest language to teach but it is no >>> longer true that getting Python to run is easer than alternatives >>> (not for the average undergrad student). >> >> Is that a commentary on Python, or the average undergrad student? > > I teach so the average student is my benchmark. Please do not > misunderstand. While some may be lazy, but the average CS undergrad is > not stupid but quite intelligent. They just do not like wasting time > with setups and I sympathize with that. Batteries included is the > Python motto. I'm going to delurk from this list and really back up Massimo here. It's not precisely his issue, but it's close enough to count. While we love our Linux and BSD variants, and OS X usage is growing, the truth of the matter is that the clear majority of people learning Python at the entry level do so on Windows. And I can assure you having attended many of the tutorials given by PyLadies and other groups, the part that took the most amount of time was ensuring a correct installation on Windows. It's not just a matter of getting the installation onto the machine, it's a matter of making sure the paths are set correctly so they can follow code examples trivially. In fact, at PyLadies tutorial events they would give literally special party hats to teachers who could get Python running under ideal conditions under Windows. And still it ate a lot of time and caused frustration. Frustration that gets shared with management and other people. I'm well aware that this matter of installation has been 'addressed'. There is a complex PEP to handle different version installs on Windows. I can go and click on a small link on the home page of python.org and download one-click installer that DOESN'T set up Windows paths. I can follow instructions 'somewhere' and get the paths set up, but I shouldn't have too. Students should be able to one-click Python and have it just work. Yet, for all the times I've been told it's fixed or we've complained about it and been told "It's getting fixed!", it is still an ongoing problem. If I were a Windows developer I would fix it today. So perhaps this should become a GSOC project of high priority: One click install of Python on Windows on a version hosted by python.org. Note: People suggest virtual machines or Vagrant. This works on new machines, but you try getting any of that working on an old Windows machine in a room of 40-100 students waiting on installation. Providing laptops for the event are also completely out of budget for most of these events. In order to make this issue as clear as possible, I'm going to quote Audrey Roy: "The number one thing that Python educators struggle with on entry level tutorials is Windows installations of Python. Ask me, ask Zed Shaw, ask any of the PyLadies." -- 'Knowledge is Power' Daniel Greenfeld http://pydanny.blogspot.com From ctb at msu.edu Thu Feb 9 20:36:35 2012 From: ctb at msu.edu (C. Titus Brown) Date: Thu, 9 Feb 2012 11:36:35 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> Message-ID: <20120209193635.GE9836@idyll.org> On Fri, Feb 10, 2012 at 03:19:36AM +0800, Matt Joiner wrote: > > The GIL annoys those who have learned to expect threading.Thread for CPU > > bound concurrency in advance -- which typically means prior experience with > > Java. Python threads are fine for their intended use -- e.g. I/O and > > background tasks in a GUI. > > Even for that purpose they're too heavy. The GIL conflicts, and > boilerplate overhead spawning threads is obscene for more than trivial > cases. The GIL is almost entirely a PR issue. In actual practice, it is so great (simple, straightforward, functional) I believe that it is a sign of Guido's time machine-enabled foresight. --titus -- C. Titus Brown, ctb at msu.edu From ctb at msu.edu Thu Feb 9 20:37:43 2012 From: ctb at msu.edu (C. Titus Brown) Date: Thu, 9 Feb 2012 11:37:43 -0800 Subject: [Python-ideas] Python-ideas Digest, Vol 63, Issue 23 In-Reply-To: References: Message-ID: <20120209193743.GB28383@idyll.org> On Thu, Feb 09, 2012 at 11:34:57AM -0800, Daniel Greenfeld wrote: > > ? 1. Re: Python 3000 TIOBE -3% (Massimo Di Pierro) > > ? 2. Re: Python 3000 TIOBE -3% (Guido van Rossum) > > ? 3. Re: Python 3000 TIOBE -3% (Sturla Molden) > > ? 4. Re: Python 3000 TIOBE -3% (Guido van Rossum) > > Date: Thu, 9 Feb 2012 12:25:18 -0600 > > From: Massimo Di Pierro > > To: Steven D'Aprano > > Cc: python-ideas > > >> Massimo Di Pierro wrote: > >>> Here is another data point: > >>> http://redmonk.com/sogrady/2012/02/08/language-rankings-2-2012/ > >>> Unfortunately the TIOBE index does matter. I can speak for python > >>> in education and trends I seen. > >>> Python is and remains the easiest language to teach but it is no > >>> longer true that getting Python to run is easer than alternatives > >>> (not for the average undergrad student). > >> > >> Is that a commentary on Python, or the average undergrad student? > > > > I teach so the average student is my benchmark. Please do not > > misunderstand. While some may be lazy, but the average CS undergrad is > > not stupid but quite intelligent. They just do not like wasting time > > with setups and I sympathize with that. Batteries included is the > > Python motto. > > I'm going to delurk from this list and really back up Massimo here. > It's not precisely his issue, but it's close enough to count. > > While we love our Linux and BSD variants, and OS X usage is growing, > the truth of the matter is that the clear majority of people learning > Python at the entry level do so on Windows. And I can assure you > having attended many of the tutorials given by PyLadies and other > groups, the part that took the most amount of time was ensuring a > correct installation on Windows. It's not just a matter of getting the > installation onto the machine, it's a matter of making sure the paths > are set correctly so they can follow code examples trivially. +inf. --titus From jimjjewett at gmail.com Thu Feb 9 20:39:10 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 9 Feb 2012 14:39:10 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3419B4.6010802@molden.no> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F3419B4.6010802@molden.no> Message-ID: On Thu, Feb 9, 2012 at 2:08 PM, Sturla Molden wrote: > And Chrome uses one *process* for each tab, right? Supposedly. If you click the wrench, then select Tools/Task Manager, it looks like there are actually several tabs/process (at least if you have enough tabs), but there can easily be several processes controlling separate tabs within the same window. > Is there a reason Chrome > does not use one thread for each tab, such as security? That too, but the reason they documented when introducing Chrome was for stability. I can say that Chrome often warns me that a selection of tabs[1] appears to be stopped, and asks if I want to kill them; it more often appears to freeze -- but switching to a different tab is usually effective in getting some response, while I wait the issue out. [1] Not sure if the selection is exactly equal to those handled by a single process, but it seems so. -jJ From guido at python.org Thu Feb 9 20:39:58 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 11:39:58 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 11:31 AM, Matt Joiner wrote: > > Isn't that more accurate for framework writers, rather than for > > "average" users? How often do average users have to address > > encoding/decoding in Python 3? > > Constantly. As a Python noob I tried Python 3 it was the first wall I > encountered. I had to learn Unicode right then and there. Fortunately, > the Python docs HOWTO on Unicode is excellent. > The difference is that *if* you hit a Unicode error in 2.x, you're done for. Even understanding Unicode doesn't help. In 3.x, you will hit Unicode problems less frequently than in 2.x, and when you do, the problem can actually be overcome, and then your code is better. In 2.x, the typical solution, when there *is* a solution, involves making your code messier and sending up frequent prayers to the gods of Unicode. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Feb 9 20:40:40 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 11:40:40 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F341D22.4020706@molden.no> References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> <4F341D22.4020706@molden.no> Message-ID: Please do file an upstream bug for this. On Thu, Feb 9, 2012 at 11:23 AM, Sturla Molden wrote: > On 09.02.2012 20:05, Guido van Rossum wrote: > > I'm curious about the module loader problem. Did someone ever analyze >> the cause and come up with a fix? Is it the import lock? Maybe it's >> something for the bug tracker. >> > > See this: > > http://mail.scipy.org/**pipermail/numpy-discussion/** > 2012-January/059801.html > > The offender is actually imp.find_module, which results in huge number of > failed open() calls when used concurrently from many processes. > > So a solution is to have one process locate the modules and then broadcast > their location to the other processes. > > There is even a paper on the issue. Here they suggest importing from > ramdisk might work on IBM blue gene, but not on Cray. > > http://www.cs.uoregon.edu/**Research/paracomp/papers/** > iccs11/iccs_paper_final.pdf > > Another solution might be to use sys.meta_path to bypass imp.find_module: > > http://mail.scipy.org/**pipermail/numpy-discussion/** > 2012-January/059813.html > > The best solution would of course be to fix imp.find_module so it scales > properly. > > > Sturla > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Thu Feb 9 20:42:40 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 9 Feb 2012 20:42:40 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120209193635.GE9836@idyll.org> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> <20120209193635.GE9836@idyll.org> Message-ID: On 2012-02-09, at 20:36 , C. Titus Brown wrote: > On Fri, Feb 10, 2012 at 03:19:36AM +0800, Matt Joiner wrote: >>> The GIL annoys those who have learned to expect threading.Thread for CPU >>> bound concurrency in advance -- which typically means prior experience with >>> Java. Python threads are fine for their intended use -- e.g. I/O and >>> background tasks in a GUI. >> >> Even for that purpose they're too heavy. The GIL conflicts, and >> boilerplate overhead spawning threads is obscene for more than trivial >> cases. > > The GIL is almost entirely a PR issue. In actual practice, it is so great > (simple, straightforward, functional) I believe that it is a sign of Guido's > time machine-enabled foresight. I'm not sure dabeaz would agree with you if he intervened in the discussion. From guido at python.org Thu Feb 9 20:42:57 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 11:42:57 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> Message-ID: On Thu, Feb 9, 2012 at 11:19 AM, Matt Joiner wrote: > > The GIL annoys those who have learned to expect threading.Thread for CPU > > bound concurrency in advance -- which typically means prior experience > with > > Java. Python threads are fine for their intended use -- e.g. I/O and > > background tasks in a GUI. > > Even for that purpose they're too heavy. The GIL conflicts, and > boilerplate overhead spawning threads is obscene for more than trivial > cases. I'd actually say that using OS threads is too heavy *specifically* for trivial cases. If you spawn a thread to add two numbers you'll have a huge overhead. If you spawn a thread to do something significant, the overhead doesn't matter much. Note that even in Java, everyone uses thread pools to reduce thread creation overhead. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwm at mired.org Thu Feb 9 20:43:27 2012 From: mwm at mired.org (Mike Meyer) Date: Thu, 9 Feb 2012 11:43:27 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <20120209104237.154be949@bhuda.mired.org> Message-ID: <20120209114327.0d262bd1@bhuda.mired.org> On Fri, 10 Feb 2012 03:16:00 +0800 Matt Joiner wrote: > > If threading is the only acceptable concurrency mechanism, then Python > > is the wrong language to use. But you're also not building scaleable > > systems, which is most of where it really matters. If you're willing > > to consider things other than threading - and you have to if you want > > to build scaleable systems - then Python makes a good choice. > Yes but core Python doesn't have any other true concurrency mechanisms > other than native threading, and they're too heavyweight for this > purpose alone. On top of this they're useless for Python-only > parallelism. Huh? Core python has other concurrency mechanisms other than native threading. I don't know what your purpose is, but for mine (building horizontally scaleable systems of various types), they work fine. They're much easier to design with and maintain than using threads as well. They also work well in Python-only systems. If you're using "true" to exclude anything but threading, then you're just playing word games. The reality is that most problems don't need threading. The only thing it buys you over the alternatives is easy shared memory. Very few problems actually require that. > > Personally, I'd like to see a modern threading model in Python, > > especially if it's tools can be extended to work with other > > concurrency mechanisms. But that's a *long* way into the future. > Too far. It needs to be now. The downward spiral is already beginning. > Mobile phones are going multicore. My next desktop will probably have > 8 cores or more. All the heavyweight languages are firing up > thread/STM standardizations and implementations to make this stuff > more performant and easier than it already is. Yes, Python needs something like that. You can't have it without breaking backwards compatibility. It's not clear you can have it without serious performance hits in Python's primary use area, which is single-threaded scripts. Which means it's probably a Python 4K feature. There have been a number of discussions on python-ideas about this. I submitted a proto-pep that covers most of that to python-dev for further discussion and approval. I'd suggest you chase those things down. > > That said, it's perfectly reasonable to suggest changes you think will > > improve the popularity of the language. But be prepared to show that > > they're actually good, as opposed to merely possibly popular. > This doesn't apply to "enabling" features. Features that make it > possible for popular stuff to happen. Concurrency isn't popular, but > parallelism is. At least where the GIL is concerned, an good > alternative concurrency mechanism doesn't exist. (The popular one is > native threading). No, the process needs to apply to *all* changes. Even changes to implementation details - like removing the GIL. If your implementation that removes the GIL causes a 50% slowdown in single-threaded python code, it ain't gonna happen. But until you actually propose a change, it won't matter. Nothing's going to happen until someone actually does something more than talk about it. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From guido at python.org Thu Feb 9 20:43:41 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 11:43:41 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F3419B4.6010802@molden.no> Message-ID: On Thu, Feb 9, 2012 at 11:39 AM, Jim Jewett wrote: > On Thu, Feb 9, 2012 at 2:08 PM, Sturla Molden wrote: > > > And Chrome uses one *process* for each tab, right? > Can we stop discussing Chrome here? It doesn't really matter. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Feb 9 20:44:23 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 20:44:23 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> <4F341EC1.6060004@molden.no> Message-ID: <4F342217.4080109@molden.no> On 09.02.2012 20:34, Matt Joiner wrote: > Linux user here. I'm not sure that IOCP solve the I/O concurrency > issue anyway, it's just as convoluted as polling from memory. On Linux processes are so light-weight that you can fork (os.fork) instead of spawning threads. Threads are typically needed for Java, Solaris and Windows, where forking is either slow or not possible. But if you need really scalable I/O on Linux, consider select/poll or epoll. And on FreeBSD and Mac there is kqueue. Sturla From mwm at mired.org Thu Feb 9 20:48:08 2012 From: mwm at mired.org (Mike Meyer) Date: Thu, 9 Feb 2012 11:48:08 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F341DBC.5010609@molden.no> References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> <4F341DBC.5010609@molden.no> Message-ID: <20120209114808.07220f5a@bhuda.mired.org> On Thu, 09 Feb 2012 20:25:48 +0100 Sturla Molden wrote: > The offender is actually imp.find_module, which results in huge number > of failed open() calls when used concurrently from many processes. Ah, I see why I never ran into it. I build systems that start by loading all the modules they need, then fork()ing many processes from that parent. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From stefan_ml at behnel.de Thu Feb 9 20:53:55 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 09 Feb 2012 20:53:55 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120209185810.GC20556@mcnabbs.org> References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> Message-ID: Andrew McNabb, 09.02.2012 19:58: > On Thu, Feb 09, 2012 at 10:44:42AM -0800, Guido van Rossum wrote: >> I am guessing in part that's a function of resistance to change, and in >> part it means PyPy hasn't gotten enough mindshare yet. (Raise your hand if >> you have PyPy installed on one of your systems. Raise your hand if you use >> it. Raise your hand if you are a PyPy contributor. :-) > > I don't know if you actually want replies, but I'll bite. I have pypy > installed (from the standard Fedora pypy package), and for a particular > project it provided a 20x speedup. I'm not a PyPy contributor, but I'm > a believer. > > I would use PyPy everywhere if it worked with Python 3 and scipy. AFAIK, there is no concrete roadmap towards supporting SciPy on top of PyPy. Currently, PyPy is getting its own implementation of NumPy-like arrays, but there is currently no interaction with anything in the SciPy world outside of those. Given the shear size of SciPy, reimplementing it on top of numpypy is unrealistic. That being said, it's quite possible to fire up CPython from PyPy (or vice versa) and interact with that, if you really need both PyPy and SciPy. It even seems to be supported through multiprocessing. I find that pretty cool. http://thread.gmane.org/gmane.comp.python.pypy/9159/focus=9161 Stefan From ctb at msu.edu Thu Feb 9 20:57:57 2012 From: ctb at msu.edu (C. Titus Brown) Date: Thu, 9 Feb 2012 11:57:57 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> <20120209193635.GE9836@idyll.org> Message-ID: <20120209195757.GA30150@idyll.org> On Thu, Feb 09, 2012 at 08:42:40PM +0100, Masklinn wrote: > On 2012-02-09, at 20:36 , C. Titus Brown wrote: > > On Fri, Feb 10, 2012 at 03:19:36AM +0800, Matt Joiner wrote: > >>> The GIL annoys those who have learned to expect threading.Thread for CPU > >>> bound concurrency in advance -- which typically means prior experience with > >>> Java. Python threads are fine for their intended use -- e.g. I/O and > >>> background tasks in a GUI. > >> > >> Even for that purpose they're too heavy. The GIL conflicts, and > >> boilerplate overhead spawning threads is obscene for more than trivial > >> cases. > > > > The GIL is almost entirely a PR issue. In actual practice, it is so great > > (simple, straightforward, functional) I believe that it is a sign of Guido's > > time machine-enabled foresight. > > I'm not sure dabeaz would agree with you if he intervened in the discussion. Are we scheduling interventions for me now? 'cause there's a lot of people who want to jump in that queue :) dabeaz understands this stuff at a deeper level than me, which is often a handicap in these kinds of discussions, IMO. (He's also said that he prefers message passing to threading.) The point is that in terms of actually making my own libraries and parallelizing code, the GIL has been very straightforward, cross platform, and quite simple for understanding the consequences of a fairly wide range of multithreading models. Most people want to go do inappropriately complex things ("ooh! threads! shiny!") with threads and then fail to write robust code or understand the scaling of their code; I think the GIL does a fine job of blocking the simplest stupidities. Anyway, I love the GIL myself, although I think there is a great opportunity for a richer & more usable mid-level C API for both thread states and interpreters. cheers, --titus -- C. Titus Brown, ctb at msu.edu From sturla at molden.no Thu Feb 9 21:03:03 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 21:03:03 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120209114808.07220f5a@bhuda.mired.org> References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> <4F341DBC.5010609@molden.no> <20120209114808.07220f5a@bhuda.mired.org> Message-ID: <4F342677.2070005@molden.no> On 09.02.2012 20:48, Mike Meyer wrote: > Ah, I see why I never ran into it. I build systems that start by > loading all the modules they need, then fork()ing many processes from > that parent. Yes, but that would not work with MPI (e.g. mpi4py) where the MPI runtime (e.g. MPICH2) is starting the Python processes. Theoretically the issue should be be present on Windows when using multiprocessing, but not on Linux as multiprocessing is using os.fork. Sturla From stefan_ml at behnel.de Thu Feb 9 21:05:14 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 09 Feb 2012 21:05:14 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> <20120209193635.GE9836@idyll.org> Message-ID: Masklinn, 09.02.2012 20:42: > On 2012-02-09, at 20:36 , C. Titus Brown wrote: >> On Fri, Feb 10, 2012 at 03:19:36AM +0800, Matt Joiner wrote: >>>> The GIL annoys those who have learned to expect threading.Thread for CPU >>>> bound concurrency in advance -- which typically means prior experience with >>>> Java. Python threads are fine for their intended use -- e.g. I/O and >>>> background tasks in a GUI. >>> >>> Even for that purpose they're too heavy. The GIL conflicts, and >>> boilerplate overhead spawning threads is obscene for more than trivial >>> cases. >> >> The GIL is almost entirely a PR issue. In actual practice, it is so great >> (simple, straightforward, functional) I believe that it is a sign of Guido's >> time machine-enabled foresight. > > I'm not sure dabeaz would agree with you if he intervened in the discussion. That's an implementation detail, though. Stefan From stephen at xemacs.org Thu Feb 9 21:14:54 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 10 Feb 2012 05:14:54 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> Message-ID: <874nuzyda9.fsf@uwakimon.sk.tsukuba.ac.jp> Massimo Di Pierro writes: > > On Feb 9, 2012, at 12:03 PM, Steven D'Aprano wrote: > > > Massimo Di Pierro wrote: > >> Here is another data point: > >> http://redmonk.com/sogrady/2012/02/08/language-rankings-2-2012/ > >> Unfortunately the TIOBE index does matter. I can speak for python > >> in education and trends I seen. Well, maybe you should teach your students the rudiments of lying, erm, "statistics". That -3% on the TIOBE index is a steaming heap of FUD, as Anatoly himself admitted. Feb 2011 is clearly above trend, Feb 2012 below it. Variables vary, OK? So at the moment it is absolutely unclear whether Python's trend line has turned down or even decreased slope. And the RedMonk ranking shows Python at the very top. > Don't shoot the messenger please. > > You can dismiss or address the problem. Anyway... undergrads do care > because they will take 4 years to grade and they do not want to come > out with obsolete skills. Our undergrads learn Python, Ruby, Java, > Javascript and C++. Maybe they should learn something about reality of the IT industry, too. According to the TIOBE survey, COBOL and PL/1 are in the same class (rank 51-100, basically indistinguishable) with POSIX shell. Old programming languages never die ... and experts in them only become more valuable with time. Python skills will hardly become "obsolete" in the next decade, certainly not in the next 4 years. You say "dismiss or address the problem." Is there a problem? I dunno. Popularity is nice, but I really don't know if I would want to use a Python that spent the next five years (because that's what it will take) fixing what ain't broke to conform to undergraduate misconceptions. Sure, it would be nice have more robust support for installing non-stdlib modules such as numpy. But guess what? That's a hard nut to crack, and more, people have been working quite hard on the issue for a while. The distutils folks seem to be about to release at this point -- I guess the Time Machine has struck again! And by the way, which of Ruby, Java, Javascript, and C++ provides something like numpy that's easier to install? Preferably part of their stdlib? In my experience on Linux and Mac, at least, numerical code has always been an issue, whether it's numpy (once that I can remember, and that was because of some dependency which wouldn't build, not numpy itself), Steel Bank Common Lisp, ATLAS, R, .... The one thing that bothers me about the picture at TIOBE is the Objective-C line. I assume that's being driven by iPhone and iPad apps, and I suppose Java is being driven in part by Android. It's too bad Python can't get a piece of that action! From raymond.hettinger at gmail.com Thu Feb 9 21:16:45 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Thu, 9 Feb 2012 12:16:45 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: <0862C9C6-A2AD-4779-BFEC-69304ECFD5C1@rcn.com> On Feb 9, 2012, at 9:50 AM, Daniel Stutzbach wrote: > Maintaining a sorted list using Python's list type is a trap. The bisect is O(log n), but insertion and deletion are still O(n). > > A SortedList class that provides O(log n) insertions is useful from time to time. There are several existing implementations available (I wrote one of them, on top of my blist type), each with their pros and cons. I concur. People who want to maintain sorted collections (periodically adding and deleting items) are far better-off using your blist, a binary tree, or an in-memory sqlite database. Otherwise, we will have baited them into a non-scalable O(n) solution. Unfortunately, when people see the word "bisect", they will presume they've got O(log n) code. We've documented the issue with bisect.insort(), but the power of suggestion is very strong. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From ehlesmes at gmail.com Thu Feb 9 21:20:18 2012 From: ehlesmes at gmail.com (Edward Lesmes) Date: Thu, 9 Feb 2012 15:20:18 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <874nuzyda9.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <874nuzyda9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: It's too bad Python can't get a piece of that action! Indeed. On Thu, Feb 9, 2012 at 3:14 PM, Stephen J. Turnbull wrote: > Massimo Di Pierro writes: > > > > On Feb 9, 2012, at 12:03 PM, Steven D'Aprano wrote: > > > > > Massimo Di Pierro wrote: > > >> Here is another data point: > > >> http://redmonk.com/sogrady/2012/02/08/language-rankings-2-2012/ > > >> Unfortunately the TIOBE index does matter. I can speak for python > > >> in education and trends I seen. > > Well, maybe you should teach your students the rudiments of lying, > erm, "statistics". That -3% on the TIOBE index is a steaming heap of > FUD, as Anatoly himself admitted. Feb 2011 is clearly above trend, > Feb 2012 below it. Variables vary, OK? So at the moment it is > absolutely unclear whether Python's trend line has turned down or even > decreased slope. > > And the RedMonk ranking shows Python at the very top. > > > Don't shoot the messenger please. > > > > You can dismiss or address the problem. Anyway... undergrads do care > > because they will take 4 years to grade and they do not want to come > > out with obsolete skills. Our undergrads learn Python, Ruby, Java, > > Javascript and C++. > > Maybe they should learn something about reality of the IT industry, > too. According to the TIOBE survey, COBOL and PL/1 are in the same > class (rank 51-100, basically indistinguishable) with POSIX shell. > Old programming languages never die ... and experts in them only > become more valuable with time. Python skills will hardly become > "obsolete" in the next decade, certainly not in the next 4 years. > > You say "dismiss or address the problem." Is there a problem? I > dunno. Popularity is nice, but I really don't know if I would want to > use a Python that spent the next five years (because that's what it > will take) fixing what ain't broke to conform to undergraduate > misconceptions. > > Sure, it would be nice have more robust support for installing > non-stdlib modules such as numpy. But guess what? That's a hard nut > to crack, and more, people have been working quite hard on the issue > for a while. The distutils folks seem to be about to release at this > point -- I guess the Time Machine has struck again! > > And by the way, which of Ruby, Java, Javascript, and C++ provides > something like numpy that's easier to install? Preferably part of > their stdlib? In my experience on Linux and Mac, at least, numerical > code has always been an issue, whether it's numpy (once that I can > remember, and that was because of some dependency which wouldn't > build, not numpy itself), Steel Bank Common Lisp, ATLAS, R, .... > > The one thing that bothers me about the picture at TIOBE is the > Objective-C line. I assume that's being driven by iPhone and iPad > apps, and I suppose Java is being driven in part by Android. It's too > bad Python can't get a piece of that action! > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Edward Lesmes -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Thu Feb 9 21:34:47 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Thu, 9 Feb 2012 12:34:47 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> Message-ID: <2E5BF6F5-6A23-45D9-BF65-36D936925543@gmail.com> On Feb 9, 2012, at 9:48 AM, Guido van Rossum wrote: > The more fundamental "conflict" here seems to be between algorithms and classes. list.sort(), bisect and heapq focus on the algorithm. Bisect in particular had way too much focus on the algorithm. The API is awkward and error-prone for many common use cases. I've tried to remedy that through documenting how to implement the common use cases: http://docs.python.org/py3k/library/bisect.html#searching-sorted-lists The issue is that the current API focuses on "insertion points" rather than on finding values. Unfortunately, this API is very old, so the only way to fix it is to introduce a new class. If we introduced class around a sorted sequence, then we could make an reasonable API that corresponds to what people usually want to do with sorted sequences. Of course, that still leaves the issue with an O(n) insort. As Daniel pointed-out, a list is not the correct underlying data structure if you want to do periodic insertions and deletions. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Thu Feb 9 21:42:54 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 10 Feb 2012 07:42:54 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> Message-ID: On 10 February 2012 06:06, Guido van Rossum wrote: > On Thu, Feb 9, 2012 at 10:58 AM, Andrew McNabb wrote: > >> On Thu, Feb 09, 2012 at 10:44:42AM -0800, Guido van Rossum wrote: >> > I am guessing in part that's a function of resistance to change, and in >> > part it means PyPy hasn't gotten enough mindshare yet. (Raise your hand >> if >> > you have PyPy installed on one of your systems. Raise your hand if you >> use >> > it. Raise your hand if you are a PyPy contributor. :-) >> >> I don't know if you actually want replies, but I'll bite. I have pypy >> installed (from the standard Fedora pypy package), and for a particular >> project it provided a 20x speedup. I'm not a PyPy contributor, but I'm >> a believer. >> >> I would use PyPy everywhere if it worked with Python 3 and scipy. My >> apologies if this was just a rhetorical question. :) > > > Thanks for replying, it was not a rhetorical question. It's something I'm > considering asking during my keynote at PyCon next month. > In that case ... - I have various versions of PyPy installed (regularly pull the latest working Windows build); - I use it occasionally, but most of my Python work ATM is Google App Engine-based, and the GAE SDK doesn't work with PyPy; - I'm not a PyPy contributor, but am also a believer - I definitely think that PyPy is the future and should be the base for Python4K. - I won't be at PyCon. Cheers, Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Feb 9 22:05:32 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Feb 2012 07:05:32 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> Message-ID: On Fri, Feb 10, 2012 at 5:19 AM, Matt Joiner wrote: >> The GIL annoys those who have learned to expect threading.Thread for CPU >> bound concurrency in advance -- which typically means prior experience with >> Java. Python threads are fine for their intended use -- e.g. I/O and >> background tasks in a GUI. > > Even for that purpose they're too heavy. The GIL conflicts, and > boilerplate overhead spawning threads is obscene for more than trivial > cases. Have you even *tried* concurrent.futures (http://docs.python.org/py3k/library/concurrent.futures)? Or the 2.x backport on PyPI (http://pypi.python.org/pypi/futures)? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From guido at python.org Thu Feb 9 22:20:48 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 13:20:48 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: <2E5BF6F5-6A23-45D9-BF65-36D936925543@gmail.com> References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> <2E5BF6F5-6A23-45D9-BF65-36D936925543@gmail.com> Message-ID: On Thu, Feb 9, 2012 at 12:34 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > Bisect in particular had way too much focus on the algorithm. The API is > awkward and error-prone for many common use cases. > > I've tried to remedy that through documenting how to implement the common > use cases: > http://docs.python.org/py3k/library/bisect.html#searching-sorted-lists > > The issue is that the current API focuses on "insertion points" rather > than on finding values. Unfortunately, this API is very old, so the only > way to fix it is to introduce a new class. > > If we introduced class around a sorted sequence, then we could make an > reasonable API that corresponds to what people usually want to do with > sorted sequences. > > Of course, that still leaves the issue with an O(n) insort. As Daniel > pointed-out, a list is not the correct underlying data structure if you > want to do periodic insertions and deletions. > Maybe you're overanalyzing the problem? It seems what you want would require a PEP and/or a reference implementation that is thoroughly tested as a 3rd party package before it warrants inclusion into the stdlib. In the mean time adding a key= option that echoes the API offered by list.sort() and sorted() is a no-brainer. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Feb 9 22:34:25 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Feb 2012 07:34:25 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: On Fri, Feb 10, 2012 at 5:25 AM, Eric Snow wrote: > On Thu, Feb 9, 2012 at 12:16 PM, Giampaolo Rodol? wrote: >> I bet a lot of people don't want to upgrade for another reason: unicode. >> The impression I got is that python 3 forces the user to use and >> *understand* unicode and a lot of people simply don't want to deal >> with that. >> In python 2 there was no such a strong imposition. >> Python 2 string type acting both as bytes and as text was certainly >> ambiguos and "impure" on different levels and changing that was >> definitively a win in terms of purity and correctness. >> I bet most advanced users are happy with this change. >> On the other hand, Python 2 average user was free to ignore that >> distinction even if that meant having subtle bugs hidden somewhere in >> his/her code. >> I think this aspect shouldn't be underestimated. > > Isn't that more accurate for framework writers, rather than for > "average" users? ?How often do average users have to address > encoding/decoding in Python 3? The problem for average users *right now* is that many of the Unicode handling tools that were written for the blurry "is-it-bytes-or-is-it-text?" 2.x 8-bit str type haven't been ported to 3.x yet. That's currently happening, and the folks doing it are the ones who really have to make the adjustment, and figure out what they can deal with on behalf of their users and what they need to expose (if anything). The idea with Python 3 unicode is to have errors happen at (or at least close to) the point where the code is doing something wrong, unlike the Python 2 implicit conversion model, where either data gets silently corrupted, or you get a Unicode error far from the location that introduced the problem. I actually find it somewhat amusing when people say that python-dev isn't focusing on users enough because of the Python 3 transition or the Windows installer problems. What they *actually* seem to be complaining about is that python-dev isn't focused entirely on users that are native English speakers using an expensive proprietary OS. And that's a valid observation - most of us are here because we like Python and primarily want to make it better for the environments where *we* use it, which is mostly a combination of Linux and Mac users, a few other POSIX based platforms and a small minority of Windows developers. Given the contrariness of Windows as a target platform, the time of those developers is mostly spent on making it keep working, and bringing it up to feature parity with the POSIX version, so cleaning up the installation process falls to the wayside. (And, for all the cries of, "Python should be better supported on Windows!", we just don't see many Windows devs signing up to help - since I consider developing for Windows it's own special kind of hell that I'm happy to never have to do again, it doesn't actually surprise me there's a shortage of people willing to do it as a hobby) In terms of actually *fixing it*, the PSF doesn't generally solicit grant proposals, it reviews (and potentially accepts) them. If anyone is serious about getting something done for 3.3, then *write and submit a grant proposal* to the PSF board with the goal of either finalising the Python launcher for Windows, or else just closing out various improvements to the current installer that are already on the issue tracker (e.g. version numbers in the shortcut names, an option to modify the system PATH). Even without going all the way to a grant proposal, go find those tracker items I mentioned and see if there's anything you can do to help folks like Martin von Loewis, Brian Curtin and Terry Reedy close them out. In the meantime, if the python.org packages for Windows aren't up to scratch (and they aren't in many ways), *use the commercially backed ones* (or one of the other sumo distributions that are out there). Don't tell your students to grab the raw installers directly from python.org, redirect them to the free rebuilds from ActiveState or Enthought, or go all out and get them to install something like Python(X, Y). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From massimo.dipierro at gmail.com Thu Feb 9 22:41:22 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Thu, 9 Feb 2012 15:41:22 -0600 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: <90C8316C-1AB0-4759-B3DF-0FB07477FF08@gmail.com> First of all all the Python developers are doing an amazing job, and none of the comments should be taken as a critique but only as a suggestion. On Feb 9, 2012, at 3:34 PM, Nick Coghlan wrote: [...] > In the meantime, if the python.org packages for Windows aren't up to > scratch (and they aren't in many ways), *use the commercially backed > ones* (or one of the other sumo distributions that are out there). > Don't tell your students to grab the raw installers directly from > python.org, redirect them to the free rebuilds from ActiveState or > Enthought, or go all out and get them to install something like > Python(X, Y). This is what I do now. I tell my students if they have trouble to Enthought. Yet there are issues with license and 32 (free) vs 64 bits (not free). Long term I do not think this what we should encourage. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From tjreedy at udel.edu Thu Feb 9 22:51:53 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Feb 2012 16:51:53 -0500 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: <0862C9C6-A2AD-4779-BFEC-69304ECFD5C1@rcn.com> References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> <0862C9C6-A2AD-4779-BFEC-69304ECFD5C1@rcn.com> Message-ID: On 2/9/2012 3:16 PM, Raymond Hettinger wrote: > > On Feb 9, 2012, at 9:50 AM, Daniel Stutzbach wrote: > >> Maintaining a sorted list using Python's list type is a trap. The >> bisect is O(log n), but insertion and deletion are still O(n). The omitted constants are such that the log n term dominates for 'small' n. list.sort internally uses binary insert sort for n up to 64. It only switches to mergesort for runs of at least 64. >> A SortedList class that provides O(log n) insertions is useful from >> time to time. There are several existing implementations available (I >> wrote one of them, on top of my blist type), each with their pros and >> cons. Are your blist leaves lists (or arrays) of some maximum size? > I concur. People who want to maintain sorted collections (periodically > adding and deleting items) are far better-off using your blist, a binary > tree, or an in-memory sqlite database. Otherwise, we will have baited > them into a non-scalable O(n) solution. Unfortunately, when people see > the word "bisect", they will presume they've got O(log n) code. We've > documented the issue with bisect.insort(), but the power of suggestion > is very strong. Using insort on a list of a millions items is definitely not a good idea, but I can see how someone not so aware of scaling issues might be tempted, especially with no stdlib alternative. One could almost be tempted to issue a warning if 'hi' is 'too large'. -- Terry Jan Reedy From stutzbach at google.com Thu Feb 9 23:01:27 2012 From: stutzbach at google.com (Daniel Stutzbach) Date: Thu, 9 Feb 2012 14:01:27 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> <0862C9C6-A2AD-4779-BFEC-69304ECFD5C1@rcn.com> Message-ID: On Thu, Feb 9, 2012 at 1:51 PM, Terry Reedy wrote: > On Feb 9, 2012, at 9:50 AM, Daniel Stutzbach wrote: >> A SortedList class that provides O(log n) insertions is useful from > > time to time. There are several existing implementations available (I >>> wrote one of them, on top of my blist type), each with their pros and >>> cons. >>> >> > Are your blist leaves lists (or arrays) of some maximum size? Yes. Each leaf has at most 128 elements. It's a compile-time constant. -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Feb 9 23:02:00 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 14:02:00 -0800 Subject: [Python-ideas] Optional key to `bisect`'s functions? In-Reply-To: References: <05E0F324-690E-45A4-8567-BB9BCD226B42@masklinn.net> <0862C9C6-A2AD-4779-BFEC-69304ECFD5C1@rcn.com> Message-ID: On Thu, Feb 9, 2012 at 1:51 PM, Terry Reedy wrote: > Using insort on a list of a millions items is definitely not a good idea, > but I can see how someone not so aware of scaling issues might be tempted, > especially with no stdlib alternative. One could almost be tempted to issue > a warning if 'hi' is 'too large'. > Put it in the docs. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Thu Feb 9 23:24:18 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Fri, 10 Feb 2012 00:24:18 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <90C8316C-1AB0-4759-B3DF-0FB07477FF08@gmail.com> References: <90C8316C-1AB0-4759-B3DF-0FB07477FF08@gmail.com> Message-ID: On Thu, Feb 9, 2012 at 11:41 PM, Massimo Di Pierro < massimo.dipierro at gmail.com> wrote: > First of all all the Python developers are doing an amazing job, and none > of the comments should be taken as a critique but only as a suggestion. > > On Feb 9, 2012, at 3:34 PM, Nick Coghlan wrote: > [...] > > In the meantime, if the python.org packages for Windows aren't up to >> scratch (and they aren't in many ways), *use the commercially backed >> ones* (or one of the other sumo distributions that are out there). >> Don't tell your students to grab the raw installers directly from >> python.org, redirect them to the free rebuilds from ActiveState or >> Enthought, or go all out and get them to install something like >> Python(X, Y). >> > > This is what I do now. I tell my students if they have trouble to > Enthought. Yet there are issues with license and 32 (free) vs 64 bits (not > free). Long term I do not think this what we should encourage. > > Concerning the eclipse and the plugin thing - "Aptana" is a nice bundle of pydev with eclipse so it's just one download and you get a nice python IDE with autocompletion etc. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From amcnabb at mcnabbs.org Thu Feb 9 23:25:40 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Thu, 9 Feb 2012 15:25:40 -0700 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> Message-ID: <20120209222540.GD20556@mcnabbs.org> On Thu, Feb 09, 2012 at 08:53:55PM +0100, Stefan Behnel wrote: > > AFAIK, there is no concrete roadmap towards supporting SciPy on top of > PyPy. Currently, PyPy is getting its own implementation of NumPy-like > arrays, but there is currently no interaction with anything in the SciPy > world outside of those. Given the shear size of SciPy, reimplementing it on > top of numpypy is unrealistic. I understand that there is some hope in getting cython to support pure python and ctypes as a backend, and then to migrate scipy to use cython. This is definitely a long-term solution. Most people don't depend on all of scipy, and for some use cases, it's not too hard to find alternatives. Today I migrated a project from scipy to the GNU Scientific Library (with ctypes). It now works great with PyPy, and I saw a total speedup of 10.6. Dropping from 27 seconds to 2.55 seconds is huge. It's funny, but for a new project I would go to great lengths to try to use the GSL instead of scipy (though I'm sure for some use cases it wouldn't be possible). > That being said, it's quite possible to fire up CPython from PyPy (or vice > versa) and interact with that, if you really need both PyPy and SciPy. It > even seems to be supported through multiprocessing. I find that pretty cool. > > http://thread.gmane.org/gmane.comp.python.pypy/9159/focus=9161 That's a fascinating idea that I had never considered. Thanks for sharing. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From tjreedy at udel.edu Thu Feb 9 23:46:33 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Feb 2012 17:46:33 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> References: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> Message-ID: On 2/9/2012 12:46 PM, Massimo Di Pierro wrote: > I think if easy_install, gevent, numpy (*), and win32 extensions where > included in 3.x, together with a slightly better Idle (still based on I am working on the patches already on the tracker, starting with bug fixes. > Tkinter, with multiple pages, If you mean multiple tabbed pages in one window, I believe there is a patch. autocompletion, IDLE already has 'auto-completion'. If you mean something else, please explain. > collapsible [blocks], line numbers, I have thought about those. > better printing with syntax highlighting), Better basic printing support is really needed. #1528593 Color printing if not possible now would be nice, as color printers are common now. I have no idea if tkinter print support makes either easier now. > and if easy_install were accessible via Idle, this would be a killer version. That should be possible with an extension. > Longer term removing the GIL and using garbage collection should be a > priority. I am not sure what is involved and how difficult it is but As has been discussed here and on pydev, the problems include things like making Python slower and disabling C extensions. > perhaps this is what PyCon money can be used for. If this cannot be done > without breaking backward compatibility again, then 3.x should be > considered an experimental branch, people should be advised to stay with > 2.7 (2.8?) and then skip to 4.x directly when these problems are For non-Euro-Americans, a major problem with Python 1/2 was the use of ascii for identifiers. This was *fixed* by Python 3. When I went to Japan a couple of years ago and stopped in a general bookstore (like Borders), its computer language section had about 10 books on Python, most in Japanese as I remember. So it is apparently in use there. > resolved. Python should not make a habit of breaking backward > compatibility. I believe the main problem has been the unicode switch, which is critical to Python being a world language. Removal of old-style classes was mostly a non-issue, except for the very few who intentionally continued to use them. -- Terry Jan Reedy From pydanny at gmail.com Thu Feb 9 23:50:58 2012 From: pydanny at gmail.com (Daniel Greenfeld) Date: Thu, 9 Feb 2012 14:50:58 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% (Massimo Di Pierro) Message-ID: > Message: 1 > Date: Thu, 9 Feb 2012 15:41:22 -0600 > From: Massimo Di Pierro > To: Nick Coghlan > Cc: python-ideas > Subject: Re: [Python-ideas] Python 3000 TIOBE -3% > Message-ID: <90C8316C-1AB0-4759-B3DF-0FB07477FF08 at gmail.com> > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > First of all all the Python developers are doing an amazing job, and > none of the comments should be taken as a critique but only as a > suggestion. I completely agree with Massimo again. :-) > > On Feb 9, 2012, at 3:34 PM, Nick Coghlan wrote: > [...] >> In the meantime, if the python.org packages for Windows aren't up to >> scratch (and they aren't in many ways), *use the commercially backed >> ones* (or one of the other sumo distributions that are out there). >> Don't tell your students to grab the raw installers directly from >> python.org, redirect them to the free rebuilds from ActiveState or >> Enthought, or go all out and get them to install something like >> Python(X, Y). > > This is what I do now. I tell my students if they have trouble to > Enthought. Yet there are issues with license and 32 (free) vs 64 bits > (not free). Long term I do not think this what we should encourage. I think it is odd to encourage users to go to use open source distros, but if they have installation problems (which is really common - Massimo/Titus/Audrey/Zed/etc seem to back me up here) to recommend 'somewhere' to go to commercial-but-free distros. If we should be pointing new users to ActiveState or Enthought, maybe we should just change the python.org default installers to what they provide. Tell you what, I'll take this matter off-list and bring it up with Jesse Noller and the rest of the board working on the python.org RFP. -- 'Knowledge is Power' Daniel Greenfeld http://pydanny.blogspot.com From guido at python.org Thu Feb 9 23:56:37 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 14:56:37 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120209222540.GD20556@mcnabbs.org> References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> Message-ID: On Thu, Feb 9, 2012 at 2:25 PM, Andrew McNabb wrote: > On Thu, Feb 09, 2012 at 08:53:55PM +0100, Stefan Behnel wrote: > > > > AFAIK, there is no concrete roadmap towards supporting SciPy on top of > > PyPy. Currently, PyPy is getting its own implementation of NumPy-like > > arrays, but there is currently no interaction with anything in the SciPy > > world outside of those. Given the shear size of SciPy, reimplementing it > on > > top of numpypy is unrealistic. > > I understand that there is some hope in getting cython to support pure > python and ctypes as a backend, and then to migrate scipy to use cython. > This is definitely a long-term solution. > > Most people don't depend on all of scipy, and for some use cases, it's > not too hard to find alternatives. Today I migrated a project from > scipy to the GNU Scientific Library (with ctypes). It now works great > with PyPy, and I saw a total speedup of 10.6. Dropping from 27 seconds > to 2.55 seconds is huge. It's funny, but for a new project I would go > to great lengths to try to use the GSL instead of scipy (though I'm sure > for some use cases it wouldn't be possible). > Hm... is there a reason GSL and SciPy need to compete? Can't SciPy incorporate GSL? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Feb 9 23:59:35 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 14:59:35 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% (Massimo Di Pierro) In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 2:50 PM, Daniel Greenfeld wrote: > > Message: 1 > > Date: Thu, 9 Feb 2012 15:41:22 -0600 > > From: Massimo Di Pierro > > To: Nick Coghlan > > Cc: python-ideas > > Subject: Re: [Python-ideas] Python 3000 TIOBE -3% > > Message-ID: <90C8316C-1AB0-4759-B3DF-0FB07477FF08 at gmail.com> > > Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes > > > > First of all all the Python developers are doing an amazing job, and > > none of the comments should be taken as a critique but only as a > > suggestion. > > I completely agree with Massimo again. :-) > > > > > On Feb 9, 2012, at 3:34 PM, Nick Coghlan wrote: > > [...] > >> In the meantime, if the python.org packages for Windows aren't up to > >> scratch (and they aren't in many ways), *use the commercially backed > >> ones* (or one of the other sumo distributions that are out there). > >> Don't tell your students to grab the raw installers directly from > >> python.org, redirect them to the free rebuilds from ActiveState or > >> Enthought, or go all out and get them to install something like > >> Python(X, Y). > > > > This is what I do now. I tell my students if they have trouble to > > Enthought. Yet there are issues with license and 32 (free) vs 64 bits > > (not free). Long term I do not think this what we should encourage. > > I think it is odd to encourage users to go to use open source distros, > but if they have installation problems (which is really common - > Massimo/Titus/Audrey/Zed/etc seem to back me up here) to recommend > 'somewhere' to go to commercial-but-free distros. > Why is that odd? Those distros are an integral part of the ecosystem that is enabled by open source. I see no philosophical problems (unless you are of the GNU religion of course -- but then you should have said FOSS instead of open source :-). > If we should be pointing new users to ActiveState or Enthought, maybe > we should just change the python.org default installers to what they > provide. > Again, why? The commercial distributors often lag way behind what python.orgoffers -- and for very good reasons. > Tell you what, I'll take this matter off-list and bring it up with > Jesse Noller and the rest of the board working on the python.org RFP. > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Feb 10 00:06:09 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 09 Feb 2012 23:06:09 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> Message-ID: On 2/9/12 10:56 PM, Guido van Rossum wrote: > On Thu, Feb 9, 2012 at 2:25 PM, Andrew McNabb > wrote: > > On Thu, Feb 09, 2012 at 08:53:55PM +0100, Stefan Behnel wrote: > > > > AFAIK, there is no concrete roadmap towards supporting SciPy on top of > > PyPy. Currently, PyPy is getting its own implementation of NumPy-like > > arrays, but there is currently no interaction with anything in the SciPy > > world outside of those. Given the shear size of SciPy, reimplementing it on > > top of numpypy is unrealistic. > > I understand that there is some hope in getting cython to support pure > python and ctypes as a backend, and then to migrate scipy to use cython. > This is definitely a long-term solution. > > Most people don't depend on all of scipy, and for some use cases, it's > not too hard to find alternatives. Today I migrated a project from > scipy to the GNU Scientific Library (with ctypes). It now works great > with PyPy, and I saw a total speedup of 10.6. Dropping from 27 seconds > to 2.55 seconds is huge. It's funny, but for a new project I would go > to great lengths to try to use the GSL instead of scipy (though I'm sure > for some use cases it wouldn't be possible). > > > Hm... is there a reason GSL and SciPy need to compete? Can't SciPy incorporate GSL? GSL is GPLed. scipy is BSD. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tjreedy at udel.edu Fri Feb 10 00:07:18 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Feb 2012 18:07:18 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: On 2/9/2012 1:26 PM, Guido van Rossum wrote: > On Thu, Feb 9, 2012 at 10:14 AM, Masklinn > wrote: > > On 2012-02-09, at 19:03 , Steven D'Aprano wrote: > > The choice of which garbage collection implementation (ref > counting is garbage collection) is a quality of implementation > detail, not a language feature. > > That's debatable, it's an implementation detail with very different > semantics which tends to leak out into usage patterns of the > language (as it did with CPython, which basically did not get fixed > in the community until Pypy started ascending), > > > I think it was actually Jython that first sensitized the community to > this issue. Yes, it was. The first PyPy status blog in Oct 2007 http://morepypy.blogspot.com/2007/10/first-post.html long before any practical release, was a year after the 2.5 release. -- Terry Jan Reedy From pydanny at gmail.com Fri Feb 10 00:09:55 2012 From: pydanny at gmail.com (Daniel Greenfeld) Date: Thu, 9 Feb 2012 15:09:55 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% (Massimo Di Pierro) In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 2:59 PM, Guido van Rossum wrote: >> > On Feb 9, 2012, at 3:34 PM, Nick Coghlan wrote: >> > [...] >> >> In the meantime, if the python.org packages for Windows aren't up to >> >> scratch (and they aren't in many ways), *use the commercially backed >> >> ones* (or one of the other sumo distributions that are out there). >> >> Don't tell your students to grab the raw installers directly from >> >> python.org, redirect them to the free rebuilds from ActiveState or >> >> Enthought, or go all out and get them to install something like >> >> Python(X, Y). >> > >> > This is what I do now. I tell my students if they have trouble to >> > Enthought. Yet there are issues with license and 32 (free) vs 64 bits >> > (not free). Long term I do not think this what we should encourage. >> >> I think it is odd to encourage users to go to use open source distros, >> but if they have installation problems (which is really common - >> Massimo/Titus/Audrey/Zed/etc seem to back me up here) to recommend >> 'somewhere' to go to commercial-but-free distros. > > Why is that odd? > > Those distros are an integral part of the ecosystem that is enabled by open > source. I see no philosophical problems (unless you are of the GNU religion > of course -- but then you should have said FOSS instead of open source :-). I may have not said this as well as I thought. :P I don't follow the GNU religion but I do make Python a very good friend. :-) I think it's wonderful that ActiveState and Enthought are providing distributions for free. I got kickstarted on ActiveState back in 2005. However, for people coming into the language, they should be able to expect an easy installation from the core site regardless of their operating system. I'm wondering that rather than pointing all the new Windows users at ActiveState/Enthought sites, if their distros are easier to install, maybe the links should be to those distros. There are probably all sorts of really good reasons why this is not possible, but if you ever have to see 10 instructors at once waste a couple hours of installation on 75 students you won't care about those reasons anymore. >> >> If we should be pointing new users to ActiveState or Enthought, maybe >> we should just change the python.org default installers to what they >> provide. > > Again, why? The commercial distributors often lag way behind what python.org > offers -- and for very good reasons. Just trying to find an easier path for instructors get students kickstarted in our favorite programming language. -- 'Knowledge is Power' Daniel Greenfeld http://pydanny.blogspot.com From sturla at molden.no Fri Feb 10 00:52:32 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 10 Feb 2012 00:52:32 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> Message-ID: Den 9. feb. 2012 kl. 23:56 skrev Guido van Rossum : ). > > Hm... is there a reason GSL and SciPy need to compete? Can't SciPy incorporate GSL? > > GPL vs BSD issue. Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From senthil at uthcode.com Fri Feb 10 01:00:32 2012 From: senthil at uthcode.com (Senthil Kumaran) Date: Fri, 10 Feb 2012 08:00:32 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> References: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> Message-ID: <20120210000032.GA1855@mathmagic> On Thu, Feb 09, 2012 at 11:46:45AM -0600, Massimo Di Pierro wrote: > I think if easy_install, gevent, numpy (*), and win32 extensions > where included in 3.x, together with a slightly better Idle (still > based on Tkinter, with multiple pages, autocompletion, collapsible, > line numbers, better printing with syntax highlitghing), and if > easy_install were accessible via Idle, this would be a killer > version. > > Longer term removing the GIL and using garbage collection should be > a priority. I am not sure what is involved and how difficult it is I am not sure if popularity contests are just based on technical merits/demerits alone. I guess, less people here could care less for popularity, but more for good tools in python land. So if there are things lacking in Python world, then those are good project opportunities. What I personally feel is, the various plug-and-play libraries are giving JavaScript a thumbs up and more is going on web world front-end than back-end. So, if there is a requirement for Python programmer, there is an assumption that he should web techs too. There are also PHP/Ruby/Java folks who also know web technologies. So, the web tech like (javascript) gets counted 4x time. -- Senthil From guido at python.org Fri Feb 10 01:02:34 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 16:02:34 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% (Massimo Di Pierro) In-Reply-To: References: Message-ID: Sadly, it's quite frequent that works really well in an educational setting shouldn't be recommended in a professional programming environment, and vice versa. I'm not sure how to answer this except by creating, maintaining and promoting some wiki pages aimed specifically at instructors. On Thu, Feb 9, 2012 at 3:09 PM, Daniel Greenfeld wrote: > On Thu, Feb 9, 2012 at 2:59 PM, Guido van Rossum wrote: > > >> > On Feb 9, 2012, at 3:34 PM, Nick Coghlan wrote: > >> > [...] > >> >> In the meantime, if the python.org packages for Windows aren't up to > >> >> scratch (and they aren't in many ways), *use the commercially backed > >> >> ones* (or one of the other sumo distributions that are out there). > >> >> Don't tell your students to grab the raw installers directly from > >> >> python.org, redirect them to the free rebuilds from ActiveState or > >> >> Enthought, or go all out and get them to install something like > >> >> Python(X, Y). > >> > > >> > This is what I do now. I tell my students if they have trouble to > >> > Enthought. Yet there are issues with license and 32 (free) vs 64 bits > >> > (not free). Long term I do not think this what we should encourage. > >> > >> I think it is odd to encourage users to go to use open source distros, > >> but if they have installation problems (which is really common - > >> Massimo/Titus/Audrey/Zed/etc seem to back me up here) to recommend > >> 'somewhere' to go to commercial-but-free distros. > > > > Why is that odd? > > > > Those distros are an integral part of the ecosystem that is enabled by > open > > source. I see no philosophical problems (unless you are of the GNU > religion > > of course -- but then you should have said FOSS instead of open source > :-). > > I may have not said this as well as I thought. :P > > I don't follow the GNU religion but I do make Python a very good friend. > :-) > > I think it's wonderful that ActiveState and Enthought are providing > distributions for free. I got kickstarted on ActiveState back in 2005. > However, for people coming into the language, they should be able to > expect an easy installation from the core site regardless of their > operating system. > > I'm wondering that rather than pointing all the new Windows users at > ActiveState/Enthought sites, if their distros are easier to install, > maybe the links should be to those distros. There are probably all > sorts of really good reasons why this is not possible, but if you ever > have to see 10 instructors at once waste a couple hours of > installation on 75 students you won't care about those reasons > anymore. > > >> > >> If we should be pointing new users to ActiveState or Enthought, maybe > >> we should just change the python.org default installers to what they > >> provide. > > > > Again, why? The commercial distributors often lag way behind what > python.org > > offers -- and for very good reasons. > > Just trying to find an easier path for instructors get students > kickstarted in our favorite programming language. > > -- > 'Knowledge is Power' > Daniel Greenfeld > http://pydanny.blogspot.com > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Feb 10 01:03:20 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 16:03:20 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> Message-ID: On Thu, Feb 9, 2012 at 3:52 PM, Sturla Molden wrote: > > > Den 9. feb. 2012 kl. 23:56 skrev Guido van Rossum : > ). > > > Hm... is there a reason GSL and SciPy need to compete? Can't SciPy > incorporate GSL? > > > > GPL vs BSD issue. > That's a bummer. Someone should open negotiations. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Fri Feb 10 01:11:54 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Thu, 9 Feb 2012 17:11:54 -0700 Subject: [Python-ideas] [Python-Dev] matrix operations on dict :) In-Reply-To: References: Message-ID: On Wed, Feb 8, 2012 at 9:54 AM, julien tayon wrote: > 2012/2/7 Mark Janssen : > > On Mon, Feb 6, 2012 at 6:12 PM, Steven D'Aprano > wrote: > > > I have the problem looking for this solution! > > > > The application for this functionality is in coding a fractal graph (or > > "multigraph" in the literature). This is the most powerful structure > that > > Computer Science has ever conceived. If you look at the evolution of > data > > structures in compsci, the fractal graph is the ultimate. From lists to > > trees to graphs to multigraphs. The latter elements can always encompass > > the former with only O(1) extra cost. > > { "a" : 1 } + { "a" : { "b" : 1 } } == KABOOM. This a counter example > proving it does not handle all structures. > > Okay, I guess I did not make myself very clear. What I'm proposing probably will (eventually) require changes to the "object" model of Python and may require (or want) the addition of the "compound" data-type (as in python's predecessor ABC). The symbol that denotes a compound would be the colon (":") and associates a left hand side with right-hand side value, a NAME with a VALUE. A dictionary would (then) be a SET of these. (Voila! things have already gotten simplified.) Eventually, I also think this will seque and integrate nicely into Mark Shannon's "shared-key dict" proposal (PEP 410). The compound data-type would act as the articulation point, around which the recursive, fractal data structure would revolve: much like the decimal point forms in (non-integer) number. (In theory, you could even do a reciprocal or __INVERT__ operation on this type.) OR perhaps a closer comparison is whatever separates the imaginary from the real part in a complex number on the complex plane -- by virtue of such, creates two orthogonal and independent spaces. The same we want to do with this new fractal dictionary type. While in the abstract one might think to allow any arbitrary data-type for right-hand-side values, in PRACTICE, integers are sufficient. The reason is thus. In the fractal data type, you simply need to define the bottom-most and top-most layers of the fractalset abstraction, which is the same as saying the relationship between the *atomic* and the *group* -- everything in-between will be taken care of by the power of the type itself. It makes sense to use a maximally atomic, INTEGER data type (starting with the UNIT 1) for the bottom most level., and a maximally abstract top-most level -- this is simply an abstract grouping type (i.e. a collection). I'm going to suggest a SET is the most abstract (i.e. sufficient) because it does not impose an order and for reasons regarding the CONFLATION rule (RULE1). The CONFLATION rule is thus: items of the same name are combined ({'a':1, 'a':3} ==> {'a':4}, and non-named (atomic) items are summed. To simplify representation, values should be conflated as much as possible, the idea is maximizing reduction. This rule separates a set from a list, because non-unique items will be conflated into one. Such a set or grouping should be looked at as an arbitrary n-dimensional space. An interesting thing to think about is how this space can be mapped into a unique 1-dimensional, ordered list and vice versa. Reflectively, a list can be converted uniquely into this fractal set thusly: All non-integer, non-collection items will be considered NAMES and counted. If an item is another list, it will recurse and create another set. If a set it will simply add it, as is. These rules could be important in object serialization (we'll call this EXPANSION). In any case, for sake of your example. In the above KABOOM example, unnamed, atomic elements can just be considered ANONYMOUS (using None as the key). In this case, the new dict becomes: { "a" : 1 } + { "a" : { "b" : 1 } } ==> { "a" : {None: 1, "b" : 1 } } , OR if have a compound data-type, we can remove the redundant pseudo-name: { "a" : { 1, "b" : 1 } }. Furthermore we can assume a default value of 1 for non-valued "names", so we could express this more simply: { 'a' } + { 'a" : { 'b' } } ==> { ''a': { 1, 'b' } } No ambiguity! as long as we determine a convention. As noted, one element is named, and the other is not. Consider unnamed values within a grouping like a GAS and *named* values as a SOLID. You're adding them into the same room where they can co-exist just fine. No confusion! To clarify the properties of this fractal data type more clearly: there is only 1 key in the the second, inner set ('b'). We can remove the values() method as they will always be the atomic INTEGER type and conflate to a single number. We'll call this other thing, this property "mass"; in this case = 2.) The use of physical analog is helpful and will inform the definition. (Could one represent a python CLASS heirarchy more simply with this fractalset object somehow....?) Further definitions: RULE2: When an atomic is added to a compound, a grouping must be created: 1 + "b" : 1 = { None : 1, "b" : 1 } RULE3: Preserve groupings where present: 'b' : 7 + { 'b' : 1 } = { 'b' : 8 } I think this might be sufficient. Darn, I hope it makes some sense.... mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Feb 10 01:18:22 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 16:18:22 -0800 Subject: [Python-ideas] [Python-Dev] matrix operations on dict :) In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 4:11 PM, Mark Janssen wrote: > On Wed, Feb 8, 2012 at 9:54 AM, julien tayon wrote: > >> 2012/2/7 Mark Janssen : >> >> > On Mon, Feb 6, 2012 at 6:12 PM, Steven D'Aprano > > wrote: >> > >> > > I have the problem looking for this solution! >> > >> > The application for this functionality is in coding a fractal graph (or >> > "multigraph" in the literature). This is the most powerful structure >> that >> > Computer Science has ever conceived. If you look at the evolution of >> data >> > structures in compsci, the fractal graph is the ultimate. From lists to >> > trees to graphs to multigraphs. The latter elements can always >> encompass >> > the former with only O(1) extra cost. >> >> { "a" : 1 } + { "a" : { "b" : 1 } } == KABOOM. This a counter example >> proving it does not handle all structures. >> >> Okay, I guess I did not make myself very clear. What I'm proposing > probably will (eventually) require changes to the "object" model of Python > and may require (or want) the addition of the "compound" data-type (as in > python's predecessor ABC). The symbol that denotes a compound would be the > colon (":") and associates a left hand side with right-hand side value, a > NAME with a VALUE. > That was not a user-visible data type in ABC. ABC had dictionaries (with somewhat different semantics due to the polymorphic static typing) and the ':' was part of the dictionary syntax, not of the type system. > A dictionary would (then) be a SET of these. (Voila! things have already > gotten simplified.) > Really? So {a:1, a:2} would be a dict of length 2? > Eventually, I also think this will seque and integrate nicely into Mark > Shannon's "shared-key dict" proposal (PEP 410). > > The compound data-type would act as the articulation point, around which > the recursive, fractal data structure would revolve: much like the decimal > point forms in (non-integer) number. (In theory, you could even do a > reciprocal or __INVERT__ operation on this type.) OR perhaps a closer > comparison is whatever separates the imaginary from the real part in a > complex number on the complex plane -- by virtue of such, creates two > orthogonal and independent spaces. The same we want to do with this new > fractal dictionary type. > > While in the abstract one might think to allow any arbitrary data-type for > right-hand-side values, in PRACTICE, integers are sufficient. The reason > is thus. In the fractal data type, you simply need to define the > bottom-most and top-most layers of the fractalset abstraction, which is the > same as saying the relationship between the *atomic* and the *group* -- > everything in-between will be taken care of by the power of the type > itself. It makes sense to use a maximally atomic, INTEGER data type > (starting with the UNIT 1) for the bottom most level., and a maximally > abstract top-most level -- this is simply an abstract grouping type (i.e. a > collection). I'm going to suggest a SET is the most abstract (i.e. > sufficient) because it does not impose an order and for reasons regarding > the CONFLATION rule (RULE1). > > The CONFLATION rule is thus: items of the same name are combined ({'a':1, > 'a':3} ==> {'a':4}, and non-named (atomic) items are summed. To simplify > representation, values should be conflated as much as possible, the idea is > maximizing reduction. This rule separates a set from a list, because > non-unique items will be conflated into one. Such a set or grouping should > be looked at as an arbitrary n-dimensional space. An interesting thing to > think about is how this space can be mapped into a unique 1-dimensional, > ordered list and vice versa. Reflectively, a list can be converted > uniquely into this fractal set thusly: All non-integer, non-collection > items will be considered NAMES and counted. If an item is another list, it > will recurse and create another set. If a set it will simply add it, as > is. These rules could be important in object serialization (we'll call > this EXPANSION). > > In any case, for sake of your example. In the above KABOOM example, > unnamed, atomic elements can just be considered ANONYMOUS (using None as > the key). In this case, the new dict becomes: > > { "a" : 1 } + { "a" : { "b" : 1 } } ==> { "a" : {None: 1, "b" : 1 } } , > OR if have a compound data-type, we can remove the redundant pseudo-name: > { "a" : { 1, "b" : 1 } }. > Furthermore we can assume a default value of 1 for non-valued "names", so > we could express this more simply: > { 'a' } + { 'a" : { 'b' } } ==> { ''a': { 1, 'b' } } No ambiguity! as > long as we determine a convention. > > As noted, one element is named, and the other is not. Consider unnamed > values within a grouping like a GAS and *named* values as a SOLID. You're > adding them into the same room where they can co-exist just fine. No > confusion! > > To clarify the properties of this fractal data type more clearly: there > is only 1 key in the the second, inner set ('b'). We can remove the > values() method as they will always be the atomic INTEGER type and conflate > to a single number. We'll call this other thing, this property "mass"; in > this case = 2.) The use of physical analog is helpful and will inform the > definition. > > (Could one represent a python CLASS heirarchy more simply with this > fractalset object somehow....?) > > Further definitions: > > RULE2: When an atomic is added to a compound, a grouping must be created: > > 1 + "b" : 1 = { None : 1, "b" : 1 } > > RULE3: Preserve groupings where present: > > 'b' : 7 + { 'b' : 1 } = { 'b' : 8 } > > I think this might be sufficient. Darn, I hope it makes some sense.... > Maybe you should reduce your coffee intake. There's too much SHOUTING in your post... :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Fri Feb 10 01:26:59 2012 From: jnoller at gmail.com (Jesse Noller) Date: Thu, 9 Feb 2012 19:26:59 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> References: <0B687CDC-6C26-4032-BFBB-CF562AF29767@gmail.com> Message-ID: <70FCC0D7-1A06-4686-ACB9-31E591A4FDAD@gmail.com> On Feb 9, 2012, at 12:46 PM, Massimo Di Pierro wrote: > I think if easy_install, gevent, numpy (*), and win32 extensions where included in 3.x, together with a slightly better Idle (still based on Tkinter, with multiple pages, autocompletion, collapsible, line numbers, better printing with syntax highlitghing), and if easy_install were accessible via Idle, this would be a killer version. > > Longer term removing the GIL and using garbage collection should be a priority. I am not sure what is involved and how difficult it is but perhaps this is what PyCon money can be used for. Please do not volunteer revenue that does not exist, or PSF funds for things without a grant proposal or working group. Especially PyCon revenue - which does not exist. Jesse From ncoghlan at gmail.com Fri Feb 10 02:16:46 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 10 Feb 2012 11:16:46 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% (Massimo Di Pierro) In-Reply-To: References: Message-ID: On Fri, Feb 10, 2012 at 8:50 AM, Daniel Greenfeld wrote: > I think it is odd to encourage users to go to use open source distros, > but if they have installation problems (which is really common - > Massimo/Titus/Audrey/Zed/etc seem to back me up here) to recommend > 'somewhere' to go to commercial-but-free distros. The reason I encourage budding developers to switch to an open source base as soon as they can is because it comes with an entire open source ecosystem around it. With open source code that needs to interact with OS level APIs, the development flow is often Linux first (it's completely open source, POSIX compatible and very popular), then OpenSolaris and the *BSD variants (also open source, POSIX compatible, but significantly less popular) then Mac OS X (at least it offers a decent POSIX layer), then Windows (from my perspective, the win32 API and NTFS stand tall as a couple of the worst cases of NIH syndrome in the history of computing). In other words, it's almost the exact reverse of the situation in the proprietary desktop software world (which usually goes Windows->Mac OS X->Linux based on desktop market share). With POSIX compatible code covering pretty much every platform other than Windows, and with win32 API programming being such an alien (and verbose) experience to anyone used to the file descriptor based POSIX world, volunteers that are willing to develop and maintain such code on their own time are pretty thin on the ground. As aresult, it's frequently necessary to turn to proprietary vendors to get a smooth, Windows-appropriate user experience. Given that Windows itself is a proprietary OS, suggesting that people use a free-as-in-beer-but-not-as-in-speech package that lets them skip the boring bits and get straight to coding sounds quite reasonable to me. Sure it's not perfect, but unless you can wave your hand and create a larger pool of volunteer developers that decide to stick with Windows for their hobbyist development instead of embracing a completely open platform like Linux or a POSIX-compatible open core one like Mac OS X, Windows support is always going to lag (including in the installation-and-deployment space). My experience on Linux is that most things, up to and including pip installation of C extension modules, *just works* (the exception being that some C extensions have broken build processes and require a bit of cajoling - it would be nice if someone actually sat down and wrote a bdist_simple PEP instead of just talking about it on this list). Automating the setup of these platforms is fairly straightforward because they come with tools like Python and wget preinstalled, so you can just use them without needing to worry about giving the user instructions on obtaining them. In contrast, on Windows, you have to do a lot of work up front to be able to compile C extensions at all, and installing pip is a far cry from being able to just do "yum install python-pip". You don't even have access to "wget" to fetch a script that handles the setup for you. Getting set up to do software development on Windows is hard because Windows is built on the assumption that the world can be cleanly divided into "Developers" that build their own copies of software from source code (basically, people that are willing to pay for a copy of Visual Studio, or at least download and install one of the Express editions) and "Users" that only run software that someone else built (everyone else). The Linux distros (and other open source platforms), on the other hand, make the tools to *build* the software just as readily available as the software itself (although, these days, they also do their best to make sure you don't *need* to build stuff from source). Cross platform tools like Python can make an attempt to paper over those fundamental philosophical differences between the platforms, but really, there's only so much any given third party can do about it (and, for most people, trying to do so doesn't qualify as a fun hobby). Suppose Python core gets our packaging story on Windows fixed. What then? Well, NumPy still runs into problems due to BLAS. What's installing Postgres, MySQL or MongoDB on Windows like? (I genuinely don't know, I've never tried). Are there Windows installers for PyPy? These kinds of road blocks are endemic in Windows open source development, and they'll likely stay that way unless MS release a native Windows POSIX compatibility layer that isn't horrible (I personally expect that to happen somewhere around the time the Earth gets swallowed by the Sun). > If we should be pointing new users to ActiveState or Enthought, maybe > we should just change the python.org default installers to what they > provide. No, because the python.org installers are what the redistributor's use to create their own sumo packages. It may be reasonable for us to point new users that aren't already experienced Windows software developers directly to the sumo distributions, though. For example, as far I know, Python(X, Y) does a nice job of dumping a comprehensive Python environment on a Windows system without relying on a proprietary vendor. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From tjreedy at udel.edu Fri Feb 10 04:48:49 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Feb 2012 22:48:49 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F341710.9030806@molden.no> References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> Message-ID: On 2/9/2012 1:57 PM, Sturla Molden wrote: > On 09.02.2012 19:42, Mike Meyer wrote: > >> If threading is the only acceptable concurrency mechanism, then Python >> is the wrong language to use. But you're also not building scaleable >> systems, which is most of where it really matters. If you're willing >> to consider things other than threading - and you have to if you want >> to build scaleable systems - then Python makes a good choice. > > Yes or no... Python is used for parallel computing on the biggest > supercomputers, monsters like Cray and IBM blue genes with tens of > thousands of CPUs. But what really fails to scale is the Python module > loader! For example it can take hours to "import numpy" for 30,000 > Python processes on a blue gene. Mike Meyer posted that on pydev today http://mail.scipy.org/pipermail/numpy-discussion/2012-January/059801.html They determined that the time was gobbled by *finding* modules in each process, so they cut hours by finding them in 1 process and sending the locations to the other 29,999. We are already discussing how to use this lesson in core Python. The sub-thread is today's posts in "requirements for moving __import__ over to importlib?" -- Terry Jan Reedy From tjreedy at udel.edu Fri Feb 10 05:21:20 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Feb 2012 23:21:20 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: On 2/9/2012 2:16 PM, Giampaolo Rodol? wrote: > I bet a lot of people don't want to upgrade for another reason: unicode. > The impression I got is that python 3 forces the user to use and > *understand* unicode and a lot of people simply don't want to deal > with that. Do *you* think that? Or or you reporting what others think? In either case, we have another communication problem. If one only uses the ascii subset, the usage of 3.x strings is transparent. As far as I can think, one does not need to know *anything* about unicode to use 3.x. In 3.3, there will not even be a memory hit. We should be saying that. Thanks for the head's up. It is hard to know what misconceptions people have until someone reports them ;-). > In python 2 there was no such a strong imposition. Nor is there in 3.x. We need to communicate that. I may give it a try on python-list. If and when one does want to use more characters, it should be *easier* in 3.x than in 2.x, especially for non-Latin1 Western European chars . -- Terry Jan Reedy From anacrolix at gmail.com Fri Feb 10 05:30:41 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 10 Feb 2012 12:30:41 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: Not true, it's necessary to understand that encodings translate to and from bytes, and how to use the API. In 2.x you rarely needed to know what unicode is. On Feb 10, 2012 12:22 PM, "Terry Reedy" wrote: > On 2/9/2012 2:16 PM, Giampaolo Rodol? wrote: > > I bet a lot of people don't want to upgrade for another reason: unicode. >> The impression I got is that python 3 forces the user to use and >> *understand* unicode and a lot of people simply don't want to deal >> with that. >> > > Do *you* think that? Or or you reporting what others think? In either > case, we have another communication problem. If one only uses the ascii > subset, the usage of 3.x strings is transparent. As far as I can think, one > does not need to know *anything* about unicode to use 3.x. In 3.3, there > will not even be a memory hit. We should be saying that. > > Thanks for the head's up. It is hard to know what misconceptions people > have until someone reports them ;-). > > In python 2 there was no such a strong imposition. >> > > Nor is there in 3.x. We need to communicate that. I may give it a try on > python-list. If and when one does want to use more characters, it should be > *easier* in 3.x than in 2.x, especially for non-Latin1 Western European > chars . > > -- > Terry Jan Reedy > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Feb 10 05:36:36 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 09 Feb 2012 23:36:36 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: On 2/9/2012 2:31 PM, Matt Joiner wrote: >> Isn't that more accurate for framework writers, rather than for >> "average" users? How often do average users have to address >> encoding/decoding in Python 3? > > Constantly. As a Python noob I tried Python 3 it was the first wall I > encountered. I am really puzzled what you mean. I have used Python 3 since 3.0 alpha and as long as I have used strictly ascii, I have encountered no such issues. >>> f = open('f:/python/mypy/test.txt', 'w') >>> f.write('test line 1\n') 12 >>> f.write('test line 2 and more\n') 21 >>> f.close() Now I can open in any other program, or open in Python. I have learned about unicode, but just so I could play around with other characters. > I had to learn Unicode right then and there. Fortunately, > the Python docs HOWTO on Unicode is excellent. Were you doing some non-ascii or non-average framework-like things? Would you really not have had to learn the same about unicode if you were using 2.x? -- Terry Jan Reedy From ctb at msu.edu Fri Feb 10 06:00:44 2012 From: ctb at msu.edu (C. Titus Brown) Date: Thu, 9 Feb 2012 21:00:44 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% (Massimo Di Pierro) In-Reply-To: References: Message-ID: <20120210050044.GP18049@idyll.org> On Thu, Feb 09, 2012 at 04:02:34PM -0800, Guido van Rossum wrote: > Sadly, it's quite frequent that works really well in an educational setting > shouldn't be recommended in a professional programming environment, and > vice versa. I'm not sure how to answer this except by creating, maintaining > and promoting some wiki pages aimed specifically at instructors. Perhaps I am brainfried ATM, but I cannot imagine what you are talking about here. Do you have any examples you can share that illustrate what you mean? thanks much, --titus From guido at python.org Fri Feb 10 06:18:59 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Feb 2012 21:18:59 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% (Massimo Di Pierro) In-Reply-To: <20120210050044.GP18049@idyll.org> References: <20120210050044.GP18049@idyll.org> Message-ID: On Thu, Feb 9, 2012 at 9:00 PM, C. Titus Brown wrote: > On Thu, Feb 09, 2012 at 04:02:34PM -0800, Guido van Rossum wrote: > > Sadly, it's quite frequent that works really well in an educational > setting > > shouldn't be recommended in a professional programming environment, and > > vice versa. I'm not sure how to answer this except by creating, > maintaining > > and promoting some wiki pages aimed specifically at instructors. > > Perhaps I am brainfried ATM, but I cannot imagine what you are talking > about > here. Do you have any examples you can share that illustrate what you > mean? > Simplest example: many educators seem delighted with Python 3 because it solves a bunch of beginner's pitfalls, and their students learn in a greenfield situation. (Though this is not the case for Massimo.) Professionals OTOH don't seem to like Python 3 because it means they have to change a pile of software that took them a decade (and an army of programmers) to create. Educators also often give their students a simple library of convenience functions and tell them to put the magic line "from blah import *" at the top of their module (or session). Again something that most professionals loathe, but it works well for the first steps in programming -- certainly better than the Java approach "copy these ten lines of gobbledygook [the minimal "hello world" in Java] into your file, don't ask what they mean, and above all be careful not to accidentally edit any of them". OTOH when educators want their students to install some 3rd party package it is often something hideously complex like pygame, rather than something simple and elegant like WebOb or flask. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Feb 10 06:47:29 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Feb 2012 00:47:29 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: On 2/9/2012 11:30 PM, Matt Joiner wrote: > Not true, it's necessary to understand that encodings translate to and > from bytes, Only if you use byte encodings for ascii text. I never have, and I would not know why you do unless you are using internet modules that do not sufficiently hide such details. Anyway... >>> b = b'abc' >>> u = str(b) >>> b = bytes(u, 'ascii') So one only needs to know one encoding name, which most should know anyway, and that it *is* an encoding name. > and how to use the API. Give the required parameter, which is standard. > In 2.x you rarely needed to know what unicode is. All one *needs* to know about unicode, that I can see, is that unicode is a superset of ascii, that ascii number codes remain the same, and that one can ignore just about everything else until one uses (or wants to know about) non-ascii characters. Since one will see 'utf-8' here and there, it is probably to know that the utf-8 encoding is a superset of the ascii encoding, so that ascii text *is* utf-8 text. -- Terry Jan Reedy From ericsnowcurrently at gmail.com Fri Feb 10 08:04:54 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 10 Feb 2012 00:04:54 -0700 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <874nuzyda9.fsf@uwakimon.sk.tsukuba.ac.jp> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <874nuzyda9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Feb 9, 2012 at 1:14 PM, Stephen J. Turnbull wrote: > It's too > bad Python can't get a piece of that action! Getting closer: http://morepypy.blogspot.com/2012/02/almost-there-pypys-arm-backend_01.html -eric From stephen at xemacs.org Fri Feb 10 09:41:20 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 10 Feb 2012 17:41:20 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > > In python 2 there was no such a strong imposition [of Unicode > > awareness on users]. > > Nor is there in 3.x. Sorry, Terry, but you're basically wrong here. True, if one sticks to pure ASCII, there's no difference to notice, but that's just not possible for people who live outside of the U.S., or who share text with people outside of the U.S. They need currency symbols, they have friends whose names have little dots on them. Every single one of those is a backtrace waiting to happen. A backtrace on f = open('text-file.txt') for line in f: pass is an imposition. That doesn't happen in 2.x (for the wrong reasons, but it's very convenient 95% of the time). This is what Victor's "locale" codec is all about. I think that's the wrong spelling for the feature, but there does need to be a way to express "don't bother me about Unicode" in most scripts for most people. We don't have a decent boilerplate for that yet. From masklinn at masklinn.net Fri Feb 10 09:49:40 2012 From: masklinn at masklinn.net (Masklinn) Date: Fri, 10 Feb 2012 09:49:40 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> Message-ID: <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> On 2012-02-10, at 01:03 , Guido van Rossum wrote: > On Thu, Feb 9, 2012 at 3:52 PM, Sturla Molden wrote: >> Den 9. feb. 2012 kl. 23:56 skrev Guido van Rossum : >> ). >> >> >> Hm... is there a reason GSL and SciPy need to compete? Can't SciPy >> incorporate GSL? >> >> GPL vs BSD issue. >> > > That's a bummer. Someone should open negotiations. I'm not sure what could be open to negotiate, being part of the GNU constellation I don't see GSL budging from the GPL, and SciPy is backed by industry members and used in "nonfree" products (notably the Enthought Python Distribution) so there's little room for it to use the GPL. Best thing that could happen (and I'm not even sure it's allowed by the GSL's license (which is under the GPL not the LGPL) would be for SciPy to grow some sort of GSL backend to delegate its operations to, when the GSL is installed. From dirkjan at ochtman.nl Fri Feb 10 10:16:28 2012 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Fri, 10 Feb 2012 10:16:28 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: On Thu, Feb 9, 2012 at 19:26, Guido van Rossum wrote: > Are there still Python idioms/patterns/recipes around that depend on > refcounting? (There also used to be some well-known anti-patterns that were > only bad because of the refcounting, mostly around saving exceptions. But > those should all have melted away -- CPython has had auxiliary GC for over a > decade.) There are some simple patterns that are great with refcounting and not so great with garbage collection. We encountered some of these with Mercurial. IIRC, the basic example is just open('foo').read() With refcounting, the file will be closed soon. With garbage collection, it won't. Being able to rely on cleanup per frame/function call is pretty useful. Cheers, Dirkjan From jeanpierreda at gmail.com Fri Feb 10 10:20:59 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 10 Feb 2012 04:20:59 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 11:49 AM, Massimo Di Pierro wrote: > 50+% of the students have a mac and an increasing number of packages depend > on numpy. Installing numpy on mac is a lottery. > > Those who do not have a mac have windows and they expect an IDE like > eclipse. I know you can use Python with eclipse but they do not. They > download Python and complain that IDLE has no autocompletion, no line > numbers, no collapsible functions/classes. At the University of Toronto we tell students to use the Wing IDE (Wing 101 was developed specifically for our use in the classroom, in fact). All classroom examples are done either in the interactive interpreter, or in a session of Wing 101. All computer lab sessions are done using Wing 101, and the first lab is dedicated specifically for introducing how to edit files with it and use its debugging features. If students don't like IDLE, tell them to use a different editor instead, and pretend that Python doesn't include one with itself. (By default IDLE only shows an interactive session, so if they get curious and click-y they'll still be in the dark.) -- Devin From mark at hotpy.org Fri Feb 10 10:29:55 2012 From: mark at hotpy.org (Mark Shannon) Date: Fri, 10 Feb 2012 09:29:55 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: Message-ID: <4F34E393.9020105@hotpy.org> There are a lot of things covered in this thread. I want to address 2 of them. 1. Garbage Collection. Python has garbage collection. There is no free() function in Python, anyone who says that Python does not have GC is talking nonsense. CPython using reference counting as its means of implementing GC. Ref counting has different performance characteristics from tracing GC, but it only makes sense to consider this is the context of overall Python performance. One key disadvantage of ref-counting is that does not play well with threads, which leads on to... 2. Global Interpreter Lock and Threads. The GIL is so deeply embedded into CPython that I think it cannot be removed. There are too many subtle assumptions pervading both the VM and 3rd party code, to make truly concurrent threads possible. But are threads the way to go? Javascript does not have threads. Lua does not have threads. Erlang does not have threads; Erlang processes are implemented (in the BEAM engine) as coroutines. One of the Lua authors said this about threads: (I can't remember the quote so I will paraphrase) "How can you program in a language where 'a = a + 1' is not deterministic?" Indeed. What Python needs are better libraries for concurrent programming based on processes and coroutines. Cheers, Mark. From timothy.c.delaney at gmail.com Fri Feb 10 10:32:14 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 10 Feb 2012 20:32:14 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: On 10 February 2012 20:16, Dirkjan Ochtman wrote: > There are some simple patterns that are great with refcounting and not > so great with garbage collection. We encountered some of these with > Mercurial. IIRC, the basic example is just > > open('foo').read() > > With refcounting, the file will be closed soon. With garbage > collection, it won't. Being able to rely on cleanup per frame/function > call is pretty useful. This is the #1 anti-pattern that shouldn't be encouraged. Using this idiom is just going to cause problems (mysterious exceptions while trying to open files due to running out of file handles for the process) for anyone trying to port your code to other implementations of Python. If you read PEP 343 (and the various discussions around that time) it's clear that the above anti-pattern is one of the major driving forces for the introduction of the 'with' statement. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Feb 10 10:51:05 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 10 Feb 2012 11:51:05 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F341710.9030806@molden.no> References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> Message-ID: 09.02.12 20:57, Sturla Molden ???????(??): > Yes or no... Python is used for parallel computing on the biggest > supercomputers, monsters like Cray and IBM blue genes with tens of > thousands of CPUs. But what really fails to scale is the Python module > loader! For example it can take hours to "import numpy" for 30,000 > Python processes on a blue gene. And yes, nobody would consider to use > Java for such systems, even though Java does not have a GIL (well, > theads do no matter that much on a cluster with distributed memory > anyway). It is Python, C and Fortran that are popular. But that really > disproves that Python sucks for big concurrency, except perhaps for the > module loader. What about os.fork()? From jeanpierreda at gmail.com Fri Feb 10 10:54:13 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 10 Feb 2012 04:54:13 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> Message-ID: On Fri, Feb 10, 2012 at 4:32 AM, Tim Delaney wrote: > On 10 February 2012 20:16, Dirkjan Ochtman wrote: >> open('foo').read() >> >> With refcounting, the file will be closed soon. With garbage >> collection, it won't. Being able to rely on cleanup per frame/function >> call is pretty useful. > > > This is the #1 anti-pattern that shouldn't be encouraged. Using this idiom > is just going to cause problems (mysterious exceptions while trying to open > files due to running out of file handles for the process) for anyone trying > to port your code to other implementations of Python. It's not that open('foo').read() is "good". Clearly with the presence of nondeterministic garbage collection, it's bad. But it is convenient and compact. Refcounting GCs in general give very nice, predictable behavior, which lets us ignore a lot of the details of destroying things. Without something like this, we have to do some forms of resource management by hand that we could otherwise push to the garbage collector, and while sometimes this is as easy as a with statement, sometimes it isn't. For example, what do you do if multiple objects are meant to hold onto a file and take turns reading it? How do we close the file at the end when all the objects are done? Is the answer "manual refcounting"? Or is the answer "I don't care, let the GC handle it"? -- Devin From robert.kern at gmail.com Fri Feb 10 11:09:11 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 10 Feb 2012 10:09:11 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: On 2/10/12 8:49 AM, Masklinn wrote: > On 2012-02-10, at 01:03 , Guido van Rossum wrote: >> On Thu, Feb 9, 2012 at 3:52 PM, Sturla Molden wrote: >>> Den 9. feb. 2012 kl. 23:56 skrev Guido van Rossum: >>> ). >>> >>> >>> Hm... is there a reason GSL and SciPy need to compete? Can't SciPy >>> incorporate GSL? >>> >>> GPL vs BSD issue. >>> >> >> That's a bummer. Someone should open negotiations. > > I'm not sure what could be open to negotiate, being part of the GNU > constellation I don't see GSL budging from the GPL, and SciPy is backed > by industry members and used in "nonfree" products (notably the Enthought > Python Distribution) so there's little room for it to use the GPL. While I am an Enthought employee and really do want to keep scipy BSD so I can continue to use it in the proprietary software that I write for clients, I must also add that the most vociferous BSD advocates in our community are the academics. They have to wade through more weird licensing arrangements than I do, and the flexibility of the BSD license is quite important to let them get their jobs done. > Best thing that could happen (and I'm not even sure it's allowed by the > GSL's license (which is under the GPL not the LGPL) would be for SciPy to > grow some sort of GSL backend to delegate its operations to, when the GSL > is installed. We've done that kind of thing in the past for FFTW and other libraries but have since removed them for all of the installation and maintenance headaches it causes. In my mind (and others disagree), having scipy-the-package subsume every relevant library is not a worthwhile pursuit. The important thing is that these packages are available to the scientific Python community. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From masklinn at masklinn.net Fri Feb 10 11:41:58 2012 From: masklinn at masklinn.net (Masklinn) Date: Fri, 10 Feb 2012 11:41:58 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: On 2012-02-10, at 11:09 , Robert Kern wrote: > On 2/10/12 8:49 AM, Masklinn wrote: >> On 2012-02-10, at 01:03 , Guido van Rossum wrote: >>> On Thu, Feb 9, 2012 at 3:52 PM, Sturla Molden wrote: >>>> Den 9. feb. 2012 kl. 23:56 skrev Guido van Rossum: >>>> ). >>>> >>>> >>>> Hm... is there a reason GSL and SciPy need to compete? Can't SciPy >>>> incorporate GSL? >>>> >>>> GPL vs BSD issue. >>>> >>> >>> That's a bummer. Someone should open negotiations. >> >> I'm not sure what could be open to negotiate, being part of the GNU >> constellation I don't see GSL budging from the GPL, and SciPy is backed >> by industry members and used in "nonfree" products (notably the Enthought >> Python Distribution) so there's little room for it to use the GPL. > > While I am an Enthought employee and really do want to keep scipy BSD so I can continue to use it in the proprietary software that I write for clients, I must also add that the most vociferous BSD advocates in our community are the academics. They have to wade through more weird licensing arrangements than I do, and the flexibility of the BSD license is quite important to let them get their jobs done. Completely true, I'd thought about this case but completely forgot about it when I started actually writing my message, I'm very sorry. From cmjohnson.mailinglist at gmail.com Fri Feb 10 11:43:14 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Fri, 10 Feb 2012 00:43:14 -1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: <28E8080F-75DF-455E-9C6A-BE3D4D603604@gmail.com> Can we please break this thread out into multiple subject headers? It's very difficult to follow the flow of conversation with some many different discussions all lumped under one name. Some proposed subjects: - Refcounting vs. Other GC - Numpy - Windows Installers - Unicode - Python in Education - Python's Popularity From robert.kern at gmail.com Fri Feb 10 11:49:29 2012 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 10 Feb 2012 10:49:29 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: On 2/10/12 10:41 AM, Masklinn wrote: > On 2012-02-10, at 11:09 , Robert Kern wrote: >> On 2/10/12 8:49 AM, Masklinn wrote: >>> On 2012-02-10, at 01:03 , Guido van Rossum wrote: >>>> On Thu, Feb 9, 2012 at 3:52 PM, Sturla Molden wrote: >>>>> Den 9. feb. 2012 kl. 23:56 skrev Guido van Rossum: >>>>> ). >>>>> >>>>> >>>>> Hm... is there a reason GSL and SciPy need to compete? Can't SciPy >>>>> incorporate GSL? >>>>> >>>>> GPL vs BSD issue. >>>>> >>>> >>>> That's a bummer. Someone should open negotiations. >>> >>> I'm not sure what could be open to negotiate, being part of the GNU >>> constellation I don't see GSL budging from the GPL, and SciPy is backed >>> by industry members and used in "nonfree" products (notably the Enthought >>> Python Distribution) so there's little room for it to use the GPL. >> >> While I am an Enthought employee and really do want to keep scipy BSD so I can continue to use it in the proprietary software that I write for clients, I must also add that the most vociferous BSD advocates in our community are the academics. They have to wade through more weird licensing arrangements than I do, and the flexibility of the BSD license is quite important to let them get their jobs done. > > Completely true, I'd thought about this case but completely forgot about it > when I started actually writing my message, I'm very sorry. No apologies necessary. I just wanted to be thorough. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From yoavglazner at gmail.com Fri Feb 10 11:51:03 2012 From: yoavglazner at gmail.com (yoav glazner) Date: Fri, 10 Feb 2012 12:51:03 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <28E8080F-75DF-455E-9C6A-BE3D4D603604@gmail.com> References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> <28E8080F-75DF-455E-9C6A-BE3D4D603604@gmail.com> Message-ID: On Fri, Feb 10, 2012 at 12:43 PM, Carl M. Johnson < cmjohnson.mailinglist at gmail.com> wrote: > Can we please break this thread out into multiple subject headers? It's > very difficult to follow the flow of conversation with some many different > discussions all lumped under one name. > > Some proposed subjects: > > - Refcounting vs. Other GC > - Numpy > - Windows Installers > - Unicode > - Python in Education > - Python's Popularity > No, The subject is correct, we have a -3% problem in the index. so the solution is to keep this thread long with many keywords like python pypy jython etc... and than the % will grow! (at least @TIOBE since it relies on google search ;) ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Fri Feb 10 11:52:09 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Fri, 10 Feb 2012 12:52:09 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: > > Also, AFAIK Ruby has a GIL much like Python. I think it's time to start a > PR offensive explaining why these are not the problem the trolls make them > out to be, and how you simply have to use different patterns for scaling in > some languages than in others. GIL + Threads = Fibers CPython doesn't have threads but it calls its fibers "threads" which causes confusion and disappointment. The underlying implementation is not important eg when you implement a "lock" using "events" does the lock become an event? No. This is a PR disaster. 100% agree we need a PR offensive but first we need a strategy. Erlang champions the actor/message paradigm so they dodge the threading bullet completely. What's the python championed parallelism paradigm? It should be on the front page of python.org and in the first paragraph of wikipedia on python. One of the Lua authors said this about threads: > (I can't remember the quote so I will paraphrase) > "How can you program in a language where 'a = a + 1' is not deterministic?" > Indeed. Anyone who cares enough about performance doesn't mind that 'a = a + 1' is only as deterministic as you design it to be with or without locks. Multiprocessing has this same problem btw. What Python needs are better libraries for concurrent programming based on > processes and coroutines. The killer feature for threads (vs multiprocessing) is access to shared state with nearly zero overhead. And note that a single-threaded event-driven process can serve 100,000 open > sockets -- while no JVM can create 100,000 threads. Graphics engines, simulations, games, etc don't want 100,000 threads, they just want true threads as many as there are CPU's. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Feb 10 12:23:04 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 10 Feb 2012 12:23:04 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: Yuval Greenfield, 10.02.2012 11:52: > GIL + Threads = Fibers No, that only applies (to a certain extent) to Python code being executed, not to the complete runtime which includes external libraries etc. Stefan From anacrolix at gmail.com Fri Feb 10 12:25:27 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 10 Feb 2012 19:25:27 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F34E393.9020105@hotpy.org> References: <4F34E393.9020105@hotpy.org> Message-ID: This. The process support is pretty good with multiprocessing but the coroutines are missing. On Feb 10, 2012 5:30 PM, "Mark Shannon" wrote: > > There are a lot of things covered in this thread. > I want to address 2 of them. > > 1. Garbage Collection. > > Python has garbage collection. There is no free() function in Python, > anyone who says that Python does not have GC is talking nonsense. > CPython using reference counting as its means of implementing GC. > > Ref counting has different performance characteristics from tracing GC, > but it only makes sense to consider this is the context of overall > Python performance. > One key disadvantage of ref-counting is that does not play well with > threads, which leads on to... > > 2. Global Interpreter Lock and Threads. > > The GIL is so deeply embedded into CPython that I think it cannot be > removed. There are too many subtle assumptions pervading both the VM and > 3rd party code, to make truly concurrent threads possible. > > But are threads the way to go? > Javascript does not have threads. Lua does not have threads. > Erlang does not have threads; Erlang processes are implemented (in the > BEAM engine) as coroutines. > > One of the Lua authors said this about threads: > (I can't remember the quote so I will paraphrase) > "How can you program in a language where 'a = a + 1' is not deterministic?" > Indeed. > > What Python needs are better libraries for concurrent programming based on > processes and coroutines. > > Cheers, > Mark. > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Fri Feb 10 14:57:56 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Fri, 10 Feb 2012 15:57:56 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: On Fri, Feb 10, 2012 at 1:23 PM, Stefan Behnel wrote: > Yuval Greenfield, 10.02.2012 11:52: > > GIL + Threads = Fibers > > No, that only applies (to a certain extent) to Python code being executed, > not to the complete runtime which includes external libraries etc. > Pure python code running in python "threads" on CPython behaves like fibers. I'd like to point out the word "external" in your statement. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Feb 10 15:01:54 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 10 Feb 2012 15:01:54 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: Yuval Greenfield, 10.02.2012 14:57: > On Fri, Feb 10, 2012 at 1:23 PM, Stefan Behnel wrote: >> Yuval Greenfield, 10.02.2012 11:52: >>> GIL + Threads = Fibers >> >> No, that only applies (to a certain extent) to Python code being executed, >> not to the complete runtime which includes external libraries etc. > > Pure python code running in python "threads" on CPython behaves like > fibers. I'd like to point out the word "external" in your statement. Yes, many people forget that existing/external/yourwordinghere code in non-Python languages is (and has always been) a substantial part of the Python platform. Stefan From anacrolix at gmail.com Fri Feb 10 15:48:07 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Fri, 10 Feb 2012 22:48:07 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <4606859F-1DCB-4B1C-8A6D-A875011B8128@masklinn.net> <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: > Pure python code running in python "threads" on CPython behaves like fibers. > I'd like to point out the word "external" in your statement. I don't believe this to be true. Fibers are not preempted. The GIL is released at regular intervals to allow the effect of preempted switching. Many other behaviours of Python threads are still native thread like, particularly in their interaction with other components and the OS. GIL + Threads = Simplified, non parallel interpreter From massimo.dipierro at gmail.com Fri Feb 10 15:52:16 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Fri, 10 Feb 2012 08:52:16 -0600 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F34E393.9020105@hotpy.org> References: <4F34E393.9020105@hotpy.org> Message-ID: <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> The way I see it is not whether Python has threads, fibers, coroutines, etc. The problem is that in 5 years we going to have on the market CPUs with 100 cores (my phone has 2, my office computer has 8 not counting GPUs). The compiler/interpreters must be able to parallelize tasks using those cores without duplicating the memory space. Erlang may not have threads in the sense that it does not expose threads via an API but provides optional parallel schedulers where coroutines are distributed automatically over the available cores/CPUs (http://erlang.2086793.n4.nabble.com/Some-facts-about-Erlang-and-SMP-td2108770.html). Different languages have different mechanisms for taking advantages of multiple cores without forking. Python does not provide a mechanism and I do not know if anybody is working on one. In Python, currently, you can only do threading to parallelize your code without duplicating memory space, but performance decreases instead of increasing with number of cores. This means threading is only good for concurrency not for scalability. The GC vs reference counting (RC) is the hearth of the matter. With RC every time a variable is allocated or deallocated you need to lock the counter because you do know who else is accessing the same variable from another thread. This forces the interpreter to basically serialize the program even if you have threads, cores, coroutines, etc. Forking is a solution only for simple toy cases and in trivially parallel cases. People use processes to parallelize web serves and task queues where the tasks do not need to talk to each other (except with the parent/master process). If you have 100 cores even with a small 50MB program, in order to parallelize it you go from 50MB to 5GB. Memory and memory access become a major bottle neck. Erlang Massimo On Feb 10, 2012, at 3:29 AM, Mark Shannon wrote: > > There are a lot of things covered in this thread. > I want to address 2 of them. > > 1. Garbage Collection. > > Python has garbage collection. There is no free() function in Python, > anyone who says that Python does not have GC is talking nonsense. > CPython using reference counting as its means of implementing GC. > > Ref counting has different performance characteristics from tracing GC, > but it only makes sense to consider this is the context of overall > Python performance. > One key disadvantage of ref-counting is that does not play well with threads, which leads on to... > > 2. Global Interpreter Lock and Threads. > > The GIL is so deeply embedded into CPython that I think it cannot be removed. There are too many subtle assumptions pervading both the VM and 3rd party code, to make truly concurrent threads possible. > > But are threads the way to go? > Javascript does not have threads. Lua does not have threads. > Erlang does not have threads; Erlang processes are implemented (in the BEAM engine) as coroutines. > > One of the Lua authors said this about threads: > (I can't remember the quote so I will paraphrase) > "How can you program in a language where 'a = a + 1' is not deterministic?" > Indeed. > > What Python needs are better libraries for concurrent programming based on processes and coroutines. > > Cheers, > Mark. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Feb 10 15:59:45 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 10 Feb 2012 15:59:45 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: Matt Joiner, 10.02.2012 15:48: >> Pure python code running in python "threads" on CPython behaves like fibers. >> I'd like to point out the word "external" in your statement. > > I don't believe this to be true. Fibers are not preempted. The GIL is > released at regular intervals to allow the effect of preempted > switching. Many other behaviours of Python threads are still native > thread like, particularly in their interaction with other components > and the OS. Absolutely. Even C extensions cannot always prevent a thread switch from happening when they need to call back into CPython's C-API. > GIL + Threads = Simplified, non parallel interpreter Note that this also applies to PyPy, so even "interpreter" isn't enough of a generalisation. I think it's best to speak of the GIL as what it is: a lock that protects internal state of the CPython runtime (and also some external code, when used that way). Rather convenient, if you ask me. Stefan From masklinn at masklinn.net Fri Feb 10 16:11:54 2012 From: masklinn at masklinn.net (Masklinn) Date: Fri, 10 Feb 2012 16:11:54 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: On 2012-02-10, at 15:52 , Massimo Di Pierro wrote: > Erlang may not have threads in the sense that it does not expose threads via an API but provides optional parallel schedulers -smp has been enabled by default since R13 or R14, it's as optional as multithreading being optional because you can bind a process to a core. > In Python, currently, you can only do threading to parallelize your code without duplicating memory space, but performance decreases instead of increasing with number of cores. This means threading is only good for concurrency not for scalability. That's definitely not true, you can also fork and multiprocessing, while not ideal by a long shot, provides a number of tools for working building concurrent applications via multiple processes. From masklinn at masklinn.net Fri Feb 10 16:12:58 2012 From: masklinn at masklinn.net (Masklinn) Date: Fri, 10 Feb 2012 16:12:58 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> Message-ID: <3BB3585B-EE07-4931-99CD-0FCF79D0557C@masklinn.net> On 2012-02-10, at 15:59 , Stefan Behnel wrote: >> GIL + Threads = Simplified, non parallel interpreter > > Note that this also applies to PyPy, so even "interpreter" isn't enough of > a generalisation. > > I think it's best to speak of the GIL as what it is: a lock that protects > internal state of the CPython runtime (and also some external code, when > used that way). Rather convenient, if you ask me. It is very convenient from the viewpoint of implementing the interpreter, but you must acknowledge that it comes with quite severe limitations on the ability of user code to take advantage of computing resources. From stefan_ml at behnel.de Fri Feb 10 16:28:11 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 10 Feb 2012 16:28:11 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: Massimo Di Pierro, 10.02.2012 15:52: > Different languages have different mechanisms for taking advantages of > multiple cores without forking. Python does not provide a mechanism and > I do not know if anybody is working on one. Seriously - what's wrong with forking? multiprocessing is so increadibly easy to use that it's hard for me to understand why anyone would fight for getting threading to do essentially the same thing, just less safe. Threading is a seriously hard problem, very tricky to get right and full of land mines. Basically, you start from a field that's covered with one big mine, and start cutting it down until you can get yourself convinced that the remaining mines (if any, right?) are small enough to not hurt anyone. They usually do anyway, but at least not right away. This is generally worth a read (not necessarily for the conclusion, but definitely for the problem description): http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf > In Python, currently, you can only do threading to parallelize your code > without duplicating memory space, but performance decreases instead of > increasing with number of cores. Well, nothing keeps you from putting your data into shared memory if you use multiple processes. It's not that hard either, but it has the major advantage over threading that you can choose exactly what data should be shared, so that you can more easily avoid race conditions and unintended interdependencies. Basically, you start from a safe split and then add explicit data sharing and messaging until you have enough shared data and synchronisation points to make it work, while still keeping up a safe and efficient concurrent system. Note how this is the opposite of threading, where you start off from the maximum possible unsafety where all state is shared, and then wade through it with a machete trying to cut down unsafe interaction points. And if you miss any one spot, you have a problem. > This means threading is only good for > concurrency not for scalability. Yes, concurrency, or more specifically, I/O concurrency is still a valid use case for threading. > The GC vs reference counting (RC) is the hearth of the matter. With RC > every time a variable is allocated or deallocated you need to lock the > counter because you do know who else is accessing the same variable from > another thread. This forces the interpreter to basically serialize the > program even if you have threads, cores, coroutines, etc. > > Forking is a solution only for simple toy cases and in trivially > parallel cases. People use processes to parallelize web serves and task > queues where the tasks do not need to talk to each other (except with > the parent/master process). If you have 100 cores even with a small 50MB > program, in order to parallelize it you go from 50MB to 5GB. Memory and > memory access become a major bottle neck. I think you should read up a bit on the various mechanisms for parallel processing. Stefan From stefan_ml at behnel.de Fri Feb 10 16:30:12 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 10 Feb 2012 16:30:12 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <3BB3585B-EE07-4931-99CD-0FCF79D0557C@masklinn.net> References: <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> <3BB3585B-EE07-4931-99CD-0FCF79D0557C@masklinn.net> Message-ID: Masklinn, 10.02.2012 16:12: > On 2012-02-10, at 15:59 , Stefan Behnel wrote: >>> GIL + Threads = Simplified, non parallel interpreter >> >> Note that this also applies to PyPy, so even "interpreter" isn't enough of >> a generalisation. >> >> I think it's best to speak of the GIL as what it is: a lock that protects >> internal state of the CPython runtime (and also some external code, when >> used that way). Rather convenient, if you ask me. > > It is very convenient from the viewpoint of implementing the interpreter, > but you must acknowledge that it comes with quite severe limitations on > the ability of user code to take advantage of computing resources. I don't think it does. See my other post just now in response to Massimo. Stefan From jimjjewett at gmail.com Fri Feb 10 16:38:02 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 10 Feb 2012 10:38:02 -0500 Subject: [Python-ideas] [Python-Dev] matrix operations on dict :) In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 7:11 PM, Mark Janssen wrote: > On Wed, Feb 8, 2012 at 9:54 AM, julien tayon wrote: >> >> 2012/2/7 Mark Janssen : >> > On Mon, Feb 6, 2012 at 6:12 PM, Steven >> > D'Aprano??wrote: >> > I have the problem looking for this solution! >> > The application for this functionality is in coding a fractal graph (or >> > "multigraph" in the literature). I think that would be better represented using an object of some sort, such as a MultiGraphNode and/or MultiGraphEdge, instead of re-purposing dict. > Okay, I guess I did not make myself very clear. ?What I'm proposing probably > will (eventually) require changes to the "object" model of Python That means you're talking about Python 4, at a minimum, and you would need to show how valuable it is by building a workaround version and getting people to use that extensively in Python 3. And frankly, you should probably do that anyhow; this feels to me like a bad plan for language defaults, but it is still a valid use case -- and I don't think this sort of math exploration should (or will) wait for Python 4; people will model it somehow in an existing language. > The symbol that denotes a compound would be the colon > (":") and associates a left hand side with right-hand side value, > a NAME with a VALUE. ? A dictionary would (then) be a SET > of these. (Voila! things have already gotten simplified.) That sounds like an association list. I think you're dealing with sufficiently abstract problems that you don't want to restrict your keys to hashable things, and it is worth suffering a bit slower performance in return. >?Eventually, I also think this will seque and integrate > nicely into Mark Shannon's "shared-key dict" proposal (PEP 410). I'm pretty sure he doesn't intend to change the semantics of dict at all. He does want to make the implementation more efficient, at least in terms of space; any semantic differences are considered either bugs or costs worth paying for that efficiency. > While in the abstract one might think to allow any arbitrary data-type for > right-hand-side values, in PRACTICE, integers are sufficient. By integers, do you really mean pointers or (possibly abstract) references to other structures? Because if you do, then ordinary arithmetic isn't the right solution, but if you don't, then I don't see them as sufficient. > ...?It makes sense to use ... a maximally abstract top-most level -- > this is simply an abstract grouping type (i.e. a collection). ?I'm going to > suggest a SET is the most abstract (i.e. sufficient) because it does > not impose an order I agree that it is the most abstract (at least of the well-known) type, and that all the other types can be represented in terms of sets. The catch is that these representations may be massively inefficient. If you're doing mathematical exploration, that may be a reasonable tradeoff, but Python also caters to other use cases. > (Could one represent a python CLASS heirarchy more simply with this > fractalset object somehow....?) Depending on what you want to represent, probably. But if you want to represent the ancestors of a given class for efficient method and attribute access, then no; it is hard to beat an array for efficiency of sequential access. -jJ From arnodel at gmail.com Fri Feb 10 16:43:54 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Fri, 10 Feb 2012 15:43:54 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: On 10 February 2012 14:52, Massimo Di Pierro wrote: > Forking is a solution only for simple toy cases and in trivially parallel > cases. People use processes to parallelize web serves and task queues where > the tasks do not need to talk to each other (except with the parent/master > process).?If you have 100 cores even with a small 50MB program, in order to > parallelize it you go from 50MB to 5GB. Memory and memory access become a > major bottle neck. I don't know much about forking, but I'm pretty sure that forking a process doesn't mean you double the amount of physical memory used. With copy-on-write, a lot of physical memory can be shared. -- Arnaud From ncoghlan at gmail.com Fri Feb 10 16:57:12 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Feb 2012 01:57:12 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <20120209185810.GC20556@mcnabbs.org> <20120209222540.GD20556@mcnabbs.org> <623AA797-6F68-4158-82D1-A21872B34782@masklinn.net> <3BB3585B-EE07-4931-99CD-0FCF79D0557C@masklinn.net> Message-ID: On Sat, Feb 11, 2012 at 1:30 AM, Stefan Behnel wrote: > Masklinn, 10.02.2012 16:12: >> On 2012-02-10, at 15:59 , Stefan Behnel wrote: >>>> GIL + Threads = Simplified, non parallel interpreter >>> >>> Note that this also applies to PyPy, so even "interpreter" isn't enough of >>> a generalisation. >>> >>> I think it's best to speak of the GIL as what it is: a lock that protects >>> internal state of the CPython runtime (and also some external code, when >>> used that way). Rather convenient, if you ask me. >> >> It is very convenient from the viewpoint of implementing the interpreter, >> but you must acknowledge that it comes with quite severe limitations on >> the ability of user code to take advantage of computing resources. > > I don't think it does. See my other post just now in response to Massimo. Armin Rigo's series on Software Transactional Memory on the PyPy blog is also required reading for anyone seriously interested in practical shared memory concurrency that doesn't impose a horrendous maintenance burden on developers that try to use it: http://morepypy.blogspot.com.au/2011/06/global-interpreter-lock-or-how-to-kill.html http://morepypy.blogspot.com.au/2011/08/we-need-software-transactional-memory.html http://morepypy.blogspot.com.au/2012/01/transactional-memory-ii.html And for those that may be inclined to dismiss STM as pie-in-the-sky stuff that is never going to be practical in the "real world", the best I can offer is Intel's plans to bake an initial attempt at it into a consumer grade chip within the next couple of years: http://arstechnica.com/business/news/2012/02/transactional-memory-going-mainstream-with-intel-haswell.ars? I do like Armin's analogy that free threading is to concurrency as malloc() and free() are to memory management :) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri Feb 10 16:58:33 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Feb 2012 01:58:33 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: On Sat, Feb 11, 2012 at 1:43 AM, Arnaud Delobelle wrote: > I don't know much about forking, but I'm pretty sure that forking a > process doesn't mean you double the amount of physical memory used. > With copy-on-write, a lot of physical memory can be shared. Unfortunately, CPython's use of refcounting plays merry hell with the effectiveness of copy-on-write memory saving techniques. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From massimo.dipierro at gmail.com Fri Feb 10 17:05:44 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Fri, 10 Feb 2012 10:05:44 -0600 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: <93FAD103-1A38-44FC-9A2B-91EC58FB7BE3@gmail.com> On Feb 10, 2012, at 9:28 AM, Stefan Behnel wrote: > Massimo Di Pierro, 10.02.2012 15:52: >> >> Forking is a solution only for simple toy cases and in trivially >> parallel cases. People use processes to parallelize web serves and task >> queues where the tasks do not need to talk to each other (except with >> the parent/master process). If you have 100 cores even with a small 50MB >> program, in order to parallelize it you go from 50MB to 5GB. Memory and >> memory access become a major bottle neck. > > I think you should read up a bit on the various mechanisms for parallel > processing. yes I should ;-) (Perhaps I should take this course http://www.cdm.depaul.edu/academics/pages/courseinfo.aspx?CrseId=001533) The fact is, in my experience, many modern applications where performance is important try to take advantage of all parallelization available. I have worked on many years in lattice QCD and I have written code that runs on various parallel machines. We used processes to parallelize across nodes, threads to parallelize on single node, and assembly vectorial instructions to parallelize within each core. This used to be a state of art way of programming but now I see these patters trickling down to many consumer applications, for example games. People do not like threads because of the need for locking but, as you increase the number of cores, the bottle neck becomes memory access. If you use processes, you don't just bloat ram usage killing cache performance but you need to use message passing for interprocess communication. Message passing require copy of data which is expensive (remember ram is the bottle neck). Ever worse, some times message passing cannot be done using ram only and you need disk buffered message for interprocess communication. Some programs are parallelized ok with processes. Those I have experience with require both processes and threads. Again, this does not mean using threading APIs. The VM should use threads to parallelize tasks. How this is exposed to the developed is a different matter. Massimo -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Fri Feb 10 17:07:09 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Fri, 10 Feb 2012 10:07:09 -0600 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: <0A002488-3C24-4C3B-B213-0F7CF4C3E65C@gmail.com> On Feb 10, 2012, at 9:43 AM, Arnaud Delobelle wrote: > On 10 February 2012 14:52, Massimo Di Pierro wrote: >> Forking is a solution only for simple toy cases and in trivially parallel >> cases. People use processes to parallelize web serves and task queues where >> the tasks do not need to talk to each other (except with the parent/master >> process). If you have 100 cores even with a small 50MB program, in order to >> parallelize it you go from 50MB to 5GB. Memory and memory access become a >> major bottle neck. > > I don't know much about forking, but I'm pretty sure that forking a > process doesn't mean you double the amount of physical memory used. > With copy-on-write, a lot of physical memory can be shared. Anyway, copy-on-write does not solve the problem. The OS tries to save memory but not duplicating physical memory space and by assigning the different address spaces of the various forked processes to the same physical memory. But as soon as one process writes into the segment, the entire segment is copied. It has to be, the processes must have different address spaces. That is what fork does. Anyway, there are many applications that are parallelized well with processes (at least for a small number of cores/cpus). > > -- > Arnaud From stefan_ml at behnel.de Fri Feb 10 17:07:12 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 10 Feb 2012 17:07:12 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: Arnaud Delobelle, 10.02.2012 16:43: > On 10 February 2012 14:52, Massimo Di Pierro wrote: >> Forking is a solution only for simple toy cases and in trivially parallel >> cases. People use processes to parallelize web serves and task queues where >> the tasks do not need to talk to each other (except with the parent/master >> process). If you have 100 cores even with a small 50MB program, in order to >> parallelize it you go from 50MB to 5GB. Memory and memory access become a >> major bottle neck. > > I don't know much about forking, but I'm pretty sure that forking a > process doesn't mean you double the amount of physical memory used. > With copy-on-write, a lot of physical memory can be shared. That applies to systems that support both fork and copy-on-write. Not all systems are that lucky, although many major Unices have caught up in recent years. The Cygwin implementation of fork() is especially involved for example, simple because Windows lacks this idiom completely (well, in it's normal non-POSIX identity, that is, where basically all Windows programs run). http://seit.unsw.adfa.edu.au/staff/sites/hrp/webDesignHelp/cygwin-ug-net-nochunks.html#OV-HI-PROCESS Stefan From phd at phdru.name Thu Feb 9 16:28:42 2012 From: phd at phdru.name (Oleg Broytman) Date: Thu, 9 Feb 2012 19:28:42 +0400 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <0E6116E4-A406-4C49-A8C1-C18D6228C141@masklinn.net> Message-ID: <20120209152842.GA15149@iskra.aviel.ru> On Thu, Feb 09, 2012 at 05:13:03PM +0200, Yuval Greenfield wrote: > On Thu, Feb 9, 2012 at 5:05 PM, Masklinn wrote: > > On 2012-02-09, at 15:36 , anatoly techtonik wrote: > > > Hi, > > > > > > I didn't want to grow FUD on python-dev, but a FUD there seems to be a > > good > > > topic for discussion here. > > > http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html > > > > 1. Python-ideas is not the right place for this stuff (neither is > > Python-dev, by the way) > > 2. Why would anybody care exactly? > > 1. Where would be the correct place to talk about a grand state of > python affairs? Nowhere because: 1. Nobody cares. This is Free Software, and we are scratching our own itches. 2. Do you consider Python developers stupid? Do you think they don't have any idea how things are going on in the wild? > 2. Like it or not, many use such ratings to decide which language to > learn, which language to use for their next project and whether or not to > be proud of their language of choice. Java (or Perl, or whatever) has won, hands down. Congrats to them! Can we please return to our own development? We are not going to conquer the world, are we? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From breamoreboy at yahoo.co.uk Fri Feb 10 17:33:53 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 10 Feb 2012 16:33:53 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: On 10/02/2012 16:07, Stefan Behnel wrote: > Arnaud Delobelle, 10.02.2012 16:43: >> On 10 February 2012 14:52, Massimo Di Pierro wrote: >>> Forking is a solution only for simple toy cases and in trivially parallel >>> cases. People use processes to parallelize web serves and task queues where >>> the tasks do not need to talk to each other (except with the parent/master >>> process). If you have 100 cores even with a small 50MB program, in order to >>> parallelize it you go from 50MB to 5GB. Memory and memory access become a >>> major bottle neck. >> >> I don't know much about forking, but I'm pretty sure that forking a >> process doesn't mean you double the amount of physical memory used. >> With copy-on-write, a lot of physical memory can be shared. > > That applies to systems that support both fork and copy-on-write. Not all > systems are that lucky, although many major Unices have caught up in recent > years. > > The Cygwin implementation of fork() is especially involved for example, > simple because Windows lacks this idiom completely (well, in it's normal > non-POSIX identity, that is, where basically all Windows programs run). > > http://seit.unsw.adfa.edu.au/staff/sites/hrp/webDesignHelp/cygwin-ug-net-nochunks.html#OV-HI-PROCESS > > Stefan For those who don't follow c.l.p a thread subject "Fabric Engine + Python bechmarks" turned up 30 minutes ago. Problem solved? :) -- Cheers. Mark Lawrence. From dreamingforward at gmail.com Fri Feb 10 17:56:02 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Fri, 10 Feb 2012 09:56:02 -0700 Subject: [Python-ideas] [Python-Dev] matrix operations on dict :) In-Reply-To: References: Message-ID: On Fri, Feb 10, 2012 at 8:38 AM, Jim Jewett wrote: > On Thu, Feb 9, 2012 at 7:11 PM, Mark Janssen > wrote: > > On Wed, Feb 8, 2012 at 9:54 AM, julien tayon wrote: > >> > >> 2012/2/7 Mark Janssen : > >> > On Mon, Feb 6, 2012 at 6:12 PM, Steven > >> > D'Aprano wrote: > > >> > I have the problem looking for this solution! > > >> > The application for this functionality is in coding a fractal graph > (or > >> > "multigraph" in the literature). > > I think that would be better represented using an object of some sort, > such as a MultiGraphNode and/or MultiGraphEdge, instead of > re-purposing dict. > Those would be good strategies in general, but the issue is how things hook together in the object model. These things are very abstract, it's exactly the thing which had made metaclasses difficult to "grok" at times. I'll probably just have to try to implement them in Pypy or abandon the idea. > > > Okay, I guess I did not make myself very clear. What I'm proposing > probably > > will (eventually) require changes to the "object" model of Python > > That means you're talking about Python 4, at a minimum, and you would > need to show how valuable it is by building a workaround version and > getting people to use that extensively in Python 3. > > I understand what you're saying but I among many of us Python3000 never really happened. So this is really still for what was dreamed to happen in version 3. > > > While in the abstract one might think to allow any arbitrary data-type > for > > right-hand-side values, in PRACTICE, integers are sufficient. > > By integers, do you really mean pointers or (possibly abstract) > references to other structures? Because if you do, then ordinary > arithmetic isn't the right solution, but if you don't, then I don't > see them as sufficient. > > No actually python integers. My point is that within this unified information model, everything can be represented by atomic units (where integers come in) and groups (or a collection type). Compre with how all the complexity of the physical world is a product of small-massed electrons and protons. I'm arguing that all the uses of data can be represented in a similar way. Thanks for the reply, but I think I'll shelve the discussion for now.... mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Feb 10 18:38:01 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 10 Feb 2012 18:38:01 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: <20120210183801.59921627@pitrou.net> On Fri, 10 Feb 2012 08:52:16 -0600 Massimo Di Pierro wrote: > The way I see it is not whether Python has threads, fibers, coroutines, etc. > The problem is that in 5 years we going to have on the market CPUs with > 100 cores This is definitely untrue. No CPU maker has plans for a general-purpose 100-core CPU. Regards Antoine. From mwm at mired.org Fri Feb 10 19:34:10 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 10 Feb 2012 10:34:10 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: <20120210103410.4a5d5841@bhuda.mired.org> On Fri, 10 Feb 2012 08:52:16 -0600 Massimo Di Pierro wrote: > Forking is a solution only for simple toy cases and in trivially parallel cases. But threading is only a solution for simple toy cases and trivial levels of scaling. > People use processes to parallelize web serves and task queues where > the tasks do not need to talk to each other (except with the > parent/master process). Only if they haven't thought much about using processes to build parallel systems. They work quite well for data that can be handed off to the next process, and where the communications is a small enough part of the problem that serializing it for communications is reasonable, and for cases where the data that needs high-speed communications can be treated as a relocatable chunk of memory. And any combination of those three, of course. The real problem with using processes in python is that there's no way to share complex python objects between processes - you're restricted to ctypes values or arrays of those. For many applications, that's fine. If you need to share a large searchable structure, you're reduced to FORTRAN techniques. > If you have 100 cores even with a small 50MB program, in order to > parallelize it you go from 50MB to 5GB. Memory and memory access > become a major bottle neck. That should be fixed in the OS, not by making your problem 2**100 times as hard to analyze. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From mal at egenix.com Fri Feb 10 19:36:52 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 10 Feb 2012 19:36:52 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: <4F3563C4.2050703@egenix.com> Massimo Di Pierro wrote: > Forking is a solution only for simple toy cases and in trivially parallel cases. People use processes to parallelize web serves and task queues where the tasks do not need to talk to each other (except with the parent/master process). If you have 100 cores even with a small 50MB program, in order to parallelize it you go from 50MB to 5GB. Memory and memory access become a major bottle neck. By the time we 100 core CPUs, we'll be measuring RAM in TB, so that shouldn't be a problem ;-) Many Python use cases are indeed easy to scale using multiple processes which then each run on a separate core, so that approach is a very workable way forward. If you need to share data across processes, you can use a shared memory mechanism. In many cases, the data to be shared will already be stored in a database and those can easily be accessed from all processes (again using shared memory). I often find these GIL discussion a bit theoretical. In practice I've so far never run into any issues with Python scalability. It's other components that cause a lot more trouble, like e.g. database query scalability, network congestion or disk access being too slow. In cases where the GIL does cause problems, it's usually better to consider changing the application design and use asynchronous processing with a single threaded design or a multi-process design where each of the processes only uses a low number of threads (20-50 per process). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 10 2012) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mwm at mired.org Fri Feb 10 19:52:08 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 10 Feb 2012 10:52:08 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3563C4.2050703@egenix.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> Message-ID: <20120210105208.6f133329@bhuda.mired.org> On Fri, 10 Feb 2012 19:36:52 +0100 "M.-A. Lemburg" wrote: > In cases where the GIL does cause problems, it's usually better to > consider changing the application design and use asynchronous processing > with a single threaded design or a multi-process design where each of > the processes only uses a low number of threads (20-50 per process). Just a warning: mixing threads and forks can be hazardous to your sanity. In particular, forking a process that has threads running has behaviors, problems and solutions that vary between Unix variants. Best to make sure you've done all your forks before you create a thread if you want your code to be portable. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From jnoller at gmail.com Fri Feb 10 19:55:54 2012 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 10 Feb 2012 13:55:54 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3563C4.2050703@egenix.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> Message-ID: <2B72BFF25CAD476C9785EC6960B43344@gmail.com> > > > By the time we 100 core CPUs, we'll be measuring RAM in TB, so that > shouldn't be a problem ;-) > > Many Python use cases are indeed easy to scale using multiple processes > which then each run on a separate core, so that approach is a very > workable way forward. > > If you need to share data across processes, you can use a shared > memory mechanism. In many cases, the data to be shared will already > be stored in a database and those can easily be accessed from all > processes (again using shared memory). > > I often find these GIL discussion a bit theoretical. In practice > I've so far never run into any issues with Python scalability. It's > other components that cause a lot more trouble, like e.g. database > query scalability, network congestion or disk access being too slow. > > In cases where the GIL does cause problems, it's usually better to > consider changing the application design and use asynchronous processing > with a single threaded design or a multi-process design where each of > the processes only uses a low number of threads (20-50 per process). I think the much, much better response to the questions and comments around Python, the GIL and parallel computing in general is this: Yes, let's have more of that! It's like asking if people like pie, or babies. 99% of people polled are going to say "Yes, let's have more of that!" - so it goes with Python, the GIL, STM, Multiprocessing, Threads, etc. Where all of these discussions break down - and they always do - is that we lack: 1> Someone with a working patch for Pie 2> Someone with a fleshed out proposal/PEP on how to get more Pie 3> A group of people with time to bake more Pies that could help be paid to make Pie Banging on the table and asking for more Pie won't get us more Pie - what we need are actual proposals, in the form of well thought out PEPs, the people to implement and maintain the thing (see: unladen swallow), or working implementations. No one in this thread is arguing that having more Pie, or babies, would be bad. No one is arguing that more/better concurrency constructs would be good. Tools like concurrent.futures in Python 3 would be a good example of something recently added. The problem is people, plans and time. If we can solve the People and Time problems, instead of looking to already overworked volunteers then I'm sure we can come up with a good Pie plan. I really like pie. Jesse From mal at egenix.com Fri Feb 10 20:15:21 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 10 Feb 2012 20:15:21 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120210105208.6f133329@bhuda.mired.org> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <20120210105208.6f133329@bhuda.mired.org> Message-ID: <4F356CC9.6070006@egenix.com> Mike Meyer wrote: > On Fri, 10 Feb 2012 19:36:52 +0100 > "M.-A. Lemburg" wrote: >> In cases where the GIL does cause problems, it's usually better to >> consider changing the application design and use asynchronous processing >> with a single threaded design or a multi-process design where each of >> the processes only uses a low number of threads (20-50 per process). > > Just a warning: mixing threads and forks can be hazardous to your > sanity. In particular, forking a process that has threads running has > behaviors, problems and solutions that vary between Unix > variants. Best to make sure you've done all your forks before you > create a thread if you want your code to be portable. Right. Applications using such strategies will usually have long running processes, so it's often better to spawn new processes than to use fork. This also helps if you want to bind processes to cores. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 10 2012) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From sturla at molden.no Fri Feb 10 20:54:35 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 10 Feb 2012 20:54:35 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3563C4.2050703@egenix.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> Message-ID: <4F3575FB.60700@molden.no> On 10.02.2012 19:36, M.-A. Lemburg wrote: > By the time we 100 core CPUs, we'll be measuring RAM in TB, so that > shouldn't be a problem ;-) Actually, Python is already great for those. They are called GPUs, and OpenCL is all about text processing. > In cases where the GIL does cause problems, it's usually better to > consider changing the application design and use asynchronous processing > with a single threaded design or a multi-process design where each of > the processes only uses a low number of threads (20-50 per process). The "GIL problem" is much easier to analyze than most Python developers using Linux might think: - Windows has no fork system call. SunOS used to have a very slow fork system call. The majority of Java developers worked with Windows or Sun, and learned to work with threads. For which the current summary is: - The GIL sucks because Windows has no fork. Which some might say is the equivalent of: - Windows sucks. Sturla From sturla at molden.no Fri Feb 10 20:57:31 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 10 Feb 2012 20:57:31 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <20120209104237.154be949@bhuda.mired.org> <4F341710.9030806@molden.no> Message-ID: <4F3576AB.40202@molden.no> On 10.02.2012 10:51, Serhiy Storchaka wrote: > What about os.fork()? MPI starts by spawning a group of empty processes. If you use these massive parallel computers, you have to play by the MPI rules. Sturla From sturla at molden.no Fri Feb 10 21:01:07 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 10 Feb 2012 21:01:07 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> Message-ID: <4F357783.3050300@molden.no> On 09.02.2012 22:05, Nick Coghlan wrote: > Have you even *tried* concurrent.futures > (http://docs.python.org/py3k/library/concurrent.futures)? Or the 2.x > backport on PyPI (http://pypi.python.org/pypi/futures)? Multiprocessing is fine, but is uses pickle for IPC and this is inefficient. We need unpickled, type-specialized queues. Or a queue that has the interface of a binary file. Sturla From solipsis at pitrou.net Fri Feb 10 21:02:09 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 10 Feb 2012 21:02:09 +0100 Subject: [Python-ideas] multiprocessing IPC References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> <4F357783.3050300@molden.no> Message-ID: <20120210210209.277a50da@pitrou.net> On Fri, 10 Feb 2012 21:01:07 +0100 Sturla Molden wrote: > On 09.02.2012 22:05, Nick Coghlan wrote: > > > Have you even *tried* concurrent.futures > > (http://docs.python.org/py3k/library/concurrent.futures)? Or the 2.x > > backport on PyPI (http://pypi.python.org/pypi/futures)? > > Multiprocessing is fine, but is uses pickle for IPC and this is > inefficient. We need unpickled, type-specialized queues. Or a queue that > has the interface of a binary file. If you have any concrete idea for that, don't hesitate to post it on the bug tracker, or here under a separate thread (this thread is a train wreck). Regards Antoine. From mwm at mired.org Fri Feb 10 22:15:52 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 10 Feb 2012 13:15:52 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F357783.3050300@molden.no> References: <4F340A81.60300@pearwood.info> <10381712-394F-47F9-986D-8D4A7679CC69@gmail.com> <4F341419.6030808@molden.no> <4F357783.3050300@molden.no> Message-ID: <20120210131552.487b1f9d@bhuda.mired.org> On Fri, 10 Feb 2012 21:01:07 +0100 Sturla Molden wrote: > On 09.02.2012 22:05, Nick Coghlan wrote: > > > Have you even *tried* concurrent.futures > > (http://docs.python.org/py3k/library/concurrent.futures)? Or the 2.x > > backport on PyPI (http://pypi.python.org/pypi/futures)? > > Multiprocessing is fine, but is uses pickle for IPC and this is > inefficient. We need unpickled, type-specialized queues. Or a queue that > has the interface of a binary file. In what way does the mmap module fail to provide your binary file interface? http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From mal at egenix.com Fri Feb 10 23:31:59 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 10 Feb 2012 23:31:59 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3575FB.60700@molden.no> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> Message-ID: <4F359ADF.3060202@egenix.com> Sturla Molden wrote: > On 10.02.2012 19:36, M.-A. Lemburg wrote: >> In cases where the GIL does cause problems, it's usually better to >> consider changing the application design and use asynchronous processing >> with a single threaded design or a multi-process design where each of >> the processes only uses a low number of threads (20-50 per process). > > The "GIL problem" is much easier to analyze than most Python developers using Linux might think: > > - Windows has no fork system call. SunOS used to have a very slow fork system call. The majority of > Java developers worked with Windows or Sun, and learned to work with threads. > > For which the current summary is: > > - The GIL sucks because Windows has no fork. > > Which some might say is the equivalent of: > > - Windows sucks. I'm not sure why you think you need os.fork() in order to work with multiple processes. Spawning processes works just as well and, often enough, is all you really need to get the second variant working. The first variant doesn't need threads at all, but can not always be used since it requires all application components to play along nicely with the async approach. I forgot to mention a third variant: use a multi-process design with single threaded asynchronous processing in each process. This third variant is becoming increasingly popular, esp. if you have to handle lots and lots of individual requests with relatively low need for data sharing between the requests. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 10 2012) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jimjjewett at gmail.com Sat Feb 11 00:33:42 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 10 Feb 2012 18:33:42 -0500 Subject: [Python-ideas] Py3 unicode impositions Message-ID: On Fri, Feb 10, 2012 at 3:41 AM, Stephen J. Turnbull wrote: > Terry Reedy writes: > > ?> > In python 2 there was no such a strong imposition [of Unicode > ?> > awareness on users]. > ?> Nor is there in 3.x. > Sorry, Terry, but you're basically wrong here. ?True, if one sticks to > pure ASCII, there's no difference to notice, but that's just not > possible for people who live outside of the U.S., or who share text > with people outside of the U.S. ?They need currency symbols, they > have friends whose names have little dots on them. ?Every single > one of those is a backtrace waiting to happen. ?A backtrace on > ? ?f = open('text-file.txt') > ? ?for line in f: pass > is an imposition. ?That doesn't happen in 2.x (for the wrong reasons, > but it's very convenient 95% of the time). I may be missing something, but as best I can tell (1) That uses an implicit encoding of None. (2) encoding=None is documented as being platform-dependent. Are you saying that some (many? all?) platforms make a bad choice there? Does that only happen when sys.getdefaultencoding() != sys.getfilesystemencoding(), or when one of them gives bad information? (FWIW, on a mostly ASCII windows machine, the default is utf-8 but the filesystem encoding is mbcs, so merely being different doesn't always provoke problems.) Would it cause problems to make the default be whatever locale returns, or whatever it returns the first time open is called? -jJ From sturla at molden.no Sat Feb 11 00:36:15 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 11 Feb 2012 00:36:15 +0100 Subject: [Python-ideas] multiprocessing IPC Message-ID: <4F35A9EF.7030309@molden.no> Den 10.02.2012 22:15, skrev Mike Meyer: > In what way does the mmap module fail to provide your binary file > interface? References: <4F35A9EF.7030309@molden.no> Message-ID: On Feb 10, 2012, at 6:36 PM, Sturla Molden wrote: > Den 10.02.2012 22:15, skrev Mike Meyer: >> In what way does the mmap module fail to provide your binary file interface? > The short answer is that BSD mmap creates an anonymous kernel object. When working with multiprocessing for a while, one comes to the conclusion that we really need named kernel objects. > > Here are two simple fail cases for anonymous kernel objects: > > - Process A spawns/forks process B. > - Process B creates an object, one of the attributes is a lock. > - Fail: This object cannot be communicated back to process A. B inherits from A, A does not inherit from B. > > - Process A spawns/forks a process pool. > - Process A creates an object, one of the attributes is a lock. > - Fail: This object cannot be communicated to the pool. They do not inherit new handles from A after they are started. > > All of multiprocessing's IPC classes suffer from this! > > Solution: > > Use named kernel objects for IPC, pickle the name. > > I made a shared memory array for NumPy that workes like this -- implemented by memory mapping from the paging file on Windows, System V IPC on Linux. Underneath is an extension class that allocates a shared memory buffer. When pickled it encodes the kernel name, not its content, and unpickling opens the object given its name. > > There is another drawback too: > > The speed of pickle. For example, sharing NumPy arrays with pickle is not faster with shared memory. The overhead from pickle completely dominate the time needed for IPC . That is why I want a type specialized or a binary channel. Making this from the named shared memory class I already have is a no-brainer. > > So that is my other objection against multiprocessing. > > That is: > > 1. Object sharing by handle inheritance fails when kernel objects must be passed back to the parent process or to a process pool. We need IPC objects that have a name in the kernel, so they can be created and shared in retrospect. > > 2. IPC with multiprocessing is too slow due to pickle. We need something that does not use pickle. (E.g. shared memory, but not by means of mmap.) It might be that the pipe or socket in multiprocessing will do this (I have not looked at it carefully enough), but they still don't have > > Proof of concept: > > http://dl.dropbox.com/u/12464039/sharedmem-feb12-2009.zip > > Dependency on Cython and NumPy should probably be removed, never mind that. Important part is this: > > sharedmemory_sysv.pyx (Linux) > sharedmemory_win.pyx and ntqueryobject.c (Windows) > > Finally, I'd like to say that I think Python's standard lib should support high-performance asynchronous I/O for concurrency. That is not poll/select (on Windows it does not even work properly). Rather, I want IOCP on Windows, epoll on Linux, and kqueue on Mac. (Yes I know about twisted.) There should also be a requirement that it works with multiprocessing. E.g. if we open a process pool, the processes should be able to use the same IOCP. In other words some highly scalable asynchronous I/O that works with multiprocessing. > > So ... As far as I am concerned, the only thing worth keeping in multipricessing is multiprocessing.Process and multiprocessing.Pool. The rest doesn't do what we want. > > > Sturla > Sturla, I think I've talked to you before - patches to improve multiprocessing from you are definitely welcome, and needed. I disagree with tossing as much out as you are suggesting - managers are pretty useful, for example, but the entire team and especially me would welcome patches to improve things. Jesse From tjreedy at udel.edu Sat Feb 11 01:07:51 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Feb 2012 19:07:51 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 2/10/2012 3:41 AM, Stephen J. Turnbull wrote: > Terry Reedy writes: > >>> In python 2 there was no such a strong imposition [of Unicode >>> awareness on users]. The claim is that Python3 imposes a large burden on users that Python2 does not. >> Nor is there in 3.x. I view that claim as FUD, at least for many users, and at least until the persons making the claim demonstrate it. In particular, I claim that people who use Python2 knowing nothing of unicode do not need to know much more to do the same things in Python3. And, if someone uses Python2 with full knowledge of Unicode, that Python3 cannot impose any extra burden. Since I am claiming negatives, the burden of proof is on those who claim otherwise. > Sorry, Terry, but you're basically wrong here. This is not a nice way to start a response, especially when you go on to admit that I was right as the the user case I discussed. Here is what you clipped. >> If one only uses the ascii subset, the usage of 3.x strings is transparent. > True, if one sticks to pure ASCII, there's no difference to notice, Which is a restatement what you clipped. In another post I detailed the *small* amount (one paragraph) that I believe such people need to know to move to Python3. I have not seen this minimum laid out before and I think it would be useful to help such people move to Python3 without FUD fear. > but that's just not possible for people who live outside of the U.S., Who *already* have to know about more than ascii to use Python2. The question is whether they have to learn *substantially* more to use Python3. > or who share text with people outside of the U.S. > They need currency symbols, they have friends whose names > have little dots on them. OK, real-life example. My wife has colleagues in China. They interchange emails (utf-8 encoded) with project budgets and some Chinese characters. Suppose she asks me to use Python to pick out ? renminbi/yuan figures and convert to dollars. What 'strong imposition' does Python3 make to learn things I would not have to know to do the same thing in Python2? > Every single one of those is a backtrace waiting to happen. > A backtrace on > f = open('text-file.txt') > for line in f: pass I do not consider that adding an encoding argument to make the same work in Python3 to be "a strong imposition of unicode awareness". Do you? In order to do much other than pass, I believe one typically needs to know the encoding of the file, even in Python2. And of course, knowing about and using the one unicode byte encoding is *much* easier than knowing about and using the 100 or so non-unicode (or unicode subset) encodings. To me, Python3's s = open('text.txt', 'utf-8').read() is easier and simpler than either Python2 version below (and please pardon any errors as I never actually did this) import codecs s = codecs.open('text.txt', 'utf-8').read() or f = open('text.txt') s = unicode(f.read, 'utf-8') -- Terry Jan Reedy From anacrolix at gmail.com Sat Feb 11 03:24:30 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 11 Feb 2012 10:24:30 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Threading is a tool (the most popular, and most flexible tool) for concurrency and parallelism. Compared to forking, multiprocessing, shared memory, mmap, and dozens of other auxiliary OS concepts it's also the easiest. Not all problems are clearly chunkable or fit some alternative parallelism pattern. Threading is arguably the cheapest method for parallelism, as we've heard throughout this thread. Just because it can be dangerous is no reason to discourage it. Many alternatives are equally as dangerous, more difficult and less portable. Python is a very popular language.someone mentioned earlier that popularity shouldn't be an argument for features but here it's fair ground. If Python 3 had unrestrained threading, this transition plunge would not be happening. People would be flocking to it for their free lunches. Unrestrained single process parallelism is the #1 reason not to choose Python for a future project. Note that certain fields use alternative parallelism like MPI, and whoopee for them, these aren't applicable to general programming. Nor is the old argument "write a C extension". Except for old stooges who can't let go of curly braces, most people agree Python is the best mainstream language, but the implementation is holding it back. The GIL has to go if CPython is to remain viable in the future for non-trivial applications. The current transition is like VB when .NET came out: everyone switched to C# rather than upgrade to VB.NET, because it was wiser to switch to the better language than to pay the high upgrade cost. Unfortunately the Python 3 ship has sailed, and presumably the GIL has to remain until 4.x at the least. Given this, it seems there is some wisdom in the current head-in-the-sand advice: It's too hard to remove the GIL so just use some other mechanism if you want parallelism, but it's misleading to suggest they're superior as described above. So with that in mind, can the following changes occur in Python 3 without breaking spec? - Replace the ref-counting with another GC? - Remove the GIL? If not, should these be relegated to Python 4 and alternate implementation discussions? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Feb 11 04:20:15 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Feb 2012 22:20:15 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Matt: directing a threading rant at me because I posted about unicode, a completely different subject, is bizarre. I have not said a word on this thread, and hardly ever on any other thread, about threading, concurrency, and the GIL. I have no direct interest in these subjects. But since you directed this at me, I will answer. On 2/10/2012 9:24 PM, Matt Joiner wrote: ... > So with that in mind, can the following changes occur in Python 3 > without breaking spec? > > - Replace the ref-counting with another GC? > - Remove the GIL? If you had paid attention to this thread and others, you would know 1. These are implementation details not in the spec. 2. There are other implementations without these. 3. People have attempted the changes you want for CPython. But so far, both would have substantial negative impacts on many CPython users, including me. 4. You are free to try to improve on previous work. As to the starting subject of this thread: I switched to Python 1.3, just before 1.4, when Python was an obscure language in the Tiobe 20s. I thought then and still do that it was best for *me*, regardless of what others decided for themselves. So while am I pleased that it usage has risen considerably, I do not mind that it has (relatively) plateaued over the last 5 years. And I am not panicked that an up wiggle was followed by a down wiggle. -- Terry Jan Reedy From stephen at xemacs.org Sat Feb 11 04:32:20 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Feb 2012 12:32:20 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87vcnevyd7.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > > Sorry, Terry, but you're basically wrong here. > > This is not a nice way to start a response, especially when you go on to > admit that I was right as the the user case I discussed. Here is what > you clipped. The point is that the user case you discuss is a toy case. Of course the problem goes away if you get to define the problem away. I don't know of any nice way to say that. > In another post I detailed the *small* amount (one paragraph) that > I believe such people need to know to move to Python3. I have not > seen this minimum laid out before and I think it would be useful to > help such people move to Python3 without FUD fear. I'll go back and take a look at it. It probably is useful. But I don't think it deals with the real issue. The problem is that without substantially more knowledge than what you describe as the minimum, the fear, uncertainty, and doubt is *real*. Anybody who follows Mailman, for example, is going to hear (even today, though much less frequently than 3 years ago, and only for installations with ancient Mailman from 2006 or so) of weird Unicode errors that cause messages to be "lost". Hearing that Python 3 requires everything be decoded to Unicode is not going to give innocent confidence. There's also a lot of FUD being created out of whole cloth, as well, such as the alleged inefficiency of recoding ASCII into Unicode, etc., which doesn't matter for most applications. The problem is that the FUD based on real issues that you don't understand gives credibility to the FUD that somebody made up. > OK, real-life example. My wife has colleagues in China. They interchange > emails (utf-8 encoded) with project budgets and some Chinese characters. > Suppose she asks me to use Python to pick out ? renminbi/yuan figures > and convert to dollars. What 'strong imposition' does Python3 make to > learn things I would not have to know to do the same thing in > Python2? None. The FUD is not about *processing* non-ASCII. It's about non-ASCII horking your process even though you have no intention of processing it. > I do not consider that adding an encoding argument to make the same work > in Python3 to be "a strong imposition of unicode awareness". Do > you? Yes, I do. If you get it wrong, you will still get a fatal UnicodeError. > In order to do much other than pass, I believe one typically needs > to know the encoding of the file, even in Python2. The gentleman once again seems to be suffering from a misconception. Quite often you need to know nothing about the encoding of a file, except that the parts you care about are ASCII-encoded. For example, in an American programming shop git log | ./count-files-touched-per-day.py will founder on '?scar Fuentes' as author, unless you know what coding system is used, or know enough to use latin-1 (because it's effectively binary, not because it's the actual encoding). > And of course, knowing about and using the one unicode byte > encoding is *much* easier than knowing about and using the 100 or > so non-unicode (or unicode subset) encodings. > > To me, Python3's > > s = open('text.txt', 'utf-8').read() > > is easier and simpler than either Python2 Indeed, it is. But we're not talking about dealing with Unicode; we're talking about why somebody who really only wants to deal with ASCII needs to know more about Unicode in Python 3 than in Python 2. > (and please pardon any errors as I never actually did this) > > import codecs > s = codecs.open('text.txt', 'utf-8').read() > > or > > f = open('text.txt') > s = unicode(f.read, 'utf-8') The reason why Unicode is part of the FUD is that in Python 2 you never needed to do that, unless you wanted to deal with a non-English language. With Python 3 you need to deal with the codec, always, or risk a UnicodeError simply because some Spaniard's name gets mentioned by somebody who cares about orthography. From anacrolix at gmail.com Sat Feb 11 04:40:39 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 11 Feb 2012 11:40:39 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I'm asking if it'd actually be accepted in 3. I know well, and have seen how quickly things are blocked and rejected in core (dabeaz and shannon's attempts come to mind). I'm well familiar with previous attempts. As an example consider that replacing ref counting would probably change the API, but is a prerequisite for performant removal of the GIL. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Feb 11 05:12:13 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Feb 2012 13:12:13 +0900 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: Message-ID: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> Jim Jewett writes: > Are you saying that some (many? all?) platforms make a bad choice there? No. I'm saying that whatever choice is made (except for 'latin-1' because it accepts all bytes regardless of the actual encoding of the data, or PEP 383 "errors='surrogateescape'" for the same reason, both of which are unacceptable defaults for production code *for the same reason*), there is data that will cause that idiom to fail on Python 3 where it would not on Python 2. This is especially the case if you work with older text data on Mac or modern Linux where UTF-8 is used, because you're almost certain to run into Latin-1-encoded files. My favorite example is ChangeLogs, which broke my Gentoo package manager when I experimented with using Python 3 as the default Python. Most packages would work fine, but for some reason some Python program in the PMS was actually reading the ChangeLogs, and sometimes they'd be impure ASCII (I don't recall whether it was utf-8 or latin-1), giving a fatal UnicodeError and everything grinds to a halt. That is reason enough for the naive to embrace fear, uncertainty, and doubt about Python 3's use of Unicode. The fact is that with a little bit of knowledge, you can almost certainly get more reliable (and in case of failure, more debuggable) results from Python 3 than from Python 2. But people are happy to deal with the devil they know, even though it's more noxious than the devil they don't. Counteracting FUD with words generally doesn't work IME, unless the words are a "magic spell" that reduces the unknown to the known. From cmjohnson.mailinglist at gmail.com Sat Feb 11 06:04:07 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Fri, 10 Feb 2012 19:04:07 -1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <87vcnevyd7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <87vcnevyd7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <50BA6538-76D0-4B1B-8C2A-6DBEB9B1B94B@gmail.com> On Feb 10, 2012, at 5:32 PM, Stephen J. Turnbull wrote: > will founder on '?scar Fuentes' as author, unless you know what coding > system is used, or know enough to use latin-1 (because it's > effectively binary, not because it's the actual encoding). Or just use errors="surrogateescape". I think we should tell people who are scared of unicode and refuse to learn how to use it to just add an errors="surrogateescape" keyword to their file open arguments. Obviously, it's the wrong thing to do, but it's wrong in the same way that Python 2 bytes are wrong, so if you're absolutely committed to remaining ignorant of encodings, you can continue to do that. From ncoghlan at gmail.com Sat Feb 11 07:15:52 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 11 Feb 2012 16:15:52 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Feb 11, 2012 at 1:40 PM, Matt Joiner wrote: > I'm asking if it'd actually be accepted in 3. Why is that relevant? If free threading is the all-singing all dancing wonderment you believe: 1. Fork CPython 2. Make it free-threaded (while retaining backwards compatibility with all the C extensions out there!) 3. Watch the developer hordes flock to your door (after all, it's the lack of free-threading that has held Python's growth back for the last two decades, so everyone will switch in a heartbeat the second you, or anyone else, publishes a free-threaded alternative where all their C extensions work. Right?). > I know well, and have seen how > quickly things are blocked and rejected in core (dabeaz and shannon's > attempts come to mind). I'm well familiar with previous attempts. If that's what you think happened, then no, you're not familiar with them at all. python-dev has just a few simple rules for accepting a free-threading patch: 1. All current third party C extension modules must continue to work (ask the folks working on Ironclad for IronPython and cpyext for PyPy how much fun *that* requirement is) 2. Calls to builtin functions and methods should remain atomic (the Jython experience can help a lot here) 3. The performance impact on single threaded scripts must be minimal (which basically means eliding all the locks in single-threaded mode the way CPython currently does with the GIL, but then creating those elided locks in the correct state when Python's threading support gets initialised) That's it, that's basically all the criteria we have for accepting a free-threading patch. However, while most people are quite happy to say "Hey, someone should make CPython free-threaded!", they're suddenly far less interested when faced with the task of implementing it *themselves* while preserving backwards compatibility (and if you think the Python 2 -> Python 3 transition is rough going, you are definitely *not* prepared for the enormity of the task of trying to move the entire C extension ecosystem away from the refcounting APIs. The refcounting C API compatibility requirement is *not* optional if you want a free-threaded CPython to be the least bit interesting in real world terms). When we can't even get enough volunteers willing to contribute back their fixes and workarounds for the known flaws in multiprocessing, do people think there is some magical concurrency fairy that will sprinkle free threading pixie dust over CPython and the GIL will be gone? Removing the GIL *won't* be fun. Just like fixing multiprocessing, or making improvements to the GIL itself, it will be a long, tedious grind dealing with subtleties of the locking and threading implementations on Windows, Linux, Mac OS X, *BSD, Solaris and various other platforms where CPython is supported (or at least runs). For extra fun, try to avoid breaking the world for CPython variants on platforms that don't even *have* threading (e.g. PyMite). And your reward for all that effort? A CPython with better support for what is arguably one of the *worst* approaches to concurrency that computer science has ever invented. If a fraction of the energy that people put into asking for free threading was instead put into asking "how can we make inter-process communication better?", we'd no doubt have a good shared object implementation in the mmap module by now (and someone might have actually responded to Jesse's request for a new multiprocessing maintainer when circumstances forced him to step down). But no, this is the internet: it's much easier to incessantly berate python-dev for pointing out that free threading would be extraordinarily hard to implement correctly and isn't the panacea that many folks seem to think it is than it is to go *do* something that's more likely to offer a good return on the time investment required. My own personal wishlist for Python's concurrency support? * I want to see mmap2 up on PyPI, with someone working on fast shared object IPC that can then be incorporated into the stdlib's mmap module * I want to see multiprocessing2 on PyPI, with someone working on the long list of multiprocessing bugs on the python.org bug tracker (including adding support for Windows-style, non-fork based child processes on POSIX platforms) * I want to see progress on PEP 3153, so that some day we can have a "Python event loop" instead of a variety of framework specific event loops, as well as solid cross-platform async IO support in the stdlib. As Jesse said earlier, asking for free threading in CPython is like asking for free pie. Sure, free pie would be nice, but who's going to bake it? And what else could those people be doing with their time if they weren't busy baking pie? Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stefan_ml at behnel.de Sat Feb 11 09:11:36 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 11 Feb 2012 09:11:36 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Matt Joiner, 11.02.2012 03:24: > Threading is a tool (the most popular, and most flexible tool) for > concurrency and parallelism. Compared to forking, multiprocessing, shared > memory, mmap, and dozens of other auxiliary OS concepts it's also the > easiest. Sure, "easy" as in "nothing is easier to get wrong". You did read my post on this matter, right? I'm yet to see a piece of non-trivially parallel code that uses threading and is known to be completely safe under all circumstances. And I've seen a lot. > Not all problems are clearly chunkable or fit some alternative > parallelism pattern. Threading is arguably the cheapest method for > parallelism, as we've heard throughout this thread. Wrong again. Threading can be pretty expensive in terms of unexpected data dependencies, but it certainly is in terms of debugging time. Debugging spurious threading issues is amongst the hardest problems for a programmer. > Just because it can be dangerous is no reason to discourage it. Many > alternatives are equally as dangerous, more difficult and less portable. Seriously - how is running separate processes less portable than threading? > Python is a very popular language.someone mentioned earlier that popularity > shouldn't be an argument for features but here it's fair ground. If Python > 3 had unrestrained threading Note that this is not a "Python 2 vs. Python 3" issue. In fact, it has nothing to do with Python 3 in particular. [stripped some additional garbage] Stefan From masklinn at masklinn.net Sat Feb 11 09:26:14 2012 From: masklinn at masklinn.net (Masklinn) Date: Sat, 11 Feb 2012 09:26:14 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <6142B265-EAE6-4D04-8A2D-8F289344A06E@masklinn.net> On 2012-02-11, at 03:24 , Matt Joiner wrote: > Threading is a tool (the most popular, and most flexible tool) for > concurrency and parallelism. Compared to forking, multiprocessing, shared > memory, mmap, and dozens of other auxiliary OS concepts it's also the > easiest. Such a statement unqualified can only be declared wrong. Threading is the most common due to Windows issues (historically, Unix parallelism used multiple processes and the switch happened with the advent of multiplatform tools, which focused on threading due to Windows's poor performances and high overhead with processes), and it is also the easiest tool *to start using*, because you just say "start a thread". Which is equivalent to saying grenades are the easiest tool to handle conversations because you just pull the pin. Threads are by far the hardest concurrency tool to use because they throw out the window all determinism in the whole program, and that determinism then needs to be reclaimed through (very) careful analysis and the use of locks or other such sub-tools. And the flexibility claim is complete nonsense. Oh, and so are your comparisons, "shared memory" and "mmap" are not comparable to threading since they *are used* by and in threading. And forking and multiprocessing are the same thing, only the initialization call changes. Finally, multiprocessing has a far better upgrade path (as e.g. Erlang demonstrates): if your non-deterministic points are well delineated and your interfaces to other concurrent execution points are well defined, scaling from multiple cores to multiple machines becomes possible. > Not all problems are clearly chunkable or fit some alternative > parallelism pattern. Threading is arguably the cheapest method for > parallelism, as we've heard throughout this thread. > > Just because it can be dangerous is no reason to discourage it. Of course it is, just as manual memory management is "discouraged". > Many alternatives are equally as dangerous, more difficult and less portable. The main alternative to threading is multiprocessing (via fork or via starting new processes does not matter), it is significantly less dangerous, it is only more difficult in that you can't take extremely dangerous shortcuts and it is just as portable (if not more). > Python is a very popular language.someone mentioned earlier that popularity > shouldn't be an argument for features but here it's fair ground. If Python > 3 had unrestrained threading, this transition plunge would not be > happening. Threading is a red herring, nobody fundamentally cares about threading, what users want is a way to exploit their cores. If `multiprocessing` was rock-solid and easier to use `threading` could just be killed and nobody would care. And we'd probably find ourselves in far better a world. From p.f.moore at gmail.com Sat Feb 11 11:40:20 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 11 Feb 2012 10:40:20 +0000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 February 2012 04:12, Stephen J. Turnbull wrote: > This is especially the case if you work with older text data on Mac or > modern Linux where UTF-8 is used, because you're almost certain to run > into Latin-1-encoded files. ?My favorite example is ChangeLogs, which > broke my Gentoo package manager when I experimented with using Python > 3 as the default Python. ?Most packages would work fine, but for some > reason some Python program in the PMS was actually reading the > ChangeLogs, and sometimes they'd be impure ASCII (I don't recall > whether it was utf-8 or latin-1), giving a fatal UnicodeError and > everything grinds to a halt. > > That is reason enough for the naive to embrace fear, uncertainty, and > doubt about Python 3's use of Unicode. My concern about Unicode in Python 3 is that the principle is, you specify the right encoding. But often, I don't *know* the encoding ;-( Text files, like changelogs as a good example, generally have no marker specifying the encoding, and they can have all sorts (depending on where the package came from). Worse, I am on Windows and changelogs usually come from Unix developers - so I'm not familiar with the common conventions ("well, of course it's in UTF-8, that's what everyone uses"...) In Python 2, I can ignore the issue. Sure, I can end up with mojibake, but for my uses, that's not a disaster. Mostly-readable works. But in Python 3, I get an error and can't process the file. I can just use latin-1, or surrogateescape. But that doesn't come naturally to me yet. Maybe it will in time... Or maybe there's a better solution I don't know about yet. To be clear - I am fully in favour of the Python 3 approach, and I completely support the idea that people should know the encodings of the stuff they are working with (I've seen others naively make encoding mistakes often enough to know that when it matters, it really does matter). But having to worry, not so much about the encoding to use, but rather about the fact that Python is asking you a question you can't answer, is a genuine stumbling block. And from what I've seen, it's at the root of the problems many people have with Unicode in Python 3. I'm not arguing for changes to the default behaviour of Python 3. But if we had a good place to put it, a FAQ entry about "what to do if I need to process a file whose encoding I don't know" would be useful. And certainly having a standard answer that people could give when the question comes up (something practical, not a purist answer like "all files have an encoding, so you should find out") would help. Paul. From p.f.moore at gmail.com Sat Feb 11 11:47:44 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 11 Feb 2012 10:47:44 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 February 2012 00:07, Terry Reedy wrote: >>> ?Nor is there in 3.x. > > I view that claim as FUD, at least for many users, and at least until the > persons making the claim demonstrate it. In particular, I claim that people > who use Python2 knowing nothing of unicode do not need to know much more to > do the same things in Python3. Concrete example, then. I have a text file, in an unknown encoding (yes, it does happen to me!) but opening in an editor shows it's mainly-ASCII. I want to find all the lines starting with a '*'. The simple with open('myfile.txt') as f: for line in f: if line.startswith('*'): print(line) fails with encoding errors. What do I do? Short answer, grumble and go and use grep (or in more complex cases, awk) :-( Paul. From ubershmekel at gmail.com Sat Feb 11 11:51:52 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 11 Feb 2012 12:51:52 +0200 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Feb 11, 2012 12:41 PM, "Paul Moore" wrote > if we had a good place to put it, a FAQ entry about "what to do if I > need to process a file whose encoding I don't know" would be useful. > And certainly having a standard answer that people could give when the > question comes up (something practical, not a purist answer like "all > files have an encoding, so you should find out") would help. I think if the bytes type behaved exactly like python2's string it would have been the best option. When you work with "wb" or "rb" you get quite a hint that you're doing it wrong. But devs would have a viable ambiguous *string* type (vs bytes and their integer cells). Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sat Feb 11 13:33:44 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 11 Feb 2012 13:33:44 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Paul Moore, 11.02.2012 11:47: > On 11 February 2012 00:07, Terry Reedy wrote: >>>> Nor is there in 3.x. >> >> I view that claim as FUD, at least for many users, and at least until the >> persons making the claim demonstrate it. In particular, I claim that people >> who use Python2 knowing nothing of unicode do not need to know much more to >> do the same things in Python3. > > Concrete example, then. > > I have a text file, in an unknown encoding (yes, it does happen to > me!) but opening in an editor shows it's mainly-ASCII. I want to find > all the lines starting with a '*'. The simple > > with open('myfile.txt') as f: > for line in f: > if line.startswith('*'): > print(line) > > fails with encoding errors. What do I do? Short answer, grumble and go > and use grep (or in more complex cases, awk) :-( Or just use the ISO-8859-1 encoding. Stefan From masklinn at masklinn.net Sat Feb 11 13:41:19 2012 From: masklinn at masklinn.net (Masklinn) Date: Sat, 11 Feb 2012 13:41:19 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> On 2012-02-11, at 13:33 , Stefan Behnel wrote: > Paul Moore, 11.02.2012 11:47: >> On 11 February 2012 00:07, Terry Reedy wrote: >>>>> Nor is there in 3.x. >>> >>> I view that claim as FUD, at least for many users, and at least until the >>> persons making the claim demonstrate it. In particular, I claim that people >>> who use Python2 knowing nothing of unicode do not need to know much more to >>> do the same things in Python3. >> >> Concrete example, then. >> >> I have a text file, in an unknown encoding (yes, it does happen to >> me!) but opening in an editor shows it's mainly-ASCII. I want to find >> all the lines starting with a '*'. The simple >> >> with open('myfile.txt') as f: >> for line in f: >> if line.startswith('*'): >> print(line) >> >> fails with encoding errors. What do I do? Short answer, grumble and go >> and use grep (or in more complex cases, awk) :-( > > Or just use the ISO-8859-1 encoding. It's true that requires to handle encodings upfront where Python 2 allowed you to play fast-and-lose though. And using latin-1 in that context looks and feels weird/icky, the file is not encoded using latin-1, the encoding just happens to work to manipulate bytes as ascii text + non-ascii stuff. From stefan_ml at behnel.de Sat Feb 11 13:53:40 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 11 Feb 2012 13:53:40 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> Message-ID: Masklinn, 11.02.2012 13:41: > On 2012-02-11, at 13:33 , Stefan Behnel wrote: >> Paul Moore, 11.02.2012 11:47: >>> On 11 February 2012 00:07, Terry Reedy wrote: >>>>>> Nor is there in 3.x. >>>> >>>> I view that claim as FUD, at least for many users, and at least until the >>>> persons making the claim demonstrate it. In particular, I claim that people >>>> who use Python2 knowing nothing of unicode do not need to know much more to >>>> do the same things in Python3. >>> >>> Concrete example, then. >>> >>> I have a text file, in an unknown encoding (yes, it does happen to >>> me!) but opening in an editor shows it's mainly-ASCII. I want to find >>> all the lines starting with a '*'. The simple >>> >>> with open('myfile.txt') as f: >>> for line in f: >>> if line.startswith('*'): >>> print(line) >>> >>> fails with encoding errors. What do I do? Short answer, grumble and go >>> and use grep (or in more complex cases, awk) :-( >> >> Or just use the ISO-8859-1 encoding. > > It's true that requires to handle encodings upfront where Python 2 allowed you > to play fast-and-lose though. Well, except for the cases where that didn't work. Remember that implicit encoding behaves in a platform dependent way in Python 2, so even if your code runs on your machine doesn't mean it will work for anyone else. > And using latin-1 in that context looks and feels weird/icky, the file is not > encoded using latin-1, the encoding just happens to work to manipulate bytes as > ascii text + non-ascii stuff. Correct. That's precisely the use case described above. Besides, it's perfectly possible to process bytes in Python 3. You just have to open the file in binary mode and do the processing at the byte string level. But if you don't care (and if most of the data is really ASCII-ish), using the ISO-8859-1 encoding in and out will work just fine for problems like the above. Stefan From sturla at molden.no Sat Feb 11 14:18:50 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 11 Feb 2012 14:18:50 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F366ABA.8090903@molden.no> Den 11.02.2012 03:24, skrev Matt Joiner: > Threading is a tool (the most popular, and most flexible tool) for > concurrency and parallelism. Compared to forking, multiprocessing, shared > memory, mmap, and dozens of other auxiliary OS concepts it's also the > easiest. I see you really know your stuff. > Not all problems are clearly chunkable or fit some alternative > parallelism pattern. Then they don't fit threading either. Sturla From sturla at molden.no Sat Feb 11 15:01:47 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 11 Feb 2012 15:01:47 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F3674CB.6060201@molden.no> Den 11.02.2012 07:15, skrev Nick Coghlan: > Why is that relevant? If free threading is the all-singing all dancing > wonderment you believe: > > 1. Fork CPython > 2. Make it free-threaded (while retaining backwards compatibility with > all the C extensions out there!) > 3. Watch the developer hordes flock to your door (after all, it's the > lack of free-threading that has held Python's growth back for the last > two decades, so everyone will switch in a heartbeat the second you, or > anyone else, publishes a free-threaded alternative where all their C > extensions work. Right?). There are several solutions to this, I think. One is to use one interpreter per thread, and share no data between them, similar to tcl and PerlFork. The drawback is developers who forget to duplicate file handles, so one interpreter can close a handle used by another. Another solution is transactional memory. Consider a database with commit and rollback. Not sure how to fit this with C extensions though, but one could in theory build a multithreaded interpreter like that. > If a fraction of the energy that people put into asking for free > threading was instead put into asking "how can we make inter-process > communication better?", we'd no doubt have a good shared object > implementation in the mmap module by now (and someone might have > actually responded to Jesse's request for a new multiprocessing > maintainer when circumstances forced him to step down). I think I already explained why BSD mmap is a dead end. We need named kernel objects (System V IPC or Unix domain sockets) as they can be communicated between processes. There is also reasons to prefer SysV message queues over shared memory (Sys V or BSD), such a thread safety. I.e. access is synchronized by the kernel. SysV message queues also have atomic read/write, unlike sockets, and they are generally faster than pipes. With sockets we have to ensure that the correct number of bytes were read or written, which is a PITA for any IPC use (or any other messaging for that matter). In the meanwhile, take a look at ZeroMQ (actually written ?MQ). ZeroMQ also have atomic read/write messages. Sturla From p.f.moore at gmail.com Sat Feb 11 15:29:34 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 11 Feb 2012 14:29:34 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> Message-ID: On 11 February 2012 12:41, Masklinn wrote: >>> with open('myfile.txt') as f: >>> ? ?for line in f: >>> ? ? ? ?if line.startswith('*'): >>> ? ? ? ? ? ?print(line) >>> >>> fails with encoding errors. What do I do? Short answer, grumble and go >>> and use grep (or in more complex cases, awk) :-( >> >> Or just use the ISO-8859-1 encoding. > > It's true that requires to handle encodings upfront where Python 2 allowed you > to play fast-and-lose though. > > And using latin-1 in that context looks and feels weird/icky, the file is not > encoded using latin-1, the encoding just happens to work to manipulate bytes as > ascii text + non-ascii stuff. To be honest, I'm fine with the answer "use latin1" for this case. Practicality beats purity and all that. But as you say, it feels wrong somehow. I suspect that errors=surrogateescape is the better "I don't really care" option. And I still maintain it would be useful for combating FUD if there was a commonly-accepted idiom for this. Interestingly, on my Windows PC, if I open a file using no encoding in Python 3, I seem to get code page 1252: Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> f = open("unicode.txt") >>> f.encoding 'cp1252' >>> So actually, on this PC, I can't really provoke these sorts of decoding error problems (CP1252 accepts all bytes, it's basically latin1). Whether this is a good thing or a bad thing, I'm not sure :-) Paul From sturla at molden.no Sat Feb 11 16:10:03 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 11 Feb 2012 16:10:03 +0100 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <4F35A9EF.7030309@molden.no> References: <4F35A9EF.7030309@molden.no> Message-ID: <4F3684CB.3090502@molden.no> Den 11.02.2012 00:36, skrev Sturla Molden: > > > Proof of concept: > > http://dl.dropbox.com/u/12464039/sharedmem-feb12-2009.zip > Sorry, wrong version. Use this instead: http://dl.dropbox.com/u/12464039/sharedmem.zip Sturla From solipsis at pitrou.net Sat Feb 11 16:27:08 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 11 Feb 2012 16:27:08 +0100 Subject: [Python-ideas] multiprocessing IPC References: <4F35A9EF.7030309@molden.no> Message-ID: <20120211162708.03111da7@pitrou.net> On Sat, 11 Feb 2012 00:36:15 +0100 Sturla Molden wrote: > > Finally, I'd like to say that I think Python's standard lib should > support high-performance asynchronous I/O for concurrency. That is not > poll/select (on Windows it does not even work properly). Rather, I want > IOCP on Windows, epoll on Linux, and kqueue on Mac. (Yes I know about > twisted.) This is not trivial (especially the IOCP part, if I consider the amount of code Twisted has for that). > There should also be a requirement that it works with > multiprocessing. E.g. if we open a process pool, the processes should be > able to use the same IOCP. In other words some highly scalable > asynchronous I/O that works with multiprocessing. Ouch. Regards Antoine. From masklinn at masklinn.net Sat Feb 11 17:18:34 2012 From: masklinn at masklinn.net (Masklinn) Date: Sat, 11 Feb 2012 17:18:34 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> Message-ID: <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> On 2012-02-11, at 13:53 , Stefan Behnel wrote: > Well, except for the cases where that didn't work. Remember that implicit > encoding behaves in a platform dependent way in Python 2, so even if your > code runs on your machine doesn't mean it will work for anyone else. Sure, I said it allowed you, not that this allowance actually worked. >> And using latin-1 in that context looks and feels weird/icky, the file is not >> encoded using latin-1, the encoding just happens to work to manipulate bytes as >> ascii text + non-ascii stuff. > > Correct. That's precisely the use case described above. Yes, but now instead of just ignoring that stuff you have to actively and knowingly lie to Python to get it to shut up. > Besides, it's perfectly possible to process bytes in Python 3. You just > have to open the file in binary mode and do the processing at the byte > string level. I think that's the route which should be taken, but (and I'll readily admit not to have followed the current state of this story) I'd understood manipulations of bytes-as-ascii-characters-and-stuff to be far more annoying (in Python 3) than string manipulation even for simple use cases. From tjreedy at udel.edu Sat Feb 11 17:24:38 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 11 Feb 2012 11:24:38 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <87vcnevyd7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <87vcnevyd7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 2/10/2012 10:32 PM, Stephen J. Turnbull wrote: The issue is whether Python 3 has a "strong imposition of Unicode awareness" that Python 2 does not. If the OP only meant awareness of the fact that something called 'unicode' exists, then I suppose that could be argued. I interpreted the claim as being about some substantive knowledge of unicode. In any case, the claim that I disagree a not about people's reactions to Python 3 or about human psychology and the propensity to stick with the known. In response to Jim Jewett, you wrote > The fact is that with a little bit of knowledge, you can almost > certainly get more reliable (and in case of failure, more debuggable) > results from Python 3 than from Python 2. That is pretty much my counterclaim, with the note that the 'little bit of knowledge' is most about non-unicode encodings and the change to some Python details. > The point is that the user case you discuss is a toy case. Thanks for dismissing me and perhaps a hundred thousand users as a 'toy cases'. > the problem goes away if you get to define the problem away. Doing case analysis, starting with the easiest cases is not defining the problem away. It is rather, an attempt to find the 'little bit on knowledge' needed in various cases. In your response, you went on to write > Counteracting FUD with words generally doesn't work > unless the words are a "magic spell" that reduces the unknown to > the known. Exactly, and finding the Python 3 version of the magic spells needed in various cases, so they can be documented and publicized, is what I have been trying to do. For ascii-only use, the magic spell in 'ascii' in bytes() calls. For some other uses, it is 'encoding=latin-1' in open(), str(), and bytes() calls, and perhaps elsewhere. Neither of these constitute substantial 'unicode awareness'. > I don't know of any nice way to say that. There was no need to say it. -- Terry Jan Reedy From tjreedy at udel.edu Sat Feb 11 17:44:59 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 11 Feb 2012 11:44:59 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 2/11/2012 5:47 AM, Paul Moore wrote: > On 11 February 2012 00:07, Terry Reedy wrote: >>>> Nor is there in 3.x. >> >> I view that claim as FUD, at least for many users, and at least until the >> persons making the claim demonstrate it. In particular, I claim that people >> who use Python2 knowing nothing of unicode do not need to know much more to >> do the same things in Python3. > > Concrete example, then. > > I have a text file, in an unknown encoding (yes, it does happen to > me!) but opening in an editor shows it's mainly-ASCII. I want to find > all the lines starting with a '*'. The simple > > with open('myfile.txt') as f: > for line in f: > if line.startswith('*'): > print(line) > > fails with encoding errors. What do I do? Good example. I believe adding ", encoding='latin-1'" to open() is sufficient. (And from your response elsewhere to Stephen, you seem to know that.) This should be in the tutorial if not already. But in reference to what I wrote above, knowing that magic phrase is not 'knowledge of unicode'. And I include it in the 'not much more knowledge' needed for Python 3. -- Terry Jan Reedy From masklinn at masklinn.net Sat Feb 11 18:00:17 2012 From: masklinn at masklinn.net (Masklinn) Date: Sat, 11 Feb 2012 18:00:17 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 2012-02-11, at 17:44 , Terry Reedy wrote: > On 2/11/2012 5:47 AM, Paul Moore wrote: >> On 11 February 2012 00:07, Terry Reedy wrote: >>>>> Nor is there in 3.x. >>> >>> I view that claim as FUD, at least for many users, and at least until the >>> persons making the claim demonstrate it. In particular, I claim that people >>> who use Python2 knowing nothing of unicode do not need to know much more to >>> do the same things in Python3. >> >> Concrete example, then. >> >> I have a text file, in an unknown encoding (yes, it does happen to >> me!) but opening in an editor shows it's mainly-ASCII. I want to find >> all the lines starting with a '*'. The simple >> >> with open('myfile.txt') as f: >> for line in f: >> if line.startswith('*'): >> print(line) >> >> fails with encoding errors. What do I do? > > Good example. I believe adding ", encoding='latin-1'" to open() is sufficient. Why not open the file in binary mode in stead? (and replace `'*'` by `b'*'` in the startswith call) From tjreedy at udel.edu Sat Feb 11 18:25:49 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 11 Feb 2012 12:25:49 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> Message-ID: On 2/11/2012 7:53 AM, Stefan Behnel wrote: > Masklinn, 11.02.2012 13:41: >> On 2012-02-11, at 13:33 , Stefan Behnel wrote: >>> Paul Moore, 11.02.2012 11:47: >>>> On 11 February 2012 00:07, Terry Reedy wrote: >>>>>>> Nor is there in 3.x. >>>>> >>>>> I view that claim as FUD, at least for many users, and at least until the >>>>> persons making the claim demonstrate it. In particular, I claim that people >>>>> who use Python2 knowing nothing of unicode do not need to know much more to >>>>> do the same things in Python3. >>>> >>>> Concrete example, then. >>>> >>>> I have a text file, in an unknown encoding (yes, it does happen to >>>> me!) but opening in an editor shows it's mainly-ASCII. I want to find >>>> all the lines starting with a '*'. The simple >>>> >>>> with open('myfile.txt') as f: >>>> for line in f: >>>> if line.startswith('*'): >>>> print(line) >>>> >>>> fails with encoding errors. What do I do? Short answer, grumble and go >>>> and use grep (or in more complex cases, awk) :-( >>> >>> Or just use the ISO-8859-1 encoding. >> >> It's true that requires to handle encodings upfront where Python 2 allowed you >> to play fast-and-lose though. > > Well, except for the cases where that didn't work. Remember that implicit > encoding behaves in a platform dependent way in Python 2, so even if your > code runs on your machine doesn't mean it will work for anyone else. > > >> And using latin-1 in that context looks and feels weird/icky, the file is not >> encoded using latin-1, the encoding just happens to work to manipulate bytes as >> ascii text + non-ascii stuff. > > Correct. That's precisely the use case described above. > > Besides, it's perfectly possible to process bytes in Python 3. You just > have to open the file in binary mode and do the processing at the byte > string level. But if you don't care (and if most of the data is really > ASCII-ish), using the ISO-8859-1 encoding in and out will work just fine > for problems like the above. If one has ascii text + unspecified 'other stuff', one can either process as 'polluted text' or as 'bytes with some ascii character codes'. Since (as I just found out) one can iterate binary mode files by line just as with text mode, I am not sure what the tradeoffs are. I would guess it is mostly whether one wants to process a sequence of characters or a sequence of character codes (ints). -- Terry Jan Reedy From tjreedy at udel.edu Sat Feb 11 18:43:41 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 11 Feb 2012 12:43:41 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 2/11/2012 12:00 PM, Masklinn wrote: > > On 2012-02-11, at 17:44 , Terry Reedy wrote: > >> On 2/11/2012 5:47 AM, Paul Moore wrote: >>> I have a text file, in an unknown encoding (yes, it does happen >>> to me!) but opening in an editor shows it's mainly-ASCII. I want >>> to find all the lines starting with a '*'. The simple >>> >>> with open('myfile.txt') as f: >>> for line in f: >>> if line.startswith('*'): print(line) >>> >>> fails with encoding errors. What do I do? >> >> Good example. I believe adding ", encoding='latin-1'" to open() is >> sufficient. > > Why not open the file in binary mode in stead? (and replace `'*'` by > `b'*'` in the startswith call) When I wrote that response, I thought that 'for line in f' would not work for binary-mode files. I then opened IDLE, experimented with 'rb', and discovered otherwise. So the remaining issue is how one wants the unknown encoding bytes to appear when printed -- as hex escapes, or as arbitrary but more readable non-ascii latin-1 chars. -- Terry Jan Reedy From stefan_ml at behnel.de Sat Feb 11 20:35:28 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 11 Feb 2012 20:35:28 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: Masklinn, 11.02.2012 17:18: > On 2012-02-11, at 13:53 , Stefan Behnel wrote: >> Well, except for the cases where that didn't work. Remember that implicit >> encoding behaves in a platform dependent way in Python 2, so even if your >> code runs on your machine doesn't mean it will work for anyone else. > > Sure, I said it allowed you, not that this allowance actually worked. > >>> And using latin-1 in that context looks and feels weird/icky, the file is not >>> encoded using latin-1, the encoding just happens to work to manipulate bytes as >>> ascii text + non-ascii stuff. >> >> Correct. That's precisely the use case described above. > > Yes, but now instead of just ignoring that stuff you have to actively and > knowingly lie to Python to get it to shut up. The advantage is that it becomes explicit what you are doing. In Python 2, without any encoding, you are implicitly assuming that the encoding is Latin-1, because that's how you are processing it. You're just not spelling it out anywhere, thus leaving it to the innocent reader to guess what's happening. In Python 3, and in better Python 2 code (using codecs.open(), for example), you'd make it clear right in the open() call that Latin-1 is the way you are going to process the data. >> Besides, it's perfectly possible to process bytes in Python 3. You just >> have to open the file in binary mode and do the processing at the byte >> string level. > > I think that's the route which should be taken Oh, absolutely not. When it's text, it's best to process it as Unicode. Stefan From masklinn at masklinn.net Sat Feb 11 20:46:52 2012 From: masklinn at masklinn.net (Masklinn) Date: Sat, 11 Feb 2012 20:46:52 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: On 2012-02-11, at 20:35 , Stefan Behnel wrote: > >> Yes, but now instead of just ignoring that stuff you have to actively and >> knowingly lie to Python to get it to shut up. > > The advantage is that it becomes explicit what you are doing. In Python 2, > without any encoding, you are implicitly assuming that the encoding is > Latin-1, because that's how you are processing it. You're just not spelling > it out anywhere, thus leaving it to the innocent reader to guess what's > happening. In Python 3, and in better Python 2 code (using codecs.open(), > for example), you'd make it clear right in the open() call that Latin-1 is > the way you are going to process the data. I'm not sure going from "ignoring it" to "explicitly lying about it" is a great step forward. latin-1 is not "the way you are going to process the data" in this case, it's just the easiest way to get Python to shut up and open the damn thing. >>> Besides, it's perfectly possible to process bytes in Python 3. You just >>> have to open the file in binary mode and do the processing at the byte >>> string level. >> >> I think that's the route which should be taken > > Oh, absolutely not. When it's text, it's best to process it as Unicode. Except it's not processed as text, it's processed as "stuff with ascii characters in it". Might just as well be cp-1252, or UTF-8, or Shift JIS (which is kinda-sorta-extended-ascii but not exactly), and while using an ISO-8859 will yield unicode data that's about the only thing you can say about it and the actual result will probably be mojibake either way. By processing it as bytes, it's made explicit that this is not known and decoded text (which is what unicode strings imply) but that it's some semi-arbitrary ascii-compatible encoding and that's the extent of the developer's knowledge and interest in it. From stefan_ml at behnel.de Sat Feb 11 21:08:38 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 11 Feb 2012 21:08:38 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: Masklinn, 11.02.2012 20:46: > On 2012-02-11, at 20:35 , Stefan Behnel wrote: >> >>> Yes, but now instead of just ignoring that stuff you have to actively and >>> knowingly lie to Python to get it to shut up. >> >> The advantage is that it becomes explicit what you are doing. In Python 2, >> without any encoding, you are implicitly assuming that the encoding is >> Latin-1, because that's how you are processing it. You're just not spelling >> it out anywhere, thus leaving it to the innocent reader to guess what's >> happening. In Python 3, and in better Python 2 code (using codecs.open(), >> for example), you'd make it clear right in the open() call that Latin-1 is >> the way you are going to process the data. > > I'm not sure going from "ignoring it" to "explicitly lying about it" is a > great step forward. latin-1 is not "the way you are going to process the data" > in this case, it's just the easiest way to get Python to shut up and open the > damn thing. > >>>> Besides, it's perfectly possible to process bytes in Python 3. You just >>>> have to open the file in binary mode and do the processing at the byte >>>> string level. >>> >>> I think that's the route which should be taken >> >> Oh, absolutely not. When it's text, it's best to process it as Unicode. > > Except it's not processed as text, it's processed as "stuff with ascii > characters in it". Might just as well be cp-1252, or UTF-8, or Shift JIS Well, you are still processing it as text because you are (again, implicitly) assuming those ASCII characters to be just that: ASCII encoded characters. You couldn't apply the same byte processing algorithm to UCS2 encoded text or a compressed gzip file, for example, at least not with a useful outcome. Mind you, I'm not regarding any text semantics here. I'm not considering whether the thus decoded data results in French, Danish, German or other human words, or in completely incomprehensible garbage. That's not relevant. What is relevant is that the program assumes an identity mapping from 1 byte to 1 character to work correctly, which, speaking in Unicode terms, implies Latin-1 decoding. Therefore my advice to make that assumption explicit. Stefan From p.f.moore at gmail.com Sun Feb 12 00:14:23 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 11 Feb 2012 23:14:23 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 February 2012 17:00, Masklinn wrote: >> Good example. I believe adding ", encoding='latin-1'" to open() is sufficient. > > Why not open the file in binary mode in stead? (and replace `'*'` by `b'*'` in > the startswith call) In my view, that's less scalable to more complex cases. It's likely you'll hit things you need to do that don't translate easily to bytes sooner than if you stick in a string-only world. A simple example, check for a regex rather than a simple starting character. The problem I have with encoding="latin-1" is that in many cases I *know* that's a lie. From what's been said in this discussion so far, I think that the "better" way to say "I know this file contains mostly ASCII, but there's some other bits I'm not sure about but don't care too much as long as they round-trip cleanly" is encoding="ascii",errors="surrogateescape". But as we've seen here, that's not the idiom that gets recommended by everyone (the "One Obvious Way", if you like). I suspect that if the community did embrace a "one obvious way", that would reduce the "Python 3 makes me need to know Unicode" FUD that's around. But as long as people get 3 different answers when they ask the question, there's going to be uncertainty and doubt (and hence, probably, fear...) Paul. PS I'm pretty confident that I have *my* answer now (ascii/surrogateescape). So this thread was of benefit to me, if nothing else, and my thanks for that. From cs at zip.com.au Sun Feb 12 00:18:06 2012 From: cs at zip.com.au (Cameron Simpson) Date: Sun, 12 Feb 2012 10:18:06 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F341561.3050409@pearwood.info> References: <4F341561.3050409@pearwood.info> Message-ID: <20120211231805.GA7853@cskk.homeip.net> On 10Feb2012 05:50, Steven D'Aprano wrote: | Python 4.x (Python 4000) is pure vapourware. It it irresponsible to tell | people to stick to Python 2.7 (there will be no 2.8) in favour of something | which may never exist. | | http://www.python.org/dev/peps/pep-0404/ Please tell me this PEP number is deliberate! -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Once I reached adulthood, I never had enemies until I posted to Usenet. - Barry Schwartz From p.f.moore at gmail.com Sun Feb 12 00:24:04 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 11 Feb 2012 23:24:04 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: On 11 February 2012 19:46, Masklinn wrote: >>>> Besides, it's perfectly possible to process bytes in Python 3. You just >>>> have to open the file in binary mode and do the processing at the byte >>>> string level. >>> >>> I think that's the route which should be taken >> >> Oh, absolutely not. When it's text, it's best to process it as Unicode. > > Except it's not processed as text, it's processed as "stuff with ascii > characters in it". Might just as well be cp-1252, or UTF-8, or Shift JIS > (which is kinda-sorta-extended-ascii but not exactly), and while using > an ISO-8859 will yield unicode data that's about the only thing you can > say about it and the actual result will probably be mojibake either way. No, not at all. It *is* text. I *know* it's text. I know that it is encoded in an ASCII-superset (because I can read it in a text editor and *see* that it is). What I *don't* know is what those funny bits of mojibake I see in the text editor are. But I don't really care. Yes, I could do some analysis based on the surrounding text and confirm whether it's latin-1, utf-8, or something similar. But it honestly doesn't matter to me, as all I care about is parsing the file to find the change authors, and printing their names (to re-use the "manipulating a ChangeLog file" example). And even if it did matter, the next file might be in a different ASCII-superset encoding, but I *still* won't care because the parsing code will be exactly the same. Saying "it's bytes" is even more of a lie than "it's latin-1". The honest truth is "it's an ASCII superset", and that's all I need to know to do the job manually, so I'd like to write code to do the same job without needing to lie about what I know. I'm now 100% convinced that encoding="ascii",errors="surrogateescape" is the way to say this in code. Paul. From mwm at mired.org Sun Feb 12 02:52:00 2012 From: mwm at mired.org (Mike Meyer) Date: Sat, 11 Feb 2012 20:52:00 -0500 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <4F35A9EF.7030309@molden.no> References: <4F35A9EF.7030309@molden.no> Message-ID: <20120211205200.2667c68f@bhuda.mired.org> pwdOn Sat, 11 Feb 2012 00:36:15 +0100 Sturla Molden wrote: > Den 10.02.2012 22:15, skrev Mike Meyer: > > In what way does the mmap module fail to provide your binary file > > interface? The short answer is that BSD mmap creates an anonymous kernel object. First, I didn't ask about "BSD mmap", I asked about the "mmap module". They aren't the same thing. > When working with multiprocessing for a while, one comes to the > conclusion that we really need named kernel objects. And both the BSD mmap (at least in recent systems) and the mmap module provide objects with names in the file system space. IIUC, while there are systems that won't let you create anonymous objects (like early versions of the mmap module), there aren't any - at least any longer - that won't let you create named objects. > Here are two simple fail cases for anonymous kernel objects: [elided, since the restriction doesn't exist] > All of multiprocessing's IPC classes suffer from this! Some of them may. The one I asked about doesn't. > Solution: > > Use named kernel objects for IPC, pickle the name. You don't need to pickle the name if you use mmap's native name system - it's just a string. > There is another drawback too: > > The speed of pickle. For example, sharing NumPy arrays with pickle is > not faster with shared memory. The overhead from pickle completely > dominate the time needed for IPC . That is why I want a type specialized > or a binary channel. Making this from the named shared memory class I > already have is a no-brainer. > So that is my other objection against multiprocessing. > > 1. Object sharing by handle inheritance fails when kernel objects must > be passed back to the parent process or to a process pool. We need IPC > objects that have a name in the kernel, so they can be created and > shared in retrospect. We've already got that one. You just need to learn how to use it. > 2. IPC with multiprocessing is too slow due to pickle. We need something > that does not use pickle. (E.g. shared memory, but not by means of > mmap.) It might be that the pipe or socket in multiprocessing will do > this (I have not looked at it carefully enough), but they still don't have Since can use pickle, you're only dealing with small amounts of data. There are better performing serialization tools available (or they can easily be created if you have to deal with large amounts of data), and those work fine for a large variety of problems. If they aren't fast enough, neither a socket nor a pipe will solve the basic issue of needing to serialize the data in order to communicate it. This isn't a problem with mmap per se, and it's not a problem that anything that can be accurately described as a "file" - as in your "binary file interface" - is going to solve. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From cmjohnson.mailinglist at gmail.com Sun Feb 12 03:27:27 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Sat, 11 Feb 2012 16:27:27 -1000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> On Feb 11, 2012, at 12:40 AM, Paul Moore wrote: > In Python 2, I can ignore the issue. Sure, I can end up with mojibake, > but for my uses, that's not a disaster. Mostly-readable works. But in > Python 3, I get an error and can't process the file. > > I can just use latin-1, or surrogateescape. But that doesn't come > naturally to me yet. Maybe it will in time... Or maybe there's a > better solution I don't know about yet. I'm confused what you're asking for. Setting errors to surrogateescape or encoding to Latin-1 causes Python 3 to behave the exact same way as Python 2: it's doing the "wrong" thing and may result in mojibake, but at least it isn't screwing up anything new so long as the stuff you add to the file is in ASCII. The only way to make Python 3 slightly more like Python 2 would be to set errors="surrogateescape" by default instead of asking the programmer to know to use it. I think that would be going too far, but it could be done. I think it would be simpler though to just publicize errors="surrogateescape" more. "Dear people who don't care about encodings and don't want to take the time to get them right, just put errors='surrogateescape' into your open commands and Python 3 will behave almost exactly like Python 2. The end." Is that really so hard? I'm confused about what else people want. From greg at krypto.org Sun Feb 12 03:27:19 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 11 Feb 2012 18:27:19 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120210183801.59921627@pitrou.net> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <20120210183801.59921627@pitrou.net> Message-ID: On Fri, Feb 10, 2012 at 9:38 AM, Antoine Pitrou wrote: > On Fri, 10 Feb 2012 08:52:16 -0600 > Massimo Di Pierro > wrote: >> The way I see it is not whether Python has threads, fibers, coroutines, etc. >> The problem is that in 5 years we going to have on the market CPUs with >> 100 cores > > This is definitely untrue. No CPU maker has plans for a general-purpose > 100-core CPU. Intel already has immediate plans for 10 core cpus, those have well functioning HT so they should be considered 20 core. Two socket boards are quite common, there's 40 cores. 4+ socket boards exist bringing your total to 80+ cores connected to a bucket of dram on a single motherboard. These are the types of systems in data centers being made available to people to run their computationally intensive software on. That counts as general purpose in my book. -gps From anacrolix at gmail.com Sun Feb 12 03:33:10 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 12 Feb 2012 10:33:10 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <20120210183801.59921627@pitrou.net> Message-ID: Damn straight. On Feb 12, 2012 10:29 AM, "Gregory P. Smith" wrote: > On Fri, Feb 10, 2012 at 9:38 AM, Antoine Pitrou > wrote: > > On Fri, 10 Feb 2012 08:52:16 -0600 > > Massimo Di Pierro > > wrote: > >> The way I see it is not whether Python has threads, fibers, coroutines, > etc. > >> The problem is that in 5 years we going to have on the market CPUs with > >> 100 cores > > > > This is definitely untrue. No CPU maker has plans for a general-purpose > > 100-core CPU. > > Intel already has immediate plans for 10 core cpus, those have well > functioning HT so they should be considered 20 core. Two socket > boards are quite common, there's 40 cores. 4+ socket boards exist > bringing your total to 80+ cores connected to a bucket of dram on a > single motherboard. These are the types of systems in data centers > being made available to people to run their computationally intensive > software on. That counts as general purpose in my book. > > -gps > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Sun Feb 12 03:33:02 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 11 Feb 2012 18:33:02 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3575FB.60700@molden.no> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> Message-ID: On Fri, Feb 10, 2012 at 11:54 AM, Sturla Molden wrote: > > - Windows has no fork system call. SunOS used to have a very slow fork > system call. The majority of Java developers worked with Windows or Sun, and > learned to work with threads. > > For which the current summary is: > > - The GIL sucks because Windows has no fork. > > Which some might say is the equivalent of: > > - Windows sucks. Please do not claim that fork() semantics and copy-on-write are good things to build off of... They are not. fork() was designed in a world *before threads* existed. It simply can not be used reliably in a process that uses threads and tons of real world practical C and C++ software that Python programs need to interact with, be embedded in or use via extension modules these days uses threads quite effectively. The multiprocessing module on posix would be better off if it offered a windows CreateProcess() work-a-like mode that spawns a *new* python interpreter process rather than depending on fork(). The fork() means multithreaded processes cannot reliably use the multiprocessing module (and those other threads could come from libraries or C/C++ extension modules that you cannot control within the scope of your own software that desires to use multiprocessing). This is likely not hard to implement, if nobody has done it already, as I believe the windows support already has to do much the same thing today. -gps From greg at krypto.org Sun Feb 12 03:39:41 2012 From: greg at krypto.org (Gregory P. Smith) Date: Sat, 11 Feb 2012 18:39:41 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <6142B265-EAE6-4D04-8A2D-8F289344A06E@masklinn.net> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <6142B265-EAE6-4D04-8A2D-8F289344A06E@masklinn.net> Message-ID: On Sat, Feb 11, 2012 at 12:26 AM, Masklinn wrote: > > Finally, multiprocessing has a far better upgrade path (as e.g. Erlang > demonstrates): if your non-deterministic points are well delineated and > your interfaces to other concurrent execution points are well defined, > scaling from multiple cores to multiple machines becomes possible. +10 :) From ericsnowcurrently at gmail.com Sun Feb 12 04:10:22 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 11 Feb 2012 20:10:22 -0700 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> Message-ID: On Sat, Feb 11, 2012 at 7:27 PM, Carl M. Johnson wrote: > > On Feb 11, 2012, at 12:40 AM, Paul Moore wrote: > >> In Python 2, I can ignore the issue. Sure, I can end up with mojibake, >> but for my uses, that's not a disaster. Mostly-readable works. But in >> Python 3, I get an error and can't process the file. >> >> I can just use latin-1, or surrogateescape. But that doesn't come >> naturally to me yet. Maybe it will in time... Or maybe there's a >> better solution I don't know about yet. > > I'm confused what you're asking for. Setting errors to surrogateescape or encoding to Latin-1 causes Python 3 to behave the exact same way as Python 2: it's doing the "wrong" thing and may result in mojibake, but at least it isn't screwing up anything new so long as the stuff you add to the file is in ASCII. The only way to make Python 3 slightly more like Python 2 would be to set errors="surrogateescape" by default instead of asking the programmer to know to use it. I think that would be going too far, but it could be done. I think it would be simpler though to just publicize errors="surrogateescape" more. > > "Dear people who don't care about encodings and don't want to take the time to get them right, just put errors='surrogateescape' into your open commands and Python 3 will behave almost exactly like Python 2. The end." So something like this: import functools, builtins open = builtins.open = functools.partial(open, encoding="ascii", errors="surrogateescape") -eric From gahtune at gmail.com Sun Feb 12 04:09:58 2012 From: gahtune at gmail.com (Gabriel AHTUNE) Date: Sun, 12 Feb 2012 11:09:58 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2012/2/11 Paul Moore > On 11 February 2012 00:07, Terry Reedy wrote: > >>> Nor is there in 3.x. > > > > I view that claim as FUD, at least for many users, and at least until the > > persons making the claim demonstrate it. In particular, I claim that > people > > who use Python2 knowing nothing of unicode do not need to know much more > to > > do the same things in Python3. > > Concrete example, then. > > I have a text file, in an unknown encoding (yes, it does happen to > me!) but opening in an editor shows it's mainly-ASCII. I want to find > all the lines starting with a '*'. The simple > > with open('myfile.txt') as f: > for line in f: > if line.startswith('*'): > print(line) > > fails with encoding errors. What do I do? Short answer, grumble and go > and use grep (or in more complex cases, awk) :-( > > Paul. I just look at the Python 3 documentation ( http://docs.python.org/release/3.1.3/library/functions.html#open), there is a "error" parameter to the open function. when set to "ignore" or "replace" it will solved your problem. Another way is to try to guess the encoding programaticaly (I found chardet module http://pypi.python.org/pypi/chardet) and pass it to decode your file with unknown encoding. Then why not put a value "auto" available for "encoding" parameter which makes "open" call a detector before opening and throw error when the guess is less than a certain percentage. Gabriel AHTUNE -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmjohnson.mailinglist at gmail.com Sun Feb 12 04:19:41 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Sat, 11 Feb 2012 17:19:41 -1000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> Message-ID: On Feb 11, 2012, at 5:10 PM, Eric Snow wrote: > So something like this: > > import functools, builtins > open = builtins.open = functools.partial(open, encoding="ascii", > errors="surrogateescape") We could pack it in and call it something like "python2open". :-) From merwok at netwok.org Sun Feb 12 04:30:50 2012 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Sun, 12 Feb 2012 04:30:50 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120211231805.GA7853@cskk.homeip.net> References: <4F341561.3050409@pearwood.info> <20120211231805.GA7853@cskk.homeip.net> Message-ID: <4F37326A.7010904@netwok.org> Le 12/02/2012 00:18, Cameron Simpson a ?crit : > On 10Feb2012 05:50, Steven D'Aprano wrote: > | Python 4.x (Python 4000) is pure vapourware. It it irresponsible to tell > | people to stick to Python 2.7 (there will be no 2.8) in favour of something > | which may never exist. > | > | http://www.python.org/dev/peps/pep-0404/ > > Please tell me this PEP number is deliberate! It is, sir! At first the number was taken by the virtualenv PEP with no special meaning, just the next number in sequence, but when Barry wrote up the 2.8 Unrelease PEP and took the number 405, the occasion was too good to be missed and the numbers were swapped. Cheers From sturla at molden.no Sun Feb 12 04:46:00 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 04:46:00 +0100 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <20120211205200.2667c68f@bhuda.mired.org> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> Message-ID: <4F3735F8.10607@molden.no> Den 12.02.2012 02:52, skrev Mike Meyer: > First, I didn't ask about "BSD mmap", I asked about the "mmap module". > They aren't the same thing. Take a look at the implementation. >> When working with multiprocessing for a while, one comes to the >> conclusion that we really need named kernel objects. > And both the BSD mmap (at least in recent systems) and the mmap module > provide objects with names in the file system space. IIUC, while there > are systems that won't let you create anonymous objects (like early > versions of the mmap module), there aren't any - at least any longer - > that won't let you create named objects. Sure, you can memory map named files. You can even memory map from /dev/shm on a system that supports it, if you are willing to reserve some RAM for ramdisk. But apart from that, show me how you would use the mmap module to make named shared memory on Linux or Windows. No, memory mapping file object -1 or 0 don't count, you get an anonymous memory mapping. Here is a task for you to try: 1. start a process 2. in the new process, create some shared memory (use the mmap module) 3. make the parent process get access to it (should be easy, right?) Can you do this? No? Then try the same thing with a lock (multiprocessing.Lock) or an event. Show me how you would code this. > > > Use named kernel objects for IPC, pickle the name. > You don't need to pickle the name if you use mmap's native name system > - it's just a string. Sure, multiprocessing does not pickle strings objects. Or whatever. Have you ever looked at the code? > Since can use pickle, you're only dealing with small amounts of > data. What on earth are you talking about? Every object passed in the "args" keyword argument to multiprocessing.Process is pickled. Same thing for any object you pass to multiprocessing.Queue. Look at the code. Sturla From pyideas at rebertia.com Sun Feb 12 05:17:31 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Sat, 11 Feb 2012 20:17:31 -0800 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> Message-ID: On Sat, Feb 11, 2012 at 7:19 PM, Carl M. Johnson wrote: > On Feb 11, 2012, at 5:10 PM, Eric Snow wrote: >> So something like this: >> >> ? ?import functools, builtins >> ? ?open = builtins.open = functools.partial(open, encoding="ascii", >> errors="surrogateescape") > > We could pack it in and call it something like "python2open". :-) Or just add a keyword-only argument to open(): americentric=True :-P From ncoghlan at gmail.com Sun Feb 12 05:19:12 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Feb 2012 14:19:12 +1000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> Message-ID: On Sun, Feb 12, 2012 at 1:19 PM, Carl M. Johnson wrote: > > On Feb 11, 2012, at 5:10 PM, Eric Snow wrote: > >> So something like this: >> >> ? ?import functools, builtins >> ? ?open = builtins.open = functools.partial(open, encoding="ascii", >> errors="surrogateescape") > > > We could pack it in and call it something like "python2open". :-) An open_ascii() builtin isn't as crazy as it may initially sound - it's not at all uncommon to have a file that's almost certainly in some ASCII compatible encoding like utf-8, latin-1 or one of the other extended ASCII encodings, but you don't know which one specifically. By offering open_ascii(), we'd be making it trivial to process such files without blowing up (or having to figure out exactly *which* ASCII compatible encoding you have). When you wrote them back to disk, if you'd added any non-ASCII chars of your own, you'd get a UnicodeEncodeError, but any encoded data from the original would be reproduced in the original encoding. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From cs at zip.com.au Sun Feb 12 05:34:11 2012 From: cs at zip.com.au (Cameron Simpson) Date: Sun, 12 Feb 2012 15:34:11 +1100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120212043411.GA442@cskk.homeip.net> On 11Feb2012 13:12, Stephen J. Turnbull wrote: | Jim Jewett writes: | > Are you saying that some (many? all?) platforms make a bad choice there? | | No. I'm saying that whatever choice is made (except for 'latin-1' | because it accepts all bytes regardless of the actual encoding of the | data, or PEP 383 "errors='surrogateescape'" for the same reason, both | of which are unacceptable defaults for production code *for the same | reason*), there is data that will cause that idiom to fail on Python 3 | where it would not on Python 2. But... By your own argument here, the failing is on the part of Python 2 becuase it is passing when it should fail, because it is effectively using the equivalent of 'latin-1'. And you say right there that that is unacceptable. At least with Python 3 you find out early that you're doing something dodgy. Disclaimer: I may be talking our my arse here; my personal code is all Python 2 at present because I haven't found an idle weekend (or, more likely, week) to spend getting it python 3 ready (meaning parsing ok but probably failing a bunch of tests to start with). I do know that in Python 2 I've tripped over a heap of unicode versus latin-1/maybe-ascii text issues and python unicode-vs-str issues just recently in Python 2 and a lot of the ambiguity I've been juggling would be absent in Python 3 (because at least all the strings will be unicode and I can concentrate on the encoding/decode stuff instead). [...snip...] | The fact is that with a little bit of knowledge, you can almost | certainly get more reliable (and in case of failure, more debuggable) | results from Python 3 than from Python 2. That's my hope. | But people are happy to | deal with the devil they know, even though it's more noxious than the | devil they don't. Not me :-) I speak as one who once moved to MH mail folders and vi-with-a-few-macros as a mail reader just to break my use of the mail reader I had been using:-( Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ No system, regardless of how sophisticated, can repeal the laws of physics or overcome careless driving actions. - Mercedes Benz From ncoghlan at gmail.com Sun Feb 12 05:34:38 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Feb 2012 14:34:38 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: On Sun, Feb 12, 2012 at 9:24 AM, Paul Moore wrote: > Saying "it's bytes" is even more of a lie than "it's latin-1". The > honest truth is "it's an ASCII superset", and that's all I need to > know to do the job manually, so I'd like to write code to do the same > job without needing to lie about what I know. I'm now 100% convinced > that encoding="ascii",errors="surrogateescape" is the way to say this > in code. I created http://bugs.python.org/issue13997 to suggest codifying this explicitly as an open_ascii() builtin. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From steve at pearwood.info Sun Feb 12 06:03:01 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Feb 2012 16:03:01 +1100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F374805.9000606@pearwood.info> Paul Moore wrote: > My concern about Unicode in Python 3 is that the principle is, you > specify the right encoding. But often, I don't *know* the encoding ;-( > Text files, like changelogs as a good example, generally have no > marker specifying the encoding, and they can have all sorts (depending > on where the package came from). Worse, I am on Windows and changelogs > usually come from Unix developers - so I'm not familiar with the > common conventions ("well, of course it's in UTF-8, that's what > everyone uses"...) But you obviously do know the convention -- use UTF-8. > In Python 2, I can ignore the issue. Sure, I can end up with mojibake, > but for my uses, that's not a disaster. Mostly-readable works. But in > Python 3, I get an error and can't process the file. > > I can just use latin-1, or surrogateescape. But that doesn't come > naturally to me yet. Maybe it will in time... Or maybe there's a > better solution I don't know about yet. So why don't you use UTF-8? As far as those who actually don't know the convention, isn't it better to teach them the convention "use UTF-8, unless dealing with legacy data" rather than to avoid dealing with the issue by using errors='surrogateescape'? I'd hate for "surrogateescape" to become the One Obvious Way for dealing with unknown encodings, because this is 2012 and people should be more savvy about non-ASCII characters by now. I suppose it's marginally better than just throwing them away with errors='ignore', but still. I recently bought a book from Amazon UK. It was ?12 not \udcc2\udca312. This isn't entirely a rhetorical question. I'm not on Windows, so perhaps there's a problem I'm unaware of. -- Steven From steve at pearwood.info Sun Feb 12 06:26:24 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 12 Feb 2012 16:26:24 +1100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> Message-ID: <4F374D80.7030309@pearwood.info> Nick Coghlan wrote: > On Sun, Feb 12, 2012 at 1:19 PM, Carl M. Johnson > wrote: >> On Feb 11, 2012, at 5:10 PM, Eric Snow wrote: >> >>> So something like this: >>> >>> import functools, builtins >>> open = builtins.open = functools.partial(open, encoding="ascii", >>> errors="surrogateescape") >> >> We could pack it in and call it something like "python2open". :-) > > An open_ascii() builtin isn't as crazy as it may initially sound - > it's not at all uncommon to have a file that's almost certainly in > some ASCII compatible encoding like utf-8, latin-1 or one of the other > extended ASCII encodings, but you don't know which one specifically. To me, "open_ascii" suggests either: - it opens ASCII files, and raises an error if they are not ASCII; or - it opens non-ASCII files, and magically translates their content to ASCII using some variant of "The Unicode Hammer" recipe: http://code.activestate.com/recipes/251871-latin1-to-ascii-the-unicode-hammer/ We should not be discouraging developers from learning even the most trivial basics of Unicode. I'm not suggesting that we try to force people to become Unicode experts (they wouldn't, even if we tried) but making this a built-in is dumbing things down too much. I don't believe that it is an imposition for people to explicitly use open(filename, 'ascii', 'surrogateescape') if that's what they want. If they want open_ascii, let them define this at the top of their modules: open_ascii = (lambda name: open(name, encoding='ascii', errors='surrogateescape')) A one liner, if you don't mind long lines. I'm not entirely happy with the surrogateescape solution, but I can see it's possibly the least worst *simple* solution for the case where you don't know the source encoding. (Encoding guessing heuristics are awesome but hardly simple.) So put the recipe in the FAQs, in the docs, and the docstring for open[1], and let people copy and paste the recipe. That's a pretty gentle introduction to Unicode. [1] Which is awfully big and complex in Python 3.1, but that's another story. -- Steven From ncoghlan at gmail.com Sun Feb 12 07:01:42 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 12 Feb 2012 16:01:42 +1000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <4F374D80.7030309@pearwood.info> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> <4F374D80.7030309@pearwood.info> Message-ID: On Sun, Feb 12, 2012 at 3:26 PM, Steven D'Aprano wrote: > I'm not entirely happy with the surrogateescape solution, but I can see it's > possibly the least worst *simple* solution for the case where you don't know > the source encoding. (Encoding guessing heuristics are awesome but hardly > simple.) So put the recipe in the FAQs, in the docs, and the docstring for > open[1], and let people copy and paste the recipe. That's a pretty gentle > introduction to Unicode. Yeah, it didn't take long for me to come back around to that point of view, so I morphed http://bugs.python.org/issue13997 into a docs bug about clearly articulating the absolute bare minimum knowledge of Unicode needed to process text in a robust cross-platform manner in Python 3 instead. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mwm at mired.org Sun Feb 12 09:02:07 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 12 Feb 2012 03:02:07 -0500 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <4F3735F8.10607@molden.no> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> Message-ID: <20120212030207.7cd2a5dc@bhuda.mired.org> On Sun, 12 Feb 2012 04:46:00 +0100 Sturla Molden wrote: > Den 12.02.2012 02:52, skrev Mike Meyer: > > First, I didn't ask about "BSD mmap", I asked about the "mmap module". > > They aren't the same thing. > Take a look at the implementation. True, but we're talking about an API, not a specific implementation. > >> When working with multiprocessing for a while, one comes to the > >> conclusion that we really need named kernel objects. > > And both the BSD mmap (at least in recent systems) and the mmap module > > provide objects with names in the file system space. IIUC, while there > > are systems that won't let you create anonymous objects (like early > > versions of the mmap module), there aren't any - at least any longer - > > that won't let you create named objects. > Sure, you can memory map named files. You can even memory map from > /dev/shm on a system that supports it, if you are willing to reserve > some RAM for ramdisk. And that's *not* the anonymous kernel object you complained about getting from mmap. > But apart from that, show me how you would use the mmap module to make > named shared memory on Linux or Windows. No, memory mapping file object > -1 or 0 don't count, you get an anonymous memory mapping. The linux mmap has the same arguments as the BSD one, so I'd expect it to work the same. I expect that the Python core will have made the semantics work properly on Windows, but don't really care, and don't have a Windows system to test it on. And that's why I'm talking about the API, not the implementation. > Here is a task for you to try: > > 1. start a process > 2. in the new process, create some shared memory (use the mmap module) > 3. make the parent process get access to it (should be easy, right?) > Can you do this? No? Works exactly like I'd expect it to. > Show me how you would code this. Here's the code that creates the shared file: share_name = '/tmp/xyzzy' with open(share_name, 'wb') as f: f.write(b'hello') Here's the code for the child: with open(share_name, 'r+b') as f: share = mmap(f.fileno(), 0) share[:5] = b'gone\n' Here's the code for the parent: child = Process(target=proc) child.start() with open(share_name, mode='r+b') as f: share = mmap(f.fileno(), 0) while share[0] == ord('h'): sleep(1) print('main:', share.readline()) > > > Use named kernel objects for IPC, pickle the name. > > You don't need to pickle the name if you use mmap's native name system > > - it's just a string. > Sure, multiprocessing does not pickle strings objects. Or whatever. Have > you ever looked at the code? I didn't say multiprocessing wouldn't pickle the name, *or* anything else about the multiprocessing module. I said *you* didn't need to pickle it. And I didn't. Did you read what I wrote? > Every object passed in the "args" keyword argument to > multiprocessing.Process is pickled. Same thing for any object you pass > to multiprocessing.Queue. Yes, but we're not talking about multiprocessing.Queue. We're talking about mmap. multiprocessing.Queue doesn't use mmap. For that, you want to us multiprocessing.Value and multiprocessing.Array. > Look at the code. Look at the text. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From p.f.moore at gmail.com Sun Feb 12 13:54:13 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Feb 2012 12:54:13 +0000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <4F374805.9000606@pearwood.info> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> Message-ID: On 12 February 2012 05:03, Steven D'Aprano wrote: > Paul Moore wrote: > >> My concern about Unicode in Python 3 is that the principle is, you >> specify the right encoding. But often, I don't *know* the encoding ;-( >> Text files, like changelogs as a good example, generally have no >> marker specifying the encoding, and they can have all sorts (depending >> on where the package came from). Worse, I am on Windows and changelogs >> usually come from Unix developers - so I'm not familiar with the >> common conventions ("well, of course it's in UTF-8, that's what >> everyone uses"...) > > > > > But you obviously do know the convention -- use UTF-8. No. I know that a lot of Unix people advocate UTF-8, and I gather it's rapidly becoming standard in the Unix world. But I work on Windows, and UTF-8 is not the standard there. I have no idea if UTF-8 is accepted cross-platform, or if it's just what has grown as most ChangeLog files are written on Unix and Unix users don't worry about what's convenient on Windows (no criticism there, just acknowledgement of a fact). And I have seen ChangeLog files with non-UTF-8 encodings of names in them. I have no idea if that's a bug or just a preference - and anyway, "be permissive in what you accept" applies... Get beyond ChangeLog files and it's anybody's guess. My PC has text files from many, many places (some created on my PC, some created by others on various flavours and ages of Unix , and some downloaded from who-knows-where on the internet). Not one of them comes with an encoding declaration. Of course every file is encoded in some way. But it's incredibly naive to assume the user knows that encoding. Hey, I still have to dump out the content of files to check the line ending convention when working in languages other than Python - universal newlines saves me needing to care about that, why is it so disastrous to consider having something similar for encodings? >> In Python 2, I can ignore the issue. Sure, I can end up with mojibake, >> but for my uses, that's not a disaster. Mostly-readable works. But in >> Python 3, I get an error and can't process the file. >> >> I can just use latin-1, or surrogateescape. But that doesn't come >> naturally to me yet. Maybe it will in time... Or maybe there's a >> better solution I don't know about yet. > > So why don't you use UTF-8? Decoding errors. > As far as those who actually don't know the convention, isn't it better to > teach them the convention "use UTF-8, unless dealing with legacy data" > rather than to avoid dealing with the issue by using > errors='surrogateescape'? Fair comment. My point here is that I *am* dealing with "legacy" data in your sense. And I do so on a day to day basis. UTF-8 is very, very rare in my world (Windows). Latin-1 (or something close) is common. There is no cross-platform standard yet. And probably won't be until Windows moves to UTF-8 as the standard encoding. Which ain't happening soon. > I'd hate for "surrogateescape" to become the One Obvious Way for dealing > with unknown encodings, because this is 2012 and people should be more savvy > about non-ASCII characters by now. I suppose it's marginally better than > just throwing them away with errors='ignore', but still. I think people are much more aware of the issues, but cross-platform handling remains a hard problem. I don't wish to make assumptions, but your insistence that UTF-8 is a viable solution suggests to me that you don't know much about the handling of Unicode on Windows. I wish I had that luxury... > I recently bought a book from Amazon UK. It was ?12 not \udcc2\udca312. ?12 in what encoding? :-) > This isn't entirely a rhetorical question. I'm not on Windows, so perhaps > there's a problem I'm unaware of. I think that's the key here. Even excluding places that don't use the Roman alphabet, Windows encoding handling is complex. CP1252, CP850, Latin-1, Latin-14 (Euro zone), UTF-16, BOMs. All are in use on my PC to some extent. And that's even without all this foreign UTF-8 I get from the Unix guys :-) Apart from the blasted UTF-16, all of it's "ASCII most of the time". Paul. From stefan_ml at behnel.de Sun Feb 12 14:33:13 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 12 Feb 2012 14:33:13 +0100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> Message-ID: Paul Moore, 12.02.2012 13:54: > Latin-1, Latin-14 (Euro zone) OT-remark: I assume you meant ISO8859-15 (aka. Latin-9) here. However, that's not for the "Euro zone", it's just Latin-1 with the Euro character wangled in and a couple of other changes. It still lacks characters that are commonly used by languages within the Euro zone, e.g. the Slovenian language (a Slavic descendant), but also Gaelic or Welsh. https://en.wikipedia.org/wiki/ISO/IEC_8859-15#Coverage https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Languages_commonly_supported_but_with_incomplete_coverage Stefan From ubershmekel at gmail.com Sun Feb 12 14:33:42 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 12 Feb 2012 15:33:42 +0200 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> Message-ID: On Sun, Feb 12, 2012 at 2:54 PM, Paul Moore wrote: > On 12 February 2012 05:03, Steven D'Aprano wrote: > > Paul Moore wrote: > > > >> My concern about Unicode in Python 3 is that the principle is, you > >> specify the right encoding. But often, I don't *know* the encoding ;-( > >> Text files, like changelogs as a good example, generally have no > >> marker specifying the encoding, and they can have all sorts (depending > >> on where the package came from). Worse, I am on Windows and changelogs > >> usually come from Unix developers - so I'm not familiar with the > >> common conventions ("well, of course it's in UTF-8, that's what > >> everyone uses"...) > > > > > > > > > > But you obviously do know the convention -- use UTF-8. > > No. I know that a lot of Unix people advocate UTF-8, and I gather it's > rapidly becoming standard in the Unix world. But I work on Windows, > and UTF-8 is not the standard there. I have no idea if UTF-8 is > accepted cross-platform, or if it's just what has grown as most > ChangeLog files are written on Unix and Unix users don't worry about > what's convenient on Windows (no criticism there, just acknowledgement > of a fact). And I have seen ChangeLog files with non-UTF-8 encodings > of names in them. I have no idea if that's a bug or just a preference > - and anyway, "be permissive in what you accept" applies... > > Windows NT started with UCS-16 and from Windows 2000 it's UTF-16 internally. It was an uplifting thought that unicode is just 2 bytes per letter so they did a huge refactoring of the entire windows API (ReadFileA/ReadFileW etc) thinking they won't have to worry about it again. Nowadays windows INTERNALS have the worst of all worlds - a variable char-length, uncommon unicode format, and twice the API to maintain. Notepad can open and save utf-8 files perfectly much like most other windows programs. UTF-8 is the internet standard and I suggest we keep that fact crystal clear. UTF-8 Is the goto codec, it is the convention. It's ok to use other codecs for whatever reasons, constraints, use cases, etc. But these are all exceptions to the convention - UTF8. Yuval (Also a windows dev) -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Sun Feb 12 14:52:20 2012 From: shibturn at gmail.com (shibturn) Date: Sun, 12 Feb 2012 13:52:20 +0000 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <4F3735F8.10607@molden.no> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> Message-ID: On 12/02/2012 3:46am, Sturla Molden wrote: > 1. start a process > 2. in the new process, create some shared memory (use the mmap module) > 3. make the parent process get access to it (should be easy, right?) As Mike says, on Unix you can just create a file in /tmp to back an mmap. On Linux, posix mmaps created with shm_open() seem to be normal files on a tmpfs file system, usually /dev/shm. Since /tmp is also usually a tmpfs file system on Linux, I assume this whould be equivalent in terms of overhead. On Windows you can use the tagname argument of mmap.mmap(). Maybe a BinaryBlob wrapper class could be created which lets an mmap be "pickled by reference". Managing life time and reliable cleanup might be awkward though. If the pickle overhead is the problem you could try Connection.send_bytes() and Connection.recv_bytes(). I suppose Queue objects could grow put_bytes() and get_bytes() methods too. Or a BytesQueue class could be created. > Can you do this? No? > > Then try the same thing with a lock (multiprocessing.Lock) or an event. I have a patch (http://bugs.python.org/issue8713) to make multiprocessing on Unix work with fork+exec which has to do this because semaphores cannot be inherited across exec. Making sure all the named semaphores get removed if the program terminates abnormally is a bit awkward though. It could be modified to make them picklable in general. On Windows dealing with "named objects" is easier since they are refcounted by the operating system and deleted when no more processes have handles for them. If you make a feature request at bugs.python.org I might work on a patch. Cheers sbt From sturla at molden.no Sun Feb 12 15:15:46 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 15:15:46 +0100 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <20120212030207.7cd2a5dc@bhuda.mired.org> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> <20120212030207.7cd2a5dc@bhuda.mired.org> Message-ID: <4F37C992.3090101@molden.no> Den 12.02.2012 09:02, skrev Mike Meyer: > True, but we're talking about an API, not a specific implementation. You have been complaining about the GIL which is a specific implementation. I am talking about how multiprocessing actually works, i.e. implementation. > >> But apart from that, show me how you would use the mmap module to make >> named shared memory on Linux or Windows. No, memory mapping file object >> -1 or 0 don't count, you get an anonymous memory mapping. > The linux mmap has the same arguments as the BSD one, so I'd expect it > to work the same. It calls BSD mmap in the implementation on Linux. It calls CreateFileMapping and MapViewOfFile on Windows. > Works exactly like I'd expect it to. > >> Show me how you would code this. > Here's the code that creates the shared file: > > share_name = '/tmp/xyzzy' > with open(share_name, 'wb') as f: > f.write(b'hello') > > Here's the code for the child: > > with open(share_name, 'r+b') as f: > share = mmap(f.fileno(), 0) > share[:5] = b'gone\n' > > Here's the code for the parent: > > child = Process(target=proc) > child.start() > with open(share_name, mode='r+b') as f: > share = mmap(f.fileno(), 0) > while share[0] == ord('h'): > sleep(1) > print('main:', share.readline()) Here you are memory mapping a temporary file, not shared memory. On Linux, shared memory with mmap does not have a share_name. It has fileno -1. So go ahead and replace f.fileno() with -1 and see if it still works for you. This is how mmap is used for shared memory on Linux: shm = mmap.mmap(-1, 4096) os.fork() See how the fork comes after the mmap. Which means it must always be allocated in the parent process. That is why we need an implementation with System V IPC instead of mmap. > Yes, but we're not talking about multiprocessing.Queue. We're talking > about mmap. multiprocessing.Queue doesn't use mmap. For that, you want > to us multiprocessing.Value and multiprocessing.Array. Pass multiprocessing.Value or multiprocessing.Array to multiprocessing.Queue and see what happens. And while you are at it, pass multiprocessing.Lock to multiprocessing.Queue and see what happens as well. Contemplate how we can pass an object with a lock as a message between two processes. Should we change the implementation? And then, look up the implementation for multiprocessing.Value and Array and see if (and how) they use mmap. Perhaps you just told me to use mmap instead of mmap. Sturla From sturla at molden.no Sun Feb 12 15:35:19 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 15:35:19 +0100 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> Message-ID: <4F37CE27.5070908@molden.no> Den 12.02.2012 14:52, skrev shibturn: > > As Mike says, on Unix you can just create a file in /tmp to back an > mmap. On Linux, posix mmaps created with shm_open() seem to be normal > files on a tmpfs file system, usually /dev/shm. Since /tmp is also > usually a tmpfs file system on Linux, I assume this whould be > equivalent in terms of overhead. Mark did not use shm_open, he memory mapped from disk. > I have a patch (http://bugs.python.org/issue8713) to make > multiprocessing on Unix work with fork+exec which has to do this > because semaphores cannot be inherited across exec. Making sure all > the named semaphores get removed if the program terminates abnormally > is a bit awkward though. It could be modified to make them picklable > in general. > > On Windows dealing with "named objects" is easier since they are > refcounted by the operating system and deleted when no more processes > have handles for them. > > If you make a feature request at bugs.python.org I might work on a patch. Cleaning up SysV ipc semaphores and shared memory is similar (semctl instead of shmctl to get refrerence count). And then we need a monkey patch for os._exit. Look at the Cython code here: http://dl.dropbox.com/u/12464039/sharedmem.zip Sturla From shibturn at gmail.com Sun Feb 12 16:20:11 2012 From: shibturn at gmail.com (shibturn) Date: Sun, 12 Feb 2012 15:20:11 +0000 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <4F37CE27.5070908@molden.no> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> <4F37CE27.5070908@molden.no> Message-ID: On 12/02/2012 2:35pm, Sturla Molden wrote: > Mark did not use shm_open, he memory mapped from disk. But if his /tmp is a tmpfs file system (which it usually is on Linux) then I think it is entirely equivalent. Or he could create the file in /dev/shm instead. Below is Blob class which seems to work. Note that the process which created the blob needs to wait for the other process to unpickle it before allowing it to be garbage collected. import multiprocessing as mp from multiprocessing.util import Finalize, get_temp_dir import mmap, sys, os, itertools class Blob(object): _counter = itertools.count() def __init__(self, length, name=None): self.length = length if sys.platform == 'win32': if name is None: name = 'blob-%s-%d' % (os.getpid(), next(self._counter)) self.name = name self.mmap = mmap.mmap(-1, length, self.name) else: if name is None: self.name = '%s/blob-%s-%d' % (get_temp_dir(), os.getpid(), next(self._counter)) flags = os.O_RDWR | os.O_CREAT | os.O_EXCL else: self.name = name flags = os.O_RDWR fd = os.open(self.name, flags, 0o600) try: if name is None: os.ftruncate(fd, length) Finalize(self, os.unlink, (self.name,), exitpriority=0) self.mmap = mmap.mmap(fd, length) finally: os.close(fd) def __reduce__(self): return Blob, (self.length, self.name) def child(conn): b = Blob(20) b.mmap[:5] = "hello" conn.send(b) conn.recv() # wait for acknowledgement before # allowing garbage collection if __name__ == '__main__': conn, child_conn = mp.Pipe() p = mp.Process(target=child, args=(child_conn,)) p.start() b = conn.recv() conn.send(None) # acknowledge receipt print repr(b.mmap[:]) From p.f.moore at gmail.com Sun Feb 12 17:30:59 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Feb 2012 16:30:59 +0000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> Message-ID: On 12 February 2012 13:33, Stefan Behnel wrote: > Paul Moore, 12.02.2012 13:54: >> Latin-1, Latin-14 (Euro zone) > > OT-remark: I assume you meant ISO8859-15 (aka. Latin-9) here. However, > that's not for the "Euro zone", it's just Latin-1 with the Euro character > wangled in and a couple of other changes. It still lacks characters that > are commonly used by languages within the Euro zone, e.g. the Slovenian > language (a Slavic descendant), but also Gaelic or Welsh. > > https://en.wikipedia.org/wiki/ISO/IEC_8859-15#Coverage > > https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Languages_commonly_supported_but_with_incomplete_coverage Yes, sorry. I misremembered and was sloppy in my wording. My apologies, and thanks for the correction. Paul From sturla at molden.no Sun Feb 12 21:33:30 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 21:33:30 +0100 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> <4F37CE27.5070908@molden.no> Message-ID: <4F38221A.5050208@molden.no> Den 12.02.2012 16:20, skrev shibturn: > > But if his /tmp is a tmpfs file system (which it usually is on Linux) > then I think it is entirely equivalent. Or he could create the file > in /dev/shm instead. It seems that on Linux /tmp is backed by shared memory. Which sounds rather strange to a Windows user, as the raison d'etre for tempfiles is temporary storage space that goes beyond physial RAM. I've also read that the use of ftruncate in this context can result in SIGBUS. > > Below is Blob class which seems to work. Note that the process which > created the blob needs to wait for the other process to unpickle it > before allowing it to be garbage collected. > I would look at kernel refcounts before unlinking. (But I am not that familiar with Linux.) Sturla From mwm at mired.org Sun Feb 12 21:56:11 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 12 Feb 2012 15:56:11 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> Message-ID: Sorry for the late reply, but this itch finally got to me... > Please do not claim that fork() semantics and copy-on-write are good > things to build off of... They work just fine for large classes of problems that require hundreds or thousands of cores. > They are not. ?fork() was designed in a > world *before threads* existed. This is wrong. While the "name" thread may not have existed when fork() was created. the *concept* of concurrent execution in a shared address space predates the creation of Unix by a good decade. Most notably, Multics - what the creators of Unix were working on before they did Unix - at least discussed the idea, though it may never have been implemented (a common fate of Multics features). Also notable is that Unix introduced the then ground-breaking idea of having the command processor create a new process to run user programs. Before Unix, user commands were run in the process (and hence address space) of the command processor. Running things in what is now called "the background" (which this architecture made a major PITA) gave you concurrent execution in a shared address space - what we today call threads. The reason those systems did this was because creating a process was *expensive*. That's also why the Multics folks looked at threads. The Unix fork/exec pair was cheap and flexible, allowing the creation of a command processor that supported easy backgrounding, pipes, and IO redirection. Fork has since gotten more expensive, in spite of the ongoing struggles to keep it cheap. > It simply can not be used reliably in > a process that uses threads and tons of real world practical C and C++ > software that Python programs need to interact with, be embedded in or > use via extension modules these days uses threads quite effectively. Personally, I find that threads can't be used reliably in a process that forks makes threads bad things to build off of. After all, there's tons of real world practical software in many languages that python needs to interact with that use fork effectively. > The multiprocessing module on posix would be better off if it offered > a windows CreateProcess() work-a-like mode that spawns a *new* python > interpreter process rather than depending on fork(). While it's a throwback to the 60s, it would make using threads and processes more convenient, but I don't need it. Why don't you submit a patch? References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: On 11 February 2012 21:24, Paul Moore wrote: > What I *don't* know is what those funny bits of > mojibake I see in the text editor are. > So, do yourself and to us, "the rest of the world", a favor, and open the file in binary mode. Also, I'd suggest you and anyone being picky about encoding to read http://www.joelonsoftware.com/articles/Unicode.html so you can finally have in your mind that *** ASCII is not text *** . It used to be text when to get to non-[A-Z|a-z] text you had to have someone recording a file in a tape, pack it in the luggage, and take a plane to "overseas" to the U.S.A. . That is not the case anymore, and that, as far as I understand, is the reasoning to Python 3 to default to unicode. Anyone can work "ignoring text" and treating bytes as bytes, opening a file in binary mode. You can use "os.linesep" instead of a hard-coded "\n" to overcome linebreaking. (Of course you might accidentally break a line inside a multi-byte character in some enconding, since you prefer to ignore them altogether, but it should be rare). js -><- -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sun Feb 12 23:14:51 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 23:14:51 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> Message-ID: <4F3839DB.7080804@molden.no> Den 12.02.2012 21:56, skrev Mike Meyer: > > While it's a throwback to the 60s, it would make using threads and > processes more convenient, but I don't need it. Why don't you submit a > patch? I suppose the Windows implementation would do this on Linux as well? At least it uses the subprocess module to spawn a new process. Though I am not sure how subprocess interacts with threads in Linux. Sturla From mwm at mired.org Sun Feb 12 23:14:50 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 12 Feb 2012 17:14:50 -0500 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <4F37C992.3090101@molden.no> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> <20120212030207.7cd2a5dc@bhuda.mired.org> <4F37C992.3090101@molden.no> Message-ID: <20120212171450.20366678@bhuda.mired.org> On Sun, 12 Feb 2012 15:15:46 +0100 Sturla Molden wrote: > Den 12.02.2012 09:02, skrev Mike Meyer: > > True, but we're talking about an API, not a specific implementation. > You have been complaining about the GIL which is a specific implementation. No, I haven't. To me, the GIL is one of the minor reasons to avoid using threads in Python. I doubt that I've mentioned it at all. Given how much attention you pay to details, I no longer care about getting an answer to my question, as I suspect that it will have as much accuracy as that statement. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From sturla at molden.no Sun Feb 12 23:20:05 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 23:20:05 +0100 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <20120212171450.20366678@bhuda.mired.org> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> <20120212030207.7cd2a5dc@bhuda.mired.org> <4F37C992.3090101@molden.no> <20120212171450.20366678@bhuda.mired.org> Message-ID: <4F383B15.2060305@molden.no> Den 12.02.2012 23:14, skrev Mike Meyer: > > lementation. >> You have been complaining about the GIL which is a specific implementation. > No, I haven't. To me, the GIL is one of the minor reasons to avoid > using threads in Python. I doubt that I've mentioned it at all. > > Given how much attention you pay to details, I no longer care about > getting an answer to my question, as I suspect that it will have as > much accuracy as that statement. > > My apologies, I was confusing you with Matt Joiner. Sturla From mwm at mired.org Sun Feb 12 23:21:05 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 12 Feb 2012 17:21:05 -0500 Subject: [Python-ideas] The concurrency discussion is off-topic! Message-ID: <20120212172105.2c98f820@bhuda.mired.org> Please take the concurrency discussion to: http://mail.python.org/mailman/listinfo/concurrency-sig -- Mike Meyer http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From sturla at molden.no Sun Feb 12 23:24:32 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 23:24:32 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> Message-ID: <4F383C20.1050205@molden.no> Den 12.02.2012 21:56, skrev Mike Meyer: > The reason those systems did this was because creating a process was > *expensive*. That's also why the Multics folks looked at threads. The > Unix fork/exec pair was cheap and flexible, allowing the creation of a > command processor that supported easy backgrounding, pipes, and IO > redirection. Fork has since gotten more expensive, in spite of the > ongoing struggles to keep it cheap. The "expensive" argument is also why the Windows API has no fork, although the Windows NT-kernel supports it. (There is even a COW fork in Windows' SUA.) I think fork() is the one function I have missed most when programming for Windows. It is the best reason to use SUA or Cygwin instead of the Windows API. Sturla From sturla at molden.no Sun Feb 12 23:31:10 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 23:31:10 +0100 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <20120212172105.2c98f820@bhuda.mired.org> References: <20120212172105.2c98f820@bhuda.mired.org> Message-ID: <4F383DAE.7000702@molden.no> Den 12.02.2012 23:21, skrev Mike Meyer: > Please take the concurrency discussion to: > > http://mail.python.org/mailman/listinfo/concurrency-sig > It might have diverged into something off-topic. But it started up as a response to Jesse Noller on improvement of multiprocessing's IPC objects. That is, e.g. being able to send an object with a mp.Lock accross a mp.Queue. That is not off-topic AFAIK. I think it is important with discussion and feedback on how these objects should work. Sturla From sturla at molden.no Sun Feb 12 23:33:00 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 12 Feb 2012 23:33:00 +0100 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <20120212172105.2c98f820@bhuda.mired.org> References: <20120212172105.2c98f820@bhuda.mired.org> Message-ID: <4F383E1C.90905@molden.no> Den 12.02.2012 23:21, skrev Mike Meyer: > Please take the concurrency discussion to: > > http://mail.python.org/mailman/listinfo/concurrency-sig > It seems that list has nearly zero traffic. Why post to a list that nobody reads? Sturla From mwm at mired.org Sun Feb 12 23:42:53 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 12 Feb 2012 17:42:53 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3839DB.7080804@molden.no> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> <4F3839DB.7080804@molden.no> Message-ID: <20120212174253.156c3660@bhuda.mired.org> [Replies have been sent to concurrency-sig at python.org] On Sun, 12 Feb 2012 23:14:51 +0100 Sturla Molden wrote: > Den 12.02.2012 21:56, skrev Mike Meyer: > > While it's a throwback to the 60s, it would make using threads and > > processes more convenient, but I don't need it. Why don't you submit a > > patch? > I suppose the Windows implementation would do this on Linux as well? At > least it uses the subprocess module to spawn a new process. Though I am > not sure how subprocess interacts with threads in Linux. subprocess and threads interact *really* badly on Unix systems. Python is missing the tools needed to deal with this situation properly. See http://bugs.python.org/issue6923. Just another of the minor reasons not to use threads in Python. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From mwm at mired.org Sun Feb 12 23:47:16 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 12 Feb 2012 17:47:16 -0500 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <4F383E1C.90905@molden.no> References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> Message-ID: <20120212174716.2dc90c95@bhuda.mired.org> On Sun, 12 Feb 2012 23:33:00 +0100 Sturla Molden wrote: Apologies for the blank response you got. > Den 12.02.2012 23:21, skrev Mike Meyer: > > Please take the concurrency discussion to: > > http://mail.python.org/mailman/listinfo/concurrency-sig > It seems that list has nearly zero traffic. Why post to a list that > nobody reads? Because that way, we won't be annoying the people who don't care about concurrency with an off-topic discussion. If you're interested in concurrency in Python, you should be reading that list. Given the amount of discussion here, I was surprised at how quite that list was. I suspect many of those here didn't know about it, and set about to correct that. Most of this discussion should be there, and then when that SIG has thrashed out a proposal for a change, it can be brought back here. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From anacrolix at gmail.com Mon Feb 13 01:06:03 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Mon, 13 Feb 2012 08:06:03 +0800 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <4F383E1C.90905@molden.no> References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> Message-ID: +1, that list is dead On Feb 13, 2012 6:33 AM, "Sturla Molden" wrote: > Den 12.02.2012 23:21, skrev Mike Meyer: > >> Please take the concurrency discussion to: >> >> http://mail.python.org/**mailman/listinfo/concurrency-**sig >> >> > It seems that list has nearly zero traffic. Why post to a list that nobody > reads? > > Sturla > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Mon Feb 13 01:13:36 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Mon, 13 Feb 2012 08:13:36 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120212174253.156c3660@bhuda.mired.org> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> <4F3839DB.7080804@molden.no> <20120212174253.156c3660@bhuda.mired.org> Message-ID: This attitude is exemplary of the status quo in Python on threads: Pretend they don't exist or you'll get hurt. On Feb 13, 2012 6:45 AM, "Mike Meyer" wrote: > [Replies have been sent to concurrency-sig at python.org] > > On Sun, 12 Feb 2012 23:14:51 +0100 > Sturla Molden wrote: > > Den 12.02.2012 21:56, skrev Mike Meyer: > > > While it's a throwback to the 60s, it would make using threads and > > > processes more convenient, but I don't need it. Why don't you submit a > > > patch? > > I suppose the Windows implementation would do this on Linux as well? At > > least it uses the subprocess module to spawn a new process. Though I am > > not sure how subprocess interacts with threads in Linux. > > subprocess and threads interact *really* badly on Unix > systems. Python is missing the tools needed to deal with this > situation properly. See http://bugs.python.org/issue6923. > > Just another of the minor reasons not to use threads in Python. > > -- > Mike Meyer http://www.mired.org/ > Independent Software developer/SCM consultant, email for more information. > > O< ascii ribbon campaign - stop html mail - www.asciiribbon.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Mon Feb 13 01:31:42 2012 From: shibturn at gmail.com (shibturn) Date: Mon, 13 Feb 2012 00:31:42 +0000 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <4F38221A.5050208@molden.no> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> <4F37CE27.5070908@molden.no> <4F38221A.5050208@molden.no> Message-ID: On 12/02/2012 8:33pm, Sturla Molden wrote: > It seems that on Linux /tmp is backed by shared memory. > > Which sounds rather strange to a Windows user, as the raison d'etre for > tempfiles is temporary storage space that goes beyond physial RAM. In reality /tmp is backed by swap space, so physical RAM does not impose a limit. Anonymous mmaps are also backed by swap space. > I've also read that the use of ftruncate in this context can result in > SIGBUS. Isn't that if you truncate the file to a smaller size *after* it has been mapped. As far as I am aware, using ftruncate to set the length *before* it can be mapped for the first time is standard practice and harmless. >> Below is Blob class which seems to work. Note that the process which >> created the blob needs to wait for the other process to unpickle it >> before allowing it to be garbage collected. >> > > I would look at kernel refcounts before unlinking. (But I am not that > familiar with Linux.) Even if you have automatic refcounting like on Windows, you still need to cope with lifetime management issues. If you put an object on a queue it may be a long time before the target process will unpickle the object and increase its refcount, and you must not decref the object until it has, or else it will disappear. I don't know how to get the ref count for a file descriptor on Unix. (And posix shared memory does not seems to get a refcount either, even though System V shared memory does.) sbt From mwm at mired.org Mon Feb 13 01:36:20 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 12 Feb 2012 19:36:20 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> <4F3839DB.7080804@molden.no> <20120212174253.156c3660@bhuda.mired.org> Message-ID: <20120212193620.65adad41@bhuda.mired.org> On Mon, 13 Feb 2012 08:13:36 +0800 Matt Joiner wrote: > This attitude is exemplary of the status quo in Python on threads: Pretend > they don't exist or you'll get hurt. Yup. After all, the answer to the question "Which modules in the standard library are thread-safe?" is "threading, queue, logging and functools" (at least, that's my best guess). Any effort to "fix" threading in Python is pretty much doomed until the authoritative answer to that question includes most of the standard library. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From sturla at molden.no Mon Feb 13 01:41:48 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 13 Feb 2012 01:41:48 +0100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> <4F3839DB.7080804@molden.no> <20120212174253.156c3660@bhuda.mired.org> Message-ID: <4F385C4C.2060600@molden.no> Den 13.02.2012 01:13, skrev Matt Joiner: > This attitude is exemplary of the status quo in Python on threads: Pretend > they don't exist or you'll get hurt. It's more that status quo on threads anywhere. Sturla From shibturn at gmail.com Mon Feb 13 01:42:41 2012 From: shibturn at gmail.com (shibturn) Date: Mon, 13 Feb 2012 00:42:41 +0000 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> <4F37CE27.5070908@molden.no> <4F38221A.5050208@molden.no> Message-ID: On 13/02/2012 12:31am, shibturn wrote: > Isn't that if you truncate the file to a smaller size *after* it has > been mapped. As far as I am aware, using ftruncate to set the length > *before* it can be mapped for the first time is standard practice and > harmless. Ah, on some Unixes ftruncate() limits the size of the file, but will not increase it. sbt From greg.ewing at canterbury.ac.nz Sun Feb 12 22:43:03 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 13 Feb 2012 10:43:03 +1300 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: <4F383267.2060600@canterbury.ac.nz> Paul Moore wrote: > I'd like to write code to do the same > job without needing to lie about what I know. I'm now 100% convinced > that encoding="ascii",errors="surrogateescape" is the way to say this > in code. Perhaps there should be a more shortwinded way of spelling this? -- Greg From pyideas at rebertia.com Mon Feb 13 01:50:34 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Sun, 12 Feb 2012 16:50:34 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F383267.2060600@canterbury.ac.nz> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <4F383267.2060600@canterbury.ac.nz> Message-ID: On Sun, Feb 12, 2012 at 1:43 PM, Greg Ewing wrote: > Paul Moore wrote: >> I'd like to write code to do the same >> job without needing to lie about what I know. I'm now 100% convinced >> that encoding="ascii",errors="surrogateescape" is the way to say this >> in code. > > Perhaps there should be a more shortwinded way of > spelling this? See http://bugs.python.org/issue13997 , mentioned earlier in the thread. Cheers, Chris From mwm at mired.org Mon Feb 13 01:53:16 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 12 Feb 2012 19:53:16 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F385C4C.2060600@molden.no> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> <4F3839DB.7080804@molden.no> <20120212174253.156c3660@bhuda.mired.org> <4F385C4C.2060600@molden.no> Message-ID: <20120212195316.66eab1a5@bhuda.mired.org> On Mon, 13 Feb 2012 01:41:48 +0100 Sturla Molden wrote: > Den 13.02.2012 01:13, skrev Matt Joiner: > > This attitude is exemplary of the status quo in Python on threads: Pretend > > they don't exist or you'll get hurt. > It's more that status quo on threads anywhere. Not (quite) true. There are a few fringe languages that have embraced threading and been built (or worked over) from the ground up to work well with it. I haven't seen any let you mix multiprocessing and threading safely, though, so the attitude there is "pretend fork doesn't exist or you'll get hurt." These are the places where I've seen safe (as in, I trusted them as much as I'd have trusted a version written using processes) non-trivial (as in, they were complex enough that if they'd been written in a mainstream language like Python, I wouldn't have trusted them) threaded applications. I strongly believe we need better concurrency solutions in Python. I'm not convinced that threading is best general solution, because threading is like the GIL: a kludge that solves the problem by fixing *everything*, whether it needs it or not, and at very high cost. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ctb at msu.edu Mon Feb 13 03:35:12 2012 From: ctb at msu.edu (C. Titus Brown) Date: Sun, 12 Feb 2012 18:35:12 -0800 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <4F383E1C.90905@molden.no> References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> Message-ID: <20120213023512.GE27683@idyll.org> On Sun, Feb 12, 2012 at 11:33:00PM +0100, Sturla Molden wrote: > Den 12.02.2012 23:21, skrev Mike Meyer: >> Please take the concurrency discussion to: >> >> http://mail.python.org/mailman/listinfo/concurrency-sig > > It seems that list has nearly zero traffic. Why post to a list that > nobody reads? It's the right place to discuss these things: concurrency-sig: Discussion of concurrency issues in python. and presumably you won't be e-mailing as many people who *aren't* interested in concurrency. Python-ideas is rapidly becoming the *wrong* place for this discussion: Python-ideas: This list is to contain discussion of speculative language ideas for Python for possible inclusion into the language. If an idea gains traction it can then be discussed and honed to the point of becoming a solid proposal to put to either python-dev or python-3000 as appropriate. So, whether or not it was the right place to begin with, could you please move it to concurrency-sig? thanks, --titus (moderator) -- C. Titus Brown, ctb at msu.edu From tjreedy at udel.edu Mon Feb 13 04:41:15 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 12 Feb 2012 22:41:15 -0500 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> Message-ID: On 2/12/2012 7:54 AM, Paul Moore wrote: > No. I know that a lot of Unix people advocate UTF-8, and I gather it's > rapidly becoming standard in the Unix world. But I work on Windows, Unicode and utf-8 is a standard for the world, not Unix. It surpassed us-ascii as the most used character encoding for the WWW about 4 years ago. https://en.wikipedia.org/wiki/ASCII XML is unicode based. I think it fair to say that UTF-8 (and UTF-16) are preferred encodings, as 'Encodings other than UTF-8 and UTF-16 will not necessarily be recognized by every XML parser' https://en.wikipedia.org/wiki/Xml#Encoding_detection OpenDocument is one of many xml-based formats. Any modern database program that intends to store arbitrary text must store unicode (or at least the BMP subset). So any text-oriented Windows program that gets input from the rest of the world has to handle unicode and at least the utf-8 encoding thereof. My impression is that Windows itself now uses unicode for text storage. It is a shame that it still somewhat hides that by using limited subset codepage facades. None of this minimizes the problem of dealing with text in the multiplicity of national and language encodings. None that is not the fault of unicode, and unicode makes dealing with multiple encodings at the same time much easier. It is too bad that unicode was only developed in the 1990s instead of the 1960s. -- Terry Jan Reedy From stephen at xemacs.org Mon Feb 13 04:55:37 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 12:55:37 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <50BA6538-76D0-4B1B-8C2A-6DBEB9B1B94B@gmail.com> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <87vcnevyd7.fsf@uwakimon.sk.tsukuba.ac.jp> <50BA6538-76D0-4B1B-8C2A-6DBEB9B1B94B@gmail.com> Message-ID: <87aa4nfkue.fsf@uwakimon.sk.tsukuba.ac.jp> Carl M. Johnson writes: > On Feb 10, 2012, at 5:32 PM, Stephen J. Turnbull wrote: > > > will founder on '?scar Fuentes' as author, unless you know what > > coding system is used, or know enough to use latin-1 (because > > it's effectively binary, not because it's the actual encoding). > > Or just use errors="surrogateescape". I think we should tell people > who are scared of unicode and refuse to learn how to use it to just > add an errors="surrogateescape" keyword to their file open > arguments. Obviously, it's the wrong thing to do, but it's wrong in > the same way that Python 2 bytes are wrong, so if you're absolutely > committed to remaining ignorant of encodings, you can continue to > do that. No, it's not the same as Python 2, and it's *subtly* the wrong thing to do, too. surrogateescape is intended to roundtrip on input from a specific API to unchanged output to that same API, and that's all it it is guaranteed to do. Less pedantically, if you use latin-1, the internal representation is valid Unicode but (partially) incorrect content. No UnicodeErrors. If you use errors="surrogateescape", any code that insists on valid Unicode will crash. Here I'm talking about a use case where the user believes that as long as the ASCII content is correct they will get correct output. It's arguable that using errors="surrogateescape" is a better approach, *because* of the possibility of a validity check. I tend to think not. But that's a different argument from "same as Python 2". From stephen at xemacs.org Mon Feb 13 05:03:29 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 13:03:29 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> Message-ID: <878vk7fkha.fsf@uwakimon.sk.tsukuba.ac.jp> Masklinn writes: > > Or just use the ISO-8859-1 encoding. > > It's true that requires to handle encodings upfront where Python 2 > allowed you to play fast-and-lose though. > > And using latin-1 in that context looks and feels weird/icky, the > file is not encoded using latin-1, the encoding just happens to > work to manipulate bytes as ascii text + non-ascii stuff. So give latin-1 an additional name. Emacsen use "raw-text" (there's also binary, but raw-text will do a loose equivalent of universal newlines for you, binary doesn't). You could also use a name more exact and less English-biased like "ascii-compatible-bytes". Same codec, name denotes different semantics. From stephen at xemacs.org Mon Feb 13 05:43:35 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 13:43:35 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <87vcnevyd7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <877gzrfimg.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > On 2/10/2012 10:32 PM, Stephen J. Turnbull wrote: > > The issue is whether Python 3 has a "strong imposition of Unicode > awareness" that Python 2 does not. If the OP only meant awareness of the > fact that something called 'unicode' exists, then I suppose that could > be argued. I interpreted the claim as being about some substantive > knowledge of unicode. I interpreted the claim as being about changing their coding practice, including maintaining existing scripts and modules that deal with textual input that people may need/want to transition to Python 3. As Paul Moore pointed out, adding "encoding='latin-1'" to their scripts doesn't come naturally to everyone. I'm sure that at a higher level, that's the stance you intend to take, too. I think there's a disconnect between that high-level stance, and the interpretation that it's about "substantive knowledge of Unicode". > In any case, the claim that I disagree a not about people's > reactions to Python 3 or about human psychology and the propensity > to stick with the known. OK. But then I think you are failing to deal with the problem, because I think *that* is the problem. Python 3 doesn't lack simple idioms for making (most naive, near-English) processing look like Python 2 to a greater or lesser extent. The question is which of those idioms we should teach, and AFAICS what's controversial about that depends on human psychology, not on the admitted facts about Python 3. > In response to Jim Jewett, you wrote > > The fact is that with a little bit of knowledge, you can almost > > certainly get more reliable (and in case of failure, more debuggable) > > results from Python 3 than from Python 2. > > That is pretty much my counterclaim, with the note that the 'little > bit of knowledge' is most about non-unicode encodings and the > change to some Python details. And my counterrebuttal is "true -- but that's not what these users want, and they probably don't need it." That is, they don't want to debug a crash when they don't care what happens to non-ASCII in their mostly-ASCII, nearly-readable-as-English byte streams. > > The point is that the user case you discuss is a toy case. > > Thanks for dismissing me and perhaps a hundred thousand users as a > 'toy cases'. Thanks for unwarrantedly dissing me. I do *not* dismiss people. I claim that the practical use case for these users is *not* 6-sigma- pure ASCII. You, too, will occasionally see Mr. Fuentes or even his Israeli sister-in-law show up in your "pure ASCII, or so I thought" texts. Better-than-Ivory-soap-pure *is* a "toy" case. Only in one's own sandbox can that be guaranteed. Otherwise, Python 3 needs to be instructed to prepare for (occasional) non-ASCII. > Exactly, and finding the Python 3 version of the magic spells > needed in various cases, so they can be documented and publicized, > is what I have been trying to do. For ascii-only use, the magic > spell in 'ascii' in bytes() calls. Except that AFAIK Python 3 already handles pure ASCII pretty much automatically. But pure ASCII doesn't exist for most people any more, even in Kansas; that magic spell will crash. 'latin-1' is a much better spell (except for people who want to crash in appropriate circumstances -- but AFAIK in the group whose needs this thread addresses, they are a tiny minority). > > I don't know of any nice way to say that. > > There was no need to say it. Maybe not, but I think there was. Some of your well-intended recommendations are unrealistic, and letting them pass would be a disservice to the users we are *both* trying to serve. From stephen at xemacs.org Mon Feb 13 05:50:04 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 13:50:04 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8762fbfibn.fsf@uwakimon.sk.tsukuba.ac.jp> Masklinn writes: > Why not open the file in binary mode in stead? (and replace `'*'` > by `b'*'` in the startswith call) This will often work, but it's task-dependent. In particular, I believe not just `.startswith(), but general regexps work with either bytes or str in Python 3. But other APIs may not. and you're going to need to prefix *all* literals (including those in modules your code imports!) with `b`. So you import a module that does exactly what you want, and be stymied by a TypeError because the module wants Unicode. This would not happen with Python 2, and there's the rub. From stephen at xemacs.org Mon Feb 13 06:04:40 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 14:04:40 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> Masklinn writes: > Except it's not processed as text, it's processed as "stuff with ascii > characters in it". Might just as well be cp-1252, or UTF-8, or Shift JIS > (which is kinda-sorta-extended-ascii but not exactly), and while using > an ISO-8859 will yield unicode data that's about the only thing you can > say about it and the actual result will probably be mojibake either > way. That's the coding pedant's way to look at it. However, people who speak only ASCII or Latin 1 are in general not going to see it that way. The ASCII speakers are a pretty clear-cut case. Using 'latin-1' as the codec, almost all things they can do with a 100% ASCII program and a sanely-encoded text (which leaves out Shift JIS, Big 5, and maybe some obsolete Vietnamese encodings, but not much else AFAIK) will pass through the non-ASCII verbatim, or delete it. Latin 1 speakers are harder, because they might do things like convert accented characters to their base, which would break multibyte characters in Asian languages. Still, one suspects that they mostly won't care terribly much about that (if they did, they'd be interested in using Unicode properly, and it would be worth investing the small amount of time required to learn a couple of recipes). > By processing it as bytes, it's made explicit that this is not > known and decoded text (which is what unicode strings imply) but > that it's some semi-arbitrary ascii-compatible encoding and that's > the extent of the developer's knowledge and interest in it. No, decoding with 'latin-1' is a far better approach for this purpose. If the name bothers you, give it an alias like 'semi-arbitrary-ascii-compatible'. The problem is that for many operations, b'*' and 'trailing text' are incompatible. Try concatenating them, or testing one against the other with .startswith(), or whatever. Such literals are buried in many modules, and you will lose if you're using bytes because those modules generally assume you're working with str. From stephen at xemacs.org Mon Feb 13 06:12:56 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 14:12:56 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> Message-ID: <8739affh9j.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > I'm now 100% convinced that > encoding="ascii",errors="surrogateescape" is the way to say this in > code. It probably is, for you. If that ever gives you a UnicodeError, you know how to find out how to deal with it. And it probably won't. That may also be a good universal default for Python 3, as it will pass through non-ASCII text unchanged, while raising an error if the program tries to manipulate it (or hand it to a module that validates). (encoding='latin-1' definitely is not a good default.) But I'm not sure of that, and the current approach of using the preferred system encoding is probably better. I don't think either argument applies to everybody who needs such a recipe, though. Many will be best served with encoding='latin-1' by some name. From ncoghlan at gmail.com Mon Feb 13 06:16:09 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Feb 2012 15:16:09 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <8762fbfibn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <8762fbfibn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Feb 13, 2012 at 2:50 PM, Stephen J. Turnbull wrote: > Masklinn writes: > > ?> Why not open the file in binary mode in stead? (and replace `'*'` > ?> by `b'*'` in the startswith call) > > This will often work, but it's task-dependent. ?In particular, I > believe not just `.startswith(), but general regexps work with either > bytes or str in Python 3. ?But other APIs may not. and you're going to > need to prefix *all* literals (including those in modules your code > imports!) with `b`. ?So you import a module that does exactly what you > want, and be stymied by a TypeError because the module wants Unicode. > > This would not happen with Python 2, and there's the rub. The other trap is APIs like urllib.parse which explicitly refuse the temptation to guess when it comes to bytes data, and decodes it as "ascii+strict". If you want it to do something else that's more permissive (e.g. "latin-1" or "ascii+surrogateescape") then you *have* to decode it to Unicode yourself before handing it over. Really, Python 3 forces programmers to learn enough about Unicode to be able to make the choice between the 4 possible options for processing ASCII-compatible encodings: 1. Process them as binary data. This is often *not* going to be what you want, since many text processing APIs will either only accept Unicode, or only pure ASCII, or require you to supply encoding+errors if you want them to process binary data. 2. Process them as "latin-1". This is the answer that completely bypasses all Unicode integrity checks. If you get fed non-ASCII data, you *will* silently produce gibberish as output. 3. Process them as "ascii+surrogateescape". This is the *right* answer if you plan solely to manipulate the text and then write it back out in the same encoding as was originally received. You will get errors if you try to write a string with escaped characters out to a non-ascii channel or an ascii channel without surrogateescape enabled. To write such strings to non-ascii channels (e.g. sys.stdout), you need to remember to use something like "ascii+replace" to mask out the values with unknown encoding first. You may still get hard to debug UnicodeEncodeError exceptions when handed data in a non-ASCII compatible encoding (like UTF-16 or UTF-32), but your odds of silently corrupting data are fairly low. 4. Get a third party encoding guessing library and use that instead of waving away the problem of ASCII-incompatible encodings. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Mon Feb 13 06:24:54 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 14:24:54 +0900 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <9CBC5148-A454-4FE3-9F2E-18A7FCB27CE7@gmail.com> <4F374D80.7030309@pearwood.info> Message-ID: <871upzfgpl.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Yeah, it didn't take long for me to come back around to that point of > view, so I morphed http://bugs.python.org/issue13997 into a docs bug > about clearly articulating the absolute bare minimum knowledge of > Unicode needed to process text in a robust cross-platform manner in > Python 3 instead. +1 I think (as I've said more verbosely elsewhere) that there are two common use cases, corresponding to two different definitions of "robust text processing". (1) Use cases where you would rather risk occasionally corrupting non-ASCII text than risk *any* UnicodeErrors at all *anywhere*. They use encoding='latin-1'. (2) Use cases where you do not want to deal with encodings just to "pass through" non-ASCII text, but do want that text preserved enough to be willing to risk (rare) UnicodeErrors or validation errors from pedantic Unicode-oriented modules. They use encoding='ascii', errors='surrogateescape'. From stephen at xemacs.org Mon Feb 13 06:42:00 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 14:42:00 +0900 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> Message-ID: <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > > But you obviously do know the convention -- use UTF-8. > > No. I know that a lot of Unix people advocate UTF-8, and I gather it's > rapidly becoming standard in the Unix world. But I work on Windows, > and UTF-8 is not the standard there. I have no idea if UTF-8 is > accepted cross-platform, It is. All of Microsoft's programs (and I suppose most third-party software, too) that I know of will happily import UTF-8-encoded text, and produce it as well. Most Microsoft-specific file formats (eg, Word) use UTF-16 internally, but they can't be read by most text-oriented programs, so in practice they're app/octet-strm. The problem is the one you point out: files you receive from third parties are still fairly likely to be in a non-Unicode encoding. > Fair comment. My point here is that I *am* dealing with "legacy" data > in your sense. And I do so on a day to day basis. UTF-8 is very, very > rare in my world (Windows). Latin-1 (or something close) is common. > > There is no cross-platform standard yet. And probably won't be until > Windows moves to UTF-8 as the standard encoding. Which ain't happening > soon. True. But for personal use, and for communicating with people you have some influence over, you can use/recommend UTF-8 safely as far I know. I occasionally get asked by Japanese people why files I send in UTF-8 are broken; it invariably turns out that they sent me a file in Shift JIS that contained a non-JIS (!) character and my software translated it to REPLACEMENT CHARACTER before sending as UTF-8. > I think people are much more aware of the issues, but cross-platform > handling remains a hard problem. I don't wish to make assumptions, but > your insistence that UTF-8 is a viable solution suggests to me that > you don't know much about the handling of Unicode on Windows. I wish I > had that luxury... I don't understand what you mean by that. Windows doesn't make handling any non-Unicode encodings easy, in my experience, except for the local code page. So, OK, if you're in a monolingual Windows environment (eg, the typical Japanese office), everybody uses a common legacy encoding for file exchange (including URLs and MIME filename= :-(, in particular Shift JIS), and only that encoding works well (ie, without the assistance of senior tech support personnel). Handling Unicode, though, isn't really an issue; all of Microsoft's programs happily deal with UTF-8 and UTF-16 (in its several varieties). > And that's even without all this foreign UTF-8 I get from the Unix > guys :-) Apart from the blasted UTF-16, all of it's "ASCII most of > the time". Indeed. Do you really see UTF-16 in files that you process with Python? From stephen at xemacs.org Mon Feb 13 06:49:19 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 14:49:19 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F383267.2060600@canterbury.ac.nz> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <4F383267.2060600@canterbury.ac.nz> Message-ID: <87y5s7e10g.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Paul Moore wrote: > > I'd like to write code to do the same > > job without needing to lie about what I know. I'm now 100% convinced > > that encoding="ascii",errors="surrogateescape" is the way to say this > > in code. > > Perhaps there should be a more shortwinded way of > spelling this? Yes! However, I don't think this 1.5-liner needs to be a built-in. (The 1.5-liner for 'open_as_ascii_compatible' was posted elsewhere.) There's also the issue of people who strongly prefer sloppy encoding and Read My Lips: No UnicodeErrors. I disagree with them in all purity, but you know .... From ncoghlan at gmail.com Mon Feb 13 06:54:24 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 13 Feb 2012 15:54:24 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Feb 13, 2012 at 3:04 PM, Stephen J. Turnbull wrote: > The ASCII speakers are a pretty clear-cut case. ?Using 'latin-1' as > the codec, almost all things they can do with a 100% ASCII program and > a sanely-encoded text (which leaves out Shift JIS, Big 5, and maybe > some obsolete Vietnamese encodings, but not much else AFAIK) will pass > through the non-ASCII verbatim, or delete it. I'd hazard a guess that the non-ASCII compatible encoding mostly likely to be encountered outside Asia is UTF-16. The choice is really between "never give me UnicodeErrors, but feel free to silently corrupt the data stream if I do the wrong thing with that data" (i.e. "latin-1") and "correctly handle any ASCII compatible encoding, but still throw UnicodeEncodeError if I'm about to emit corrupted data" ("ascii+surrogateescape"). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mwm at mired.org Mon Feb 13 06:57:40 2012 From: mwm at mired.org (Mike Meyer) Date: Mon, 13 Feb 2012 00:57:40 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120213023640.GF27683@idyll.org> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> <4F3839DB.7080804@molden.no> <20120212174253.156c3660@bhuda.mired.org> <20120212193620.65adad41@bhuda.mired.org> <20120213023640.GF27683@idyll.org> Message-ID: <20120213005740.0db1cb38@bhuda.mired.org> On Sun, 12 Feb 2012 18:36:40 -0800 "C. Titus Brown" wrote: > "All of them except subprocess, on some platforms" is the answer, AFAIK. Which > is kind of the point. Do you have any documentation to back this up? For instance, The collections and random module are both known to have code in them that isn't thread safe. For the random module, you can check the docstring: Help on method gauss in module random: gauss(self, mu, sigma) method of random.Random instance Gaussian distribution. mu is the mean, and sigma is the standard deviation. This is slightly faster than the normalvariate() function. Not thread-safe without a lock around calls. For the collections module, I quote the functools module: lock = Lock() # needed because ordereddicts aren't threadsafe The argparse and pprint modules both use ordereddicts without either locking them providing an explanation as to why they don't need to, which makes both of them suspect as well. Given those cases, I'm not willing to trust a simple assertion that a module is thread-safe, unless it's from the author or a primary maintainer of the module, or someone who's actually audited the module in question for thread safety. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ctb at msu.edu Mon Feb 13 07:03:50 2012 From: ctb at msu.edu (C. Titus Brown) Date: Sun, 12 Feb 2012 22:03:50 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120213005740.0db1cb38@bhuda.mired.org> References: <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> <4F3839DB.7080804@molden.no> <20120212174253.156c3660@bhuda.mired.org> <20120212193620.65adad41@bhuda.mired.org> <20120213023640.GF27683@idyll.org> <20120213005740.0db1cb38@bhuda.mired.org> Message-ID: <20120213060350.GA11284@idyll.org> On Mon, Feb 13, 2012 at 12:57:40AM -0500, Mike Meyer wrote: > On Sun, 12 Feb 2012 18:36:40 -0800 > "C. Titus Brown" wrote: > > > "All of them except subprocess, on some platforms" is the answer, AFAIK. Which > > is kind of the point. > > Do you have any documentation to back this up? For instance, The > collections and random module are both known to have code in them that > isn't thread safe. For the random module, you can check the docstring: > > Help on method gauss in module random: > > gauss(self, mu, sigma) method of random.Random instance > Gaussian distribution. > > mu is the mean, and sigma is the standard deviation. This is > slightly faster than the normalvariate() function. > > Not thread-safe without a lock around calls. > > For the collections module, I quote the functools module: > > lock = Lock() # needed because ordereddicts aren't threadsafe > > The argparse and pprint modules both use ordereddicts without either > locking them providing an explanation as to why they don't need to, > which makes both of them suspect as well. > > Given those cases, I'm not willing to trust a simple assertion that a > module is thread-safe, unless it's from the author or a primary > maintainer of the module, or someone who's actually audited the module > in question for thread safety. Good points; I was equating thread safety with not crashing, when I should have been thinking about consistency in other ways. thanks, --titus p.s. Why did you take a private e-mail response and reply to it to the group? Bad netiquette & rather rude. (Private not because I object to being pointed out as being wrong, but because I'm tired of these long discussions being sent to python-ideas.) -- C. Titus Brown, ctb at msu.edu From stephen at xemacs.org Mon Feb 13 07:23:33 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 15:23:33 +0900 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <20120212043411.GA442@cskk.homeip.net> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <20120212043411.GA442@cskk.homeip.net> Message-ID: <87r4xzdzfe.fsf@uwakimon.sk.tsukuba.ac.jp> Cameron Simpson writes: > At least with Python 3 you find out early that you're doing something > dodgy. The point is that there is a use case for "doing something dodgy." See Paul Moore's subthread for an example and discussion. However, I think people who do something dodgy should be forced to make it explicit in their code. From stephen at xemacs.org Mon Feb 13 07:30:34 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 13 Feb 2012 15:30:34 +0900 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87pqdjdz3p.fsf@uwakimon.sk.tsukuba.ac.jp> Sorry for the self-reply, but this should be clarified. Stephen J. Turnbull writes: > know. I occasionally get asked by Japanese people why files I send in > UTF-8 are broken; it invariably turns out that they sent me a file in > Shift JIS that contained a non-JIS (!) character and my software > translated it to REPLACEMENT CHARACTER before sending as UTF-8. Ie, the breakage that you're likely to encounter in using UTF-8 wherever possible is *very* minor, and typically related to somebody else failing to conform to standards. From cs at zip.com.au Mon Feb 13 08:21:05 2012 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 13 Feb 2012 18:21:05 +1100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <87r4xzdzfe.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87r4xzdzfe.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120213072105.GA8419@cskk.homeip.net> On 13Feb2012 15:23, Stephen J. Turnbull wrote: | Cameron Simpson writes: | > At least with Python 3 you find out early that you're doing something | > dodgy. | | The point is that there is a use case for "doing something dodgy." | See Paul Moore's subthread for an example and discussion. Yes. | However, I think people who do something dodgy should be forced to | make it explicit in their code. I think I agree here, too. -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ There are old climbers, and there are bold climbers; but there are no old bold climbers. From p.f.moore at gmail.com Mon Feb 13 09:12:43 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 13 Feb 2012 08:12:43 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <8739affh9j.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <8739affh9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 13 February 2012 05:12, Stephen J. Turnbull wrote: > Paul Moore writes: > > ?> I'm now 100% convinced that > ?> encoding="ascii",errors="surrogateescape" is the way to say this in > ?> code. > > It probably is, for you. ?If that ever gives you a UnicodeError, you > know how to find out how to deal with it. ?And it probably won't. And yet, after your earlier posting on latin-1, and your comments here, I'm less certain. Thank you so much :-) Seriously, I find these discussions about Unicode immensely useful. I now have a much better feel for how to deal with (and think about) text in "unknown but mostly ASCII" format, which can only be a good thing. > I don't think either argument applies to everybody who needs such a > recipe, though. ?Many will be best served with encoding='latin-1' by > some name. Probably the key question is, how do we encapsulate this debate in a simple form suitable for people to find out about *without* feeling like they "have to learn all about Unicode"? A note in the Unicode HOWTO seems worthwhile, but how to get people to look there? Given that this is people who don't want to delve too deeply into Unicode issues. Just to be clear, my reluctance to "do the right thing" was *not* because I didn't want to understand Unicode - far from it, I'm interested in, and inclined towards, "doing Unicode right". The problem is that I know enough to realise that "proper" handling of files where I don't know the encoding, and it seems to be inconsistent sometimes (both between files, and even on occasion within a file), is a seriously hard issue. And I don't want to get into really hard Unicode issues for what, in practical terms, is a simple problem as it's one-off code and minor corruption isn't really an issue. Paul. From ubershmekel at gmail.com Mon Feb 13 09:26:09 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Mon, 13 Feb 2012 10:26:09 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <8739affh9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Feb 13, 2012 10:13 AM, "Paul Moore" wrote: > > On 13 February 2012 05:12, Stephen J. Turnbull wrote: > > Paul Moore writes: > > > > > I'm now 100% convinced that > > > encoding="ascii",errors="surrogateescape" is the way to say this in > > > code. > > > > It probably is, for you. If that ever gives you a UnicodeError, you > > know how to find out how to deal with it. And it probably won't. > > And yet, after your earlier posting on latin-1, and your comments > here, I'm less certain. Thank you so much :-) > > Seriously, I find these discussions about Unicode immensely useful. I > now have a much better feel for how to deal with (and think about) > text in "unknown but mostly ASCII" format, which can only be a good > thing. > > > I don't think either argument applies to everybody who needs such a > > recipe, though. Many will be best served with encoding='latin-1' by > > some name. > > Probably the key question is, how do we encapsulate this debate in a > simple form suitable for people to find out about *without* feeling > like they "have to learn all about Unicode"? A note in the Unicode > HOWTO seems worthwhile, but how to get people to look there? Given > that this is people who don't want to delve too deeply into Unicode > issues. > > Just to be clear, my reluctance to "do the right thing" was *not* > because I didn't want to understand Unicode - far from it, I'm > interested in, and inclined towards, "doing Unicode right". The > problem is that I know enough to realise that "proper" handling of > files where I don't know the encoding, and it seems to be inconsistent > sometimes (both between files, and even on occasion within a file), is > a seriously hard issue. And I don't want to get into really hard > Unicode issues for what, in practical terms, is a simple problem as > it's one-off code and minor corruption isn't really an issue. > > Paul. Adding a url for help in the exception string that points to a python unicode faq sounds like a good idea. -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopherreay at gmail.com Mon Feb 13 09:50:03 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Mon, 13 Feb 2012 10:50:03 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <8739affh9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: +1 for the URL in the exception. Well in all exceptions Bringing the language into the 21st century. Great entry points for learning about the language. Whilst google provides an excellent service in finding documentation, it seems that a programming language has other methods of defining entry points for learning, being a complex but (mostly) deterministic thing. So exceptions with URLs. The URLs point to kind of "knowledge base wiki" sorts of things where the "What is your intent/usecase" can be matched up with the deterministic state we know the interpreter is in. With something like encodings, which can be happily ignored by someone until poof, suddenly they just have mush, finding out things like "Its possible printing the string to the screen is giving the error", and "There are libraries which guess encodings" and "latin-1" is a magic bullet can take many many days of searching. Also it may be possible, from this perspective, to show ways that the developer can gather more deterministic information about his interpreter's state to narrow down his intent for the Knowledge Base (e.g. if its a print statement that throws the error, its possible the program doesnt have any encoding issues, except debugging statements) The encoding issue here is a great example of this because of the complexity and mobility of encodings (i.e. they ve changed a lot). There must be other good examples which can fireup equally strong and informative discussion on "options" and their limitations and benefits. Id be very interested in formalising the idea of a "KnowledgeBase Wiki thing", maybe there already is one... -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmjohnson.mailinglist at gmail.com Mon Feb 13 10:19:15 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Sun, 12 Feb 2012 23:19:15 -1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <8739affh9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Feb 12, 2012, at 10:50 PM, Christopher Reay wrote: > +1 for the URL in the exception. Well in all exceptions > > Bringing the language into the 21st century. > Great entry points for learning about the language. That's not a bad idea. We might want to use some kind of URL shortener for length and future proofing though. If the site changes, we can have redirection of the short URLs updated. Something like http://pyth.on/e1234 --> http://docs.python.org/library/exceptions.html From p.f.moore at gmail.com Mon Feb 13 12:14:42 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 13 Feb 2012 11:14:42 +0000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 13 February 2012 05:42, Stephen J. Turnbull wrote: > Paul Moore writes: > > ?> > But you obviously do know the convention -- use UTF-8. > ?> > ?> No. I know that a lot of Unix people advocate UTF-8, and I gather it's > ?> rapidly becoming standard in the Unix world. But I work on Windows, > ?> and UTF-8 is not the standard there. I have no idea if UTF-8 is > ?> accepted cross-platform, > > It is. ?All of Microsoft's programs (and I suppose most third-party > software, too) that I know of will happily import UTF-8-encoded text, > and produce it as well. ?Most Microsoft-specific file formats (eg, > Word) use UTF-16 internally, but they can't be read by most > text-oriented programs, so in practice they're app/octet-strm. If I create a new text file in Notepad or Vim on my PC, it's not created in UTF-8 by default. Vim uses Latin-1, and Notepad uses "ANSI" (which I'm pretty sure translates to CP1252 (but there are so few differences between this and latin-1, that I can't easily test this at the moment). If I do "chcp" on a console window, I get codepage 850, and in CMD, echo a?b >file.txt encodes the file in CP850. echo a?b >file.txt in Powershell creates little-endian UTF-16 with a BOM. The out-file cmdlet in Powershell (which lets me specify an encoding to override the UTF-16 of the standard redirection) says this about the encoding parameter: -Encoding Specifies the type of character encoding used in the file. Valid values are "Unicode", "UTF7", "UTF8", "UTF32 "ASCII", "BigEndianUnicode", "Default", and "OEM". "Unicode" is the default. "Default" uses the encoding of the system's current ANSI code page. "OEM" uses the current original equipment manufacturer code page identifier for the operating system. With this I can at least get UTF-8 (with BOM). But it's a long way from simple to do so... Basically, In my experience, Windows users are not likely to produce UTF-8 formatted files unless they make specific efforts to do so. I have heard anecdotal evidence that attempts to set the configuration on Windows to produce UTF-8 by default hit significant issues. So don't expect to see Windows users producing UTF-8 by default anytime soon. > The problem is the one you point out: files you receive from third > parties are still fairly likely to be in a non-Unicode encoding. And, if I don't concentrate, I produce non-UTF8 files myself. The good news is that Python 3 generally works fine with files I produce myself, as it follows the system encoding. >python Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.getpreferredencoding() 'cp1252' Near enough, as the only character I tend to use is ?, and latin-1 and cp1252 concur on that (and I know what CP850 ? signs look like in latin-1/cp1252, so I can spot that particular error). Of course, that means that processing UTF-8 always needs me to explicitly set the encoding. Which in turn means that (if I care - back to the original point) I need to go checking for non-ASCII characters, do a quick hex dump to check they look like utf-8 and set the encoding. Or go with the default and risk mojibake (cp1252 is not latin-1 AIUI, so won't roundtrip bytes). Or go the "don't care" route. All of this simply because I feel that it's impolite to corrupt someone's name in my output just because they have an accented letter in their name :-) As I say: - I know what to do - It can be a lot of work - Frankly, the damage is minor (these are usually personal or low-risk scripts) - The temptation to say "stuff it" and get on with my life is high - It frustrates me that Python by default tempts me to *not* do the right thing Maybe the answer is to have some form of encoding-detection function in the standard library. It doesn't have to be 100% accurate, and it certainly shouldn't be used anywhere by default, but it would be available for people who want to do the right thing without over-engineering things totally. > True. ?But for personal use, and for communicating with people you > have some influence over, you can use/recommend UTF-8 safely as far I > know. ?I occasionally get asked by Japanese people why files I send in > UTF-8 are broken; it invariably turns out that they sent me a file in > Shift JIS that contained a non-JIS (!) character and my software > translated it to REPLACEMENT CHARACTER before sending as UTF-8. Maybe it's different in Japan, where character sets are more of a common knowledge issue? But if I tried to say to one of my colleagues that the spooled output of a SQL query they sent me (from a database with one encoding, through a client with no real encoding handling beyond global OS-level defaults) didn't use UTF-8, I'd get a blank look at best. I've had to debug encoding issues for database programmers only to find that they don't even know what encodings are about - and they are writing multilingual applications! (Before someone says, yes, of course this is terrible, and shouldn't happen - but it does, and these are the places I get weirdly-encoded text files from...) > ?> I think people are much more aware of the issues, but cross-platform > ?> handling remains a hard problem. I don't wish to make assumptions, but > ?> your insistence that UTF-8 is a viable solution suggests to me that > ?> you don't know much about the handling of Unicode on Windows. I wish I > ?> had that luxury... > > I don't understand what you mean by that. ?Windows doesn't make > handling any non-Unicode encodings easy, in my experience, except for > the local code page. ?So, OK, if you're in a monolingual Windows > environment (eg, the typical Japanese office), everybody uses a common > legacy encoding for file exchange (including URLs and MIME filename= > :-(, in particular Shift JIS), and only that encoding works well (ie, > without the assistance of senior tech support personnel). ?Handling > Unicode, though, isn't really an issue; all of Microsoft's programs > happily deal with UTF-8 and UTF-16 (in its several varieties). What I was trying to say was that typical Windows environments (where people don't interact often with Unix utilities, or if they do it's with ASCII characters almost exclusively) hide the details of Unicode from the end user to the extent that they don't know what's going on under the hood, and don't need to care. Much like Python 2, I guess :-) > Indeed. ?Do you really see UTF-16 in files that you process with > Python? Powershell generates it. See above. But no, not often, and it's easy to fix. Meh, for easy read cmd /c "iconv -f utf-16 -t utf-8 u1 >u2" or set-content u2 (get-content u1) -encoding utf8 if I don't mind a BOM. No, Unicode on Windows isn't easy :-( Paul From christopherreay at gmail.com Mon Feb 13 12:41:57 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Mon, 13 Feb 2012 13:41:57 +0200 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <20120213023512.GE27683@idyll.org> References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> Message-ID: Its not like there is a huge amount of traffic in python-ideas But if its annoying people ill sign up to concurrency sig How many mailing lists do I need to sign up to to make sure Im not missing something I might be intersted in. Email "Subject" fields, I find, are quite useful -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopherreay at gmail.com Mon Feb 13 12:42:53 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Mon, 13 Feb 2012 13:42:53 +0200 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> Message-ID: Also, coming up with ideas for new ways of doing things should include significant discussions of what is already there On 13 February 2012 13:41, Christopher Reay wrote: > Its not like there is a huge amount of traffic in python-ideas > > But if its annoying people ill sign up to concurrency sig > > How many mailing lists do I need to sign up to to make sure Im not missing > something I might be intersted in. > > Email "Subject" fields, I find, are quite useful > -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Feb 13 13:10:36 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Feb 2012 13:10:36 +0100 Subject: [Python-ideas] The concurrency discussion is off-topic! References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> Message-ID: <20120213131036.40aed9c2@pitrou.net> On Mon, 13 Feb 2012 13:41:57 +0200 Christopher Reay wrote: > Its not like there is a huge amount of traffic in python-ideas Are you kidding? The idiotic "TIOBE -3%" discussion thread is probably in the hundred of answers now. That's for a completely vacuous thread launched 4 days ago by a well-known troll. python-ideas is not a playground for people with an opinion. It's a communication tool for core development. Thanks Antoine. From christopherreay at gmail.com Mon Feb 13 13:21:17 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Mon, 13 Feb 2012 14:21:17 +0200 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <20120213131036.40aed9c2@pitrou.net> References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> <20120213131036.40aed9c2@pitrou.net> Message-ID: Hmm. Stimulating people to express what they believe the major hurdles to uptake of use of the language are, discuss their solutions. 100 mails in 4 days is light traffic afaik. But im fairly new to this, so ill bow out. Still, I thought it was itneresting and informative discussion with lots of specific information. Where would you suggest discussion should have taken place? -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Feb 13 13:26:28 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Feb 2012 13:26:28 +0100 Subject: [Python-ideas] The concurrency discussion is off-topic! References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> <20120213131036.40aed9c2@pitrou.net> Message-ID: <20120213132628.6658bfa7@pitrou.net> Hello, > Still, I thought it was itneresting and informative discussion with > lots of specific information. Not really. The subjects discussed there, e.g. the GIL and multithreading, have already been rehashed countless times. Perhaps you may find them interesting if you haven't really followed the mailing-lists in the past. > Where would you suggest discussion should have taken place? Well, in that case, it should really have been /dev/null :( Regards Antoine. From christopherreay at gmail.com Mon Feb 13 13:44:58 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Mon, 13 Feb 2012 14:44:58 +0200 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <20120213132628.6658bfa7@pitrou.net> References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> <20120213131036.40aed9c2@pitrou.net> <20120213132628.6658bfa7@pitrou.net> Message-ID: lol why, then, are people with experience on the lists using this as a space to express themselves? -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Mon Feb 13 13:50:24 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 13 Feb 2012 12:50:24 +0000 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> Message-ID: On 13/02/2012 11:41, Christopher Reay wrote: > Its not like there is a huge amount of traffic in python-ideas > > But if its annoying people ill sign up to concurrency sig > > How many mailing lists do I need to sign up to to make sure Im not missing > something I might be intersted in. There are 326 listed under gmane.comp.python, please enjoy them all :) > > Email "Subject" fields, I find, are quite useful > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Cheers. Mark Lawrence. From solipsis at pitrou.net Mon Feb 13 14:01:16 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 13 Feb 2012 14:01:16 +0100 Subject: [Python-ideas] The concurrency discussion is off-topic! References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> <20120213131036.40aed9c2@pitrou.net> <20120213132628.6658bfa7@pitrou.net> Message-ID: <20120213140116.46469912@pitrou.net> On Mon, 13 Feb 2012 14:44:58 +0200 Christopher Reay wrote: > lol > > why, then, are people with experience on the lists using this as a space to > express themselves? Ask them. From christopherreay at gmail.com Mon Feb 13 14:07:59 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Mon, 13 Feb 2012 15:07:59 +0200 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <20120213140116.46469912@pitrou.net> References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> <20120213131036.40aed9c2@pitrou.net> <20120213132628.6658bfa7@pitrou.net> <20120213140116.46469912@pitrou.net> Message-ID: Hey, Sturla, Mike and other people who clearly know what they are talking about and this ecosystem, why are we using this space for this discussion? Felt all kind of warm and fuzzy and community like to me. On 13 February 2012 15:01, Antoine Pitrou wrote: > On Mon, 13 Feb 2012 14:44:58 +0200 > Christopher Reay > wrote: > > lol > > > > why, then, are people with experience on the lists using this as a space > to > > express themselves? > > Ask them. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Mon Feb 13 14:10:28 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Mon, 13 Feb 2012 21:10:28 +0800 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> Message-ID: Clearly more are required. On Feb 13, 2012 8:50 PM, "Mark Lawrence" wrote: > On 13/02/2012 11:41, Christopher Reay wrote: > >> Its not like there is a huge amount of traffic in python-ideas >> >> But if its annoying people ill sign up to concurrency sig >> >> How many mailing lists do I need to sign up to to make sure Im not missing >> something I might be intersted in. >> > > There are 326 listed under gmane.comp.python, please enjoy them all :) > > >> Email "Subject" fields, I find, are quite useful >> >> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/**mailman/listinfo/python-ideas >> > > -- > Cheers. > > Mark Lawrence. > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Mon Feb 13 14:36:58 2012 From: jnoller at gmail.com (Jesse Noller) Date: Mon, 13 Feb 2012 08:36:58 -0500 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> Message-ID: On Sunday, February 12, 2012 at 7:06 PM, Matt Joiner wrote: > +1, that list is dead > On Feb 13, 2012 6:33 AM, "Sturla Molden" wrote: > > Den 12.02.2012 23:21, skrev Mike Meyer: > > > Please take the concurrency discussion to: > > > > > > http://mail.python.org/mailman/listinfo/concurrency-sig > > > > It seems that list has nearly zero traffic. Why post to a list that nobody reads? > > > > Sturla That list is dead because no one posts to it with good ideas and brainstorming, or to discuss issues like this, not because it's dead. Do I need to put up a signpost that says "Oh god please come discuss patches and the like"? From christopherreay at gmail.com Mon Feb 13 14:50:34 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Mon, 13 Feb 2012 15:50:34 +0200 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> Message-ID: On 13 February 2012 15:36, Jesse Noller wrote: > > > On Sunday, February 12, 2012 at 7:06 PM, Matt Joiner wrote: > > > +1, that list is dead > > On Feb 13, 2012 6:33 AM, "Sturla Molden" sturla at molden.no)> wrote: > > > Den 12.02.2012 23:21, skrev Mike Meyer: > > > > Please take the concurrency discussion to: > > > > > > > > http://mail.python.org/mailman/listinfo/concurrency-sig > > > > > > It seems that list has nearly zero traffic. Why post to a list that > nobody reads? > > > > > > Sturla > That list is dead because no one posts to it with good ideas and > brainstorming, or to discuss issues like this, not because it's dead. Do I > need to put up a signpost that says "Oh god please come discuss patches and > the like"? > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Be prepared to have your predictions come true -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Feb 13 15:02:11 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 14 Feb 2012 01:02:11 +1100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F3917E3.10004@pearwood.info> Paul Moore wrote: > Maybe the answer is to have some form of encoding-detection function > in the standard library. It doesn't have to be 100% accurate, and it > certainly shouldn't be used anywhere by default, but it would be > available for people who want to do the right thing without > over-engineering things totally. Encoding guessers have their place, but they should only be used by those who know what they're getting themselves into. http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx Note that even Raymond Chen makes the classic error of conflating encodings (UTF-16) with Unicode. +0 on providing an encoding guesser, but -1 on making operate by default. -- Steven From techtonik at gmail.com Mon Feb 13 16:15:29 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 13 Feb 2012 17:15:29 +0200 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: <20120213131036.40aed9c2@pitrou.net> References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> <20120213131036.40aed9c2@pitrou.net> Message-ID: On Mon, Feb 13, 2012 at 3:10 PM, Antoine Pitrou wrote: > On Mon, 13 Feb 2012 13:41:57 +0200 > Christopher Reay > wrote: > > Its not like there is a huge amount of traffic in python-ideas > > Are you kidding? > > The idiotic "TIOBE -3%" discussion thread is probably in the hundred of > answers now. That's for a completely vacuous thread launched 4 days ago > by a well-known troll. > Well-known troll is +1 that proposal to write to an empty list sounds like "you're not welcome here with your multiprocessing". I guess you didn't mention that, so the problem probably that you can not handle the list traffic, and read all interesting Python ideas - not speaking about answering to all of them with your opinion. No problem with that either - nobody can. That's why there was proposal about Etherpad, which summaries would be as interesting for Python community as different blog posts from core devs. python-ideas is not a playground for people with an opinion. > It's a communication tool for core development. > Since many of (potential) core devs are not able to cope up with traffic in main lists, I'd propose to look for a better communication tool that at least allows easy selective subscription (like Google Groups) and makes sure interested parties have accessible instrument (for a reference to accessibility read Steve Yegge's rant at https://plus.google.com/112678702228711889851/posts/eVeouesvaVX) to subscribe and participate (search, tree with all mailing lists and one-button subscription). I've heard Pinax guys are rethinking their Tribes/Groups feature - https://groups.google.com/forum/?fromgroups#!topic/pinax-users/Wze7L2LlwjM- perhaps you should communicate with them. > Thanks P.S. I am not changing the subject of this thread to stop spawning another "completely vacuous thread". P.P.S. Too bad I can not be at PyCon this year. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ctb at msu.edu Mon Feb 13 16:18:54 2012 From: ctb at msu.edu (C. Titus Brown) Date: Mon, 13 Feb 2012 07:18:54 -0800 Subject: [Python-ideas] The concurrency discussion is off-topic! In-Reply-To: References: <20120212172105.2c98f820@bhuda.mired.org> <4F383E1C.90905@molden.no> <20120213023512.GE27683@idyll.org> <20120213131036.40aed9c2@pitrou.net> Message-ID: <20120213151854.GD15826@idyll.org> Please move any further discussion of both concurrency and how or where to discuss concurrency issues to concurrency-sig. Further discussion will be moderated. thanks all, --titus On Mon, Feb 13, 2012 at 05:15:29PM +0200, anatoly techtonik wrote: > On Mon, Feb 13, 2012 at 3:10 PM, Antoine Pitrou wrote: > > > On Mon, 13 Feb 2012 13:41:57 +0200 > > Christopher Reay > > wrote: > > > Its not like there is a huge amount of traffic in python-ideas > > > > Are you kidding? > > > > The idiotic "TIOBE -3%" discussion thread is probably in the hundred of > > answers now. That's for a completely vacuous thread launched 4 days ago > > by a well-known troll. > > > > Well-known troll is +1 that proposal to write to an empty list sounds like > "you're not welcome here with your multiprocessing". I guess you didn't > mention that, so the problem probably that you can not handle the list > traffic, and read all interesting Python ideas - not speaking about > answering to all of them with your opinion. No problem with that either - > nobody can. That's why there was proposal about Etherpad, which summaries > would be as interesting for Python community as different blog posts from > core devs. > > python-ideas is not a playground for people with an opinion. > > It's a communication tool for core development. > > > > Since many of (potential) core devs are not able to cope up with traffic in > main lists, I'd propose to look for a better communication tool that at > least allows easy selective subscription (like Google Groups) and makes > sure interested parties have accessible instrument (for a reference to > accessibility read Steve Yegge's rant at > https://plus.google.com/112678702228711889851/posts/eVeouesvaVX) to > subscribe and participate (search, tree with all mailing lists and > one-button subscription). I've heard Pinax guys are rethinking their > Tribes/Groups feature - > https://groups.google.com/forum/?fromgroups#!topic/pinax-users/Wze7L2LlwjM- > perhaps you should communicate with them. > > > > Thanks > > > P.S. I am not changing the subject of this thread to stop spawning another > "completely vacuous thread". > P.P.S. Too bad I can not be at PyCon this year. > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- C. Titus Brown, ctb at msu.edu From mwm at mired.org Mon Feb 13 17:15:43 2012 From: mwm at mired.org (Mike Meyer) Date: Mon, 13 Feb 2012 11:15:43 -0500 Subject: [Python-ideas] multiprocessing IPC In-Reply-To: <4F38221A.5050208@molden.no> References: <4F35A9EF.7030309@molden.no> <20120211205200.2667c68f@bhuda.mired.org> <4F3735F8.10607@molden.no> <4F37CE27.5070908@molden.no> <4F38221A.5050208@molden.no> Message-ID: On Sun, Feb 12, 2012 at 3:33 PM, Sturla Molden wrote: > Den 12.02.2012 16:20, skrev shibturn: >> But if his /tmp is a tmpfs file system (which it usually is on Linux) then >> I think it is entirely equivalent. ?Or he could create the file in /dev/shm >> instead. > It seems that on Linux /tmp is backed by shared memory. > Which sounds rather strange to a Windows user, as the raison d'etre for > tempfiles is temporary storage space that goes beyond physial RAM. That's what /tmp was created for on Unix as well. But we've since added virtual memory for that same purpose. Modern kernel virtual address spaces are bigger than disks, and the IO and VM subsystem buffer caches have similar performance, and may even share buffers. So the major difference between memory-backed and fs-backed /tmp is that an fs-backed one survives a reboot, which creates security issues on multiuser systems. In theory, you could create a file on a memory-backed /tmp that's bigger than any data structure your process can hold. But modern software tends to use /tmp for things that need to be shared between processes (unix-domain sockets, lock files, etc), and legacy software is usually quite happy with a few tens of megabytes on /tmp. So it's rather common for a systems per-process virtual address limit to be bigger than /tmp. References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <8739affh9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Feb 13, 2012 at 11:19 AM, Carl M. Johnson < cmjohnson.mailinglist at gmail.com> wrote: > > On Feb 12, 2012, at 10:50 PM, Christopher Reay wrote: > > > +1 for the URL in the exception. Well in all exceptions > > > > Bringing the language into the 21st century. > > Great entry points for learning about the language. > > That's not a bad idea. We might want to use some kind of URL shortener for > length and future proofing though. If the site changes, we can have > redirection of the short URLs updated. Something like http://pyth.on/e1234 --> > http://docs.python.org/library/exceptions.html > > I think we can use wiki.python.org/ for hosting exception specific content. E.g. http://wiki.python.org/moin/PrintFails needs a lot of love and care. Microsoft actually has documentation for every single compiler and linker error that ever existed. Not that we have the same amount of resources at our disposal, but it is a nice concept. Concerning the shortened url's - I'd go with trustworthiness over compactness - http://python.org/s1 or http://python.org/s/1 would be better than http://pyth.on/1 imo. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From christopherreay at gmail.com Mon Feb 13 19:08:18 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Mon, 13 Feb 2012 20:08:18 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <8739affh9j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Entry Points: Google: Natural Language user searches based on "intent of code" Module Name/Function names: user wants more details on something he already knows exists Exception Name: Great, finds you the exception definition just like any other Class name. Googling for "UnicodeEncodingError Python" gives me a link to the 2.7 documentation which says at the top "this is not yet updated for python 3" - I dont know how important this is Googling for "UnicodeEncodingError Python 3" gives http://docs.python.org/release/3.0.1/howto/unicode.html This is a great document. It explains encoding very well. The unicode tutorial doesnt mention anything about the terminal output encoding to STDOUT, and whilst this is obvious after a while, it is not always clear the printing to the terminal is the cause of the attempt to encode as ascii during a print statement. To some extent, the unicode tutorial doesnt have the practical specifics that are being discussed in this thread which is targetted at "learning curve into Python" I think the most important points here are: The exception knows what version of Python its from (which allows the language to make changes It would be nice to have a wiki type document targetted by the exception/error Sections like: - "Python Official Docs" - Murgh, Fix This NOW, Dont care how dirty - Contributed Docs we have none and loved/stack overflow etc... - Discussions from python-dev / python ideas - PEPs that apply The point is that Google cant be responsible for making sure all these sections are laid out, obvious correct or constant -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Mon Feb 13 22:53:40 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 14 Feb 2012 08:53:40 +1100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 13 February 2012 16:42, Stephen J. Turnbull wrote: > Paul Moore writes: > [Lots of stuff from Stephen that I agree with]. > > And that's even without all this foreign UTF-8 I get from the Unix > > guys :-) Apart from the blasted UTF-16, all of it's "ASCII most of > > the time". > > Indeed. Do you really see UTF-16 in files that you process with > Python? I've only had one real use-case (and it was Java, but could easily be Python). We wanted to be able to export settings as a CSV file to be opened in Excel, modified and then re-imported. Turns out that if you want to open non-ascii CSV files in Excel, they must be encoded as (IIRC) UTF-16LE (i.e. without a BOM). I think you can save as other encodings, but that's the only one you can reliably open. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Tue Feb 14 01:39:40 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 13 Feb 2012 19:39:40 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> References: <4F34E393.9020105@hotpy.org> <6774A8D7-548B-4651-8879-1158621157E5@gmail.com> Message-ID: On Fri, Feb 10, 2012 at 9:52 AM, Massimo Di Pierro wrote: > The GC vs reference counting (RC) is the hearth of the matter. > With RC every time a variable is allocated or deallocated you need > to lock the counter uh... if you need to lock it for allocation, that is an issue with the malloc, rather than refcounting. And if you need to lock it for deallocation, then your program already has a (possibly threading-race-condition-related) bug. The problem is that you need to lock the memory for writing every time you acquire or release a view of the object, even if you won't be modifying the object. (And this changing of the refcount makes copy-on-write copy too much.) There are plenty of ways around that, mostly by using thread-local (or process-local or machine-local) proxies; the original object only gets one incref/decref from each remote thread; if sharable objects are delegated to a memory-controller thread, even better. Once you have the infrastructure for this, you could also more easily support "permanent" objects like None. The catch is that the overhead of having the refcount+pointer (even without the proxies) instead of just "refcount 4 bytes ahead" turns out to be pretty high, so those forks (and extensions, if I remember pyro http://irmen.home.xs4all.nl/pyro/ correctly) never really caught on. Maybe that will change when the number of cores that aren't already in use for other processes really does skyrocket. -jJ From ethan at stoneleaf.us Tue Feb 14 03:17:38 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 13 Feb 2012 18:17:38 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120213060350.GA11284@idyll.org> References: <4F3563C4.2050703@egenix.com> <4F3575FB.60700@molden.no> <4F3839DB.7080804@molden.no> <20120212174253.156c3660@bhuda.mired.org> <20120212193620.65adad41@bhuda.mired.org> <20120213023640.GF27683@idyll.org> <20120213005740.0db1cb38@bhuda.mired.org> <20120213060350.GA11284@idyll.org> Message-ID: <4F39C442.5030206@stoneleaf.us> C. Titus Brown wrote: >> p.s. Why did you take a private e-mail response and reply to it to the group? Perhaps he thought it was private by mistake. I'm glad he did, though, as I am learning quite a bit from this rather rambling thread. ~Ethan~ From stephen at xemacs.org Tue Feb 14 09:02:16 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 14 Feb 2012 17:02:16 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > I'd hazard a guess that the non-ASCII compatible encoding mostly > likely to be encountered outside Asia is UTF-16. In other words, only people who insist on messing with application/octet-stream files (like Word ;-). They don't deserve the pain, but they're gonna feel it anyway. > The choice is really between "never give me UnicodeErrors, but feel > free to silently corrupt the data stream if I do the wrong thing > with that data" (i.e. "latin-1") Yes. > and "correctly handle any ASCII compatible encoding, but still > throw UnicodeEncodeError if I'm about to emit corrupted data" > ("ascii+surrogateescape"). Not if I understand what ascii+surrogateescape would do correctly. Yes, you can pass through verbatim, but AFAICS you would have to work quite hard to do anything to that stream that would cause a UnicodeError in your program, even though you corrupt it. (Eg, delete half of a multibyte EUC character.) The question is what happens if you run into a validating processor internally -- then you'll see an error (even though you're just passing it through verbatim!) From ncoghlan at gmail.com Tue Feb 14 09:45:24 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 14 Feb 2012 18:45:24 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Feb 14, 2012 at 6:02 PM, Stephen J. Turnbull wrote: > ?> and "correctly handle any ASCII compatible encoding, but still > ?> throw UnicodeEncodeError if I'm about to emit corrupted data" > ?> ("ascii+surrogateescape"). > > Not if I understand what ascii+surrogateescape would do correctly. > Yes, you can pass through verbatim, but AFAICS you would have to work > quite hard to do anything to that stream that would cause a > UnicodeError in your program, even though you corrupt it. ?(Eg, delete > half of a multibyte EUC character.) > > The question is what happens if you run into a validating processor > internally -- then you'll see an error (even though you're just > passing it through verbatim!) If you're only round-tripping (i.e. writing back out as "ascii+surrogateescape") it's very hard to corrupt your data stream with processing that assumes an ASCII compatible encoding (as you point out, you'd have to be splitting on arbitrary codepoints instead of searching for ASCII first). However, it's trivial to get an error when you go to encode the data stream without one of the silencing error handlers set. In particular, sys.stdout has error handling set to strict, which I believe is likely to throw UnicodeEncodeError if you try to feed a string containing surrogate escaped bytes to an encoding that can't handle them. (Of course, if sys.stdout.encoding is "UTF-8", then you're right, those characters will just be displayed as gibberish, as they would in the latin-1 case. I guess its only on Windows and in any other locations with a more restrictive default stdout encoding that errors are particularly likely). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Tue Feb 14 10:36:54 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 14 Feb 2012 18:36:54 +0900 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > Basically, In my experience, Windows users are not likely to produce > UTF-8 formatted files unless they make specific efforts to do so. Agreed. All I meant was that if you make the effort to do so, your Windows-based correspondents will be able to read it, and vice versa. > As I say: > - I know what to do > - It can be a lot of work > - Frankly, the damage is minor (these are usually personal or low-risk scripts) > - The temptation to say "stuff it" and get on with my life is high > - It frustrates me that Python by default tempts me to *not* do the right thing Please don't blame it on Python. Python tempts you because it offers the choice to do it right. There is no way that Python can do it right *for* you, not even all the resources Microsoft or Apple can bring to bear have managed to do it right (you can't get 100% even within an all-Windows or all-Mac shop, let alone cross-platform). Not yet; it requires your help. Thanks for caring! > Maybe it's different in Japan, where character sets are more of a > common knowledge issue? Mojibake is common knowledge in Japan; what to do about it requires a specialized technical background. > But if I tried to say to one of my colleagues that the spooled > output of a SQL query they sent me (from a database with one > encoding, through a client with no real encoding handling beyond > global OS-level defaults) didn't use UTF-8, I'd get a blank look at > best. Again, this is not the direction I have in mind (I'm thinking more in terms of the RightThinkingAmongUs using UTF-8 as much as possible, and whether the recipients will be able to read it -- AFAICT/IME they can), and you certainly shouldn't presume that your correspondents "should" "already" be using UTF-8. That would be seriously rude on Windows, where as you point out one has to do something rather contorted to produce UTF-8 in most applications. > What I was trying to say was that typical Windows environments (where > people don't interact often with Unix utilities, or if they do it's > with ASCII characters almost exclusively) hide the details of Unicode > from the end user to the extent that they don't know what's going on > under the hood, and don't need to care. Ah. If you're in a monolingual environment, yes, it works that way. But it works just well on Unix if you set LANG appropriately in your environment. > Much like Python 2, I guess :-) No, Python 2 is better and worse. Many protocols use magic numbers that look like ASCII-encoded English (eg, HTML tags). Python 2 is quite happy to process those magic numbers and the intervening content (as long as each stretch of non-ASCII is treated as an atomic unit), regardless of whether actual encoding matches local convention. (This is why the WSGI guys love Python 2 -- it can be multilingual without knowing the encoding!) On the other hand, the Windows environment will be more seamless (and allow useful processing of the "intervening content") as long as you stick to the local convention for encoding. From cmjohnson.mailinglist at gmail.com Tue Feb 14 12:39:42 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Tue, 14 Feb 2012 01:39:42 -1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> On Feb 13, 2012, at 10:45 PM, Nick Coghlan wrote: > (Of course, if sys.stdout.encoding is "UTF-8", then you're right, those > characters will just be displayed as gibberish, as they would in the > latin-1 case. I guess its only on Windows and in any other locations > with a more restrictive default stdout encoding that errors are > particularly likely). I don't think that's right. I think that by default Python refuses to turn surrogate characters into UTF-8: >>> bytes(range(256)).decode("ascii", errors="surrogateescape") '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\udc80\udc81\udc82\udc83\udc84\udc85\udc86\udc87\udc88\udc89\udc8a\udc8b\udc8c\udc8d\udc8e\udc8f\udc90\udc91\udc92\udc93\udc94\udc95\udc96\udc97\udc98\udc99\udc9a\udc9b\udc9c\udc9d\udc9e\udc9f\udca0\udca1\udca2\udca3\udca4\udca5\udca6\udca7\udca8\udca9\udcaa\udcab\udcac\udcad\udcae\udcaf\udcb0\udcb1\udcb2\udcb3\udcb4\udcb5\udcb6\udcb7\udcb8\udcb9\udcba\udcbb\udcbc\udcbd\udcbe\udcbf\udcc0\udcc1\udcc2\udcc3\udcc4\udcc5\udcc6\udcc7\udcc8\udcc9\udcca\udccb\udccc\udccd\udcce\udccf\udcd0\udcd1\udcd2\udcd3\udcd4\udcd5\udcd6\udcd7\udcd8\udcd9\udcda\udcdb\udcdc\udcdd\udcde\udcdf\udce0\udce1\udce2\udce3\udce4\udce5\udce6\udce7\udce8\udce9\udcea\udceb\udcec\udced\udcee\udcef\udcf0\udcf1\udcf2\udcf3\udcf4\udcf5\udcf6\udcf7\udcf8\udcf9\udcfa\udcfb\udcfc\udcfd\udcfe\udcff' >>> _.encode("utf-8") Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 128: surrogates not allowed >>> _.encode("utf-8", errors="surrogateescape") b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' OK, so concrete proposals: update the docs and maybe make a synonym for Latin-1 that makes it more semantically obvious that you're not really using it as Latin-1, just as a easy to pass through encoding. Anything else? Any bike shedding on the synonym? -- Carl Johnson From christopherreay at gmail.com Tue Feb 14 11:37:44 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Tue, 14 Feb 2012 12:37:44 +0200 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Web browsers can parse pages with multiple encodings seemingly perfectly into the correct display characters. A quick copy and paste produces UTF-8 encoded text in the clip board. (on linux) HOW DO THEY DO IT.. can we have their libraries? :) Some of the web pages I tried decoding made be pull my hair out. One I just cancelled the client. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyideas at rebertia.com Tue Feb 14 15:38:04 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Tue, 14 Feb 2012 06:38:04 -0800 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Feb 14, 2012 at 2:37 AM, Christopher Reay wrote: > Web browsers can parse pages with multiple encodings seemingly perfectly > into the correct display characters. A quick copy and paste produces UTF-8 > encoded text in the clip board. (on linux) > > HOW DO THEY DO IT.. can we have their libraries? :) The "chardet" package is in fact a port of Mozilla's encoding guessing code. Cheers, Chris From dirkjan at ochtman.nl Tue Feb 14 15:54:50 2012 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 14 Feb 2012 15:54:50 +0100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Feb 14, 2012 at 15:38, Chris Rebert wrote: > The "chardet" package is in fact a port of Mozilla's encoding guessing code. I thought at some point that it would be useful to have in the stdlib (I still do). It's already fairly successful on PyPI, after all, and it's very helpful when dealing with text of unknown character encoding. However, there are licensing issues. At one point I asked Van Lindberg to look into that... He forwarded me some email between him and Mozilla guys about this, but it was not yet conclusive. Cheers, Dirkjan From jimjjewett at gmail.com Tue Feb 14 21:04:05 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 14 Feb 2012 15:04:05 -0500 Subject: [Python-ideas] Unicode surrogateescape [was: Re: Python 3000 TIOBE -3%] Message-ID: On Mon, Feb 13, 2012 at 12:12 AM, Stephen J. Turnbull wrote: > Paul Moore writes: > ?> I'm now 100% convinced that > ?> encoding="ascii",errors="surrogateescape" is the way to say this in > ?> code. > That may also be a good universal default for Python 3, as it will > pass through non-ASCII text unchanged, while raising an error if the > program tries to manipulate it (or hand it to a module that > validates). ?(encoding='latin-1' definitely is not a good default.) > But I'm not sure of that, and the current approach of using the > preferred system encoding is probably better. The preferred system encoding is indeed better than universal ASCII. But is there a good reason not to change the default errorhandler to errors="surrogateescape"? errors="strict" is already well-documented, and the sort of people most eager to reject (rather than ignore) bad data also tend to be explicit about their use of defaults. And if the barrier is only backwards-compatibility, is there any reason not to at least recommend a recipe of errors="surrogateescape" for cases where you expect ASCII, but want to round-trip other data just in case? -jJ From cmjohnson.mailinglist at gmail.com Tue Feb 14 21:17:06 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Tue, 14 Feb 2012 10:17:06 -1000 Subject: [Python-ideas] Unicode surrogateescape [was: Re: Python 3000 TIOBE -3%] In-Reply-To: References: Message-ID: <04A64366-3F31-40AF-9E84-FFB3C3C1E690@gmail.com> On Feb 14, 2012, at 10:04 AM, Jim Jewett wrote: > But is there a good reason not to change the default errorhandler to > errors="surrogateescape"? It's a conflict in the Zen: > Errors should never pass silently. > Unless explicitly silenced. OK, so default to strict. But: > Although practicality beats purity. Hmm, so maybe do use surrogates. Then again: > In the face of ambiguity, refuse the temptation to guess. Grr, I'm not nearly Dutch enough to make sense of this logical conflict! From jimjjewett at gmail.com Tue Feb 14 21:20:23 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 14 Feb 2012 15:20:23 -0500 Subject: [Python-ideas] Unicode surrogateescape [was: Re: Python 3000 TIOBE -3%] Message-ID: On Mon, Feb 13, 2012 at 12:16 AM, Nick Coghlan wrote: > Really, Python 3 forces programmers ... > to make the choice between the 4 possible options for > processing ASCII-compatible encodings: > 1. Process them as binary data. [Code smell from lying; lots of pain from mismatch with external libraries.] > 2. Process them as "latin-1". [Code smell from lying; non-ASCII often turns to gibberish.] > 3. Process them as "ascii+surrogateescape". This is the *right* > answer if you plan solely to manipulate the text and then write it back > out in the same encoding as was originally received. [Note that the original "encoding" may well be internally inconsistent; I've often seen that in log files.] > You will get errors if you try to write a string with escaped > characters out to a non-ascii channel or an ascii channel > without surrogateescape enabled. ... (e.g. sys.stdout) Is there any reason not to enable surrogate escape by default? At least on the console/terminal? I can see an argument for replace or xmlcharreplace or something more complicated, but ... if I'm sending output to myself, I would rather see it (possibly with a mark indicating where it was corrupted) than to get my program aborted (strict) and *not* be told what data caused the problem. > 4. Get a third party encoding guessing library and use that instead of > waving away the problem of ASCII-incompatible encodings. And I do think this needs to stay 3rd-party; domain information matters, and n-gram guessing should not be subject to stability guarantees. -jJ From p.f.moore at gmail.com Tue Feb 14 22:08:09 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 14 Feb 2012 21:08:09 +0000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14 February 2012 09:36, Stephen J. Turnbull wrote: > ?> As I say: > ?> - I know what to do > ?> - It can be a lot of work > ?> - Frankly, the damage is minor (these are usually personal or low-risk scripts) > ?> - The temptation to say "stuff it" and get on with my life is high > ?> - It frustrates me that Python by default tempts me to *not* do the right thing > > Please don't blame it on Python. ?Python tempts you because it offers > the choice to do it right. ?There is no way that Python can do it > right *for* you, not even all the resources Microsoft or Apple can > bring to bear have managed to do it right (you can't get 100% even > within an all-Windows or all-Mac shop, let alone cross-platform). ?Not > yet; it requires your help. Point taken. I think my point is that I wish there was a more obvious way for me to tell Python that I just want to do it nearly right on this occasion (like "everything else" does) because I really don't need to care for now. I'm getting a lot closer to knowing how to do that as this thread progresses, though, which is why I think of this as more of an educational issue than anything else. Thinking about how I'd code something like "cat" naively in C (while ((i = getchar()) != EOF) { putchar(i); }), I guess encoding=latin1 is the way for Python to "work like everything else" in this context. So I suppose there's a question. Do we really want to document how to "do it wrong"? At first glance, obviously not. But if we don't, it seems that the "Python 3 forces you to know Unicode" meme thrives, and we keep getting bad press. Maybe we could add a note to the open() documentation, something like the following: """To open a file, you need to know its encoding. This is not always obvious, depending on where the file came from, among other things. Other tools can process files without knowing the encoding by assuming the bytes of the file map 1-1 to the first 256 Unicode characters. This can cause issues such as mojibake or corrupted data, but for casual use is sometimes sufficient. To get this behaviour in Python (with all the same risks and problems) you can use the "latin1" encoding, which maps bytes to unicode as described above. It is far, far better to use the correct encoding declaration, if at all possible, however.""" I have no real opinion on whether this is the right thing to do. Unfortunately (in a sense :-)) it doesn't matter much to me any more, as I now have the benefit of learning from this thread, so I'm no longer in the target audience of the comment :-) > Thanks for caring! Thanks for helping me learn! Paul From p.f.moore at gmail.com Tue Feb 14 22:10:45 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 14 Feb 2012 21:10:45 +0000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14 February 2012 14:38, Chris Rebert wrote: > On Tue, Feb 14, 2012 at 2:37 AM, Christopher Reay > wrote: >> Web browsers can parse pages with multiple encodings seemingly perfectly >> into the correct display characters. A quick copy and paste produces UTF-8 >> encoded text in the clip board. (on linux) >> >> HOW DO THEY DO IT.. can we have their libraries? :) > > The "chardet" package is in fact a port of Mozilla's encoding guessing code. It seems to be Python 2 only. "Dive into Python 3" describes porting it to Python 3, but I don't know of an actual Python 3 version. Paul From dirkjan at ochtman.nl Tue Feb 14 22:34:31 2012 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 14 Feb 2012 22:34:31 +0100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: (adding back the list...) On Tue, Feb 14, 2012 at 21:55, Jim Jewett wrote: > As useful as it could be, it needs to clearly be an "external > standard", and should probably stay external. ?I don't want a > situation where "Hey, this file was detected differently in Py3.4 and > Py3.5" is a regression bug, or there will never be room for > improvements. Well, there have been *very* limited functional changes in the Mozilla tree since about 2008, so I don't think there would be a great many changes. And there's a large test suite to make sure regressions are unlikely. I still think the benefit for the simple cases is tremendous: a simple one-liner that finds the correct encoding in many of the cases. There's also very little API surface. Also, there is an actual 3.x port, I have it installed... should be in the same tarball. Cheers, Dirkjan From jimjjewett at gmail.com Tue Feb 14 22:43:50 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 14 Feb 2012 16:43:50 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> Message-ID: On Tue, Feb 14, 2012 at 6:39 AM, Carl M. Johnson wrote: > OK, so concrete proposals: update the docs and maybe make a > synonym for Latin-1 that makes it more semantically obvious that > you're not really using it as Latin-1, just as a easy to pass through > encoding. Anything else? Any bike shedding on the synonym? encoding="ascii-ish" # gets the sloppyness right encoding="passthrough" # I would like "ignore", if it wouldn't cause confusion with the errorhandler encoding="binpass" encoding="rawbytes" -jJ From tjreedy at udel.edu Tue Feb 14 23:29:24 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 14 Feb 2012 17:29:24 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> Message-ID: On 2/14/2012 6:39 AM, Carl M. Johnson wrote: > > On Feb 13, 2012, at 10:45 PM, Nick Coghlan wrote: > >> (Of course, if sys.stdout.encoding is "UTF-8", then you're right, those >> characters will just be displayed as gibberish, as they would in the >> latin-1 case. I guess its only on Windows and in any other locations >> with a more restrictive default stdout encoding that errors are >> particularly likely). > > I don't think that's right. I think that by default Python refuses to turn surrogate characters into UTF-8: > >>>> bytes(range(256)).decode("ascii", errors="surrogateescape") > '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\udc80\udc81\udc82\udc83\udc84\udc85\udc86\udc87\udc88\udc89\udc8a\udc8b\udc8c\udc8d\udc8e\udc8f\udc90\udc91\udc92\udc93\udc94\udc95\udc96\udc97\udc98\udc99\udc9a\udc9b\udc9c\udc9d\udc9e\udc9f\udca0\udca1\udca2\udca3\udca4\udca5\udca6\udca7\udca8\udca9\udcaa\udcab\udcac\udcad\udcae\udcaf\udcb0\udcb1\udcb2\udcb3\udcb4\udcb5\udcb6\udcb7\udcb8\udcb9\udcba\udcbb\udcbc\udcbd\udcbe\udcbf\udcc0\udcc1\udcc2\udcc3\udcc4\udcc5\udcc6\udcc7\udcc8\udcc9\udcca\udccb\udccc\udccd\udcce\udccf\udcd0\udcd1\udcd2\udcd3\udcd4\udcd5\udcd6\udcd7\udcd8\udcd9\udcda\udcdb\udcdc\udcdd\udcde\udcdf > \udce0\udce1\udce2\udce3\udce4\udce5\udce6\udce7\udce8\udce9\udcea\udceb\udcec\udced\udcee\udcef\udcf0\udcf1\udcf2\udcf3\udcf4\udcf5\udcf6\udcf7\udcf8\udcf9\udcfa\udcfb\udcfc\udcfd\udcfe\udcff' While this is a Py3 str object, it is not unicode. Unicode only only allows proper surrogate codeunit pairs. Py2 allowed mal-formed unicode objects and that was not changed in Py3 -- or 3.3. It seems appropriate that bytes that are meaningless to ascii should be translated to codeunits that are meaningless (by themselves) to unicode. >>>> _.encode("utf-8") > Traceback (most recent call last): > File "", line 1, in > UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 128: surrogates not allowed utf-8 only encodes proper unicode. >>>> _.encode("utf-8", errors="surrogateescape") > b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' The result is not utf-8 and it would be better not to use 'utf-8' instead of 'ascii' in the expression. The above encodes to ascii + uninterpreted high-bit-set bytes. >>> s=bytes(range(256)).decode("ascii", errors="surrogateescape") >>> u=s.encode("utf-8", errors="surrogateescape") >>> a=s.encode("ascii", errors="surrogateescape") >>> u == a True -- Terry Jan Reedy From python at mrabarnett.plus.com Tue Feb 14 23:55:49 2012 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 14 Feb 2012 22:55:49 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> Message-ID: <4F3AE675.6010907@mrabarnett.plus.com> On 14/02/2012 21:43, Jim Jewett wrote: > On Tue, Feb 14, 2012 at 6:39 AM, Carl M. Johnson > wrote: > >> OK, so concrete proposals: update the docs and maybe make a >> synonym for Latin-1 that makes it more semantically obvious that >> you're not really using it as Latin-1, just as a easy to pass through >> encoding. Anything else? Any bike shedding on the synonym? > > encoding="ascii-ish" # gets the sloppyness right > > encoding="passthrough" # I would like "ignore", if it wouldn't cause > confusion with the errorhandler > > encoding="binpass" > encoding="rawbytes" > encoding="mojibake" # :-) From barry at python.org Wed Feb 15 00:32:43 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 14 Feb 2012 18:32:43 -0500 Subject: [Python-ideas] Py3 unicode impositions References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120214183243.3567f413@resist.wooz.org> On Feb 14, 2012, at 09:10 PM, Paul Moore wrote: >> The "chardet" package is in fact a port of Mozilla's encoding guessing code. > >It seems to be Python 2 only. "Dive into Python 3" describes porting >it to Python 3, but I don't know of an actual Python 3 version. We have a python3-chardet package in both Debian and Ubuntu, so the upstream does support Python 3 afaik. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From steve at pearwood.info Wed Feb 15 00:35:11 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 15 Feb 2012 10:35:11 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3AE675.6010907@mrabarnett.plus.com> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> Message-ID: <4F3AEFAF.5060107@pearwood.info> MRAB wrote: > On 14/02/2012 21:43, Jim Jewett wrote: >> On Tue, Feb 14, 2012 at 6:39 AM, Carl M. Johnson >> wrote: >> >>> OK, so concrete proposals: update the docs and maybe make a >>> synonym for Latin-1 that makes it more semantically obvious that >>> you're not really using it as Latin-1, just as a easy to pass through >>> encoding. Anything else? Any bike shedding on the synonym? >> >> encoding="ascii-ish" # gets the sloppyness right >> encoding="passthrough" # I would like "ignore", if it wouldn't cause >> confusion with the errorhandler "Ignore" won't do. Ignore what? Everything? Don't actually run an encoder? That doesn't even make sense! "Passthrough" is bad too, because it perpetrates the idea that ASCII characters are "plain text" which are bytes. Unicode strings, even those that are purely ASCII, are not strings of bytes (except in the sense that every data structure is a string of bytes). You can't just "pass bytes through" to turn them into Unicode. >> encoding="binpass" >> encoding="rawbytes" >> > encoding="mojibake" # :-) You have a smiley, but I think that's the best name I've seen yet. It's explicit in what you get -- mojibake. The only downside is that it's a little obscure. Not everyone knows what mojibake is called, or calls it mojibake, although I suppose we could add aliases to other terms such as Buchstabensalat and Kr?henf??e if German users complain But remind me again, why are we doing this? If you have to teach people the recipe open(filename, encoding='mojibake') why not just teach them the very slightly more complex recipe open(filename, encoding='ascii', errors='surrogateescape') which captures the user's intent ("I want ASCII, with some way of escaping errors so I don't have to deal with them") much more accurately. Sometimes brevity is *not* a virtue. -- Steven From mwm at mired.org Wed Feb 15 00:50:44 2012 From: mwm at mired.org (Mike Meyer) Date: Tue, 14 Feb 2012 18:50:44 -0500 Subject: [Python-ideas] Adding shm_open to mmap? Message-ID: <20120214185044.4c5ee513@bhuda.mired.org> One of the issues that showed up during the overlong TIOBE- thread and spinoffs is that there's no portable way to get a named shared memory segment (as distinguished from a disk-backed file) using the mmap module. Most unix variants provide a memory-backed file system that works for this, but it's name changes from distro to distro and even installation to installation. It's not clear to me that non-Unix platforms provide such a file system. The Posix solution is shm_open, which accepts a name for rendezvous and returns a file descriptor suitable for passing to mmap. Passing the file descriptor to anything but fstat, ftruncate, close and mmap is undefined. We'd also need to add shm_unlink to remove the shared segment, as the object created by shm_open isn't necessarily visible in the file system name space. shm_open has five values that can be used in it's flags argument, but those are shared with open and already available in the os module. This seems like a slam-dunk to me, but... 1) Is there some reason not to just add these two functions? 2) Are there any supported platforms with mmap and without shm_open/unlink? 3) Is this simple enough that a PEP isn't needed, just a patch in an issue? Thanks, http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ncoghlan at gmail.com Wed Feb 15 01:02:20 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Feb 2012 10:02:20 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> Message-ID: On Tue, Feb 14, 2012 at 9:39 PM, Carl M. Johnson wrote: > > On Feb 13, 2012, at 10:45 PM, Nick Coghlan wrote: > >> (Of course, if sys.stdout.encoding is "UTF-8", then you're right, those >> characters will just be displayed as gibberish, as they would in the >> latin-1 case. I guess its only on Windows and in any other locations >> with a more restrictive default stdout encoding that errors are >> particularly likely). > > I don't think that's right. I think that by default Python refuses to turn surrogate characters into UTF-8: Oops, that's what I get for posting without testing :) Still, your example clearly illustrates the point I was trying to make - that using "ascii+surrogateescape" is less likely to silently corrupt the data stream than using "latin-1", because attempts to encode it under the "strict" error handler will generally fail, even for an otherwise universal encoding like UTF-8. > OK, so concrete proposals: update the docs and maybe make a synonym for Latin-1 that makes it more semantically obvious that you're not really using it as Latin-1, just as a easy to pass through encoding. Anything else? Any bike shedding on the synonym? I don't see any reason to obfuscate the use of "latin-1" as a workaround that maps 8-bit bytes directly to the corresponding Unicode code points. My proposal would be two-fold: Firstly, that we document three alternatives for working with arbitrary ASCII compatible encodings (from simplest to most flexible): 1. Use the "latin-1" encoding The latin-1 encoding accepts arbitrary binary data by mapping individual bytes directly to the first 256 Unicode code points. Thus, any sequence of bytes may be translated to a sequence of code points, effectively reproducing the behaviour of Python 2's 8-bit strings. If all data supplied is genuinely in an ASCII compatible encoding then this will work correctly. However, it fails badly if the supplied data is ever in an ASCII incompatible encoding, or if the decoded string is written back out using a different encoding. Using this option switches off *all* of Python 3's support for ensuring transcoding correctness - errors will frequently pass silently and result in corrupted output data rather than explicit exceptions. 2. Use the "ascii" encoding with the "surrogateescape" error handler This is the most correct approach that doesn't involve attempting to guess the string encoding. Behaviour if given data in an ASCII incompatible encoding is still unpredictable (and likely to result in data corruption). This approach retains most of Python 3's support for ensuring transcoding correctness, while still accepting any ASCII compatible encoding. If UnicodeEncodeErrors when displaying surrogate escaped strings are not desired, sys.stdout should also be updated to use the "backslashreplace" error handler. (see below) 3. Initially process the data as binary, using the "chardet" package from PyPI to guess the encoding This is the most correct option that can even cope with many ASCII incompatible encodings. Unfortunately, the chardet site is gone, since Mark Pilgrim took down his entire web presence. This (including the dead home page link from the PyPI entry) would need to be addressed before its use could be recommended in the official documentation (or, failing that, is there a properly documented alternative package available?) Secondly, that we make it easy to replace a TextIOWrapper with an equivalent wrapper that has only selected settings changed (e.g. encoding or errors). In 3.2, that is currently not possible, since the original "newline" argument is not made available as a public attribute. The closest we can get is to force universal newlines mode along with whatever other changes we want to make: old = sys.stdout sys.stdout = io.TextIOWrapper(old.buffer, old.encoding, "backslashreplace", None, old.line_buffering) 3.3 currently makes this even worse by accepting a "write_through" argument that isn't available for introspection. I propose that we make it possible to write the above as: sys.stdout = sys.stdout.rewrap(errors="backslashreplace") For the latter point, see http://bugs.python.org/issue14017 Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From p.f.moore at gmail.com Wed Feb 15 01:06:12 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 15 Feb 2012 00:06:12 +0000 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: <20120214183243.3567f413@resist.wooz.org> References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> <20120214183243.3567f413@resist.wooz.org> Message-ID: On 14 February 2012 23:32, Barry Warsaw wrote: > On Feb 14, 2012, at 09:10 PM, Paul Moore wrote: > >>> The "chardet" package is in fact a port of Mozilla's encoding guessing code. >> >>It seems to be Python 2 only. "Dive into Python 3" describes porting >>it to Python 3, but I don't know of an actual Python 3 version. > > We have a python3-chardet package in both Debian and Ubuntu, so the upstream > does support Python 3 afaik. Found it. There's a "chardet2" package. Paul From ncoghlan at gmail.com Wed Feb 15 01:07:23 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Feb 2012 10:07:23 +1000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <20120214185044.4c5ee513@bhuda.mired.org> References: <20120214185044.4c5ee513@bhuda.mired.org> Message-ID: On Wed, Feb 15, 2012 at 9:50 AM, Mike Meyer wrote: > This seems like a slam-dunk to me, but... > > 1) Is there some reason not to just add these two functions? Not that I can see. Make sure to add an "Availabilty: Unix" marker in the relevant docs, though. > 2) Are there any supported platforms with mmap and without > ? shm_open/unlink? The safest option is probably to add a configure check so we only expose these APIs when the underlying platform offers them. There's a *ton* of examples of such checks to copy from :) > 3) Is this simple enough that a PEP isn't needed, just a patch in an > ? issue? Just a tracker issue will be fine - we expose additional posix APIs all the time without a PEP. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From solipsis at pitrou.net Wed Feb 15 01:05:18 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 15 Feb 2012 01:05:18 +0100 Subject: [Python-ideas] Adding shm_open to mmap? References: <20120214185044.4c5ee513@bhuda.mired.org> Message-ID: <20120215010518.048e3da2@pitrou.net> On Tue, 14 Feb 2012 18:50:44 -0500 Mike Meyer wrote: > > This seems like a slam-dunk to me, but... > > 1) Is there some reason not to just add these two functions? > > 2) Are there any supported platforms with mmap and without > shm_open/unlink? > > 3) Is this simple enough that a PEP isn't needed, just a patch in an > issue? A patch is enough. Note that this functionality is already available under Windows (though not really advertised in our docs), through the `tagname` parameter to mmap.mmap(): >>> import mmap >>> f = mmap.mmap(-1, 4096, "mysharedmem") >>> f.write(b"some bytes") And in another session: >>> import mmap >>> f = mmap.mmap(-1, 4096, "mysharedmem") >>> f.read(10) b'some bytes' See http://docs.python.org/dev/library/mmap.html and http://msdn.microsoft.com/en-us/library/windows/desktop/aa366551%28v=vs.85%29.aspx Regards Antoine. From greg at krypto.org Wed Feb 15 01:10:29 2012 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 14 Feb 2012 16:10:29 -0800 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> <20120214183243.3567f413@resist.wooz.org> Message-ID: oh good, this long thread has already started talking about encoding detection packages. now I don't have to bring it up. :) I suggest we link to one or more of these from the Python docs to their pypi project pages as a suggestion for users that need to deal with the real world of legacy data files in a variety of undeclared format rather than the internet world of utf-8 or bust. At some point it might be interesting to have a library like this in the stdlib along with a common API for other compatible libraries but I'm not sure any are ready for such a consideration. Is their behavior stable or still learning based on new inputs? -gps From ben+python at benfinney.id.au Wed Feb 15 01:15:36 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 15 Feb 2012 11:15:36 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> Message-ID: <87haytyms7.fsf@benfinney.id.au> MRAB writes: > On 14/02/2012 21:43, Jim Jewett wrote: > > On Tue, Feb 14, 2012 at 6:39 AM, Carl M. Johnson > > wrote: > > > >> OK, so concrete proposals: update the docs and maybe make a > >> synonym for Latin-1 that makes it more semantically obvious that > >> you're not really using it as Latin-1, just as a easy to pass through > >> encoding. Anything else? Any bike shedding on the synonym? [?] > encoding="mojibake" # :-) +1 If people want to remain wilfully ignorant of text encoding in the third millennium of our calendar, then a name like ?mojibake? is clear about what they'll get, and will perhaps be publicly embarrassing enough that some proportion of programmers will decide to reduce their ignorance and use a specific encoding instead. -- \ ?Science is a way of trying not to fool yourself. The first | `\ principle is that you must not fool yourself, and you are the | _o__) easiest person to fool.? ?Richard P. Feynman, 1964 | Ben Finney From python at mrabarnett.plus.com Wed Feb 15 01:44:19 2012 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 15 Feb 2012 00:44:19 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3AEFAF.5060107@pearwood.info> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> Message-ID: <4F3AFFE3.6070305@mrabarnett.plus.com> On 14/02/2012 23:35, Steven D'Aprano wrote: > MRAB wrote: >> On 14/02/2012 21:43, Jim Jewett wrote: >>> On Tue, Feb 14, 2012 at 6:39 AM, Carl M. Johnson >>> wrote: >>> >>>> OK, so concrete proposals: update the docs and maybe make a >>>> synonym for Latin-1 that makes it more semantically obvious that >>>> you're not really using it as Latin-1, just as a easy to pass through >>>> encoding. Anything else? Any bike shedding on the synonym? >>> >>> encoding="ascii-ish" # gets the sloppyness right >>> encoding="passthrough" # I would like "ignore", if it wouldn't cause >>> confusion with the errorhandler > > "Ignore" won't do. Ignore what? Everything? Don't actually run an encoder? > That doesn't even make sense! > > "Passthrough" is bad too, because it perpetrates the idea that ASCII > characters are "plain text" which are bytes. Unicode strings, even those that > are purely ASCII, are not strings of bytes (except in the sense that every > data structure is a string of bytes). You can't just "pass bytes through" to > turn them into Unicode. > > >>> encoding="binpass" >>> encoding="rawbytes" >>> >> encoding="mojibake" # :-) > > You have a smiley, but I think that's the best name I've seen yet. It's > explicit in what you get -- mojibake. > > The only downside is that it's a little obscure. Not everyone knows what > mojibake is called, or calls it mojibake, although I suppose we could add > aliases to other terms such as Buchstabensalat and Kr?henf??e if German users > complain > Alternatively, "vreemdetekens" or "alfabetsoep"... > But remind me again, why are we doing this? If you have to teach people the > recipe > > open(filename, encoding='mojibake') > > why not just teach them the very slightly more complex recipe > > open(filename, encoding='ascii', errors='surrogateescape') > > which captures the user's intent ("I want ASCII, with some way of escaping > errors so I don't have to deal with them") much more accurately. Sometimes > brevity is *not* a virtue. > From grosser.meister.morti at gmx.net Wed Feb 15 01:46:58 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Wed, 15 Feb 2012 01:46:58 +0100 Subject: [Python-ideas] map iterator In-Reply-To: References: Message-ID: <4F3B0082.8040403@gmx.net> On 02/09/2012 05:48 PM, Jerry Hill wrote: > On Thu, Feb 9, 2012 at 11:40 AM, Edward Lesmes > wrote: > > An iterator version of map should be available for large sets of data. > > > The python time machine strikes again. In python 2, this is available as itertools.imap. In python > 3, this is the default behavior of the map() function. > Same goes for zip, by the way. From shibturn at gmail.com Wed Feb 15 02:00:30 2012 From: shibturn at gmail.com (shibturn) Date: Wed, 15 Feb 2012 01:00:30 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <20120215010518.048e3da2@pitrou.net> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120215010518.048e3da2@pitrou.net> Message-ID: On 15/02/2012 12:05am, Antoine Pitrou wrote: > A patch is enough. > > Note that this functionality is already available under Windows > (though not really advertised in our docs), through the `tagname` > parameter to mmap.mmap(): > >>>> import mmap >>>> f = mmap.mmap(-1, 4096, "mysharedmem") >>>> f.write(b"some bytes") > > And in another session: > >>>> import mmap >>>> f = mmap.mmap(-1, 4096, "mysharedmem") >>>> f.read(10) > b'some bytes' It's not quite the same functionality since the lifetime of tagnamed mmaps is managed through handle refcounting. In some cases that is an advantage compared to open()/unlink(), and in others a disadvantage. Also, a problem with tagname is that there is no way to check whether the returned mmap was created by another process -- unless you resort to something like undocumented like from _multiprocessing import win32 f = mmap.mmap(-1, 4096, "mysharedmem") if win32.GetLastError() == win32.ERROR_ALREADY_EXISTS: raise ValueError('tagname already exists') sbt From stephen at xemacs.org Wed Feb 15 02:27:48 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Feb 2012 10:27:48 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87fwecopgr.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > If you're only round-tripping (i.e. writing back out as > "ascii+surrogateescape") This is the only case that makes sense in this thread. We're talking about people coming from Python 2 who want an encoding-agnostic way to script ASCII-oriented operations for an ASCII-compatible environment, and not to learn about encodings at all. While my opinions on this are (probably obviously) informed by the WSGI discussion, this is not about making life come up roses for the WSGI folks. They work in a sewer; life stinks for them, and all they can do about it is to hold their noses. This thread is about people who are not trying to handle sewage in a sanitary fashion, rather just cook a meal and ignore the occasional hairs that inevitably fall in. > However, it's trivial to get an error when you go to encode the data > stream without one of the silencing error handlers set. Sure, but getting errors is for people who want to learn how to do it right, not for people who just need to get a job done. Cf. the fevered opposition to giving "import cElementTree" a DeprecationWarning. > In particular, sys.stdout has error handling set to strict, which I > believe is likely to throw UnicodeEncodeError if you try to feed a > string containing surrogate escaped bytes to an encoding that can't > handle them. No, it should *always* throw a UnicodeEncodeError, because there are *no* encodings that can handle them -- they're not characters, so they can't be encoded. > (Of course, if sys.stdout.encoding is "UTF-8", then you're right, > those characters will just be displayed as gibberish, No, they will raise UnicodeEncodeError; that's why surrogateescape was invented, to work around the problem of what to do with bytes that the programmer knows are meaningful to somebody, but do not represent characters as far as Python can know: wideload:~ 10:06$ python3.2 Python 3.2 (r32:88445, Mar 20 2011, 01:56:57) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> s = b'\xff\xff'.decode('utf-8', errors='surrogateescape') >>> s.encode('utf-8',errors='strict') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 0: surrogates not allowed >>> The reason I advocate 'latin-1' (preferably under an appropriate alias) is that you simply can't be sure that those surrogates won't be passed to some module that decides to emit information about them somewhere (eg, a warning or logging) -- without the protection of a "silencing error handler". Bang-bang! Python's silver hammer comes down upon your head! From ben+python at benfinney.id.au Wed Feb 15 02:39:10 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 15 Feb 2012 12:39:10 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <87fwecopgr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <874nuszxhd.fsf@benfinney.id.au> "Stephen J. Turnbull" writes: > [?] the WSGI folks. They work in a sewer; life stinks for them, and > all they can do about it is to hold their noses. This thread is about > people who are not trying to handle sewage in a sanitary fashion, > rather just cook a meal and ignore the occasional hairs that > inevitably fall in. [?] > [?] some module that decides to emit information about them somewhere > (eg, a warning or logging) -- without the protection of a "silencing > error handler". Bang-bang! Python's silver hammer comes down upon your > head! You have made me feel strange emotions with this message. I don't know what they are, but a combination of ?sickened? and ?admiring? and ?nostalgia?, with a pinch of fear, seems close. Maybe this is what it's like to read poetry. -- \ ?[Entrenched media corporations will] maintain the status quo, | `\ or die trying. Either is better than actually WORKING for a | _o__) living.? ?ringsnake.livejournal.com, 2007-11-12 | Ben Finney From mwm at mired.org Wed Feb 15 03:25:39 2012 From: mwm at mired.org (Mike Meyer) Date: Tue, 14 Feb 2012 21:25:39 -0500 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: References: <20120214185044.4c5ee513@bhuda.mired.org> Message-ID: <20120214212539.7c5ffdef@bhuda.mired.org> On Wed, 15 Feb 2012 10:07:23 +1000 Nick Coghlan wrote: > On Wed, Feb 15, 2012 at 9:50 AM, Mike Meyer wrote: > > This seems like a slam-dunk to me, but... > > 1) Is there some reason not to just add these two functions? > Not that I can see. Make sure to add an "Availabilty: Unix" marker in > the relevant docs, though. I thought Windows was a Posix system? As such, it should have shm_open and shm_unlink, so the market wouldn't be appropriate. Thanks, http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From stephen at xemacs.org Wed Feb 15 03:43:41 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Feb 2012 11:43:41 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3AEFAF.5060107@pearwood.info> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> Message-ID: <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > MRAB wrote: > >> encoding="ascii-ish" # gets the sloppyness right +0.8 I'd prefer the more precise "ascii-compatible". Shift JIS is "ASCII-ish", but should not be decoded with this codec. > > encoding="mojibake" # :-) > > You have a smiley, but I think that's the best name I've seen yet. It's > explicit in what you get -- mojibake. Explicit, but incorrect. Mojibake ("bake" means "change") is what you get when you use one encoding to encode characters, and another to decode them. Here, not only are we talking about using the same codec at both ends, but in fact it's inside out (we are decoding then encoding). This is GIGO, not mojibake. > why not just teach them the very slightly more complex recipe > > open(filename, encoding='ascii', errors='surrogateescape') > > which captures the user's intent ("I want ASCII, with some way of > escaping errors so I don't have to deal with them") much more > accurately. Why not? Because 'surrogateescape' does not express the user's intent. That user *will* have to deal with errors as soon as she invokes modules that validate their input, or include some portion of the text being treated in output of any kind, unless they use an error-suppressing handler themselves. Surrogates are errors in Unicode, and that's the way it should be. That's precisely why Martin felt it necessary to use this technique in PEP 383: to ensure that errors *will* occur unless you are very careful in handling strings produced with the surrogateescape handler active. It's arguable that most applications *should* want errors in these cases; I've made that argument myself. But it's quite clearly not the user's intent. From ncoghlan at gmail.com Wed Feb 15 04:07:54 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Feb 2012 13:07:54 +1000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <20120214212539.7c5ffdef@bhuda.mired.org> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> Message-ID: On Wed, Feb 15, 2012 at 12:25 PM, Mike Meyer wrote: > On Wed, 15 Feb 2012 10:07:23 +1000 > Nick Coghlan wrote: > >> On Wed, Feb 15, 2012 at 9:50 AM, Mike Meyer wrote: >> > This seems like a slam-dunk to me, but... >> > 1) Is there some reason not to just add these two functions? >> Not that I can see. Make sure to add an "Availabilty: Unix" marker in >> the relevant docs, though. > > I thought Windows was a Posix system? Not as far as I am aware - if it was, Cygwin wouldn't be needed as a compatibility layer to get POSIX software running. To get them to work properly on Windows, many modules that interface with the OS have to use the win32 API directly rather than relying on the native implementations of the POSIX APIs. > As such, it should have shm_open > and shm_unlink, so the market wouldn't be appropriate. In this case, it sounds like Windows may already have a roughly equivalent mechanism in mmap, so cross-platform support may be feasible. If that's the case, a marker won't be needed. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Wed Feb 15 04:22:02 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Feb 2012 13:22:02 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Feb 15, 2012 at 12:43 PM, Stephen J. Turnbull wrote: > It's arguable that most applications *should* want errors in these > cases; I've made that argument myself. ?But it's quite clearly not the > user's intent. However, from a correctness point of view, it's a big step up from just saying "latin-1" (which effectively turns off *all* of the additional encoding related sanity checking Python 3 offers over Python 2). For many "I don't care about Unicode" use cases, using "ascii+surrogateescape" for your own I/O and setting "backslashreplace" on sys.stdout should cover you (and any exceptions you get will be warning you about cases where your original assumptions about not caring about Unicode validity have been proven wrong). If the logging module doesn't do it already, it should probably be defaulting to backslashreplace when encoding messages, too (for the same reason sys.stderr already defaults to that - you don't want your error reporting system failing to encode corrupted Unicode data). sys.stdin and sys.stdout are different due to the role they play in pipeline processing - for those, locale.getpreferredencoding()+"strict" is a more reasonable default (but we should make it easy to replace them with something more specific for a given application, hence http://bugs.python.org/issue14017) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mwm at mired.org Wed Feb 15 05:10:11 2012 From: mwm at mired.org (Mike Meyer) Date: Tue, 14 Feb 2012 23:10:11 -0500 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> Message-ID: <20120214231011.6fce4b3b@bhuda.mired.org> On Wed, 15 Feb 2012 13:07:54 +1000 Nick Coghlan wrote: > On Wed, Feb 15, 2012 at 12:25 PM, Mike Meyer wrote: > > On Wed, 15 Feb 2012 10:07:23 +1000 > > Nick Coghlan wrote: > > As such, it should have shm_open > > and shm_unlink, so the market wouldn't be appropriate. > In this case, it sounds like Windows may already have a roughly > equivalent mechanism in mmap, so cross-platform support may be > feasible. If that's the case, a marker won't be needed. The "tagname" feature in the windows version uses ref counting to free the shared segment when no one is using it. shm_open requires someone to call shm_unlink, but doesn't actually remove it until there are no more references to it. However you can't shm_open it again after shm_unlink'ing (expected on Unix, and verified on my FBSD box). We could sorta-kinda emulate the windows "tagname" behavior using shm_open. I'd prefer to provide shm_open on Windows if at all possible. The "sorta-kinda" bothers me. That would also allow for an application to exit and then resume work stored in a mapped segment (something I've done before). However, setting this up on Windows isn't something I can do. Thanks, http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From stephen at xemacs.org Wed Feb 15 05:12:58 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Feb 2012 13:12:58 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > using "ascii+surrogateescape" for your own I/O and setting > "backslashreplace" on sys.stdout should cover you (and any > exceptions you get will be warning you about cases where your > original assumptions about not caring about Unicode validity have > been proven wrong). Are you saying you know more than the user about her application? > If the logging module doesn't do it already, it should probably be > defaulting to backslashreplace when encoding messages, too See, *you* don't know whether it will raise, either, and that about an important stdlib module. Why should somebody who is not already a Unicode geek and is just using a module they've downloaded off of PyPI be required to audit its IO foibles? Really, I think use of 'latin1' in this context is covered by "consenting adults." We *should* provide an alias that says "all we know about this string is that the ASCII codes represent ASCII characters," and document that even if your own code is ASCII compatible (ie, treats runs of non-ASCII as opaque, atomic blobs), third party modules may corrupt the text. And use the word "corrupt"; all UnicodelyRightThinking folks will run away screaming. That statement about corrupting text is true in Python 2, and pre-PEP-393 Python 3, anyway (on Windows and UCS-2 builds elsewhere), you know, since they can silently slice a surrogate pair in half. From cmjohnson.mailinglist at gmail.com Wed Feb 15 06:03:10 2012 From: cmjohnson.mailinglist at gmail.com (Carl M. Johnson) Date: Tue, 14 Feb 2012 19:03:10 -1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <16F229EE-4018-4E81-962A-8D48036F194F@gmail.com> If I can I would like to offer one argument for surrogateescape over latin-1 as the newbie approach. Suppose I am naively processing text files to create a webpage and one of my filters is a "smart quotes" filter to change "" to ??. Of course, there's no way to smarten quotes up if you don't know the encoding of your input or output files; you'll just make a mess. In this situation, Latin-1 lets you mojibake it up. If your input turns out not to have been Latin-1, the final result will be corrupted by the quote smartener. On the other hand, if you use encoding="ascii", errors="surrogateescape" Python will complain, because the smart quotes being added aren't ascii. In other words, the surrogate escape force naive users to stick to ASCII unless they can determine what encoding they want to use for their input/output. It's not perfect, but I think it strikes a better balance than letting the users shoot themselves in the foot. From ncoghlan at gmail.com Wed Feb 15 07:58:55 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Feb 2012 16:58:55 +1000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <20120214231011.6fce4b3b@bhuda.mired.org> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> Message-ID: On Wed, Feb 15, 2012 at 2:10 PM, Mike Meyer wrote: > I'd prefer to provide shm_open on Windows if at all possible. The > "sorta-kinda" bothers me. That would also allow for an application to > exit and then resume work stored in a mapped segment (something I've > done before). However, setting this up on Windows isn't something I > can do. That's the purpose of the "Availability" markers in the docs - to allow a POSIX implementation to be added directly, then, if it's confirmed to work on Windows, or someone implements the necessary additional parts to make it work, the Availability restriction can be dropped. The OS interface on Windows is just too different for us to gate all OS service additions on having a working Windows version of the feature. (It's not *ideal* when that happens, of course, but it's a practical concession to the fact that our pool of Windows developers is significantly smaller than our pool of *nix and OS X developers). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Wed Feb 15 08:46:18 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 15 Feb 2012 16:46:18 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <16F229EE-4018-4E81-962A-8D48036F194F@gmail.com> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> <16F229EE-4018-4E81-962A-8D48036F194F@gmail.com> Message-ID: <87aa4ko7xx.fsf@uwakimon.sk.tsukuba.ac.jp> Carl M. Johnson writes: > If I can I would like to offer one argument for surrogateescape > over latin-1 as the newbie approach. This isn't the newbie approach. What should be recommended to newbies is to use the default (which is locale-dependent, and therefore "usually" "good enough"), and live with the risk of occasional exceptions. If they get exceptions, or must avoid exceptions, learn about encodings or consult with someone who already knows.[1] *Neither* of the approaches discussed here is reliable for tasks like automatically processing email or uploaded files on the web, and neither should be recommended to people who aren't already used to encoding-agnostic processing in the Python 2 "str" style. So, now that you mention "newbies", I don't know what other people are discussing, but what I've been discussing here is an approach for people who are comfortable working around (or never experience!) the defects of Python 2's ASCII-compatible approach to handling varied encodings in a single program, and want a workalike for Python 3. The choice between the two is task-dependent. The encoding='latin1' method is for tasks where a little mojibake can be tolerated, but an exception would stop the show. The errors='surrogateencoding' method is for tasks where any mojibake at all is a disaster, but occasional exceptions can be handled as they arise. Footnotes: [1] When this damned term is over in a few weeks, I'll take a look at the tutorial-level docs and see if I can come up with a gentle approach for those who are finding out for the first time that the locale-dependent default isn't good enough for them. From ncoghlan at gmail.com Wed Feb 15 09:03:03 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 15 Feb 2012 18:03:03 +1000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Feb 15, 2012 at 2:12 PM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > ?> using "ascii+surrogateescape" for your own I/O and setting > ?> "backslashreplace" on sys.stdout should cover you (and any > ?> exceptions you get will be warning you about cases where your > ?> original assumptions about not caring about Unicode validity have > ?> been proven wrong). > > Are you saying you know more than the user about her application? No, I'm merely saying that at least 3 options (latin-1, ascii+surrogateescape, chardet2) should be presented clearly to beginners and the trade-offs explained. For example: Task: Process data in any ASCII compatible encoding Unicode Awareness Care Factor: None Approach: Specify encoding="latin-1" Bytes/bytearray: data.decode("latin-1") Text files: open(fname, encoding="latin-1") Stdin replacement: sys.stdin = io.TextIOWrapper(sys.stdin.buffer, "latin-1") Stdout replacement (pipeline): sys.stdout = io.TextIOWrapper(sys.stdout.buffer, "latin-1", line_buffered=True) Stdout replacement (terminal): Leave it alone By decoding with latin-1, an application won't get *any* Unicode decoding errors, as that encoding maps byte values directly to the first 256 Unicode code points. However, any output data generated by that application *will* be corrupted if the assumption of ASCII compatibility are violated, or if implicit transcoding to any encoding other than "latin-1" occurs (e.g. when writing to sys.stdout or a log file, communicating over a network socket or serialising the string the json module). This is the closest Python 3 comes to emulating the permissive behaviour of Python 2's 8-bit strings (implicit interoperation with byte sequences is still disallowed). Task: Process data in any ASCII compatible encoding Unicode Awareness Care Factor: Minimal Approach: Use encoding="ascii" and errors="surrogateescape" (or, alternatively, errors="backslashreplace" for sys.stdout) Bytes/bytearray: data.decode("ascii", errors="surrogateescape") Text files: open(fname, encoding="ascii", "surrogateescape") Stdin replacement: sys.stdin = io.TextIOWrapper(sys.stdin.buffer, "ascii", "surrogateescape") Stdout replacement (pipeline): sys.stdout = io.TextIOWrapper(sys.stdout.buffer, "ascii", "surrogateescape", line_buffered=True) Stdout replacement (terminal): sys.stdout = io.TextIOWrapper(sys.stdout.buffer, sys.stdout.encoding, "backslashreplace", line_buffered=True) Using "ascii+surrogateescape" instead of "latin-1" is a small initial step into the Unicode-aware world. It still lets an application process any ASCII-compatible encoding *without* having to know the exact encoding of the source data, but will complain if there is an implicit attempt to transcode the data to another encoding, or if the application inserts non-ASCII data into the strings before writing them out. Whether non-ASCII compatible encodings trigger errors or get corrupted will depend on the specifics of the encoding and how the program manipulates the data. The "backslashreplace" error handler (enabled by default for sys.stderr, optionally enabled as shown above for sys.stdout) can be useful to help ensure that printing out strings will not trigger UnicodeEncodeErrors (note: the *repr* of strings already escapes non-ASCII characters internally, such that repr(x) == ascii(x). Thus, UnicodeEncodeErrors will occur only when encoding the string itself using the "strict" error handler, or when another library performs equivalent validation on the string). Task: Process data in any ASCII compatible encoding Unicode Awareness Care Factor: High Approach: Use binary APIs and the "chardet2" module from PyPI to detect the character encoding Bytes/bytearray: data.decode(detected_encoding) Text files: open(fname, encoding=detected_encoding) The *right* way to process text in an unknown encoding is to do your best to derive the encoding from the data stream. The "chardet2" module on PyPI allows this. Refer to that module's documentation (WHERE?) for details. With this approach, transcoding to the default sys.stdin and sys.stdout encodings should generally work (although the default restrictive character set on Windows and in some locales may cause problems). -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From niki.spahiev at gmail.com Wed Feb 15 09:52:45 2012 From: niki.spahiev at gmail.com (Niki Spahiev) Date: Wed, 15 Feb 2012 10:52:45 +0200 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14.02.2012 23:08, Paul Moore wrote: > Maybe we could add a note to the open() > documentation, something like the following: > > """To open a file, you need to know its encoding. This is not always > obvious, depending on where the file came from, among other things. > Other tools can process files without knowing the encoding by assuming > the bytes of the file map 1-1 to the first 256 Unicode characters. > This can cause issues such as mojibake or corrupted data, but for > casual use is sometimes sufficient. To get this behaviour in Python > (with all the same risks and problems) you can use the "latin1" > encoding, which maps bytes to unicode as described above. It is far, > far better to use the correct encoding declaration, if at all > possible, however.""" IMHO it's better to make 'unknown' encoding alias to 'latin1'. This way one can find and change it later. Niki From shibturn at gmail.com Wed Feb 15 12:16:46 2012 From: shibturn at gmail.com (shibturn) Date: Wed, 15 Feb 2012 11:16:46 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <20120214231011.6fce4b3b@bhuda.mired.org> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> Message-ID: On 15/02/2012 4:10am, Mike Meyer wrote: > I'd prefer to provide shm_open on Windows if at all possible. The > "sorta-kinda" bothers me. That would also allow for an application to > exit and then resume work stored in a mapped segment (something I've > done before). However, setting this up on Windows isn't something I > can do. Maybe creating a file using CreateFile and FILE_ATTRIBUTES_TEMPORARY would have a similar effect - it hints to the system to avoid flushing to the disk. (os.open and O_TEMPORARY would not work because that also causes the file to be removed when all handles are closed.) sbt From solipsis at pitrou.net Wed Feb 15 13:34:19 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 15 Feb 2012 13:34:19 +0100 Subject: [Python-ideas] Adding shm_open to mmap? References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> Message-ID: <20120215133419.230ea8e6@pitrou.net> On Tue, 14 Feb 2012 23:10:11 -0500 Mike Meyer wrote: > > I'd prefer to provide shm_open on Windows if at all possible. The > "sorta-kinda" bothers me. That would also allow for an application to > exit and then resume work stored in a mapped segment (something I've > done before). The original discussion was about shared memory with multiprocessing. In that context, automatic collection of shared memory areas shouldn't be a problem. Regards Antoine. From shibturn at gmail.com Wed Feb 15 14:25:14 2012 From: shibturn at gmail.com (shibturn) Date: Wed, 15 Feb 2012 13:25:14 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <20120215133419.230ea8e6@pitrou.net> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> Message-ID: On 15/02/2012 12:34pm, Antoine Pitrou wrote: > The original discussion was about shared memory with multiprocessing. > In that context, automatic collection of shared memory areas shouldn't > be a problem. One problem with automatic collection is if you want to put a reference to an mmap on a queue. The mmap is likely to be disposed of before the target process can upickle it. sbt From phd at phdru.name Wed Feb 15 14:39:12 2012 From: phd at phdru.name (Oleg Broytman) Date: Wed, 15 Feb 2012 17:39:12 +0400 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <87haytyms7.fsf@benfinney.id.au> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> Message-ID: <20120215133912.GA17040@iskra.aviel.ru> On Wed, Feb 15, 2012 at 11:15:36AM +1100, Ben Finney wrote: > If people want to remain wilfully ignorant of text encoding in the third > millennium This returns us to the very beginning of the thread. The original complain was: Python3 requires users to learn too much about unicode, more than they really need. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From christopherreay at gmail.com Wed Feb 15 09:14:17 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Wed, 15 Feb 2012 10:14:17 +0200 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: +1000 Great, lets do that Will I be repetitive if I say "can we put a link in the "UnicodeDecodeError" docstring? At the top of that page have "FOR BEGINNERS" or "Mugh, just make this error go away, Now", and this info from Nick Also link to all the other tons and tons of stuff that exists on UnicodeDecoding... Chardet does nothing like the complex character set decoding that any of the browsers accomplish. Also, it almost always calls "latin-1" encoded files "latin-2" and "latin-someOtherNumber", which actually doesnt work to decode the data. The browsers can translate seemingly untouchable mush of mixed char encodings into UTF-8 (on my linux box) without hiccupping. I tried to emulate their behaviour for almost a week before I gave up. To be fair, I was at that time char set newbie, and I guess I still am, though my scraper works properly. Christopherq -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.sapin at kozea.fr Wed Feb 15 14:41:46 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Wed, 15 Feb 2012 14:41:46 +0100 Subject: [Python-ideas] Py3 unicode impositions In-Reply-To: References: <87ty2yvwiq.fsf@uwakimon.sk.tsukuba.ac.jp> <4F374805.9000606@pearwood.info> <87zkcne1cn.fsf@uwakimon.sk.tsukuba.ac.jp> <87k43poix5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F3BB61A.7020602@kozea.fr> Le 14/02/2012 22:08, Paul Moore a ?crit : > Thinking about how I'd code something like "cat" naively in C (while > ((i = getchar()) != EOF) { putchar(i); }), I guess encoding=latin1 is > the way for Python to "work like everything else" in this context. Hi, The Python equivalent to your C program is to use bytes without decoding at all: open a file with 'rb' mode, use sys.stdin.buffer, ... I think this is the right thing to do if you want to pass through unmodified text without knowing the encoding. Regards, -- Simon Sapin From christopherreay at gmail.com Wed Feb 15 13:39:57 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Wed, 15 Feb 2012 14:39:57 +0200 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <20120215133419.230ea8e6@pitrou.net> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> Message-ID: Do the people here want to shift over to the concurrency maililng list? Would be nicer in there with a few more people -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ronny.Pfannschmidt at gmx.de Wed Feb 15 15:32:10 2012 From: Ronny.Pfannschmidt at gmx.de (Ronny Pfannschmidt) Date: Wed, 15 Feb 2012 15:32:10 +0100 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases Message-ID: <4F3BC1EA.3030002@gmx.de> Hi, in my experience for many cases, __repr__ and __str__ can be unconditionally be represented as simple string formatting operation, so i would propose to add a extension to support simply declaring them in the form of newstyle format strings a basic implementation for __repr__ could look like: class SelfFormatter(string.Formatter): def __init__(self, obj): self.__obj = obj string.Formatter.__init__(self) def get_value(self, key, args, kwargs): if isinstance(key, str) and hasattr(self.__obj, key): return getattr(self.__obj, key) return Formatter.get_value(self, key, args, kwargs) class SimpleReprMixing(object): _repr_ = '<{__class__.__name__} at 0x{__id__!x}>' def __repr__(self): formatter = SelfFormatter(self) return formatter.vformat(self._repr_, (), {'__id__':id(self)}) -- Ronny Pfannschmidt From nathan.alexander.rice at gmail.com Wed Feb 15 16:34:45 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Wed, 15 Feb 2012 10:34:45 -0500 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: <4F3BC1EA.3030002@gmx.de> References: <4F3BC1EA.3030002@gmx.de> Message-ID: I think that a generic __repr__ has been reinvented more times than I can count. I don't think a generic __str__ is a good thing, as it is supposed to be a pretty, semantically meaningful. I don't really see anywhere in the standard library that such a feature would make sense though. I feel like Python's standard library bloat actually makes the good stuff harder to find, and a better approach would be to have a minimal "core" standard library with a few "official" battery pack style libs that are very prominently featured and available. Since you might find this useful, here is my old __repr__ reciple (which has several issues, but gets the job done for the most part): def get_attributes(o): attributes = [(a, getattr(o, a)) for a in set(dir(o)).difference(dir(object)) if a[0] != "_"] return {a[0]: a[1] for a in attributes if not callable(a[1])} class ReprMixin(object): def _format(self, v): if isinstance(v, (basestring, date, time, datetime)): v = "'%s'" % v return v.encode("utf-8", errors="ignore") else: return v def __repr__(self): attribute_string = ", ".join("%s=%s" % (k[0], self._format(k[1])) for k in get_attributes(self).items()) return "%s(%s)" % (type(self).__name__, attribute_string) There is a similar recipes in SQL Alchemy, and I've seen them in a few other popular libs that I can't remember off the top of my head. Nathan From ubershmekel at gmail.com Wed Feb 15 17:20:41 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Wed, 15 Feb 2012 18:20:41 +0200 Subject: [Python-ideas] Generators' and iterators' __add__ method Message-ID: Wouldn't it be nice to add generators and iterators like we can do with lists? def f(): yield 1 yield 2 yield 3 def g(): yield 4 yield 5 # today for item in itertools.chain(f(), g()): print(item) # proposal for item in f() + g(): print(item) What do you guys think? Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Feb 15 17:41:04 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Feb 2012 08:41:04 -0800 Subject: [Python-ideas] Generators' and iterators' __add__ method In-Reply-To: References: Message-ID: It's been proposed many times, but always stumbled on the fact that the iterator protocol doesn't have a standard implementation -- each object implementing __next__ would have to be modified separately to also support __add__. On Wed, Feb 15, 2012 at 8:20 AM, Yuval Greenfield wrote: > Wouldn't it be nice to add generators and iterators like we can do with > lists? > > ? ? def f(): > ? ? ? ? yield 1 > ? ? ? ? yield 2 > ? ? ? ? yield 3 > > ? ? def g(): > ? ? ? ? yield 4 > ? ? ? ? yield 5 > > ? ? # today > ? ? for item in itertools.chain(f(), g()): > ? ? ? ? print(item) > > ? ? # proposal > ? ? for item in f() + g(): > ? ? ? ? print(item) > > > What do you guys think? > > Yuval Greenfield > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From ubershmekel at gmail.com Wed Feb 15 17:50:42 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Wed, 15 Feb 2012 18:50:42 +0200 Subject: [Python-ideas] Generators' and iterators' __add__ method In-Reply-To: References: Message-ID: On Wed, Feb 15, 2012 at 6:41 PM, Guido van Rossum wrote: > It's been proposed many times, but always stumbled on the fact that > the iterator protocol doesn't have a standard implementation -- each > object implementing __next__ would have to be modified separately to > also support __add__. > > > If it isn't a bad idea then we can at least do generators and whatever we find in the standard lib. I think I extrapolate from your response that it isn't a bad idea. I'll work on a patch. Cheers, Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Feb 15 17:54:52 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Feb 2012 08:54:52 -0800 Subject: [Python-ideas] Generators' and iterators' __add__ method In-Reply-To: References: Message-ID: It IS a bad idea. --Guido van Rossum (sent from Android phone) On Feb 15, 2012 8:50 AM, "Yuval Greenfield" wrote: > On Wed, Feb 15, 2012 at 6:41 PM, Guido van Rossum wrote: > >> It's been proposed many times, but always stumbled on the fact that >> the iterator protocol doesn't have a standard implementation -- each >> object implementing __next__ would have to be modified separately to >> also support __add__. >> >> >> > If it isn't a bad idea then we can at least do generators and whatever we > find in the standard lib. > > I think I extrapolate from your response that it isn't a bad idea. I'll > work on a patch. > > Cheers, > > Yuval > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ehlesmes at gmail.com Wed Feb 15 18:02:57 2012 From: ehlesmes at gmail.com (Edward Lesmes) Date: Wed, 15 Feb 2012 12:02:57 -0500 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: References: <4F3BC1EA.3030002@gmx.de> Message-ID: On Wed, Feb 15, 2012 at 10:34 AM, Nathan Rice < nathan.alexander.rice at gmail.com> wrote: > I feel like Python's standard library bloat actually makes > the good stuff harder to find, and a better approach would be to have > a minimal "core" standard library with a few "official" battery pack > style libs that are very prominently featured and available. +1, but who decides what? -- Edward Lesmes -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Feb 15 18:02:50 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 15 Feb 2012 18:02:50 +0100 Subject: [Python-ideas] Adding shm_open to mmap? References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> Message-ID: <20120215180250.21a05ddf@pitrou.net> On Wed, 15 Feb 2012 13:25:14 +0000 shibturn wrote: > On 15/02/2012 12:34pm, Antoine Pitrou wrote: > > The original discussion was about shared memory with multiprocessing. > > In that context, automatic collection of shared memory areas shouldn't > > be a problem. > > One problem with automatic collection is if you want to put a reference > to an mmap on a queue. The mmap is likely to be disposed of before the > target process can upickle it. Can you elaborate? I would think the general use case is to keep an mmap alive as long as you need it, so I don't understand why someone would destroy an mmap just after sending it to another process. Regards Antoine. From shibturn at gmail.com Wed Feb 15 18:43:25 2012 From: shibturn at gmail.com (shibturn) Date: Wed, 15 Feb 2012 17:43:25 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <20120215180250.21a05ddf@pitrou.net> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> Message-ID: On 15/02/2012 5:02pm, Antoine Pitrou wrote: > On Wed, 15 Feb 2012 13:25:14 +0000 > Can you elaborate? I would think the general use case is to keep an > mmap alive as long as you need it, so I don't understand why someone > would destroy an mmap just after sending it to another process. A process which creates an mmap may want to transfer ownership of the mmap to another process along a pipeline. For example: 1) Process A creates an mmap 2) Process A does some work on mmap 3) Process A puts mmap on a queue. 4) mmap gets garbage collected in process A. 5) Process B gets mmap from queue. ... With refcounting the mmap will be destroyed at step 4. With shm_open/shm_unlink, it would be Process B's responsibility to unlink the file. This is the scenario which Sturla Molden was concerned with, although he hadn't thought through the premature disposal issue. sbt P.S. I have posted a possible implementation of shm_open/shm_unlink for Windows at http://mail.python.org/pipermail/concurrency-sig/2012-February/000058.html From ethan at stoneleaf.us Wed Feb 15 19:22:08 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 15 Feb 2012 10:22:08 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <87fwecopgr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <87fwecopgr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F3BF7D0.3030702@stoneleaf.us> Stephen J. Turnbull wrote: > While my opinions on this are (probably obviously) informed by the > WSGI discussion, this is not about making life come up roses for the > WSGI folks. They work in a sewer; life stinks for them, and all they > can do about it is to hold their noses. This thread is about people > who are not trying to handle sewage in a sanitary fashion, rather just > cook a meal and ignore the occasional hairs that inevitably fall in. +1 Picturesque QOTW From stephen at xemacs.org Wed Feb 15 19:40:25 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 16 Feb 2012 03:40:25 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <878vk4ndnq.fsf@uwakimon.sk.tsukuba.ac.jp> It seems we once again agree violently on the principles. I think our differences here are mostly due to me giving a lot of attention to audience and presentation, and you focusing on the content of what to say. Re: spin control: Nick Coghlan writes: > No, I'm merely saying that at least 3 options (latin-1, > ascii+surrogateescape, chardet2) should be presented clearly to > beginners and the trade-offs explained. Are you defining "beginner" as "Python 2 programmer experienced in a multilingual context but new to Python 3"? My point is that, by other definitions of "beginner", I don't think the tradeoffs can be usefully explained to beginners without substantial discussion of the issues involved in ASCII vs. the encoding Babel vs. Unicode. Only in extreme cases where the beginner only cares about *never* getting a Unicode error, or only cares about *never* getting mojibake, will they be able to get much out of this. Re: descriptions > Task: Process data in any ASCII compatible encoding > Unicode Awareness Care Factor: None I don't understand what "Unicode awareness" means here. The degree to which Python will raise Unicode errors? The awareness of the programmer? > Approach: Specify encoding="latin-1" [...] > first 256 Unicode code points. However, any output data generated by > that application *will* be corrupted As advice, I think this is mostly false. In particular, unless you do language-specific manipulations (transforming particular words and the like), the Latin-N family is going to be 6-sigma interoperable with Latin-1, and the rest of the ISO 8859 and Windows-125x family tolerably so. This is why it is so hard to root out the "Python 3 is just Unicode-me-harder by another name" meme. The most you should say here is that data *may* be corrupted and that, depending on the program, the risk *may* be non-negligible for non-Latin-1 data if you ever encounter it. > Using "ascii+surrogateescape" instead of "latin-1" is a small initial > step into the Unicode-aware world. It still lets an application > process any ASCII-compatible encoding *without* having to know the > exact encoding of the source data, but will complain if there is an > implicit attempt to transcode the data to another encoding, That last line would be better "attempt to validate the data, or output it without an error-suppressing handler (which may occur implicitly, in a module your program uses)." > or if the application inserts non-ASCII data into the strings > before writing them out. Whether non-ASCII compatible encodings > trigger errors or get corrupted will depend on the specifics of the > encoding and how the program manipulates the data. You can be a little more precise: Non-ASCII-compatible encodings will trigger errors in the same circumstances as ASCII-compatible encodings. They also likely to be corrupted, but depending on the specifics of the encoding and how the program manipulates the data. I don't know if it's worth the extra verbosity, though. > Task: Process data in any ASCII compatible encoding > Unicode Awareness Care Factor: High > Approach: Use binary APIs and the "chardet2" module from PyPI to > detect the character encoding > Bytes/bytearray: data.decode(detected_encoding) > Text files: open(fname, encoding=detected_encoding) > > The *right* way to process text in an unknown encoding is to do your > best to derive the encoding from the data stream. The claim of "right" isn't good advice. The *right* way to process text is to insist on knowing the encoding in advance. If you have to process text in unknown encodings, then what is "right" will vary with the application. For one thing, accurate detection generally impossible without advice from outside. Given inaccuracy of automatic detection, I would often prefer to fall back to a generic ASCII-compatible algorithm that omits any processing that requires identifying non-ASCII characters or inserting non-ASCII characters into the text stream, rather than risk mojibake. In other cases, all of the significant processing is done on ASCII characters, and non-ASCII is simply passed through verbatim. Then if you need to process text in assorted encodings, the 'latin1' method is not merely acceptable, it is the obvious winning strategy. And to some extent the environment: > [T]he default restrictive character set on Windows and in some > locales may cause problems. In sum, most likely naive use of chardet is most effective as a way to rule out non-ASCII-compatible encodings, which *can* be done rather accurately (Shift JIS, Big5, UTF-16, and UTF-32 all have characteristic patterns of use of non-ASCII octets). From ned at nedbatchelder.com Wed Feb 15 19:41:16 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Wed, 15 Feb 2012 13:41:16 -0500 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: References: <4F3BC1EA.3030002@gmx.de> Message-ID: <4F3BFC4C.5000202@nedbatchelder.com> On 2/15/2012 10:34 AM, Nathan Rice wrote: > I feel like Python's standard library bloat actually makes > the good stuff harder to find, and a better approach would be to have > a minimal "core" standard library with a few "official" battery pack > style libs that are very prominently featured and available. If the only problem is "hard to find," then you need a documentation re-organization, not a change to what is shipped where. --Ned. From p.f.moore at gmail.com Wed Feb 15 19:51:29 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 15 Feb 2012 18:51:29 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I really like a task-oriented approach like this. +1000 for this sort of thing in the docs. On 15 February 2012 08:03, Nick Coghlan wrote: > Task: Process data in any ASCII compatible encoding This is actually closest to how I think about what I'm doing, so thanks for spelling it out. > Unicode Awareness Care Factor: High I'm not entirely sure how to interpret this - "High level of interest in getting it right" or "High amount of investment in understanding Unicode needed"? Or something else? > Approach: Use binary APIs and the "chardet2" module from PyPI to > detect the character encoding > ? ?Bytes/bytearray: data.decode(detected_encoding) > ? ?Text files: open(fname, encoding=detected_encoding) If this is going into the Unicode FAQ or somewhere similar, it probably needs a more complete snippet of sample code. Without having looked for and read the chardet2 documentation, do I need to read the file once in binary mode (possibly only partially) to scan it for an encoding, and then start again "for real". That's arguably a downside to this approach. > The *right* way to process text in an unknown encoding is to do your > best to derive the encoding from the data stream. The "chardet2" > module on PyPI allows this. Refer to that module's documentation > (WHERE?) for details. There is arguably another, simpler approach, which is to pick a default encoding (probably what Python gives you by default) and add a command line argument to your program (or equivalent if your program isn't a command line app) to manually specify an alternative. That's probably more complicated than the naive user wanted to deal with when they started reading this summary, but may well not sound so bad by the time they get to this point :-) > With this approach, transcoding to the default sys.stdin and > sys.stdout encodings should generally work (although the default > restrictive character set on Windows and in some locales may cause > problems). A couple of other tasks spring to mind: Task: Process data in a file whose encoding I don't know Unicode Understanding Needed: Medium-Low Unicode Correctness: High Approach: Use external tools to identify the encoding, then simply specify it when opening the file. On Unix, "file -i FILENAME" will attempt to detect the encoding, on Windows, XXX. If, and only if, this approach doesn't identify the encoding clearly, then the other options allow you to do the best you can. (Needs a better description of what tools to use, and maybe a sample Python script using chardet2 as a fallback). This is actually the "right way", and should be highlighted as such. By describing it this way, it's also rather clear that it's *not hard*, once you get over the idea that you don't know how to get the encoding, because it's not specified in the file. Having read through and extended Nick's analysis to this point, I'm thinking that it actually fits my use cases fine (and correct Unicode handling no longer feels like such a hard problem to me :-)) Task: Process data in a file believed to have inconsistent encodings Unicode Understanding Needed: High Unicode Correctness: Low Approach: ??? Panic :-) This is the killer, but should be extremely rare. We don't need to explain what to do here, but maybe offer a simple strategy (1. Are you sure the file has mixed encodings? Have you checked twice? 2. If it's ASCII-compatible, can you work on a basis that you just pass the mixed-encoding bytes through unchanged? If so use one of the other recipes Nick explained. 3. Do you care about mojibake or corruption? Can you afford not to? 4. Are you a Unicode expert, or do you know one? :-)) I think something like this would be a huge benefit for the Unicode FAQ. I haven't got the time or expertise to write it, but I wish I did. If I get some spare time, I might well have a go anyway, but I can't promise. Paul From nathan.alexander.rice at gmail.com Wed Feb 15 19:54:15 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Wed, 15 Feb 2012 13:54:15 -0500 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: <4F3BFC4C.5000202@nedbatchelder.com> References: <4F3BC1EA.3030002@gmx.de> <4F3BFC4C.5000202@nedbatchelder.com> Message-ID: >> I feel like Python's standard library bloat actually makes >> the good stuff harder to find, and a better approach would be to have >> a minimal "core" standard library with a few "official" battery pack >> style libs that are very prominently featured and available. > > If the only problem is "hard to find," then you need a documentation > re-organization, not a change to what is shipped where. I think the documentation is pretty well organized overall. There is an issue of irreducible complexity though; someone that is searching for a specific thing wouldn't care, but a newer user trying to get their bearings on how this python thing works by browsing the standard lib probably would. Additionally, decoupling modules from the interpreter release schedule would probably be a good thing. Nathan From shibturn at gmail.com Wed Feb 15 20:53:16 2012 From: shibturn at gmail.com (shibturn) Date: Wed, 15 Feb 2012 19:53:16 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 15/02/2012 6:51pm, Paul Moore wrote: > Task: Process data in a file whose encoding I don't know > Unicode Understanding Needed: Medium-Low > Unicode Correctness: High > Approach: Use external tools to identify the encoding, then simply > specify it when opening the file. On Unix, "file -i FILENAME" will > attempt to detect the encoding, on Windows, XXX. If, and only if, this > approach doesn't identify the encoding clearly, then the other options > allow you to do the best you can. Don't recommend "file -i". I just tried it on the files in /usr/share/libtextcat/ShortTexts/. Basically, everything is identified as us-ascii, iso-8859-1 or unknown-8bit. Examples: chinese-big5.txt: text/plain; charset=iso-8859-1 chinese-gb2312.txt: text/plain; charset=iso-8859-1 japanese-euc_jp.txt: text/plain; charset=iso-8859-1 korean.txt: text/plain; charset=iso-8859-1 arabic-windows1256.txt: text/plain; charset=iso-8859-1 georgian.txt: text/plain; charset=iso-8859-1 greek-iso8859-7.txt: text/plain; charset=iso-8859-1 hebrew-iso8859_8.txt: text/plain; charset=iso-8859-1 russian-windows1251.txt: text/plain; charset=iso-8859-1 ukrainian-koi8_r.txt: text/plain; charset=iso-8859-1 sbt From greg.ewing at canterbury.ac.nz Wed Feb 15 22:44:20 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Feb 2012 10:44:20 +1300 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3AE675.6010907@mrabarnett.plus.com> References: <871uq3xeq7.fsf@uwakimon.sk.tsukuba.ac.jp> <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> Message-ID: <4F3C2734.9060807@canterbury.ac.nz> MRAB wrote: > encoding="mojibake" # :-) +1 -- Greg From tjreedy at udel.edu Wed Feb 15 23:39:31 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 15 Feb 2012 17:39:31 -0500 Subject: [Python-ideas] Generators' and iterators' __add__ method In-Reply-To: References: Message-ID: On 2/15/2012 11:20 AM, Yuval Greenfield wrote: > Wouldn't it be nice to add generators and iterators like we can do with > lists? That is simply not possible. list1+list2 is a list; tuple1+tuple2 is a tuple; list1+tuple2 is an error! What type would iterable1 + iterable 2 be? Answer: a chain object! > def f(): > yield 1 > yield 2 > yield 3 > > def g(): > yield 4 > yield 5 > > # today > for item in itertools.chain(f(), g()): > print(item) The itertools module is the proper place for generic operations on iterables (and not just iterators!). >>> import itertools as it >>> list(it.chain([1,2,3], (4,5.6), range(7,10))) [1, 2, 3, 4, 5.6, 7, 8, 9] Chain 'adds' mixed types and is not limited to binary scope. If we were starting fresh today, we *might* consider making the functions that went into itertools into methods of an iterable ABC. But making every function a method is not Python's style. Indeed, exposing generic methods like __len__ as functions is more common. -- Terry Jan Reedy From anacrolix at gmail.com Wed Feb 15 23:52:55 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 16 Feb 2012 06:52:55 +0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120215133912.GA17040@iskra.aviel.ru> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> Message-ID: The thread was reasons for a possible drop in popularity. Somehow the other reasons have been sabotaged leaving only the unicode discussion still alive. On Feb 15, 2012 9:39 PM, "Oleg Broytman" wrote: > On Wed, Feb 15, 2012 at 11:15:36AM +1100, Ben Finney wrote: > > If people want to remain wilfully ignorant of text encoding in the third > > millennium > > This returns us to the very beginning of the thread. The original > complain was: Python3 requires users to learn too much about unicode, > more than they really need. > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Thu Feb 16 00:07:49 2012 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 16 Feb 2012 10:07:49 +1100 Subject: [Python-ideas] Unicode surrogateescape [was: Re: Python 3000 TIOBE -3%] In-Reply-To: <04A64366-3F31-40AF-9E84-FFB3C3C1E690@gmail.com> References: <04A64366-3F31-40AF-9E84-FFB3C3C1E690@gmail.com> Message-ID: <20120215230749.GA7352@cskk.homeip.net> On 14Feb2012 10:17, Carl M. Johnson wrote: | On Feb 14, 2012, at 10:04 AM, Jim Jewett wrote: | > But is there a good reason not to change the default errorhandler to | > errors="surrogateescape"? | | It's a conflict in the Zen: | | > Errors should never pass silently. | > Unless explicitly silenced. | | OK, so default to strict. But: Yes. | > Although practicality beats purity. | | Hmm, so maybe do use surrogates. Then again: No. Adding errors="surrogateescape" when needed is easy enough not to be impractical. (Also, it clearly flags in the code that we won't always get what we expect/hope.) | > In the face of ambiguity, refuse the temptation to guess. | | Grr, I'm not nearly Dutch enough to make sense of this logical conflict! I'm not Dutch either (I can never remember which way P and V go in semaphore operations, for example). However, the logic I would use is very simple: I should know the encoding of these bytes. If I don't, and I merely have to suck them in and spit them back out again as bytes undamaged (such as when reading filesystem filenames, which can often be treated as opaque tokens), use errors="surrogateescape". Otherwise, arrange to know the encoding (or have enough fiat to declare one, preferably utf-8). errors="surrogateescape" is for lossless but usually "blind" decode/encode. The rest of the time it would be better to know what you're doing. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ We don't just *borrow* words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary. - James D. Nicoli From p.f.moore at gmail.com Thu Feb 16 00:15:18 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 15 Feb 2012 23:15:18 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 15 February 2012 19:53, shibturn wrote: > Don't recommend "file -i". Fair enough - I have no experience to comment one way or another. it was just something I'd seen mentioned in the thread. If there isn't a good standard encoding detector, maybe a small Python script using chardet2 would be the best thing to recommend... Paul. From steve at pearwood.info Thu Feb 16 00:26:50 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Feb 2012 10:26:50 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> Message-ID: <4F3C3F3A.4090408@pearwood.info> Matt Joiner wrote: > The thread was reasons for a possible drop in popularity. Somehow the other > reasons have been sabotaged leaving only the unicode discussion still alive. Not so much sabotaged as ignored. Perhaps because we don't believe this alleged drop in popularity represents anything real, while the Unicode issue is a genuine problem that needs a solution. -- Steven From steve at pearwood.info Thu Feb 16 00:43:33 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Feb 2012 10:43:33 +1100 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: <4F3BC1EA.3030002@gmx.de> References: <4F3BC1EA.3030002@gmx.de> Message-ID: <4F3C4325.2070000@pearwood.info> Ronny Pfannschmidt wrote: > Hi, > > in my experience for many cases, __repr__ and __str__ can be > unconditionally be represented as simple string formatting operation, In my experience, not so much. > so i would propose to add a extension to support simply declaring them > in the form of newstyle format strings Declare them how? What is your proposed API for using this new functionality? Before proposing an implementation, you should propose an interface. > a basic implementation for __repr__ could look like: > > class SelfFormatter(string.Formatter): > def __init__(self, obj): > self.__obj = obj > string.Formatter.__init__(self) > > def get_value(self, key, args, kwargs): > if isinstance(key, str) and hasattr(self.__obj, key): > return getattr(self.__obj, key) > return Formatter.get_value(self, key, args, kwargs) > > class SimpleReprMixing(object): > _repr_ = '<{__class__.__name__} at 0x{__id__!x}>' > def __repr__(self): > formatter = SelfFormatter(self) > return formatter.vformat(self._repr_, (), {'__id__':id(self)}) I don't think you need this just to get a generic "instance at id" string. If you inherit from object (and remember that all classes inherit from object in Python 3) you get this for free: >>> class K(object): ... pass ... >>> k = K() >>> str(k) '<__main__.K object at 0xb746068c>' -- Steven From ncoghlan at gmail.com Thu Feb 16 01:03:38 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Feb 2012 10:03:38 +1000 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: References: <4F3BC1EA.3030002@gmx.de> Message-ID: On Thu, Feb 16, 2012 at 1:34 AM, Nathan Rice wrote: > I think that a generic __repr__ has been reinvented more times than I > can count. ?I don't think a generic __str__ is a good thing, as it is > supposed to be a pretty, semantically meaningful. ?I don't really see > anywhere in the standard library that such a feature would make sense > though. Python 3's reprlib already provides some tools for writing well-behaved __repr__ implementations (specifically, the reprlib.recursive_repr decorator that handles cycles in container representations). I actually have a recipe for simple "cls(arg1, arg2, arg3)" style __repr__ output on Stack Overflow: http://stackoverflow.com/questions/7072938/including-a-formatted-iterable-as-part-of-a-larger-formatted-string > There is a similar recipes in SQL Alchemy, and I've seen them in a few > other popular libs that I can't remember off the top of my head. The unfortunate part of dict-based __repr__ implementations is that the order of the parameter display is technically arbitrary. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From greg.ewing at canterbury.ac.nz Thu Feb 16 02:37:12 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Feb 2012 14:37:12 +1300 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120215133912.GA17040@iskra.aviel.ru> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> Message-ID: <4F3C5DC8.707@canterbury.ac.nz> On 16/02/12 02:39, Oleg Broytman wrote: > On Wed, Feb 15, 2012 at 11:15:36AM +1100, Ben Finney wrote: >> If people want to remain wilfully ignorant of text encoding in the third >> millennium > > This returns us to the very beginning of the thread. The original > complain was: Python3 requires users to learn too much about unicode, > more than they really need. I don't think it's helpful to label everyone who wants to use the techniques being discussed here as lazy or ignorant. As we've seen, there are cases where you truly *can't* know the true encoding, and at the same time it *doesn't matter*, because all you want to do is treat the unknown bytes as opaque data. To tell someone in that position that they're being lazy is both wrong and insulting. It seems to me that what surrogateescape is effectively doing is creating a new data type that consists of a mixture of ASCII characters and raw bytes, and enables you to tell which is which. Maybe there should be a real data type like this, or a flag on the unicode type. The data would be stored in the same way as a latin1-decoded string, but anything with the high bit set would be regarded as a byte instead of a character. This might make it easier to interoperate with external libraries that expect well-formed unicode. -- Greg From sturla at molden.no Thu Feb 16 02:40:43 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 16 Feb 2012 02:40:43 +0100 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> Message-ID: <6E24498E-EB73-46EA-9508-BC4279762B74@molden.no> > > P.S. I have posted a possible implementation of shm_open/shm_unlink for Windows at > > http://mail.python.org/pipermail/concurrency-sig/2012-February/000058.html > > A temporary file is not backed shared memory on Windows, but is a persistent file on disk. You have to mmap from the OS' paging file to get shared memory. Sturla From greg.ewing at canterbury.ac.nz Thu Feb 16 02:46:23 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Feb 2012 14:46:23 +1300 Subject: [Python-ideas] Generators' and iterators' __add__ method In-Reply-To: References: Message-ID: <4F3C5FEF.5090508@canterbury.ac.nz> On 16/02/12 05:20, Yuval Greenfield wrote: > Wouldn't it be nice to add generators and iterators like we can do with lists? > > for item in f() + g(): > print(item) No. Then every iterator would be expected to implement __add__, including all the ones already written. It would also clash with existing meanings of __add__ on some types, such as NumPy arrays. -- Greg From greg.ewing at canterbury.ac.nz Thu Feb 16 02:48:51 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Feb 2012 14:48:51 +1300 Subject: [Python-ideas] Generators' and iterators' __add__ method In-Reply-To: References: Message-ID: <4F3C6083.9080004@canterbury.ac.nz> On 16/02/12 05:50, Yuval Greenfield wrote: > If it isn't a bad idea then we can at least do generators and whatever we find > in the standard lib. It's only a good idea if it applies universally, otherwise code that relies on it would be fragile. -- Greg From greg.ewing at canterbury.ac.nz Thu Feb 16 02:56:22 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 16 Feb 2012 14:56:22 +1300 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> Message-ID: <4F3C6246.7000509@canterbury.ac.nz> On 16/02/12 06:43, shibturn wrote: > A process which creates an mmap may want to transfer ownership of the mmap to > another process along a pipeline. For example: > > 1) Process A creates an mmap > 2) Process A does some work on mmap > 3) Process A puts mmap on a queue. > 4) mmap gets garbage collected in process A. > 5) Process B gets mmap from queue. I don't know about Windows, but in Unix it's possible to send a file descriptor from one process to another over a unix-domain socket connection. So a refcounted anonymous mmap handover could be achieved this way: 1. Process A creates a temp file, mmaps it and unlinks it. 2. Process A sends the file descriptor to process B over a unix-domain socket. 3. Process B mmaps it. Even if process A closes its version of the fd right after sending it, the OS should keep it alive while it's in transit, I think. -- Greg From steve at pearwood.info Thu Feb 16 05:08:39 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Feb 2012 15:08:39 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3C5DC8.707@canterbury.ac.nz> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> Message-ID: <20120216040839.GA3048@ando> On Thu, Feb 16, 2012 at 02:37:12PM +1300, Greg Ewing wrote: > On 16/02/12 02:39, Oleg Broytman wrote: > >On Wed, Feb 15, 2012 at 11:15:36AM +1100, Ben Finney wrote: > >>If people want to remain wilfully ignorant of text encoding in the third > >>millennium > > > > This returns us to the very beginning of the thread. The original > >complain was: Python3 requires users to learn too much about unicode, > >more than they really need. > > I don't think it's helpful to label everyone who wants to use the > techniques being discussed here as lazy or ignorant. As we've seen, > there are cases where you truly *can't* know the true encoding, > and at the same time it *doesn't matter*, because all you want to > do is treat the unknown bytes as opaque data. To tell someone in > that position that they're being lazy is both wrong and insulting. In fairness, this thread was originally started with the scenario "I'm reading files which are only mostly ASCII, but I don't want to learn about Unicode" rather than "I know about Unicode, but it doesn't help me in this situation because the encoding truly is unknown". So wilful ignorance does apply, at least in the use-case the thread started with. (If it helps, think of them as too busy to learn, not too lazy.) If you already know about Unicode, then you probably don't need to be given a simple recipe to follow, because you probably already have a solution that works for you. Which brings us back to the original use-case: "I have a file which is only mostly ASCII, and I don't care to learn about Unicode at this time to deal with it. I need a recipe I can follow that will do the right-thing so I can continue to ignore the issue for a little longer." I don't think that we should either insist that these people be forced to learn Unicode, nor expect to be able to solve every possible problem they might find. A couple of recipes in the FAQs, and discussion of why you might prefer one to the other, should be able to cover most simple cases: open(filename, encoding='ascii', errors='surrogateescape') open(filename, encoding='latin1') Both recipes hint at the wider world of encodings and error handlers, hence act as a non-threatening introduction to Unicode. -- Steven From storchaka at gmail.com Thu Feb 16 07:24:25 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 16 Feb 2012 08:24:25 +0200 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: References: <4F3BC1EA.3030002@gmx.de> Message-ID: 16.02.12 02:03, Nick Coghlan ???????(??): > The unfortunate part of dict-based __repr__ implementations is that > the order of the parameter display is technically arbitrary. Not for OrderedDict. From Ronny.Pfannschmidt at gmx.de Thu Feb 16 07:54:06 2012 From: Ronny.Pfannschmidt at gmx.de (Ronny Pfannschmidt) Date: Thu, 16 Feb 2012 07:54:06 +0100 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: <4F3C4325.2070000@pearwood.info> References: <4F3BC1EA.3030002@gmx.de> <4F3C4325.2070000@pearwood.info> Message-ID: <4F3CA80E.1060208@gmx.de> On 02/16/2012 12:43 AM, Steven D'Aprano wrote: > Ronny Pfannschmidt wrote: >> Hi, >> >> in my experience for many cases, __repr__ and __str__ can be >> unconditionally be represented as simple string formatting operation, > > In my experience, not so much. > > >> so i would propose to add a extension to support simply declaring them >> in the form of newstyle format strings > > Declare them how? What is your proposed API for using this new > functionality? Before proposing an implementation, you should propose an > interface. > > >> a basic implementation for __repr__ could look like: >> >> class SelfFormatter(string.Formatter): >> def __init__(self, obj): >> self.__obj = obj >> string.Formatter.__init__(self) >> >> def get_value(self, key, args, kwargs): >> if isinstance(key, str) and hasattr(self.__obj, key): >> return getattr(self.__obj, key) >> return Formatter.get_value(self, key, args, kwargs) >> >> class SimpleReprMixing(object): >> _repr_ = '<{__class__.__name__} at 0x{__id__!x}>' >> def __repr__(self): >> formatter = SelfFormatter(self) >> return formatter.vformat(self._repr_, (), {'__id__':id(self)}) > > > I don't think you need this just to get a generic "instance at id" > string. If you inherit from object (and remember that all classes > inherit from object in Python 3) you get this for free: > > >>> class K(object): > ... pass > ... > >>> k = K() > >>> str(k) > '<__main__.K object at 0xb746068c>' > > > seems like you completely missed that class-level attributes can easily be redefined by subclasses like class User(SimpleReprMixin): _repr_ = "" ... the implementation and the interface is pretty simple and straightforward From ncoghlan at gmail.com Thu Feb 16 08:04:11 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Feb 2012 17:04:11 +1000 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: References: <4F3BC1EA.3030002@gmx.de> Message-ID: On Thu, Feb 16, 2012 at 4:24 PM, Serhiy Storchaka wrote: > 16.02.12 02:03, Nick Coghlan ???????(??): > >> The unfortunate part of dict-based __repr__ implementations is that >> the order of the parameter display is technically arbitrary. > > > Not for OrderedDict. Yes, but relying on OrderedDict rules out using keyword arguments to make any API easier to use. At that point, it's generally simpler for people to write their own repr that does exactly what they want. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From techtonik at gmail.com Thu Feb 16 08:04:54 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 16 Feb 2012 10:04:54 +0300 Subject: [Python-ideas] Generators' and iterators' __add__ method In-Reply-To: <4F3C5FEF.5090508@canterbury.ac.nz> References: <4F3C5FEF.5090508@canterbury.ac.nz> Message-ID: On Thu, Feb 16, 2012 at 4:46 AM, Greg Ewing wrote: > On 16/02/12 05:20, Yuval Greenfield wrote: > >> Wouldn't it be nice to add generators and iterators like we can do with >> lists? >> > > > >> for item in f() + g(): >> print(item) >> > > No. Then every iterator would be expected to implement __add__, > including all the ones already written. > > It would also clash with existing meanings of __add__ on some > types, such as NumPy arrays. Good example for Python Ideas FAQ. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Feb 16 08:13:47 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Feb 2012 17:13:47 +1000 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: <4F3CA80E.1060208@gmx.de> References: <4F3BC1EA.3030002@gmx.de> <4F3C4325.2070000@pearwood.info> <4F3CA80E.1060208@gmx.de> Message-ID: On Thu, Feb 16, 2012 at 4:54 PM, Ronny Pfannschmidt wrote: > the implementation and the interface is pretty simple and straightforward However, the question is whether it's simple and straightforward enough to be worth standardising. There are a few common patterns that recur in repr implementations: - based (the object.__repr__ default) - cls(arg1, arg2...) positional argument based - cls(kwd1=arg1, kwd2=arg2...) keyword argument based - a mixture of the previous two options Adding some helpers along those lines to reprlib may make sense, but a meaningful ReprMixin places stronger constraints on the relationship between class attributes and the desired repr output than is appropriate for the standard library. It's way too idiosyncratic across developers to be worth promoting in the stdlib (a project or domain specific class hierarchy is a different story, but that's not relevant for a general purpose ReprMixin proposal). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From Ronny.Pfannschmidt at gmx.de Thu Feb 16 08:20:06 2012 From: Ronny.Pfannschmidt at gmx.de (Ronny Pfannschmidt) Date: Thu, 16 Feb 2012 08:20:06 +0100 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: References: <4F3BC1EA.3030002@gmx.de> <4F3C4325.2070000@pearwood.info> <4F3CA80E.1060208@gmx.de> Message-ID: <4F3CAE26.4020201@gmx.de> On 02/16/2012 08:13 AM, Nick Coghlan wrote: > On Thu, Feb 16, 2012 at 4:54 PM, Ronny Pfannschmidt > wrote: >> the implementation and the interface is pretty simple and straightforward > > However, the question is whether it's simple and straightforward > enough to be worth standardising. > > There are a few common patterns that recur in repr implementations: > > - based (the object.__repr__ default) > > - cls(arg1, arg2...) positional argument based > > - cls(kwd1=arg1, kwd2=arg2...) keyword argument based > > - a mixture of the previous two options > > Adding some helpers along those lines to reprlib may make sense, but a > meaningful ReprMixin places stronger constraints on the relationship > between class attributes and the desired repr output than is > appropriate for the standard library. It's way too idiosyncratic > across developers to be worth promoting in the stdlib (a project or > domain specific class hierarchy is a different story, but that's not > relevant for a general purpose ReprMixin proposal). instead of the ReprMixin, how about a descriptor the Api would change to something like class User(object): __repr__ = FormatRepr(' > Cheers, > Nick. > From stephen at xemacs.org Thu Feb 16 08:49:47 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 16 Feb 2012 16:49:47 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3C5DC8.707@canterbury.ac.nz> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> Message-ID: <877gznnrok.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Maybe there should be a real data type [parallel to str and bytes > that mixes str and bytes], or a flag on the unicode type. -1. This is yesterday's problem. It still hurts today; we need workarounds. But it's going to be less and less important as time goes on, because nobody can afford one-locale software anymore, and the cheapest way to be multilocale is to process in Unicode, and insist on Unicode on input and output. The unknown encoding problem is not one with a generally acceptable solution. That's why Unicode was invented. To "solve" the problem by ensuring it doesn't occur in the first place. From p.f.moore at gmail.com Thu Feb 16 13:59:26 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Feb 2012 12:59:26 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120216040839.GA3048@ando> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <20120216040839.GA3048@ando> Message-ID: On 16 February 2012 04:08, Steven D'Aprano wrote: > On 16/02/12 02:39, Oleg Broytman wrote: >> I don't think it's helpful to label everyone who wants to use the >> techniques being discussed here as lazy or ignorant. As we've seen, >> there are cases where you truly *can't* know the true encoding, >> and at the same time it *doesn't matter*, because all you want to >> do is treat the unknown bytes as opaque data. To tell someone in >> that position that they're being lazy is both wrong and insulting. > > In fairness, this thread was originally started with the scenario "I'm > reading files which are only mostly ASCII, but I don't want to learn > about Unicode" rather than "I know about Unicode, but it doesn't help me > in this situation because the encoding truly is unknown". So wilful > ignorance does apply, at least in the use-case the thread started with. > (If it helps, think of them as too busy to learn, not too lazy.) As the person who started the thread with this use case, I'd dispute that description of what I said. To restate it "I'm reading files which are mostly ASCII but not all. I know that I should identify the encoding, and what to do if I did know the encoding, but I'm not sure how to find out reliably what the encoding is. Also, the problem doesn't really warrant investing the time needed to research means of doing so - given that I don't need to process the non-ASCII, I just want to avoid decoding errors and not corrupt the data". I'm not lazy, I've just done a cost/benefit analysis and determined that my limited knowledge should be enough. Experience with other tools which aren't as strict as Python 3 on Unicode matters confirms that a "good enough" job does satisfy my needs. And I'm not willfully ignorant, I actually have a good feel for Unicode and the issues involved, and I certainly know what's right. I've just found that everything I've read assumes that "knowing the encoding" isn't hard - and my experience differs, so I don't know where to go for answers. Add to this the fact that I *know* I've seen supposed text files with mixed encoding content, and no-one has *ever* explained how to handle that (it's basically a damaged file, and so all the "right way to deal with Unicode" discussions ignore it) even though tools like grep and awk do a perfectly acceptable job to the level I care about. I'm very pleased with the way this thread has gone, because it has answered all of the questions I've had about "nearly-ASCII" text files. But there's no way I'd have expected to spend this much time, and involve this many other people with more knowledge than me, just to handle my original changelog-parsing problem that I could do in awk or Python 2 in about 5 minutes. Now, I could also do it in Python 3. But then, I couldn't. Hopefully the knowledge from this thread can be captured so that other people can avoid my dilemma. OK, so maybe I do feel somewhat insulted... Cheers, Paul. From steve at pearwood.info Thu Feb 16 14:44:25 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Feb 2012 00:44:25 +1100 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <20120216040839.GA3048@ando> Message-ID: <4F3D0839.2070802@pearwood.info> Paul Moore wrote: > On 16 February 2012 04:08, Steven D'Aprano wrote: >> On 16/02/12 02:39, Oleg Broytman wrote: >>> I don't think it's helpful to label everyone who wants to use the >>> techniques being discussed here as lazy or ignorant. As we've seen, >>> there are cases where you truly *can't* know the true encoding, >>> and at the same time it *doesn't matter*, because all you want to >>> do is treat the unknown bytes as opaque data. To tell someone in >>> that position that they're being lazy is both wrong and insulting. >> In fairness, this thread was originally started with the scenario "I'm >> reading files which are only mostly ASCII, but I don't want to learn >> about Unicode" rather than "I know about Unicode, but it doesn't help me >> in this situation because the encoding truly is unknown". So wilful >> ignorance does apply, at least in the use-case the thread started with. >> (If it helps, think of them as too busy to learn, not too lazy.) > > As the person who started the thread with this use case, I'd dispute > that description of what I said. I am sorry, I spoke poorly. Apologies if you feel I misrepresented you. To be honest, this thread has been so large, and so rambling, and covering so much ground, I have no idea what the *actual* first mention of encoding related issues was. The oldest I can find was Giampaolo Rodol? on 9 Feb 2012 20:16:00 +0100: I bet a lot of people don't want to upgrade for another reason: unicode. The impression I got is that python 3 forces the user to use and *understand* unicode and a lot of people simply don't want to deal with that. two days before the first post from you mentioning encoding issues that I can find. Another mention of a similar use-case was by Stephen J Turnbull on 10 Feb 2012 17:41:21 +0900: True, if one sticks to pure ASCII, there's no difference to notice, but that's just not possible for people who live outside of the U.S., or who share text with people outside of the U.S. They need currency symbols, they have friends whose names have little dots on them. Every single one of those is a backtrace waiting to happen. A backtrace on f = open('text-file.txt') for line in f: pass is an imposition. That doesn't happen in 2.x (for the wrong reasons, but it's very convenient 95% of the time). This is what Victor's "locale" codec is all about. I think that's the wrong spelling for the feature, but there does need to be a way to express "don't bother me about Unicode" in most scripts for most people. We don't have a decent boilerplate for that yet. which I *paraphrased* as "I have text files that are mostly ASCII and I don't want to deal with Unicode yadda yadda yadda". But in any case, I expressed myself poorly, and I'm sorry about that. Regardless of who made the very first mention of the encoding problem in this thread, I think we should all be able to agree that laziness is *not* the only reason for having encoding problems. I thought I made it clear that I did not subscribe to that opinion. -- Steven From p.f.moore at gmail.com Thu Feb 16 15:47:58 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Feb 2012 14:47:58 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3D0839.2070802@pearwood.info> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <20120216040839.GA3048@ando> <4F3D0839.2070802@pearwood.info> Message-ID: On 16 February 2012 13:44, Steven D'Aprano wrote: > But in any case, I expressed myself poorly, and I'm sorry about that. > > Regardless of who made the very first mention of the encoding problem in > this thread, I think we should all be able to agree that laziness is *not* > the only reason for having encoding problems. I thought I made it clear that > I did not subscribe to that opinion. Not a problem. Equally, my "I feel insulted" dig was uncalled for - it was the sort of semi-humorous comment that doesn't translate itself well in email. I think the debate here has been immensely useful, and I appreciate everyone's comments. Paul. From nathan.alexander.rice at gmail.com Thu Feb 16 15:55:27 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Thu, 16 Feb 2012 09:55:27 -0500 Subject: [Python-ideas] automation of __repr__/__str__ for all the common simple cases In-Reply-To: <4F3CAE26.4020201@gmx.de> References: <4F3BC1EA.3030002@gmx.de> <4F3C4325.2070000@pearwood.info> <4F3CA80E.1060208@gmx.de> <4F3CAE26.4020201@gmx.de> Message-ID: > instead of the ReprMixin, how about a descriptor > > the Api would change to something like > > class User(object): > ?__repr__ = FormatRepr(' > its less concise, but actually more straightforward and better decoupled > (thanks for hinting at the strong coupling in the class hierarchy) > > and ArgRepr and KwargRepr could be added in a similar fashion +1 for not using inheritance (which honestly creates about as many problems as it solves)... Having some of the common use cases like this as descriptors that could be imported and used directly would be an improvement on the current interface of the lib IMHO. Nathan From stephen at xemacs.org Thu Feb 16 16:25:59 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 17 Feb 2012 00:25:59 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <20120216040839.GA3048@ando> Message-ID: <874nuqol4o.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > Add to this the fact that I *know* I've seen supposed text files with > mixed encoding content, Heck, I've seen *file names* with mixed encoding content. > and no-one has *ever* explained how to handle that (it's basically > a damaged file, and so all the "right way to deal with Unicode" > discussions ignore it) The right way to handle such a file is ad hoc: operate on the features you can identify, and treats runs of bytes of unknown encoding as atomic blobs. In practice, there is a generic such feature that supports many applications: runs of ASCII text. Which is the intuition all the pragmatists start with -- it's correct. > OK, so maybe I do feel somewhat insulted... I'm sorry you feel that way. (I've sided with the pragmatists in this thread, but on this issue I'm a purist at heart.) From sturla at molden.no Thu Feb 16 16:27:24 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 16 Feb 2012 16:27:24 +0100 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <6E24498E-EB73-46EA-9508-BC4279762B74@molden.no> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> <6E24498E-EB73-46EA-9508-BC4279762B74@molden.no> Message-ID: <4F3D205C.5020406@molden.no> On 16.02.2012 02:40, Sturla Molden wrote: > >> >> P.S. I have posted a possible implementation of shm_open/shm_unlink for Windows at >> >> http://mail.python.org/pipermail/concurrency-sig/2012-February/000058.html >> >> > > A temporary file is not backed shared memory on Windows, but is a persistent file on disk. You have to mmap from the OS' paging file to get shared memory. Hmm... It seems files created with the flag FILE_ATTRIBUTE_TEMPORARY is backed by memory if possible. Though MSDN does not say if it is shared memory that can be used for IPC. A blog article on MSDN from 2004 indicates that the combination FILE_ATTRIBUTE_TEMPORARY|FILE_FLAG_DELETE_ON_CLOSE is needed. The Windows systems programming book from MS Press does not mention FILE_ATTRIBUTE_TEMPORARY for temporary files. So it seems most Windows programmers are actually creating permanent files in the temp file folder, rather than creating temporary files. So the cause for buil-up of temporary files on Windows is actually a wide-spread programming error, not the fault of the operating system. It seems tmpfile.NamedTemporaryFile will use FILE_ATTRIBUTE_TEMPORARY on Windows if called with delete=True. But is does not use FILE_FLAG_DELETE_ON_CLOSE as well, which probably is an error (particularly if the "delete" keyword argument should make sence). Sturla From ethan at stoneleaf.us Thu Feb 16 16:21:23 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 16 Feb 2012 07:21:23 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3C5DC8.707@canterbury.ac.nz> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> Message-ID: <4F3D1EF3.40203@stoneleaf.us> Greg Ewing wrote: > It seems to me that what surrogateescape is effectively doing is > creating a new data type that consists of a mixture of ASCII > characters and raw bytes, and enables you to tell which is which. How so? Sounds like this new data type assumes everything over 127 is a raw byte, but there are plenty of applications where values between 0 - 127 should be interpreted as raw bytes even when the majority are indeed just plain ascii. > Maybe there should be a real data type like this, or a flag on > the unicode type. The data would be stored in the same way as a > latin1-decoded string, but anything with the high bit set would > be regarded as a byte instead of a character. This might make it > easier to interoperate with external libraries that expect > well-formed unicode. I can see a data type that is easier to work with than bytes (ascii-string, anybody? ;) but I don't think we want to make it any kind of unicode -- once the text has been extracted from this ascii-string it should be converted to unicode for further processing, while any other non-convertible bytes should stay as bytes (or ascii-string, or whatever we call it). The above is not arguing with the 'latin-1' nor 'surrogateescape' techniques, but only commenting on a different data type with probably different uses. ~Ethan~ From p.f.moore at gmail.com Thu Feb 16 16:37:02 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 16 Feb 2012 15:37:02 +0000 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <874nuqol4o.fsf@uwakimon.sk.tsukuba.ac.jp> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <20120216040839.GA3048@ando> <874nuqol4o.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 16 February 2012 15:25, Stephen J. Turnbull wrote: >> OK, so maybe I do feel somewhat insulted... > > I'm sorry you feel that way. ?(I've sided with the pragmatists in this > thread, but on this issue I'm a purist at heart.) As I said elsewhere that was a lame attempt at a joke. My apologies. No-one has been anything but helpful in this thread, I was just reacting (a little) to the occasional characterisation I've noticed of people as "lazy" - your term "pragmatists" is much less emotive. (And it wasn't so much a personal reaction anyway, just an awareness that we need to be careful how we express things to people struggling with this) Paul. From barry at python.org Thu Feb 16 17:07:11 2012 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Feb 2012 11:07:11 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% References: <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> <16F229EE-4018-4E81-962A-8D48036F194F@gmail.com> <87aa4ko7xx.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120216110711.284001db@resist.wooz.org> On Feb 15, 2012, at 04:46 PM, Stephen J. Turnbull wrote: >[1] When this damned term is over in a few weeks, I'll take a look at >the tutorial-level docs and see if I can come up with a gentle >approach for those who are finding out for the first time that the >locale-dependent default isn't good enough for them. I really hope you do this, but note that it would be very helpful to have guidelines and recommendations even for advanced, knowledgeable Python developers. I have participated in many discussions in various forums with other Python developers where genuine differences of opinion or experience, leads to different solutions. It would be very helpful to point to a document and say "here are the best practices for your [application|library] as recommended by core Python experts in Unicode handling." Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Thu Feb 16 17:25:47 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 17 Feb 2012 01:25:47 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3D1EF3.40203@stoneleaf.us> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <4F3D1EF3.40203@stoneleaf.us> Message-ID: <8739aaoid0.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > The above is not arguing with the 'latin-1' nor 'surrogateescape' > techniques, but only commenting on a different data type with probably > different uses. But there really aren't any uses that aren't equally well dealt with by 'surrogateescape' that I can see. You have to process it code unit by code unit (just like surrogateescape) and if you find a non- character code unit, you then have an ad hoc decision to make about what to do with it. surrogateescape makes one particular treatment blazingly efficient (namely, turning the surrogate back into a byte with no known meaning). What other treatment of a byte of by-definition unknown semantics deserves the blazing efficiency that a new (presumably builtin) type could give? From shibturn at gmail.com Thu Feb 16 17:51:58 2012 From: shibturn at gmail.com (shibturn) Date: Thu, 16 Feb 2012 16:51:58 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <6E24498E-EB73-46EA-9508-BC4279762B74@molden.no> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> <6E24498E-EB73-46EA-9508-BC4279762B74@molden.no> Message-ID: On 16/02/2012 1:40am, Sturla Molden wrote: > A temporary file is not backed shared memory on Windows, but is a > persistent file on disk. You have to mmap from the OS' paging file > to get shared memory. An mmap can certainly be used as shared memory when it is backed by a real file. Or are you saying that it would work but be much slower? Also, according to this msdn blog http://blogs.msdn.com/b/larryosterman/archive/2004/04/19/116084.aspx if you open a file using FILE_ATTRIBUTE_TEMPORARY and FILE_FLAG_DELETE_ON_CLOSE the file will not be flushed to the disk unless there is memory pressure. sbt From shibturn at gmail.com Thu Feb 16 17:56:46 2012 From: shibturn at gmail.com (shibturn) Date: Thu, 16 Feb 2012 16:56:46 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <4F3D205C.5020406@molden.no> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> <6E24498E-EB73-46EA-9508-BC4279762B74@molden.no> <4F3D205C.5020406@molden.no> Message-ID: On 16/02/2012 3:27pm, Sturla Molden wrote: > Hmm... > > It seems files created with the flag FILE_ATTRIBUTE_TEMPORARY isbacked by memory if possible... I did not notice this message before I replied to your earlier one. sbt From julien at tayon.net Thu Feb 16 18:51:47 2012 From: julien at tayon.net (julien tayon) Date: Thu, 16 Feb 2012 18:51:47 +0100 Subject: [Python-ideas] Generators' and iterators' __add__ method In-Reply-To: References: <4F3C5FEF.5090508@canterbury.ac.nz> Message-ID: >> It would also clash with existing meanings of __add__ on some >> types, such as NumPy arrays. > > > Good example for Python Ideas FAQ. Well, it looks like a classical problem of disambiguisation. We have more than one consistent behaviour for addition. (I know I am dense). And these behaviours clash if not properly disambiguized. In a world were unicorns exists, we would select the behaviour of __add__ with a switch (let's say __add__ would be a dispatch table that would take the ?algebrae? as a parameter). Still in this world were unicorns exists and Perl6 would be in production (and acclaimed) this switch could be sensibily set according to the context. And everybody would be perfectly aware and taking no dangers. In the actual word, it is pretty much a chimera or a dying foetus (musical joke) in the case of python since it would violates half the tao of python : ... Explicit is better than implicit. ... Complex is better than complicated. ... Special cases aren't special enough to break the rules. ... There should be one-- and preferably only one --obvious way to do it. As a result my suggestion for the FAQ should be to state that anyone wanting to submit an idea to python community should ?import this? first and see it as deadly serious. I made the mistake twice :) PS I still see cases were having an __algebrae__ private member could be very helpful, because dynamic typing has its limit. hf, gl -- Jul From ethan at stoneleaf.us Thu Feb 16 20:34:16 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 16 Feb 2012 11:34:16 -0800 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <8739aaoid0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <4F3D1EF3.40203@stoneleaf.us> <8739aaoid0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4F3D5A38.6070901@stoneleaf.us> Stephen J. Turnbull wrote: > Ethan Furman writes: >> The above is not arguing with the 'latin-1' nor 'surrogateescape' >> techniques, but only commenting on a different data type with probably >> different uses. > > But there really aren't any uses that aren't equally well dealt with > by 'surrogateescape' that I can see. You have to process it code unit > by code unit (just like surrogateescape) and if you find a non- > character code unit, you then have an ad hoc decision to make about > what to do with it. > > surrogateescape makes one particular treatment blazingly efficient > (namely, turning the surrogate back into a byte with no known > meaning). What other treatment of a byte of by-definition unknown > semantics deserves the blazing efficiency that a new (presumably > builtin) type could give? It wasn't the 'unknown semantics' that I was responding to (latin-1 and surrogateescape deal with that just fine), but rather a new data type with a mixture of valid unicode (0-127) and raw bytes (128-255) -- I don't think that would be common enough to justify, and I can see confusion again creeping in when somebody (like myself ;) sees a datatype which seemingly supports a mixture of unicode and raw bytes only to find out that 'uni_raw(...)[5] != 32' because a u' ' was returned and an integer (or raw byte) was expected at that location. ~Ethan~ From sturla at molden.no Thu Feb 16 21:40:54 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 16 Feb 2012 21:40:54 +0100 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> <6E24498E-EB73-46EA-9508-BC4279762B74@molden.no> Message-ID: <4F3D69D6.2000809@molden.no> On 16.02.2012 17:51, shibturn wrote: > An mmap can certainly be used as shared memory when it is backed by a > real file. Or are you saying that it would work but be much slower? For FILE_ATTRIBUTE_TEMPORARY, I am not sure if the memory is shared or private. (I.e. if using it for IPC will involve disk access.) mmap can certainly be used for shared memory. Sturla From shibturn at gmail.com Thu Feb 16 22:20:55 2012 From: shibturn at gmail.com (shibturn) Date: Thu, 16 Feb 2012 21:20:55 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <4F3C6246.7000509@canterbury.ac.nz> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> <4F3C6246.7000509@canterbury.ac.nz> Message-ID: On 16/02/2012 1:56am, Greg Ewing wrote: > I don't know about Windows, but in Unix it's possible to send a > file descriptor from one process to another over a unix-domain > socket connection. So a refcounted anonymous mmap handover could > be achieved this way: > > 1. Process A creates a temp file, mmaps it and unlinks it. > 2. Process A sends the file descriptor to process B over a > unix-domain socket. > 3. Process B mmaps it. > > Even if process A closes its version of the fd right after > sending it, the OS should keep it alive while it's in transit, > I think. If the receiving process is expecting an fd then that certainly works. But making it work transparently with pickle is difficult. (multiprocessing.reduction tried making it transparent using a background thread to accept requests for fds from unpickling processes. But that functionality has been disabled.) On Windows one rather cleaner possibility is for the process pickling the handle to use DuplicateHandle() to copy the handle to the main process. Then the receiving process can copy the handle from the main process, removing it from the main process at the same time by using "dwOptions=DUPLICATE_CLOSE_SOURCE". Since the main process will not exit before its descendants, that will solve the keep-alive problem. (I have managed to produce a working example of this scheme for transfering a file handle.) sbt From shibturn at gmail.com Thu Feb 16 22:31:49 2012 From: shibturn at gmail.com (shibturn) Date: Thu, 16 Feb 2012 21:31:49 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <4F3D69D6.2000809@molden.no> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> <6E24498E-EB73-46EA-9508-BC4279762B74@molden.no> <4F3D69D6.2000809@molden.no> Message-ID: On 16/02/2012 8:40pm, Sturla Molden wrote: > For FILE_ATTRIBUTE_TEMPORARY, I am not sure if the memory is shared or > private. (I.e. if using it for IPC will involve disk access.) Even if it is backed by a perfectly normal file, using an mmap for IPC does not require disk access if the relevant pages have not been evicted from memory. FILE_ATTRIBUTE_TEMPORARY only affects how eager the system is to flush modified data to the disk. sbt From mbarkhau at googlemail.com Fri Feb 17 01:06:52 2012 From: mbarkhau at googlemail.com (Manuel Barkhau) Date: Fri, 17 Feb 2012 01:06:52 +0100 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal Message-ID: Hi everybody, I'd like to suggest adopting something similar to the ScopeGuardStatement from the D programming language. A description of the D version can be found here: http://d.digitalmars.com/2.0/statement.html#ScopeGuardStatement It is also similar to golangs "defer" statement: http://golang.org/doc/go_spec.html#Defer_statements So these are roughly equivelent: defer {block} // in golang scope(exit) {block} // in D I have written a context manager that approximates the behavior of the scope statement in D: http://ideone.com/vNmq8 The use of lambdas or nested functions doesn't look very nice however. So, on to the proposal. I think the "defer" keyword is more appropriate than "scope" and the function like syntax of "scope(exit)" doesn't fit with the overall python syntax. There are three ways to define a defer block. - "defer: BLOCK", which is the same as "defer {BLOCK} in golang or "scope(exit) {BLOCK}" in D. - "defer EXPR as VAR: BLOCK", which is similar to "scope(failure)". It differs in that it specifies the exception that caused the failure and is only called for matching exceptions. - "defer EXPR: BLOCK else: BLOCK", where the else BLOCK is executed when no exception occurs. This is similar to "scope(success)" and the existing except: else: construct in python. As "defer:" is currently invalid syntax, there shouldn't be any code breakage from adding the new keyword. Some rules: - Deferred blocks are executed in the reverse lexical order in which they appear. - If a function returns before reaching a defer statement, it will not be executed. - If a defer block raises an error, a lexically earlier defer block may catch it. - If multiple defer blocks raise errors or return results, the raise or return of the lexically earlier defer will mask the previous result or error. Some example code: >>> def ordering_example(): ... print(1) ... defer: print(2) ... defer: print(3) ... print(4) ... >>> ordering_example() 1 4 3 2 Handling exceptions: >>> def defer_example(): ... # setup ... defer: # always executed ... # cleanup ... defer Exception as e: # executed if exception is raised ... # handle exception ... else: # executed if no exception is raised ... # success code ... ... # your usual code ... # possibly raise exception Equivalent using try/except/finally: >>> def try_example(): ... # setup ... try: ... # your usual code ... # possibly raise exception ... except Exception as e: ... # handle exception ... else: ... # success code ... finally: ... # cleanup The nesting advantage becomes more apparent when more are required. Here is an example from http://www.doughellmann.com/articles/how-tos/python-exception-handling/index.html #!/usr/bin/env python import sys import traceback def throws(): raise RuntimeError('error from throws') def cleanup(): raise RuntimeError('error from cleanup') def nested(): try: throws() except Exception as original_error: try: raise finally: try: cleanup() except: pass # ignore errors in cleanup def main(): try: nested() return 0 except Exception as err: traceback.print_exc() return 1 if __name__ == '__main__': sys.exit(main()) Here are the equivalent of main and nested functions using defer: def nested(): defer RuntimeError: pass # ignore errors in cleanup defer: cleanup() throws() def main(): defer Exception as err: traceback.print_exc() return 1 else: return 0 nested() Notice that we don't even need "defer Exception as original_error: raise" after "defer: cleanup()" in order to preserve the stack trace. It will go up the call stack, so long as no defer handles it or masks it with another exception. This proposal would probably have had a better chance before the introduction of the "with" statement, but I still think it may be useful in cases where you don't want to write a context manager. Context managers may also not have access to the scope they are used in, which may be inconvenient in some cases. For code where try/except/finally would otherwise be required, I think the advantages make this proposal at least worth considering. You don't need to nest your normal code in a try block and you can place error handling code together with relevant sections, rather than further down in an except block. I'm sure there is much I have overlooked, possibly this is technically difficult and of course there is the minor task of implementation. But other than that what do you think? Manuel From tjreedy at udel.edu Fri Feb 17 02:18:07 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 16 Feb 2012 20:18:07 -0500 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <20120216040839.GA3048@ando> Message-ID: On 2/16/2012 7:59 AM, Paul Moore wrote: > Add to this the fact that I *know* I've seen supposed text files with > mixed encoding content, and no-one has *ever* explained how to handle > that (it's basically a damaged file, Before unicode, mixed encodings was the only was to have multi-lingual digital text (with multiple symbol systems) in one file. I presume such texts used some sort of language markup like , (or ), and , along with software that understood the markup. Such files were not broken, just the pre-unicode system of different codes for each language or nation. To handle such a file, the program, whatever the language, has to understand the custom markup, segment the bytes, and handle each segment appropriately. Crazy text that switches among unknown encodings without notice is a possibly unsolvable decryption problem. Such have no guaranteed algorithms, only heuristics. -- Terry Jan Reedy From steve at pearwood.info Fri Feb 17 02:25:41 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 17 Feb 2012 12:25:41 +1100 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: <4F3DAC95.8050804@pearwood.info> Manuel Barkhau wrote: > As "defer:" is currently invalid syntax, there shouldn't be any code > breakage from adding the new keyword. Of course there will be. Every new keyword will break code that uses that word as a regular name: defer = True instance.defer = None Both of which will become a SyntaxError if defer becomes a keyword. It's not even like "defer" is an uncommon word unlikely to be used anywhere. (Although I can't find any examples of it in the standard library.) > Some rules: > - Deferred blocks are executed in the reverse lexical order in which > they appear. Why in reverse order? This is unintuitive. If you write: def func(): defer: print(1) defer: print(2) defer: print(3) do_stuff() return the output will be 3 2 1 Is this a deliberate design choice, or an accident of implementation that D and Go have followed? If it is deliberate, what is the rationale for it? [...] > The nesting advantage becomes more apparent when more are required. Here > is an example from I disagree. Nesting is an advantage, the use of defer which eliminates that nesting is a MAJOR disadvantage of the concept. You seem to believe that nesting is a problem to be worked around. I call it a feature to be encouraged. With try...except/finally, the structure of which blocks are called, and when, is directly reflected in the nesting and indentation. With defer, that structure is gone. The reader has to try to recreate the execution order in their head. That is an enormous negative. > http://www.doughellmann.com/articles/how-tos/python-exception-handling/index.html > > #!/usr/bin/env python > > import sys > import traceback > > def throws(): > raise RuntimeError('error from throws') > > def cleanup(): > raise RuntimeError('error from cleanup') > > def nested(): > try: > throws() > except Exception as original_error: > try: > raise > finally: > try: > cleanup() > except: > pass # ignore errors in cleanup I don't understand the point of that example. Wouldn't it be better written as this? def nested(): try: throws() finally: try: cleanup() except: pass As far as I can tell, my version gives the same behaviour as yours: py> main() Traceback (most recent call last): File "", line 3, in main File "", line 3, in nested File "", line 2, in throws RuntimeError: error from throws 1 (Tested in Python 2.5 with the obvious syntax changes.) [...] > Here are the equivalent of main and nested functions using defer: > > def nested(): > defer RuntimeError: pass # ignore errors in cleanup > defer: cleanup() > throws() How is the reader supposed to know that pass will ignore errors in cleanup, and nothing else, without the comment? Imagine that the first defer line and the second are separated by a bunch of code: def nested(): defer RuntimeError: pass do_this() do_that() do_something_else() if flag: return if condition: defer: something() defer: cleanup() throws() What is there to connect the first defer to the cleanup now? It seems to me that defer would let you write spaghetti code in a way which is really difficult (if not impossible) with try blocks. When considering a proposal, we should consider how it will be abused as well as how it will be used. -- Steven From ncoghlan at gmail.com Fri Feb 17 02:33:59 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 17 Feb 2012 11:33:59 +1000 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: On Fri, Feb 17, 2012 at 10:06 AM, Manuel Barkhau wrote: > It is also similar to golangs "defer" statement: > http://golang.org/doc/go_spec.html#Defer_statements Since there have been a few proposals along these lines recently: Nothing is going to happen on the dedicated syntax front in the deferred execution space at least until I get contextlib.CallbackStack into Python 3.3 and we gather additional feedback on patterns of use (and, assuming that API addresses the relevant use cases the way I plan, these features will *never* need dedicated syntax). A preliminary version of the API is available in the contextlib2 backport as ContextStack: http://contextlib2.readthedocs.org/en/latest/index.html#contextlib2.ContextStack See the issue tracker for the changes that are planned in order to update that to the new CallbackStack API: https://bitbucket.org/ncoghlan/contextlib2/issue/8/rename-contextstack-to-callbackstack-and > Some example code: > >>>> def ordering_example(): > ... ? ? print(1) > ... ? ? defer: print(2) > ... ? ? defer: print(3) > ... ? ? print(4) With ContextStack: def ordering_example(): with ContextStack() as stack: print(1) stack.register(print, 2) stack.register(print, 3) print(4) With the planned CallbackStack API: def ordering_example(): with CallbackStack() as stack: print(1) stack.push(print, 2) stack.push(print, 3) print(4) >>>> ordering_example() > 1 > 4 > 3 > 2 Same output. > The nesting advantage becomes more apparent when more are required. Here > is an example from > http://www.doughellmann.com/articles/how-tos/python-exception-handling/index.html > > ? ?#!/usr/bin/env python > > ? ?import sys > ? ?import traceback > > ? ?def throws(): > ? ? ? ?raise RuntimeError('error from throws') > > ? ?def cleanup(): > ? ? ? ?raise RuntimeError('error from cleanup') > > ? ?def nested(): > ? ? ? ?try: > ? ? ? ? ? ?throws() > ? ? ? ?except Exception as original_error: > ? ? ? ? ? ?try: > ? ? ? ? ? ? ? ?raise > ? ? ? ? ? ?finally: > ? ? ? ? ? ? ? ?try: > ? ? ? ? ? ? ? ? ? ?cleanup() > ? ? ? ? ? ? ? ?except: > ? ? ? ? ? ? ? ? ? ?pass # ignore errors in cleanup Huh? That's a bizarre way to write it. A more sane equivalent would be def nested(): try: throws() except BaseException: try: cleanup() except: pass raise >>> def throws(): ... 1/0 ... >>> def nested(): ... try: ... throws() ... except BaseException: ... try: ... raise Exception ... except: ... pass ... raise ... >>> nested() Traceback (most recent call last): File "", line 1, in File "", line 3, in nested File "", line 2, in throws ZeroDivisionError: division by zero However, this does raise a reasonable feature request for the planned contextlib2.CallbackStack API, so the above can be written as: def _ignore_exception(*args): return True def _cleanup_on_error(exc_type, exc_val, exc_tb): if exc_type is not None: cleanup() def nested(): with CallbackStack(callback_error=_ignore_exception) as stack: stack.push_exit(_cleanup_on_error) throws() In Python 3 though, your better bet is often going to be just to let the cleanup exception fly - the __context__ attribute means the original exception and the full stack trace will be preserved automatically. > I'm sure there is much I have overlooked, possibly this is technically > difficult and of course there is the minor task of implementation. But > other than that what do you think? I think contextlib2 and PEP 3144 cover the use cases you have presented more cleanly and without drastic syntax changes. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Fri Feb 17 03:22:44 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 17 Feb 2012 11:22:44 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <20120216110711.284001db@resist.wooz.org> References: <3A660961-784E-43BC-8EE5-EA5E71B44E5A@masklinn.net> <08E5748E-1A04-4986-A907-5D86B9C99711@masklinn.net> <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <4F3AEFAF.5060107@pearwood.info> <87ehtwolya.fsf@uwakimon.sk.tsukuba.ac.jp> <87bop0ohth.fsf@uwakimon.sk.tsukuba.ac.jp> <16F229EE-4018-4E81-962A-8D48036F194F@gmail.com> <87aa4ko7xx.fsf@uwakimon.sk.tsukuba.ac.jp> <20120216110711.284001db@resist.wooz.org> Message-ID: <871upunqq3.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > I really hope you do this, but note that it would be very helpful to have > guidelines and recommendations even for advanced, knowledgeable Python > developers. > I have participated in many discussions in various forums with > other Python developers where genuine differences of opinion or > experience, leads to different solutions. It would be very helpful > to point to a document and say "here are the best practices for > your [application|library] as recommended by core Python experts in > Unicode handling." I'll see what I can do, but for *best practices* going beyond the level of Paul Moore's use case is difficult for the reasons elaborated elsewhere (by others as well as myself): basic Unicode handling is no harder than ASCII handling as long as everything is Unicode. So the real answer is to insist on valid Unicode for your text I/O, failing that, text labeled *as* text *with* an encoding[1], and failing that (or failing validation of the input), reject the input.[2] If that's not acceptable -- all too often it is not -- you're in a world of pain, and the solutions are going to be ad hoc. The WSGI folks will not find the solutions proposed for email acceptable, and vice versa. Something like the format Nick proposed, where the tradeoffs are described, would be useful, I guess. But the tradeoffs have to be made ad hoc. Footnotes: [1] Of course it's OK if these are implicitly labeled by requirements or defaults of a higher-level protocol. [2] This is the Unicode party line, of course. But it's really the only generally applicable advice. From mbarkhau at googlemail.com Fri Feb 17 03:25:36 2012 From: mbarkhau at googlemail.com (Manuel Barkhau) Date: Fri, 17 Feb 2012 03:25:36 +0100 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: > Every new keyword will break code that uses that word as a regular > name: Ah, my bad. I had assumed that the addition of the with statement didn't break anything and thought the only case I needed to look at was "defer: ...". > It seems to me that defer would let you write spaghetti code in a way > which is really difficult (if not impossible) with try blocks. Sure people can write spaghetti code with this, who ever said it was appropriate for everything in the world? I also wasn't aware there were people so fond of writing try blocks, because to me they luck fugly. Rather than wrapping all my code in a try block, I would rather write the code that deals with peripheral cases in a block, and continue on with the main code. > You seem to believe that nesting is a problem to be worked around. I > call it a feature to be encouraged. Bingo, I don't like nesting too much. > Is this a deliberate design choice, or an accident of implementation > that D and Go have followed? If it is deliberate, what is the > rationale for it? Yes it's because of how they chose to do it and I kept it that way if nothing else, for familiarity. But I'm sure there is some reasoning behind it. > Huh? That's a bizarre way to write it. A more sane equivalent would be The example given by Doug is intended to preserve the original stack trace of the exception that is thrown by throws. > ? ?def nested(): > ? ? ? ?try: > ? ? ? ? ? ?throws() > ? ? ? ?except BaseException: > ? ? ? ? ? ?try: > ? ? ? ? ? ? ? ?cleanup() > ? ? ? ? ? ?except: > ? ? ? ? ? ? ? ?pass > ? ? ? ? ? ?raise This raises the exception thrown by cleanup. If you use "raise original_exception", the stack trace isn't preserved, which is what the article is about. But now that you mention it, I'm not sure the defer example I gave actually would produce the same stack trace either. Oh well, context managers it is then I guess. Thanks for the references Nick. Manuel From stephen at xemacs.org Fri Feb 17 04:42:25 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 17 Feb 2012 12:42:25 +0900 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <20120216040839.GA3048@ando> Message-ID: <87zkcim8gu.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > Before unicode, mixed encodings was the only was to have multi-lingual > digital text (with multiple symbol systems) in one file. There is a long-accepted standard for doing this, ISO 2022. IIRC it's available online from ISO now, and if not, ECMA 35 is the same. The X Compound Text standard (I think this is documented in the ICCCM) and the Motif Compound String are profiles of ISO 2022. If that is what Paul is seeing, then the iso-2022-jp codec might be good enough to decode the files he has, depending on which version of ISO-2022-JP is implemented. If not, iconv -f ISO-2022-JP-2 (or ISO-2022-JP-3) should work (at least for GNU's iconv implementation). > I presume such texts used some sort of language markup like > , (or ), and , along with > software that understood the markup. They would use encoding "markup" (specifically escape sequences). Language is not enough, as all languages have had multiple encodings since the invention of ASCII (or EBCDIC, whichever came second ;-), and in many cases multilingual standards have evolved (Japanese, for example, includes Greek and Cyrillic alphabets in its JIS standard coded character set). More recently, many languages have several ISO 2022-based encodings (the ISO 8859 family is a conformant profile of ISO 2022, as are the EUC encodings for Asian languages; the Windows 125x code pages are non-conformant extensions of ASCII based on ISO 8859). > Crazy text that switches among unknown encodings without notice is a > possibly unsolvable decryption problem. True, and occasionally seen even today in Japan (cat(1) will produce such files easily, and any system for including files). From sven at marnach.net Fri Feb 17 18:14:01 2012 From: sven at marnach.net (Sven Marnach) Date: Fri, 17 Feb 2012 17:14:01 +0000 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: <4F3DAC95.8050804@pearwood.info> References: <4F3DAC95.8050804@pearwood.info> Message-ID: <20120217171401.GA3406@pantoffel-wg.de> Steven D'Aprano schrieb am Fr, 17. Feb 2012, um 12:25:41 +1100: > Why in reverse order? This is unintuitive. > [...] > Is this a deliberate design choice, or an accident of implementation > that D and Go have followed? If it is deliberate, what is the > rationale for it? Basically any cleanup mechanism I know of does the cleanups in the reverse order as the initialisations, be it destructor calls in C++, defer handlers in Go or nested 'with' statements in Python. Since the later initialised objects might depend on the previously defined objects, this is also the only sane choice. > With defer, that structure is gone. The reader has to try to > recreate the execution order in their head. That is an enormous > negative. I think "defer" has some definite advantages over try/except as far as readability is concerned. It places the cleanup code at the position the necessity for the cleanup occurs, and not way down in the code. Python's "with" statement does a similar thing, but it gets difficult to handle as soon as you try to *conditionally* add a cleanup handler -- we had this discussion before, and it lead to Nick's contextlib2. Cheers, Sven From dreamingforward at gmail.com Fri Feb 17 22:57:36 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Fri, 17 Feb 2012 14:57:36 -0700 Subject: [Python-ideas] doctest Message-ID: I find myself wanting to use doctest for some test-driven development, and find myself slightly frustrated and wonder if others would be interested in seeing the following additional functionality in doctest: 1. Execution context determined by outer-scope doctest defintions. 2. Smart Comparisons that will detect output of a non-ordered type (dict/set), lift and recast it and do a real comparison. Without #1, "literate testing" becomes awash with re-defining re-used variables which, generally, also detracts from exact purpose of the test -- this creates testdoc noise and the docs become less useful. Without #2, "readable docs" nicely co-aligning with "testable docs" tends towards divergence. Perhaps not enough developers use doctest to care, but I find it one of the more enjoyable ways to develop python code -- I don't have to remember test cases nor go through the trouble of setting up unittests. AND, it encourages agile development. Another user wrote a while back of even having a built-in test() method. Wouldn't that really encourage agile developement? And you wouldn't have to muddy up your code with "if __name__ == "__main__": import doctest, yadda yadda". Anyway... of course patches welcome, yes... ;^) mark From ncoghlan at gmail.com Fri Feb 17 23:12:15 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Feb 2012 08:12:15 +1000 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen wrote: > Anyway... of course patches welcome, yes... ?;^) Not really. doctest is for *testing code example in docs*. If you try to use it for more than that, it's likely to drive you up the wall, so proposals to make it more than it is usually don't get a great reception (docs patches to make it's limitations clearer are generally welcome, though). The stdib solution for test driven development is unittest (the vast majority of our own regression suite is written that way - only a small proportion uses doctest). An interesting third party alternative that has been created recently is behave: http://crate.io/packages/behave/ Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From nathan.alexander.rice at gmail.com Fri Feb 17 23:16:47 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Fri, 17 Feb 2012 17:16:47 -0500 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: > Since there have been a few proposals along these lines recently: > > Nothing is going to happen on the dedicated syntax front in the > deferred execution space at least until I get contextlib.CallbackStack > into Python 3.3 and we gather additional feedback on patterns of use > (and, assuming that API addresses the relevant use cases the way I > plan, these features will *never* need dedicated syntax). > > A preliminary version of the API is available in the contextlib2 > backport as ContextStack: > http://contextlib2.readthedocs.org/en/latest/index.html#contextlib2.ContextStack > > See the issue tracker for the changes that are planned in order to > update that to the new CallbackStack API: > https://bitbucket.org/ncoghlan/contextlib2/issue/8/rename-contextstack-to-callbackstack-and > >> ... snip ... > > With ContextStack: > > def ordering_example(): > ? ?with ContextStack() as stack: > ? ? ? ?print(1) > ? ? ? ?stack.register(print, 2) > ? ? ? ?stack.register(print, 3) > ? ? ? ?print(4) > > > With the planned CallbackStack API: > > def ordering_example(): > ? ?with CallbackStack() as stack: > ? ? ? ?print(1) > ? ? ? ?stack.push(print, 2) > ? ? ? ?stack.push(print, 3) > ? ? ? ?print(4) Hi Nick, I just wanted to chime in on this, because I understand the use cases and benefits of this but the code is very semantically opaque and imperative. I also feel like a lot of C programming concepts and semantics have leaked into the design. Additionally, I feel that there are some benefits to taking a step back and looking at this problem as part of a bigger picture. Fundamentally, context managers are ways to convert a block of code into an event, with __enter__ analogous to "before_block" and __exit__ analogous to an "after_block". There are a couple of problems with context managers that I feel an event system handles more elegantly: 1.) Context is ambiguous. Context could be interpreted to mean a thread, a scope, a point in time, etc. Context managers only deal with the narrow problem of a block of code being run. This is succinctly described as an event. 2.) The context manager API requires you to fire events before and after the code block (yes, you can pass) and does not provide other options, such as (in an ideal world of python with well behaved threads) an event that is fired/in the active state concurrent to the block of code's execution. There are a few ways to hack this behavior but they're all bad, and interoperability between libraries is unlikely. 3.) If you want to extend context management for a particular piece of code, you have to modify the code to add another context manager, or monkey patch the existing context manager. Modifying the code is has some thorny issues, for instance, if you need to modify the context handling in a third party lib, all of a sudden you have to fork the lib and manually patch every time you upgrade or redeploy. Monkey patching is easy, but from a conceptual/readability perspective it is horrible. If the lib fires events, you can just register an action on the event in your code and live happily ever after. 4.) The way context managers are defined only allows you describe a linear chain of events, because they are associated with a block of code, and the act of association precludes other context managers from firing events for that same block of code. Because of this, you have things like register and preserve that exist to add support for (weak) non-linearity. 5.) Going back to event concurrency and touching on non-linearity again, if I have two functions that I've asked to fire when an event occurs, this provides a strong clue to the interpreter that the given functions could potentially run in parallel. Of course, there would need to be other cues, but I don't think people want to be in the business of explicitly writing parallel code forever. 6.) User interface coders going back 30 years understand events pretty well, but will probably give you a blank stare for a second or two if you mention context managers. I feel that "with"/context managers are an elegant solution to the simple problem, however it seems like the generalized solution based on context managers is pretty awkward. The right thing to do in my opinion would be to go back to the drawing board, design an event subsystem that maps to something like pi-calculus/interval temporal logic in a human/pythonic way. This will avoid the immediate issues like the necessary goofiness of contextlib2, and lay the groundwork for nice things like automatic parallelization. Of course, I'm anal about getting things 100% right, and context managers are very nice, simple, elegant 80% solution. If 99% of people are happy with the 80% solution, it is probably the right thing to do just to force a ugly hack on the remaining 1%. Take care, Nathan From christopherreay at gmail.com Fri Feb 17 23:31:38 2012 From: christopherreay at gmail.com (Christopher Reay) Date: Sat, 18 Feb 2012 00:31:38 +0200 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Sat Feb 18 01:43:00 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 17 Feb 2012 19:43:00 -0500 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Fri, Feb 17, 2012 at 4:57 PM, Mark Janssen wrote: > I find myself wanting to use doctest for some test-driven development, > and find myself slightly frustrated and wonder if others would be > interested in seeing the following additional functionality in > doctest: > > 1. Execution context determined by outer-scope doctest defintions. I'm not sure what you mean, but it might be relevant that Sphinx lets you define multiple scopes for doctests. I feel like its approach is the right one, but it isn't reusable in Python docstrings. That said, I think users of doctest have moved away from embedded doctests in docstrings -- it encourages doctests to have way too many "examples" (test cases), which reduces their usefulness as documentation. > 2. Smart Comparisons that will detect output of a non-ordered type > (dict/set), lift and recast it and do a real comparison. I think it's better to just always use ast.literal_eval on the output as another form of testing for equivalence. This could break code, but probably not any code worth caring about. (in particular, >>> print 'r""' "" would pass in a literal_eval-ing system, but not in some other system) > Without #1, "literate testing" becomes awash with re-defining re-used > variables which, generally, also detracts from exact purpose of the > test -- this creates testdoc noise and the docs become less useful. > Without #2, "readable docs" nicely co-aligning with "testable docs" > tends towards divergence. > > Perhaps not enough developers use doctest to care, but I find it one > of the more enjoyable ways to develop python code -- I don't have to > remember test cases nor go through the trouble of setting up > unittests. ? AND, it encourages agile development. ?Another user wrote > a while back of even having a built-in test() method. ?Wouldn't that > really encourage agile developement? ?And you wouldn't have to muddy > up your code with "if __name__ == "__main__": import doctest, yadda > yadda". > > Anyway... of course patches welcome, yes... ?;^) Not exactly... doctest has no maintainer, and so no patches ever get accepted. If you want to improve it, you'll have to fork it. I hope you're that sort of person, because doctest can totally be improved. It suffers a lot from people thinking of what it is rather than what it could be. :( I've in the past worked a bit on improving doctest in a fork I started. Its primary purpose was originally to add Cram-like "shell doctests" to doctest (see http://pypi.python.org/pypi/cram ), but since then I started working on other bits here and there. The work I've done is available at https://bitbucket.org/devin.jeanpierre/doctest2 (please forgive the presumptuous name -- I'm considering a rename to "lembas".) The reason I've not worked on it recently is that the problems have gotten harder and my time has run short. I would be very open to collaboration or forking, although I also understand that a largeish expansion with redesigned internals created by an overworked student is probably not the greatest place to start. This is all assuming your intentions are to contribute rather than only suggest. Not that suggestions aren't welcome, I suppose, but maybe not here. doctest is not actively developed or maintained anywhere, as far as I know. (I want to say "except by me", because that'd make me seem all special and so on, but I haven't committed a thing in months.) Mostly, I feel a bit like this thread could accidentally spawn parallel / duplicated work, so I figured I'd put what I have out here. Please don't take it for more than it is, doctest2 is still a work in progress (and, worse, its source code is in the middle of two feature additions!) I definitely hope you help to make the doctest world better. I think it fills a role that should be filled, and its neglect is unfortunate. -- Devin From ianb at colorstudy.com Sat Feb 18 05:24:10 2012 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 17 Feb 2012 22:24:10 -0600 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Feb 17, 2012 4:12 PM, "Nick Coghlan" wrote: > > On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen wrote: > > Anyway... of course patches welcome, yes... ;^) > > Not really. doctest is for *testing code example in docs*. If you try > to use it for more than that, it's likely to drive you up the wall, so > proposals to make it more than it is usually don't get a great > reception (docs patches to make it's limitations clearer are generally > welcome, though). The stdib solution for test driven development is > unittest (the vast majority of our own regression suite is written > that way - only a small proportion uses doctest). This pessimistic attitude is why doctest is challenging to work with at times, not anything to do with doctest's actual model. The constant criticisms of doctest keep contributors away, and keep its many resolvable problems from being resolved. > An interesting third party alternative that has been created recently > is behave: http://crate.io/packages/behave/ This style of test is why it's so sad that doctest is ignored and unmaintained. It's based on testing patterns developed by people who care to promote what they are doing, but I'm of the strong opinion that they are inferior to doctest. Ian -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Sat Feb 18 05:25:02 2012 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 17 Feb 2012 22:25:02 -0600 Subject: [Python-ideas] Fwd: Re: doctest In-Reply-To: References: Message-ID: On Feb 17, 2012 3:58 PM, "Mark Janssen" wrote: > I find myself wanting to use doctest for some test-driven development, > and find myself slightly frustrated and wonder if others would be > interested in seeing the following additional functionality in > doctest: > > 1. Execution context determined by outer-scope doctest defintions > 2. Smart Comparisons that will detect output of a non-ordered type > (dict/set), lift and recast it and do a real comparison. > > Without #1, "literate testing" becomes awash with re-defining re-used > variables which, generally, also detracts from exact purpose of the > test -- this creates testdoc noise and the docs become less useful. I dunno... I find the discipline of defining your prerequesites to be a helpful feature of doctest (I find TestCase.setUp to be smelly). You can include a namespace in doctest invocations, but I'm guessing the problem is that you aren't able to give these settings when using some kind of test collector/runner? More flexible ways of defining doctest options (e.g., ELLIPSIS) would be helpful. > Without #2, "readable docs" nicely co-aligning with "testable docs" > tends towards divergence. IMHO this could be more easily solved by replacing the standard repr with one that is more predictable. At least that would handle dictionaries, it becomes a bit more difficult for custom types. Also it diverges from being exactly like the console, but eh, I don't think that's a big advantage. Unfortunately plugging in a custom repr is kind of hard; Python has a way to specifically compile expressions into "print repr(expr)" (more or less) but no general way to get the value of expressions (while also handling statements). But if you wanted to try it, I did figure out a terrible hack for it. Ian -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Feb 18 05:50:58 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 18 Feb 2012 15:50:58 +1100 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: <4F3F2E32.7070907@pearwood.info> Nick Coghlan wrote: > On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen wrote: >> Anyway... of course patches welcome, yes... ;^) > > Not really. doctest is for *testing code example in docs*. If you try > to use it for more than that, it's likely to drive you up the wall, Really? Not in my experience, although I admit I haven't tried to push the envelope too far. But I haven't had any problem with a literate programming model: * Use short, self-contained but not necessarily exhaustive examples in the code's docstrings (I don't try to give examples of *every* combination of good and bad data, special cases, etc. in the docstring). * Write extensive (ideally exhaustive) examples with explanatory text, in a separate text file. I generally do this to describe, explain and test the interface, rather than the implementation, but I see no reason why it wouldn't work for the implementation as well. It would require writing for the next maintainer rather than for a user of the library. In the external test text file(s), examples don't necessarily need to be self-contained. I have an entire document to create a test environment, if necessary, and can include extra functions, stubs, mocks, etc. as needed, without clashing with the primary purpose of docstrings to be *documentation* first and tests a distant second. If need be, test infrastructure can go into an external module, to be imported, rather than in-place in the doctest file. In my experience, this works well for algorithmic code that doesn't rely on external resources. If my tests require setting up and tearing down resources, I stick to unittest which has better setup/teardown support. (It would be hard to have *less* support for setup and teardown than doctest.) But otherwise, I haven't run into any problems with doctest other than the perennial "oops, I forgot to escape my backslashes!". -- Steven From guido at python.org Sat Feb 18 05:55:48 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 17 Feb 2012 20:55:48 -0800 Subject: [Python-ideas] Fwd: Re: doctest In-Reply-To: References: Message-ID: On Fri, Feb 17, 2012 at 8:25 PM, Ian Bicking wrote: > On Feb 17, 2012 3:58 PM, "Mark Janssen" wrote: >> I find myself wanting to use doctest for some test-driven development, >> and find myself slightly frustrated and wonder if others would be >> interested in seeing the following additional functionality in >> doctest: >> >> 1. Execution context determined by outer-scope doctest defintions >> 2. Smart Comparisons that will detect output of a non-ordered type >> (dict/set), lift and recast it and do a real comparison. >> >> Without #1, "literate testing" becomes awash with re-defining re-used >> variables which, generally, also detracts from exact purpose of the >> test -- this creates testdoc noise and the docs become less useful. > > I dunno... I find the discipline of defining your prerequesites to be a > helpful feature of doctest (I find TestCase.setUp to be smelly).? You can > include a namespace in doctest invocations, but I'm guessing the problem is > that you aren't able to give these settings when using some kind of test > collector/runner?? More flexible ways of defining doctest options (e.g., > ELLIPSIS) would be helpful. > >> Without #2, "readable docs" nicely co-aligning with "testable docs" >> tends towards divergence. > > IMHO this could be more easily solved by replacing the standard repr with > one that is more predictable.? At least that would handle dictionaries, it > becomes a bit more difficult for custom types.? Also it diverges from being > exactly like the console, but eh, I don't think that's a big advantage. > > Unfortunately plugging in a custom repr is kind of hard; Python has a way to > specifically compile expressions into "print repr(expr)" (more or less) but > no general way to get the value of expressions (while also handling > statements).? But if you wanted to try it, I did figure out a terrible hack > for it. Isn't sys.displayhook() usable for this purpose? -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Sat Feb 18 06:08:16 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 18 Feb 2012 16:08:16 +1100 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: <4F3F3240.4090104@pearwood.info> Mark Janssen wrote: > I find myself wanting to use doctest for some test-driven development, > and find myself slightly frustrated and wonder if others would be > interested in seeing the following additional functionality in > doctest: > > 1. Execution context determined by outer-scope doctest defintions. Can you give an example of how you would like this to work? > 2. Smart Comparisons that will detect output of a non-ordered type > (dict/set), lift and recast it and do a real comparison. I would love to see a doctest directive that accepted differences in output order, e.g. would match {1, 2, 3} and {3, 1, 2}. But I think that's a hard problem to solve in the general case. Should it match 123 and 312? I don't think so. Just coming up with a clear and detailed set of requirements for (e.g.) #doctest:+IGNORE_ORDER may be tricky. I'd like a #3 as well: an abbreviated way to spell doctest directives, because they invariably push my tests well past the 80 character mark. -- Steven From ncoghlan at gmail.com Sat Feb 18 16:30:58 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Feb 2012 01:30:58 +1000 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 8:16 AM, Nathan Rice wrote: > I just wanted to chime in on this, because I understand the use cases > and benefits of this but the code is very semantically opaque and > imperative. ?I also feel like a lot of C programming concepts and > semantics have leaked into the design. ?Additionally, I feel that > there are some benefits to taking a step back and looking at this > problem as part of a bigger picture. So... context managers are not a good fit for general event handling. Correct. Given that I agree with your basic point, I'm not sure what the rest of that had to do with anything, unless you heard the word "callback" and immediately assumed I was talking about general event handling rather than Go defer'ed style cleanup APIs (along with a replacement for the bug-prone, irredeemably flawed contextlib.nested API). I'm not - what I'm planning would be a terrible API for general event handling. Fortunately, it's just a replacement for contextlib.nested() as a tool for programmatic management of context managers. If you want nice clean callbacks for general event handling, Python doesn't currently provide that. (We certainly don't have anything that gets remotely close to the elegance of Ruby's blocks for that style of programming: http://www.boredomandlaziness.org/2011/10/correcting-ignorance-learning-bit-about.html) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From anacrolix at gmail.com Sat Feb 18 16:38:06 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sat, 18 Feb 2012 23:38:06 +0800 Subject: [Python-ideas] channel (synchronous queue) Message-ID: Recently (for some) the CSP style of channel has become quite popular in concurrency implementations. This kind of channel allows sends that do not complete until a receiver has actually taken the item. The existing queue.Queue would act like this if it didn't treat a queue size of 0 as infinite capacity. In particular, I find channels to have value when sending data between threads, where it doesn't make sense to proceed until some current item has been accepted. This is useful when items are not purely CPU bound, and so generators are not appropriate. I believe this rendezvous behaviour can be added to queue.Queue for the maxsize=0 case, with maxsize=None being the existing "infinite queue" behaviour. Additionally a close method, Closed exception and other usability features like an __iter__ for receiving until closed can be added. The stackless class linked below also has some other possible ideas for performance reasons that make a lot of sense. Existing code using queue.Queue would remain completely unaffected by such additions if the default maxsize value is changed to maxsize=None, and maxsize=0 is not being explicitly passed (it's currently the default). Here are a few links for some background and ideas: http://gevent.org/gevent.queue.html#gevent.queue.Queue http://www.disinterest.org/resource/stackless/2.6-docs-html/library/stackless/channels.html#the-channel-class http://en.wikipedia.org/wiki/Communicating_sequential_processes#Comparison_with_the_Actor_Model http://golang.org/doc/go_spec.html#Channel_types From arnodel at gmail.com Sat Feb 18 19:57:55 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Sat, 18 Feb 2012 18:57:55 +0000 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: Message-ID: On 18 February 2012 15:38, Matt Joiner wrote: > Recently (for some) the CSP style of channel has become quite popular > in concurrency implementations. This kind of channel allows sends that > do not complete until a receiver has actually taken the item. The > existing ?queue.Queue would act like this if it didn't treat a queue > size of 0 as infinite capacity. I don't know if that's exactly what you have in mind, but you can implement a channel very simply with a threading.Barrier object (new in Python 3.2). I'm no specialist of concurrency at all, but it seems that this is what you are describing (what in the go language is called a "synchronous channel" I think): from threading import Barrier class Channel: def __init__(self): self._sync = Barrier(2) self._values = [None, None] def send(self, value=None): i = self._sync.wait() self._values[i] = value self._sync.wait() return self._values[1 - i] def get(self): return self.send() Then with the following convenience function to start a function in a new thread: from threading import Thread def go(f, *args, **kwargs): thread = Thread(target=f, args=args, kwargs=kwargs) thread.start() return thread You can have e.g. the scenario: ch = Channel() def produce(ch): for i in count(): print("sending", i) ch.send(i) def consume(ch, n): for i in range(n): print("getting", ch.get()) Giving you this: >>> go(produce, ch) sending 0 >>> go(consume, ch, 3) getting 0 sending 1 getting 1 sending 2 getting 2 sending 3 >>> go(consume, ch, 5) getting 3 sending 4 getting 4 sending 5 getting 5 sending 6 getting 6 sending 7 getting 7 sending 8 >>> -- Arnaud From solipsis at pitrou.net Sat Feb 18 20:01:08 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 18 Feb 2012 20:01:08 +0100 Subject: [Python-ideas] channel (synchronous queue) References: Message-ID: <20120218200108.2b72ab9f@pitrou.net> On Sat, 18 Feb 2012 23:38:06 +0800 Matt Joiner wrote: > Recently (for some) the CSP style of channel has become quite popular > in concurrency implementations. This kind of channel allows sends that > do not complete until a receiver has actually taken the item. The > existing queue.Queue would act like this if it didn't treat a queue > size of 0 as infinite capacity. > > In particular, I find channels to have value when sending data between > threads, where it doesn't make sense to proceed until some current > item has been accepted. This is useful when items are not purely CPU > bound, and so generators are not appropriate. What is the point to process the data in another thread, if you are going to block on the result anyway? Antoine. From nathan.alexander.rice at gmail.com Sat Feb 18 23:33:16 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Sat, 18 Feb 2012 17:33:16 -0500 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 10:30 AM, Nick Coghlan wrote: > On Sat, Feb 18, 2012 at 8:16 AM, Nathan Rice > wrote: >> I just wanted to chime in on this, because I understand the use cases >> and benefits of this but the code is very semantically opaque and >> imperative. ?I also feel like a lot of C programming concepts and >> semantics have leaked into the design. ?Additionally, I feel that >> there are some benefits to taking a step back and looking at this >> problem as part of a bigger picture. > > So... context managers are not a good fit for general event handling. Correct. > > Given that I agree with your basic point, I'm not sure what the rest > of that had to do with anything, unless you heard the word "callback" > and immediately assumed I was talking about general event handling > rather than Go defer'ed style cleanup APIs (along with a replacement > for the bug-prone, irredeemably flawed contextlib.nested API). My point was more that I feel like you're hitting a point where the context manager as a programming and semantic construct is starting to stretch pretty thin. My gut feeling is that it might be more productive to let context managers alone (I think they're in an okay place with multiple managers in a single with statement) and start to examine the larger class of problems of which the deferred cleanup is a member. Events can unify a lot of concepts in python, while providing a much more elegant handle into third party code than is currently possible. For example... Decorators, descriptors and exceptions can all be unified neatly as events, and events let you reach into 3rd party code in a robust manner. I can't tell you the number of times I have had to subclass multiple things from a third party library to fix a small, unnecessarily limiting design decision. I've even run into this with authors who make very elegant libraries like Armin; nobody can predict all the use cases for their code. The best thing we can do is make it easy to work around such problems. I like the with statement in general, but if python is ever going to embrace events, the farther you travel along this path the more painful switching over is going to be down the line. > I'm not - what I'm planning would be a terrible API for general event > handling. Fortunately, it's just a replacement for contextlib.nested() > as a tool for programmatic management of context managers. If you want > nice clean callbacks for general event handling, Python doesn't > currently provide that. (We certainly don't have anything that gets > remotely close to the elegance of Ruby's blocks for that style of > programming: http://www.boredomandlaziness.org/2011/10/correcting-ignorance-learning-bit-about.html) I like ruby's blocks a lot. I don't think they don't drink enough of the koolaid though. Blocks can be a gateway to powerful macros (if you have first class expressions) and a mechanism for very elegant currying and partial function evaluation. I think something that is missing for me is a clear picture of where Python is going. I imagine between you, Guido, Martin, Anton, Georg and Raymond (apologies to any of the primary group I'm forgetting) there is some degree of tacit understanding. My perspective on python was framed by Peter Norvig's description of it as aspiring to be a humane reexamination of lisp, but lately I get the feeling the target would better be described as a 21st century pascal. Nathan From guido at python.org Sat Feb 18 23:47:15 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Feb 2012 14:47:15 -0800 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 2:33 PM, Nathan Rice wrote: > On Sat, Feb 18, 2012 at 10:30 AM, Nick Coghlan wrote: >> On Sat, Feb 18, 2012 at 8:16 AM, Nathan Rice >> wrote: >>> I just wanted to chime in on this, because I understand the use cases >>> and benefits of this but the code is very semantically opaque and >>> imperative. ?I also feel like a lot of C programming concepts and >>> semantics have leaked into the design. ?Additionally, I feel that >>> there are some benefits to taking a step back and looking at this >>> problem as part of a bigger picture. >> >> So... context managers are not a good fit for general event handling. Correct. >> >> Given that I agree with your basic point, I'm not sure what the rest >> of that had to do with anything, unless you heard the word "callback" >> and immediately assumed I was talking about general event handling >> rather than Go defer'ed style cleanup APIs (along with a replacement >> for the bug-prone, irredeemably flawed contextlib.nested API). > > My point was more that I feel like you're hitting a point where the > context manager as a programming and semantic construct is starting to > stretch pretty thin. ?My gut feeling is that it might be more > productive to let context managers alone (I think they're in an okay > place with multiple managers in a single with statement) and start to > examine the larger class of problems of which the deferred cleanup is > a member. ?Events can unify a lot of concepts in python, while > providing a much more elegant handle into third party code than is > currently possible. ?For example... > > Decorators, descriptors and exceptions can all be unified neatly as > events, and events let you reach into 3rd party code in a robust > manner. ?I can't tell you the number of times I have had to subclass > multiple things from a third party library to fix a small, > unnecessarily limiting design decision. ?I've even run into this with > authors who make very elegant libraries like Armin; nobody can predict > all the use cases for their code. ?The best thing we can do is make it > easy to work around such problems. > > I like the with statement in general, but if python is ever going to > embrace events, the farther you travel along this path the more > painful switching over is going to be down the line. > >> I'm not - what I'm planning would be a terrible API for general event >> handling. Fortunately, it's just a replacement for contextlib.nested() >> as a tool for programmatic management of context managers. If you want >> nice clean callbacks for general event handling, Python doesn't >> currently provide that. (We certainly don't have anything that gets >> remotely close to the elegance of Ruby's blocks for that style of >> programming: http://www.boredomandlaziness.org/2011/10/correcting-ignorance-learning-bit-about.html) > > I like ruby's blocks a lot. ?I don't think they don't drink enough of > the koolaid though. ?Blocks can be a gateway to powerful macros (if > you have first class expressions) and a mechanism for very elegant > currying and partial function evaluation. > > I think something that is missing for me is a clear picture of where > Python is going. ?I imagine between you, Guido, Martin, Anton, Georg > and Raymond (apologies to any of the primary group I'm forgetting) > there is some degree of tacit understanding. ?My perspective on python > was framed by Peter Norvig's description of it as aspiring to be a > humane reexamination of lisp, but lately I get the feeling the target > would better be described as a 21st century pascal. Was that meant as an insult? Because it sounds to me like one. -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Sat Feb 18 23:48:05 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 18 Feb 2012 22:48:05 +0000 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: On 18 February 2012 22:33, Nathan Rice wrote: > Events can unify a lot of concepts in python, while > providing a much more elegant handle into third party code than is > currently possible. You may have a point, but I find it hard to understand what you are getting at. Would you be able to propose a specific syntax/semantics to clarify what you're trying to express? (I think I get the general concept, but I can't see how you imagine it to work). Thanks, Paul From cs at zip.com.au Sun Feb 19 00:05:25 2012 From: cs at zip.com.au (Cameron Simpson) Date: Sun, 19 Feb 2012 10:05:25 +1100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <20120218200108.2b72ab9f@pitrou.net> References: <20120218200108.2b72ab9f@pitrou.net> Message-ID: <20120218230525.GA14740@cskk.homeip.net> On 18Feb2012 20:01, Antoine Pitrou wrote: | On Sat, 18 Feb 2012 23:38:06 +0800 | Matt Joiner wrote: | > Recently (for some) the CSP style of channel has become quite popular | > in concurrency implementations. This kind of channel allows sends that | > do not complete until a receiver has actually taken the item. The | > existing queue.Queue would act like this if it didn't treat a queue | > size of 0 as infinite capacity. | > | > In particular, I find channels to have value when sending data between | > threads, where it doesn't make sense to proceed until some current | > item has been accepted. This is useful when items are not purely CPU | > bound, and so generators are not appropriate. | | What is the point to process the data in another thread, if you are | going to block on the result anyway? Synchronisation. Shrug. I use synchronous channels myself; they are a fine basic facility. The problem with Queues et al is that they are inherently _asynchronous_ and you have to work hard to wrap locking around it when you want interlocking cogs. Also, it is perfectly reasonable in many circumstances to use a thread for algorithmic clarity, just like you might use a generator or a coroutine in suitable circumstances. Here one does it not so that some work may process in parallel but to cleanly write two algorithms that pass information between each other but are otherwise as separate as an aother pair of functions might be. The alternative may be a complicated interwoven event loop. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ C makes it easy for you to shoot yourself in the foot. C++ makes that harder, but when you do, it blows away your whole leg. - Bjarne Stroustrup From greg.ewing at canterbury.ac.nz Sun Feb 19 00:37:19 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 19 Feb 2012 12:37:19 +1300 Subject: [Python-ideas] Python 3000 TIOBE -3% In-Reply-To: <4F3D5A38.6070901@stoneleaf.us> References: <874nuvfhnb.fsf@uwakimon.sk.tsukuba.ac.jp> <87lio5onav.fsf@uwakimon.sk.tsukuba.ac.jp> <70089F52-E9AB-4C3D-97BA-88A5BC11B976@gmail.com> <4F3AE675.6010907@mrabarnett.plus.com> <87haytyms7.fsf@benfinney.id.au> <20120215133912.GA17040@iskra.aviel.ru> <4F3C5DC8.707@canterbury.ac.nz> <4F3D1EF3.40203@stoneleaf.us> <8739aaoid0.fsf@uwakimon.sk.tsukuba.ac.jp> <4F3D5A38.6070901@stoneleaf.us> Message-ID: <4F40362F.6060400@canterbury.ac.nz> Ethan Furman wrote: > I can see > confusion again creeping in when somebody (like myself ;) sees a > datatype which seemingly supports a mixture of unicode and raw bytes > only to find out that 'uni_raw(...)[5] != 32' because a u' ' was > returned and an integer (or raw byte) was expected at that location. I wasn't intending that an int would be returned when you index a non-char position. Indexing and slicing would always return another mixed-string object. -- Greg From steve at pearwood.info Sun Feb 19 00:52:57 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 19 Feb 2012 10:52:57 +1100 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: <4F4039D9.2010909@pearwood.info> Guido van Rossum wrote: >> I think something that is missing for me is a clear picture of where >> Python is going. I imagine between you, Guido, Martin, Anton, Georg >> and Raymond (apologies to any of the primary group I'm forgetting) >> there is some degree of tacit understanding. My perspective on python >> was framed by Peter Norvig's description of it as aspiring to be a >> humane reexamination of lisp, but lately I get the feeling the target >> would better be described as a 21st century pascal. > > Was that meant as an insult? Because it sounds to me like one. I hope not. I like Pascal. It has nice, clean syntax (if a tad verbose, with the BEGIN/END tags) and straight-forward, simple semantics. Standard Pascal is somewhat lacking (e.g. no strings) but who uses standard Pascal? Without wishing to deny the strengths of C, I think the computing world would be a lot better if C was closer to Pascal than if Pascal had been closer to C. -- Steven From greg.ewing at canterbury.ac.nz Sun Feb 19 00:56:35 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 19 Feb 2012 12:56:35 +1300 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> <4F3C6246.7000509@canterbury.ac.nz> Message-ID: <4F403AB3.3030502@canterbury.ac.nz> shibturn wrote: > If the receiving process is expecting an fd then that certainly works. > But making it work transparently with pickle is difficult. Is making it work with pickle a requirement? The point of using shared memory is to avoid the need for serialising and deserialising. -- Greg From nathan.alexander.rice at gmail.com Sun Feb 19 00:57:52 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Sat, 18 Feb 2012 18:57:52 -0500 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: > Was that meant as an insult? Because it sounds to me like one. I'm sorry if my poor wording caused it to come across that way. Pascal was a very useful language, it with a perspective that was different than its contemporaries because it was originally intended for educational purposes, rather than as an academic language like lisp or a hacker tool like c or fortran. I enjoy writing python a lot, and would prefer to use it rather than ruby/lisp/java/etc in most cases. My suggestions come from frustrations that occur when using python in areas where the right answer is probably just to use a different language. If I knew that what I wanted was at odds with the vision for python, I would have less of an issue just accepting circumstances, and would just get to work rather than sidetracking discussions on this list. Thanks, and again, sorry! Nathan From sturla at molden.no Sun Feb 19 01:19:15 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 01:19:15 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: Message-ID: <4F404003.4060403@molden.no> Den 18.02.2012 16:38, skrev Matt Joiner: > Recently (for some) the CSP style of channel has become quite popular > in concurrency implementations. This kind of channel allows sends that > do not complete until a receiver has actually taken the item. The > existing queue.Queue would act like this if it didn't treat a queue > size of 0 as infinite capacity. > > In particular, I find channels to have value when sending data between > threads, where it doesn't make sense to proceed until some current > item has been accepted. That is the most common cause of deadlock in number crunching code using MPI. Process A sends message to Process B, waits for B to receive Process B sends message to Process A, waits for A to receive ... and now we just wait ... I am really glad the queues on Python do not do this. Sturla From ron3200 at gmail.com Sun Feb 19 01:27:23 2012 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 18 Feb 2012 18:27:23 -0600 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: <1329611243.27188.4.camel@Gutsy> On Sat, 2012-02-18 at 08:12 +1000, Nick Coghlan wrote: > On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen wrote: > > Anyway... of course patches welcome, yes... ;^) > > Not really. doctest is for *testing code example in docs*. If you try > to use it for more than that, it's likely to drive you up the wall, so > proposals to make it more than it is usually don't get a great > reception (docs patches to make it's limitations clearer are generally > welcome, though). The stdib solution for test driven development is > unittest (the vast majority of our own regression suite is written > that way - only a small proportion uses doctest). I love doctest for *testing while I develop code*. Cheers, Ron From anacrolix at gmail.com Sun Feb 19 01:30:10 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 19 Feb 2012 08:30:10 +0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: Message-ID: I'm not sure that your example allows for multiple senders and receivers to block at the same time. I'm also not sure why the senders are receiving values. It's definitely an interesting approach to use a Barrier but incomplete. On Feb 19, 2012 2:57 AM, "Arnaud Delobelle" wrote: > On 18 February 2012 15:38, Matt Joiner wrote: > > Recently (for some) the CSP style of channel has become quite popular > > in concurrency implementations. This kind of channel allows sends that > > do not complete until a receiver has actually taken the item. The > > existing queue.Queue would act like this if it didn't treat a queue > > size of 0 as infinite capacity. > > I don't know if that's exactly what you have in mind, but you can > implement a channel very simply with a threading.Barrier object (new > in Python 3.2). I'm no specialist of concurrency at all, but it seems > that this is what you are describing (what in the go language is > called a "synchronous channel" I think): > > from threading import Barrier > > class Channel: > def __init__(self): > self._sync = Barrier(2) > self._values = [None, None] > def send(self, value=None): > i = self._sync.wait() > self._values[i] = value > self._sync.wait() > return self._values[1 - i] > def get(self): > return self.send() > > Then with the following convenience function to start a function in a > new thread: > > from threading import Thread > > def go(f, *args, **kwargs): > thread = Thread(target=f, args=args, kwargs=kwargs) > thread.start() > return thread > > You can have e.g. the scenario: > > ch = Channel() > > def produce(ch): > for i in count(): > print("sending", i) > ch.send(i) > > def consume(ch, n): > for i in range(n): > print("getting", ch.get()) > > Giving you this: > > >>> go(produce, ch) > sending 0 > > >>> go(consume, ch, 3) > > getting 0 > sending 1 > getting 1 > sending 2 > getting 2 > sending 3 > >>> go(consume, ch, 5) > > getting 3 > sending 4 > getting 4 > sending 5 > getting 5 > sending 6 > getting 6 > sending 7 > getting 7 > sending 8 > >>> > > -- > Arnaud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Sun Feb 19 01:31:56 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 19 Feb 2012 08:31:56 +0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <20120218200108.2b72ab9f@pitrou.net> References: <20120218200108.2b72ab9f@pitrou.net> Message-ID: Cameron explained this better than I could. On Feb 19, 2012 3:05 AM, "Antoine Pitrou" wrote: > On Sat, 18 Feb 2012 23:38:06 +0800 > Matt Joiner wrote: > > Recently (for some) the CSP style of channel has become quite popular > > in concurrency implementations. This kind of channel allows sends that > > do not complete until a receiver has actually taken the item. The > > existing queue.Queue would act like this if it didn't treat a queue > > size of 0 as infinite capacity. > > > > In particular, I find channels to have value when sending data between > > threads, where it doesn't make sense to proceed until some current > > item has been accepted. This is useful when items are not purely CPU > > bound, and so generators are not appropriate. > > What is the point to process the data in another thread, if you are > going to block on the result anyway? > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Sun Feb 19 01:39:16 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 19 Feb 2012 08:39:16 +0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F404003.4060403@molden.no> References: <4F404003.4060403@molden.no> Message-ID: Yes, channels can allow for this, but as with locks directionality and ordering matter. Typically messages will only run in a particular direction. Nor will all channels be synchronous (they're a tool, not a panacea), they might be intermixed with infinite asynchronous queues as is commonplace at the moment. On Feb 19, 2012 8:19 AM, "Sturla Molden" wrote: > Den 18.02.2012 16:38, skrev Matt Joiner: > >> Recently (for some) the CSP style of channel has become quite popular >> in concurrency implementations. This kind of channel allows sends that >> do not complete until a receiver has actually taken the item. The >> existing queue.Queue would act like this if it didn't treat a queue >> size of 0 as infinite capacity. >> >> In particular, I find channels to have value when sending data between >> threads, where it doesn't make sense to proceed until some current >> item has been accepted. >> > > That is the most common cause of deadlock in number crunching code using > MPI. > > Process A sends message to Process B, waits for B to receive > Process B sends message to Process A, waits for A to receive > > ... and now we just wait ... > > I am really glad the queues on Python do not do this. > > Sturla > > > > > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Feb 19 01:48:16 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Feb 2012 16:48:16 -0800 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 3:57 PM, Nathan Rice wrote: >> Was that meant as an insult? Because it sounds to me like one. > > I'm sorry if my poor wording caused it to come across that way. > Pascal was a very useful language, it with a perspective that was > different than its contemporaries because it was originally intended > for educational purposes, rather than as an academic language like > lisp or a hacker tool like c or fortran. I have no ideal how old you are, or what your background is, so I don't know if you have all that from personal experience or from hearsay. I do know that for me, when I first learned Pascal on the Control Data mainframe in 1974, it was the ultimate hacker tool. (Well, penultimate. Assembler was the ultimate. But even then it was a last resort.) Pascal was also developed by an academic. I never got much out of Lisp. So I guess it's a matter of perspective. > I enjoy writing python a lot, and would prefer to use it rather than > ruby/lisp/java/etc in most cases. My suggestions come from > frustrations that occur when using python in areas where the right > answer is probably just to use a different language. ?If I knew that > what I wanted was at odds with the vision for python, I would have > less of an issue just accepting circumstances, and would just get to > work rather than sidetracking discussions on this list. > > Thanks, and again, sorry! I strongly recommend that you stick to describing your use cases and tentatively exploring possible solutions, instead of trying to spout sweeping controversial statements. Those just get in the way of getting an exchange of ideas going. -- --Guido van Rossum (python.org/~guido) From shibturn at gmail.com Sun Feb 19 01:50:07 2012 From: shibturn at gmail.com (shibturn) Date: Sun, 19 Feb 2012 00:50:07 +0000 Subject: [Python-ideas] Adding shm_open to mmap? In-Reply-To: <4F403AB3.3030502@canterbury.ac.nz> References: <20120214185044.4c5ee513@bhuda.mired.org> <20120214212539.7c5ffdef@bhuda.mired.org> <20120214231011.6fce4b3b@bhuda.mired.org> <20120215133419.230ea8e6@pitrou.net> <20120215180250.21a05ddf@pitrou.net> <4F3C6246.7000509@canterbury.ac.nz> <4F403AB3.3030502@canterbury.ac.nz> Message-ID: On 18/02/2012 11:56pm, Greg Ewing wrote: > shibturn wrote: > >> If the receiving process is expecting an fd then that certainly works. >> But making it work transparently with pickle is difficult. > > Is making it work with pickle a requirement? The point of using > shared memory is to avoid the need for serialising and deserialising. > The point is to avoiding having to pickle/unpickle the *data*. Being able to pickle/unpickle a *reference* to the data would be rather convenient. Then, for instance, you can put references to blocks of raw data on a queue. sbt From sturla at molden.no Sun Feb 19 01:59:20 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 01:59:20 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404003.4060403@molden.no> Message-ID: <4F404968.5070000@molden.no> Den 19.02.2012 01:39, skrev Matt Joiner: > > Yes, channels can allow for this, but as with locks directionality and > ordering matter. Typically messages will only run in a particular > direction. > Actually, it was only a synchronous MPI_Recv that did this in MPI, a synchronous MPI_Send would have been even worse. Which is why MPI got the asynchronous method MPI_Irecv... Sounds like you just want a barrier or a condition primitive. E.g. have the sender call .wait() on a condition and let the receiver call .notify() the condition. Sturla From ncoghlan at gmail.com Sun Feb 19 02:27:12 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Feb 2012 11:27:12 +1000 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 9:57 AM, Nathan Rice wrote: > I enjoy writing python a lot, and would prefer to use it rather than > ruby/lisp/java/etc in most cases. My suggestions come from > frustrations that occur when using python in areas where the right > answer is probably just to use a different language. ?If I knew that > what I wanted was at odds with the vision for python, I would have > less of an issue just accepting circumstances, and would just get to > work rather than sidetracking discussions on this list. The core problem comes down to the differences between Guido's original PEP 340 idea (which was much closer in power to Ruby's blocks, since it was a new looping construct that allowed 0-or-more executions of the contained block) and the more constrained with statement that is defined in PEP 343 (which will either execute the body once or throw an exception, distinguishing it clearly from both the looping constructs and if statements). The principle Guido articulated when making that decision was: "different forms of flow control should look different at the point of invocation". So, where a language like Ruby just defines one protocol (callbacks, supplemented by anonymous blocks that run directly in the namespace of the containing function) and uses it for pretty much *all* flow control (including all their loop constructs), Python works the other way around, defining *different* protocols for different patterns of invocation. This provides a gain in readability on the Python side. When you see any of the following in Python: @whatever() def f(): pass with whatever(): # Do something! for x in whatever(): # Do something! It places a lot of constraints on the nature of the object returned by "whatever()" - even without knowing anything else about it, you know the first must return a decorator, the second a context manager, and the third an iterable. If that's all you need to know at this point in time, you don't need to worry about the details - the local syntax tells you the important things you need to know about the flow control. In Ruby, though, all of them (assuming it isn't actually important that the function name be bound locally) could be written like this: whatever() do: # Do something! end Is it a context manager? An iterable? Some other kind of callback? There's nothing in the syntax to tell you that - you're relying on naming conventions to provide that information (like the ".foreach" convention for iteration methods). That approach can obviously work (otherwise Ruby wouldn't be as popular as it is), but it *does* make it harder to pick up a piece of code and understand the possible control flows without looking elsewhere. However, this decision to be explicit about flow control for the benefit of the *reader* brings with it a high *cost* on the Python side for the code *author*: where Ruby works by defining a nice syntax and semantics for callback based programming and building other language constructs on top of that, Python *doesn't currently have* a particularly nice general purpose native syntax for callback based programming. Decorators do work in many cases (especially simple callback registration), but they sometimes feel wrong because they're mainly designed to modify how a function is defined, not implement key program flow control constructs. However, their flexibility shouldn't be underestimated, and the CallbackStack API is designed to help Python developers push decorators and context managers closer to those limits *without* needing new language constructs. By decoupling the callback stack from the code layout, it gives you full *programmatic* control of the kinds of things context managers can help with when you know in advance exactly what you want to do. *If* CallbackStack proves genuinely popular (and given the number of proposals I have seen along these lines, and the feedback I have received on ContextStack to date, I expect it will), and people start to develop interesting patterns for using it, *then* we can start looking at the possibility of dedicated syntax to streamline particular use cases (just as the with statement itself was designed to streamline various use cases of the more general try statement). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Sun Feb 19 02:54:26 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 19 Feb 2012 11:54:26 +1000 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 10:43 AM, Devin Jeanpierre wrote: > I've in the past worked a bit on improving doctest in a fork I > started. Its primary purpose was originally to add Cram-like "shell > doctests" to doctest (see http://pypi.python.org/pypi/cram ), but > since then I started working on other bits here and there. The work > I've done is available at > https://bitbucket.org/devin.jeanpierre/doctest2 (please forgive the > presumptuous name -- I'm considering a rename to "lembas".) > > The reason I've not worked on it recently is that the problems have > gotten harder and my time has run short. I would be very open to > collaboration or forking, although I also understand that a largeish > expansion with redesigned internals created by an overworked student > is probably not the greatest place to start. > > This is all assuming your intentions are to contribute rather than > only suggest. Not that suggestions aren't welcome, I suppose, but > maybe not here. doctest is not actively developed or maintained > anywhere, as far as I know. (I want to say "except by me", because > that'd make me seem all special and so on, but I haven't committed a > thing in months.) > > Mostly, I feel a bit like this thread could accidentally spawn > parallel / duplicated work, so I figured I'd put what I have out here. > Please don't take it for more than it is, doctest2 is still a work in > progress (and, worse, its source code is in the middle of two feature > additions!) > > I definitely hope you help to make the doctest world better. I think > it fills a role that should be filled, and its neglect is unfortunate. Indeed, my apologies for my earlier crankiness (I should know by now to stay away from mailing lists at crazy hours of the morning). While it's obviously not the ideal, forking orphaned stdlib modules and publishing new versions on PyPI can be an *excellent* idea. The core development team is generally a fairly conservative bunch, so unless a module has a sufficiently active maintainer that feels entitled to make API design decisions, our default response to proposals is going to be "no". One of the *best* ways to change this is to develop a community around an enhanced version of the module - one of our reasons for switching to a DVCS for our development was to help make it easier for people to extract and merge stdlib updates while maintaining their own versions. Then, when you come to python-ideas to say "Hey, wouldn't this be a good idea?", it's possible to point to the PyPI version and say: - people have tried this and liked it - I've been maintaining this for a while now and would continue to do so for the standard library Some major (current or planned) updates to the Python 3.3 standard library occurred because folks decided the stdlib solutions were not in an acceptable state and set out to improve them (specifically, the packaging package came from the distutils2 fork, which continues as a backport to early Python versions, and MRAB's regex module has been approved for addition, although it hasn't actually been incorporated yet). In the past, other major additions like argparse came about that way. A few other stdlib modules have backports on PyPI by their respective stlib maintainers so we can try out new design concepts *before* committing to supporting them in the standard library. A published version of doctest2 that was designed to be suitable for eventual incorporation back into doctest itself (i.e. by maintaining backwards compatibility) sounds like it would be quite popular, and would route around the fact that enhancing it isn't high on the priority list for the current core development team. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ericsnowcurrently at gmail.com Sun Feb 19 03:27:56 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 18 Feb 2012 19:27:56 -0700 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 6:54 PM, Nick Coghlan wrote: > While it's obviously not the ideal, forking orphaned stdlib modules > and publishing new versions on PyPI can be an *excellent* idea. The > core development team is generally a fairly conservative bunch, so > unless a module has a sufficiently active maintainer that feels > entitled to make API design decisions, our default response to > proposals is going to be "no". One of the *best* ways to change this > is to develop a community around an enhanced version of the module - > one of our reasons for switching to a DVCS for our development was to > help make it easier for people to extract and merge stdlib updates > while maintaining their own versions. Then, when you come to > python-ideas to say "Hey, wouldn't this be a good idea?", it's > possible to point to the PyPI version and say: > - people have tried this and liked it > - I've been maintaining this for a while now and would continue to do > so for the standard library > > Some major (current or planned) updates to the Python 3.3 standard > library occurred because folks decided the stdlib solutions were not > in an acceptable state and set out to improve them (specifically, the > packaging package came from the distutils2 fork, which continues as a > backport to early Python versions, and MRAB's regex module has been > approved for addition, although it hasn't actually been incorporated > yet). In the past, other major additions like argparse came about that > way. > > A few other stdlib modules have backports on PyPI by their respective > stlib maintainers so we can try out new design concepts *before* > committing to supporting them in the standard library. > > A published version of doctest2 that was designed to be suitable for > eventual incorporation back into doctest itself (i.e. by maintaining > backwards compatibility) sounds like it would be quite popular, and > would route around the fact that enhancing it isn't high on the > priority list for the current core development team. Well said, Nick. That's worth putting in the devguide. -eric From guido at python.org Sun Feb 19 05:06:08 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 18 Feb 2012 20:06:08 -0800 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 5:27 PM, Nick Coghlan wrote: > On Sun, Feb 19, 2012 at 9:57 AM, Nathan Rice > wrote: >> I enjoy writing python a lot, and would prefer to use it rather than >> ruby/lisp/java/etc in most cases. My suggestions come from >> frustrations that occur when using python in areas where the right >> answer is probably just to use a different language. ?If I knew that >> what I wanted was at odds with the vision for python, I would have >> less of an issue just accepting circumstances, and would just get to >> work rather than sidetracking discussions on this list. > > The core problem comes down to the differences between Guido's > original PEP 340 idea (which was much closer in power to Ruby's > blocks, since it was a new looping construct that allowed 0-or-more > executions of the contained block) and the more constrained with > statement that is defined in PEP 343 (which will either execute the > body once or throw an exception, distinguishing it clearly from both > the looping constructs and if statements). > > The principle Guido articulated when making that decision was: > "different forms of flow control should look different at the point of > invocation". > > So, where a language like Ruby just defines one protocol (callbacks, > supplemented by anonymous blocks that run directly in the namespace of > the containing function) and uses it for pretty much *all* flow > control (including all their loop constructs), Python works the other > way around, defining *different* protocols for different patterns of > invocation. > > This provides a gain in readability on the Python side. When you see > any of the following in Python: > > ? @whatever() > ? def f(): > ? ? ? ?pass > > ? ?with whatever(): > ? ? ? ?# Do something! > > ? ?for x in whatever(): > ? ? ? ?# Do something! > > It places a lot of constraints on the nature of the object returned by > "whatever()" - even without knowing anything else about it, you know > the first must return a decorator, the second a context manager, and > the third an iterable. If that's all you need to know at this point in > time, you don't need to worry about the details - the local syntax > tells you the important things you need to know about the flow > control. > > In Ruby, though, all of them (assuming it isn't actually important > that the function name be bound locally) could be written like this: > > ? ?whatever() do: > ? ? ? ?# Do something! > ? ?end > > Is it a context manager? An iterable? Some other kind of callback? > There's nothing in the syntax to tell you that - you're relying on > naming conventions to provide that information (like the ".foreach" > convention for iteration methods). That approach can obviously work > (otherwise Ruby wouldn't be as popular as it is), but it *does* make > it harder to pick up a piece of code and understand the possible > control flows without looking elsewhere. > > However, this decision to be explicit about flow control for the > benefit of the *reader* brings with it a high *cost* on the Python > side for the code *author*: where Ruby works by defining a nice syntax > and semantics for callback based programming and building other > language constructs on top of that, Python *doesn't currently have* a > particularly nice general purpose native syntax for callback based > programming. > > Decorators do work in many cases (especially simple callback > registration), but they sometimes feel wrong because they're mainly > designed to modify how a function is defined, not implement key > program flow control constructs. However, their flexibility shouldn't > be underestimated, and the CallbackStack API is designed to help > Python developers push decorators and context managers closer to those > limits *without* needing new language constructs. By decoupling the > callback stack from the code layout, it gives you full *programmatic* > control of the kinds of things context managers can help with when you > know in advance exactly what you want to do. > > *If* CallbackStack proves genuinely popular (and given the number of > proposals I have seen along these lines, and the feedback I have > received on ContextStack to date, I expect it will), and people start > to develop interesting patterns for using it, *then* we can start > looking at the possibility of dedicated syntax to streamline > particular use cases (just as the with statement itself was designed > to streamline various use cases of the more general try statement). Very lucid explanation, Nick. (I also liked your blog post that you referenced in a previous message, which touches upon the same issues.) Apparently I don't seem to like flow control constructs formed by "quoting" (in Lisp terms) a block of code and leaving its execution to some other party, with the exception of explicit function definitions. Maybe a computer-literate psychoanalyst can do something with this... To this day I am having trouble liking event-based architectures -- I do see a need for them, but I immediately want to hide their mechanisms and offer a *different* mechanism for most use cases. See e.g. the (non-thread-based) async functionality I added to the new App Engine datastore client, NDB: https://docs.google.com/document/pub?id=1LhgEnZXAI8xiEkFA4tta08Hyn5vo4T6HSGLFVrP0Jag . Deep down inside it has an event loop, but this is hidden by using Futures, which in turn are mostly wrapped in tasklets , i.e. yield-based coroutines. I expect that if I were to find a use for Twisted, I'd do most of my coding using its so-called inlineCallbacks mechanism (also yield-based coroutines). When I first saw Monocle, which offers a simplified coroutine-based API on top of (amongst others) Twisted, I thought it was a breath of fresh air (NDB is heavily influenced by it). I've probably (implicitly) trained most key Python developers and users to think similarly, and Python isn't likely to morph into Ruby any time soon. It's easy enough to write an event-based architecture in Python (see Twisted and Tornado); but an event loop is never going to be the standard way to solve all your programming problems in Python. I do kind of like the 'defer' idea that started this thread (even if I had syntactic quibbles with it that already came up before the thread was derailed), but I notice that it is a far cry from an event-driven architecture -- like the referenced counterparts in Go and D, 'defer' blocks are not anonymous functions that can be passed off to arbitrary other libraries for possibly later and/or repeated execution -- they are a way to specify out-of-order execution within the current scope, which "tames" them enough to be acceptable from my perspective. Though they may also not be powerful enough to be convincing as a new feature, since you can do everything they can do by rearranging the code of your function somewhat and carefully using try/finally. -- --Guido van Rossum (python.org/~guido) From nathan.alexander.rice at gmail.com Sun Feb 19 05:31:31 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Sat, 18 Feb 2012 23:31:31 -0500 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: > The core problem comes down to the differences between Guido's > original PEP 340 idea (which was much closer in power to Ruby's > blocks, since it was a new looping construct that allowed 0-or-more > executions of the contained block) and the more constrained with > statement that is defined in PEP 343 (which will either execute the > body once or throw an exception, distinguishing it clearly from both > the looping constructs and if statements). > > The principle Guido articulated when making that decision was: > "different forms of flow control should look different at the point of > invocation". > > So, where a language like Ruby just defines one protocol (callbacks, > supplemented by anonymous blocks that run directly in the namespace of > the containing function) and uses it for pretty much *all* flow > control (including all their loop constructs), Python works the other > way around, defining *different* protocols for different patterns of > invocation. > > This provides a gain in readability on the Python side. When you see > any of the following in Python: > > ? @whatever() > ? def f(): > ? ? ? ?pass > > ? ?with whatever(): > ? ? ? ?# Do something! > > ? ?for x in whatever(): > ? ? ? ?# Do something! > > It places a lot of constraints on the nature of the object returned by > "whatever()" - even without knowing anything else about it, you know > the first must return a decorator, the second a context manager, and > the third an iterable. If that's all you need to know at this point in > time, you don't need to worry about the details - the local syntax > tells you the important things you need to know about the flow > control. I can appreciate the intention there. That particular case isn't as big a deal from my perspective, my non-local code pain points tend to be centered around boneheaded uses of inheritance and dynamic modification of classes.. > In Ruby, though, all of them (assuming it isn't actually important > that the function name be bound locally) could be written like this: > > ? ?whatever() do: > ? ? ? ?# Do something! > ? ?end > > Is it a context manager? An iterable? Some other kind of callback? > There's nothing in the syntax to tell you that - you're relying on > naming conventions to provide that information (like the ".foreach" > convention for iteration methods). That approach can obviously work > (otherwise Ruby wouldn't be as popular as it is), but it *does* make > it harder to pick up a piece of code and understand the possible > control flows without looking elsewhere. More often than not, when I am reading other people's code, I am debugging it (and thus have local/global context information) or just interested in nailing down a poorly documented corner of an API. I think if I were regularly in the habit of working with lots of undocumented code I would probably appreciate this more. > However, this decision to be explicit about flow control for the > benefit of the *reader* brings with it a high *cost* on the Python > side for the code *author*: where Ruby works by defining a nice syntax > and semantics for callback based programming and building other > language constructs on top of that, Python *doesn't currently have* a > particularly nice general purpose native syntax for callback based > programming. > > Decorators do work in many cases (especially simple callback > registration), but they sometimes feel wrong because they're mainly > designed to modify how a function is defined, not implement key > program flow control constructs. However, their flexibility shouldn't > be underestimated, and the CallbackStack API is designed to help > Python developers push decorators and context managers closer to those > limits *without* needing new language constructs. By decoupling the > callback stack from the code layout, it gives you full *programmatic* > control of the kinds of things context managers can help with when you > know in advance exactly what you want to do. > > *If* CallbackStack proves genuinely popular (and given the number of > proposals I have seen along these lines, and the feedback I have > received on ContextStack to date, I expect it will), and people start > to develop interesting patterns for using it, *then* we can start > looking at the possibility of dedicated syntax to streamline > particular use cases (just as the with statement itself was designed > to streamline various use cases of the more general try statement). I wasn't suggesting syntax needs to change necessarily, all the pieces are already there. I see it more along the lines of function.Event, class.Event, module.Event, context_manager.Event, etc. It is a moot point though. Thank you again for taking the time to clarify the rational for me. It wasn't intuitive to me because it does not really address issues I have. From nathan.alexander.rice at gmail.com Sun Feb 19 06:14:37 2012 From: nathan.alexander.rice at gmail.com (Nathan Rice) Date: Sun, 19 Feb 2012 00:14:37 -0500 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: > Apparently I don't seem to like flow control constructs formed by > "quoting" (in Lisp terms) a block of code and leaving its execution to > some other party, with the exception of explicit function definitions. > Maybe a computer-literate psychoanalyst can do something with this... > > To this day I am having trouble liking event-based architectures -- I > do see a need for them, but I immediately want to hide their > mechanisms and offer a *different* mechanism for most use cases. See > e.g. the (non-thread-based) async functionality I added to the new App > Engine datastore client, NDB: > https://docs.google.com/document/pub?id=1LhgEnZXAI8xiEkFA4tta08Hyn5vo4T6HSGLFVrP0Jag > . Deep down inside it has an event loop, but this is hidden by using > Futures, which in turn are mostly wrapped in tasklets , i.e. > yield-based coroutines. I expect that if I were to find a use for > Twisted, I'd do most of my coding using its so-called inlineCallbacks > mechanism (also yield-based coroutines). When I first saw Monocle, > which offers a simplified coroutine-based API on top of (amongst > others) Twisted, I thought it was a breath of fresh air (NDB is > heavily influenced by it). The main attraction of events for me is that they are a decent model of computational flow that makes it easy to "reach into" other people's code. I won't argue against the statement that they can be less clear or convenient to work with in some cases than other mechanisms. My personal preference would be to have the more powerful mechanism as the underlying technology, and build simpler abstractions on top of that (kind of like @property vs manually creating a descriptor). > I've probably (implicitly) trained most key Python developers and > users to think similarly, and Python isn't likely to morph into Ruby > any time soon. It's easy enough to write an event-based architecture > in Python (see Twisted and Tornado); but an event loop is never going > to be the standard way to solve all your programming problems in > Python. I agree that events can make code harder to follow in some cases. I feel the same way about message passing and channels versus method invocation. In both cases I think there is an argument to be made for representing the simpler techniques as a special cases which are emphasized for general use. I also understand not wanting to be stuck dealing with someone else's event or message passing fetish when it's not necessary (and they often aren't), and that is certainly a fair counterargument. Thank you for clarifying your views somewhat, it was instructive. I enjoy writing python code in general, but I shouldn't let that lead me astray when it isn't the right tool for the job. Take care, Nathan From anacrolix at gmail.com Sun Feb 19 08:36:48 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 19 Feb 2012 15:36:48 +0800 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: References: Message-ID: Out of interest, do you see an alternative to events or message passing when they _are_ required? I'm in Guido's apparently minority camp in that I can't stand events. The only decent alternative I've seen is message passing. On Feb 19, 2012 1:15 PM, "Nathan Rice" wrote: > > Apparently I don't seem to like flow control constructs formed by > > "quoting" (in Lisp terms) a block of code and leaving its execution to > > some other party, with the exception of explicit function definitions. > > Maybe a computer-literate psychoanalyst can do something with this... > > > > To this day I am having trouble liking event-based architectures -- I > > do see a need for them, but I immediately want to hide their > > mechanisms and offer a *different* mechanism for most use cases. See > > e.g. the (non-thread-based) async functionality I added to the new App > > Engine datastore client, NDB: > > > https://docs.google.com/document/pub?id=1LhgEnZXAI8xiEkFA4tta08Hyn5vo4T6HSGLFVrP0Jag > > . Deep down inside it has an event loop, but this is hidden by using > > Futures, which in turn are mostly wrapped in tasklets , i.e. > > yield-based coroutines. I expect that if I were to find a use for > > Twisted, I'd do most of my coding using its so-called inlineCallbacks > > mechanism (also yield-based coroutines). When I first saw Monocle, > > which offers a simplified coroutine-based API on top of (amongst > > others) Twisted, I thought it was a breath of fresh air (NDB is > > heavily influenced by it). > > The main attraction of events for me is that they are a decent model > of computational flow that makes it easy to "reach into" other > people's code. I won't argue against the statement that they can be > less clear or convenient to work with in some cases than other > mechanisms. My personal preference would be to have the more powerful > mechanism as the underlying technology, and build simpler abstractions > on top of that (kind of like @property vs manually creating a > descriptor). > > > I've probably (implicitly) trained most key Python developers and > > users to think similarly, and Python isn't likely to morph into Ruby > > any time soon. It's easy enough to write an event-based architecture > > in Python (see Twisted and Tornado); but an event loop is never going > > to be the standard way to solve all your programming problems in > > Python. > > I agree that events can make code harder to follow in some cases. I > feel the same way about message passing and channels versus method > invocation. In both cases I think there is an argument to be made for > representing the simpler techniques as a special cases which are > emphasized for general use. I also understand not wanting to be stuck > dealing with someone else's event or message passing fetish when it's > not necessary (and they often aren't), and that is certainly a fair > counterargument. > > Thank you for clarifying your views somewhat, it was instructive. I > enjoy writing python code in general, but I shouldn't let that lead me > astray when it isn't the right tool for the job. > > > Take care, > > Nathan > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sun Feb 19 10:25:39 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 19 Feb 2012 11:25:39 +0200 Subject: [Python-ideas] ScopeGuardStatement/Defer Proposal In-Reply-To: <4F4039D9.2010909@pearwood.info> References: <4F4039D9.2010909@pearwood.info> Message-ID: 19.02.12 01:52, Steven D'Aprano ???????(??): > I hope not. I like Pascal. It has nice, clean syntax (if a tad verbose, > with the BEGIN/END tags) and straight-forward, simple semantics. > Standard Pascal is somewhat lacking (e.g. no strings) but who uses > standard Pascal? Python is not Pascal. For me it s BASIC of nowadays. Really basic, simple and clear (even for non-specialists) language. Not old BASIC with line numbers, GOTO, GOSUB and 1- or 2-symbol identifiers, but modern language with modules, structured programming, powerful basic data structures, OOP, first-class functions, automatic resource management etc. From cs at zip.com.au Sun Feb 19 11:05:12 2012 From: cs at zip.com.au (Cameron Simpson) Date: Sun, 19 Feb 2012 21:05:12 +1100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F404968.5070000@molden.no> References: <4F404968.5070000@molden.no> Message-ID: <20120219100512.GA31747@cskk.homeip.net> On 19Feb2012 01:59, Sturla Molden wrote: | Den 19.02.2012 01:39, skrev Matt Joiner: | > | > Yes, channels can allow for this, but as with locks directionality and | > ordering matter. Typically messages will only run in a particular | > direction. | > | | Actually, it was only a synchronous MPI_Recv that did this in MPI, a | synchronous | MPI_Send would have been even worse. Which is why MPI got the | asynchronous method MPI_Irecv... | | Sounds like you just want a barrier or a condition primitive. E.g. have | the sender | call .wait() on a condition and let the receiver call .notify() the | condition. A condition is essentially a boolean (with waiting). A channel is a value passing mechanism. Sometimes you really do want a zero-storage Queue i.e. a channel. Saying "but you could put a value in a shared variable and just use a condition" removes the abstraction/metaphor. If I was thinking that way more than once in some code I'd write a small class to do that. And it would be a channel! Seriously, a channel is semanticly equivalent to a zero-storage Queue, which is a mode not provided by the current Queue implementation. -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ No good deed shall go unpunished! - David Wood From jeanpierreda at gmail.com Sun Feb 19 16:18:58 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 19 Feb 2012 10:18:58 -0500 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 8:54 PM, Nick Coghlan wrote: > A published version of doctest2 that was designed to be suitable for > eventual incorporation back into doctest itself (i.e. by maintaining > backwards compatibility) sounds like it would be quite popular, and > would route around the fact that enhancing it isn't high on the > priority list for the current core development team. Heh, "quite popular". Whenever I mention doctest2, people think of doctest. And apparently people really dislike doctest. The way I try to address the immediate fear response is, "sure, doctest is terrible -- why do you think I'm forking it? ;)"; however, I think popularity would be difficult outside of the existing doctest user base. P.S., some uninvited advice to would-be forkers: - Make the starting commit of your repository identical to the original module that you're forking, to make tracking the original module easier. - On that note, also write down the hg revision of the module that you're forking so that you can find later changes. - Immediately change the name of your forked module so that unit tests only run against it rather than accidentally testing the original module. (Also, delete the original from your Python to be sure you edited the test cases right too. And, uh, don't forget the pyc.) Maybe these are obvious to everyone else, but I'd never forked anything before, and so I made all those mistakes. The first dozen or two commits are full of sad things. -- Devin From guido at python.org Sun Feb 19 16:53:14 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 19 Feb 2012 07:53:14 -0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <20120219100512.GA31747@cskk.homeip.net> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> Message-ID: How hard would it be to add Channel to the stdlib? Perhaps even in the threading module, which already has a bunch of different primitives like Lock, RLock, Condition, Event, Semaphore, Barrier. On Sun, Feb 19, 2012 at 2:05 AM, Cameron Simpson wrote: > On 19Feb2012 01:59, Sturla Molden wrote: > | Den 19.02.2012 01:39, skrev Matt Joiner: > | > > | > Yes, channels can allow for this, but as with locks directionality and > | > ordering matter. Typically messages will only run in a particular > | > direction. > | > > | > | Actually, it was only a synchronous MPI_Recv that did this in MPI, a > | synchronous > | MPI_Send would have been even worse. Which is why MPI got the > | asynchronous method MPI_Irecv... > | > | Sounds like you just want a barrier or a condition primitive. E.g. have > | the sender > | call .wait() on a condition and let the receiver call .notify() the > | condition. > > A condition is essentially a boolean (with waiting). > A channel is a value passing mechanism. > Sometimes you really do want a zero-storage Queue i.e. a channel. > > Saying "but you could put a value in a shared variable and just use a > condition" removes the abstraction/metaphor. If I was thinking that way > more than once in some code I'd write a small class to do that. And it would > be a channel! > > Seriously, a channel is semanticly equivalent to a zero-storage Queue, which > is a mode not provided by the current Queue implementation. > -- > Cameron Simpson DoD#743 > http://www.cskk.ezoshosting.com/cs/ > > No good deed shall go unpunished! ? ? ? - David Wood > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From sturla at molden.no Sun Feb 19 16:58:07 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 16:58:07 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> Message-ID: <4F411C0F.6030008@molden.no> Den 19.02.2012 16:53, skrev Guido van Rossum: > How hard would it be to add Channel to the stdlib? It might take 10 lines of code... Sturla From solipsis at pitrou.net Sun Feb 19 17:01:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 19 Feb 2012 17:01:03 +0100 Subject: [Python-ideas] channel (synchronous queue) References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> Message-ID: <20120219170103.17f5a8b9@pitrou.net> On Sun, 19 Feb 2012 16:58:07 +0100 Sturla Molden wrote: > Den 19.02.2012 16:53, skrev Guido van Rossum: > > How hard would it be to add Channel to the stdlib? > > It might take 10 lines of code... Even for multiprocessing? (I realize we didn't implement a Barrier in multiprocessing; patches welcome :-)) Regards Antoine. From sturla at molden.no Sun Feb 19 17:58:53 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 17:58:53 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <20120219170103.17f5a8b9@pitrou.net> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> Message-ID: <4F412A4D.4090102@molden.no> Den 19.02.2012 17:01, skrev Antoine Pitrou: > Even for multiprocessing? (I realize we didn't implement a Barrier in > multiprocessing; patches welcome :-)) Regards Antoine. > _______________________________________________ Python-ideas mailing > list Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas Here is a sceleton (replace self._data with some IPC mechanism for multiprocessing). I'll post a barrier for multiprocessing asap, I happen to have one ;-) Sturla from threading import Lock, Event class Channel(object): def __init__(self): self._writelock = Lock() self._readlock = Lock() self._new_data = Event() self._recv_data = Event() self._data = None def put(self, msg): with self._writelock: self._data = msg self._new_data.set() self._recv_data.wait() self._recv_data.clear() def get(self): with self._readlock: self._new_data.wait() msg = self._data self._data = None self._new_data.clear() self._recv_data.set() return msg if __name__ == "__main__": from threading import Thread from sys import stdout def thread2(channel): for i in range(1000): msg = channel.get() stdout.flush() print "Thread 2 received '%s'\n" % msg, stdout.flush() def thread1(channel): for i in range(1000): stdout.flush() print "Thread 1 preparing to send 'message %d'\n" % i, stdout.flush() msg = channel.put(("message %d" % i,)) stdout.flush() print "Thread 1 finished sending 'message %d'\n" % i, stdout.flush() channel = Channel() t2 = Thread(target=thread2, args=(channel,)) t2.start() thread1(channel) t2.join() From sturla at molden.no Sun Feb 19 18:05:57 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 18:05:57 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F412A4D.4090102@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> Message-ID: <4F412BF5.90504@molden.no> Den 19.02.2012 17:58, skrev Sturla Molden: > Den 19.02.2012 17:01, skrev Antoine Pitrou: > >> Even for multiprocessing? (I realize we didn't implement a Barrier in >> multiprocessing; patches welcome :-)) Regards Antoine. >> _______________________________________________ Python-ideas mailing >> list Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > Here is a sceleton (replace self._data with some IPC mechanism for > multiprocessing). > > I'll post a barrier for multiprocessing asap, I happen to have one ;-) > from multiprocessing import Event from math import ceil, log class Barrier(object): def __init__(self, numproc): self._events = [mp.Event() for n in range(numproc**2)] self._numproc = numproc def wait(self, rank): # loop log2(numproc) times, rounding up for k in range(int(ceil(log(self._numproc)/log(2)))): # send event to process # (rank + 2**k) % numproc receiver = (rank + 2**k) % self._numproc evt = self._events[rank * self._numproc + receiver] evt.set() # wait for event from process # (rank - 2**k) % numproc sender = (rank - 2**k) % self._numproc evt = self._events[sender * self._numproc + rank] evt.wait() evt.clear() From solipsis at pitrou.net Sun Feb 19 18:18:52 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 19 Feb 2012 18:18:52 +0100 Subject: [Python-ideas] channel (synchronous queue) References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> Message-ID: <20120219181852.1bce007c@pitrou.net> On Sun, 19 Feb 2012 17:58:53 +0100 Sturla Molden wrote: > > def put(self, msg): > with self._writelock: > self._data = msg > self._new_data.set() > self._recv_data.wait() > self._recv_data.clear() This begs the question: what does it achieve? You know that the data has been "received" on the other side (i.e. get() has been called), but this doesn't tell you anything was done with the data, so: why is this an useful way to synchronize? Regards Antoine. From sturla at molden.no Sun Feb 19 18:27:23 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 18:27:23 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <20120219181852.1bce007c@pitrou.net> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> Message-ID: <4F4130FB.4040204@molden.no> Den 19.02.2012 18:18, skrev Antoine Pitrou: > This begs the question: what does it achieve? You know that the data > has been "received" on the other side (i.e. get() has been called), > but this doesn't tell you anything was done with the data, so: why is > this an useful way to synchronize? I think it achieves nothing, except making deadlocks more likely. Sturla From sturla at molden.no Sun Feb 19 18:36:34 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 18:36:34 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F4130FB.4040204@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> Message-ID: <4F413322.2070703@molden.no> Den 19.02.2012 18:27, skrev Sturla Molden: > Den 19.02.2012 18:18, skrev Antoine Pitrou: >> This begs the question: what does it achieve? You know that the data >> has been "received" on the other side (i.e. get() has been called), >> but this doesn't tell you anything was done with the data, so: why is >> this an useful way to synchronize? > > I think it achieves nothing, except making deadlocks more likely. Which is to say, I just wanted to prove how ridiculously simple Matt Joiner's complaint about a "channel" was. The multiprocessing barrier on the other hand is quite useful. (Though the butterfly method is not the most efficient implementation of a barrier.) Sturla From sturla at molden.no Sun Feb 19 18:43:58 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 18:43:58 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F413322.2070703@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> Message-ID: <4F4134DE.5070609@molden.no> Den 19.02.2012 18:36, skrev Sturla Molden: > > The multiprocessing barrier on the other hand is quite useful. (Though > the butterfly method is not the most efficient implementation of a > barrier.) Oops... it is the dissemination barrier, not the butterfly barrier. It scales better for non-power of two number of threads. Sturla From guido at python.org Sun Feb 19 18:44:00 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 19 Feb 2012 09:44:00 -0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F413322.2070703@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> Message-ID: On Sun, Feb 19, 2012 at 9:36 AM, Sturla Molden wrote: > Den 19.02.2012 18:27, skrev Sturla Molden: > >> Den 19.02.2012 18:18, skrev Antoine Pitrou: >>> >>> This begs the question: what does it achieve? You know that the data has >>> been "received" on the other side (i.e. get() has been called), but this >>> doesn't tell you anything was done with the data, so: why is this an useful >>> way to synchronize? >> >> >> I think it achieves nothing, except making deadlocks more likely. > > > Which is to say, I just wanted to prove how ridiculously simple Matt > Joiner's complaint about a "channel" was. I may be taking this out of context, but I have a really hard time understanding what you were trying to say. What does it mean for a complaint to be simple? Did you leave out a word in haste? (I know that happens a lot to me. :-) > The multiprocessing barrier on the other hand is quite useful. (Though the > butterfly method is not the most efficient implementation of a barrier.) Glad to see some real code. It's probably time to move the code samples to the bug tracker where they can be reviewed and have a chance of getting incorporated into the next release. -- --Guido van Rossum (python.org/~guido) From sturla at molden.no Sun Feb 19 19:01:31 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 19:01:31 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> Message-ID: <4F4138FB.8010103@molden.no> Den 19.02.2012 18:44, skrev Guido van Rossum: > I may be taking this out of context, but I have a really hard time > understanding what you were trying to say. What does it mean for a > complaint to be simple? Did you leave out a word in haste? (I know > that happens a lot to me. :-) Sorry for the rude language. I ment I think it is a problem that does not belong in the standard library, but perhaps in a cookbook. It is ~20 lines of trivial code with objects already in the standard library. Well, one could say the same thing about a queue too (it's just deque and a lock), but it is very useful and commonly used, so there is a difference. Sturla From shibturn at gmail.com Sun Feb 19 19:07:21 2012 From: shibturn at gmail.com (shibturn) Date: Sun, 19 Feb 2012 18:07:21 +0000 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F412BF5.90504@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <4F412BF5.90504@molden.no> Message-ID: On 19/02/2012 5:05pm, Sturla Molden wrote: > from multiprocessing import Event > from math import ceil, log > ... I presume rank is the index of the process? Sounds very MPIish. One problem with multiprocessing's Event uses 5 semaphores. (Condition uses 4 and Lock, RLock, Semaphore use 1). So your Barrier will use 5*numproc semaphores. This is likely to be a problem for those Unixes (such as oldish versions of FreeBSD) which allow a very limited number of semaphores. It would probably better to use something which has an API which is a closer match to threading.Barrier. The code below gets closer in API but does not implement reset() (which I think is pretty pointless anyway), and wait() returns None instead of an index. It is not properly tested though. import multiprocessing as mp class BrokenBarrierError(Exception): pass class Barrier(object): def __init__(self, size): assert size > 0 self.size = size self._lock = mp.Lock() self._entry_sema = mp.Semaphore(size-1) self._exit_sema = mp.Semaphore(0) self._broken_sema = mp.BoundedSemaphore(1) def wait(self, timeout=None): if self.broken: raise BrokenBarrierError try: if self._entry_sema.acquire(timeout=0): if not self._exit_sema.acquire(timeout=timeout): self.abort() else: for i in range(self.size-1): self._exit_sema.release() for i in range(self.size-1): self._entry_sema.release() except: self.abort() raise if self.broken: raise BrokenBarrierError def abort(self): with self._lock: self._broken_sema.acquire(timeout=5) for i in range(self.size): self._entry_sema.release() self._exit_sema.release() def reset(self): raise NotImplementedError @property def broken(self): with self._lock: if not self._broken_sema.acquire(timeout=0): return True self._broken_sema.release() return False ## import time, random def child(b,l): for i in range(5): time.sleep(random.random()*5) with l: print i, "entering barrier:", mp.current_process().name b.wait() with l: print '\t', i, "exiting barrier:", mp.current_process().name if __name__ == '__main__': b = Barrier(5) l = mp.Lock() for i in range(5): mp.Process(target=child, args=(b,l)).start() time.sleep(10) print("ABORTING") b.abort() From luoyonggang at gmail.com Sun Feb 19 19:08:13 2012 From: luoyonggang at gmail.com (=?UTF-8?B?572X5YuH5YiaKFlvbmdnYW5nIEx1bykg?=) Date: Mon, 20 Feb 2012 02:08:13 +0800 Subject: [Python-ideas] Lack a API WCHAR* PyUnicode_WCHAR_DATA(PyObject *o), This is important for OS dependent feature port. Message-ID: Py_UCS1 * PyUnicode_1BYTE_DATA(PyObject * *o*) Py_UCS2 * PyUnicode_2BYTE_DATA(PyObject * *o*) Py_UCS4 * PyUnicode_4BYTE_DATA(PyObject * *o*) Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4 integer types for direct character access. No checks are performed if the canonical representation has the correct character size; use PyUnicode_KIND() to select the right macro. Make sure PyUnicode_READY() has been called before accessing this. New in version 3.3. PyUnicode_WCHAR_KIND PyUnicode_1BYTE_KIND PyUnicode_2BYTE_KIND PyUnicode_4BYTE_KIND Return values of the PyUnicode_KIND() macro. New in version 3.3. int PyUnicode_KIND(PyObject * *o*) Return one of the PyUnicode kind constants (see above) that indicate how many bytes per character this Unicode object uses to store its data. *o* has to be a Unicode object in the ?canonical? representation (not checked). New in version 3.3. -- ?? ? ??? Yours sincerely, Yonggang Luo -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Feb 19 19:08:41 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 19 Feb 2012 18:08:41 +0000 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F4138FB.8010103@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> Message-ID: On 19 February 2012 18:01, Sturla Molden wrote: > It is ~20 lines > of trivial code with objects already in the standard library. Well, one > could say the same thing about a queue too (it's just deque and a lock), but > it is very useful and commonly used, so there is a difference. FWIW, I wouldn't have got this code right if I'd tried to write it. I'd have missed a lock or something. So it's possible that having it in the standard library avoids people like me writing buggy implementations. On the other hand, I can't imagine ever needing to use a channel object like this, so it would probably be worth having some real-world use cases to justify it. Paul From sturla at molden.no Sun Feb 19 19:18:04 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 19:18:04 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <4F412BF5.90504@molden.no> Message-ID: <4F413CDC.30707@molden.no> Den 19.02.2012 19:07, skrev shibturn: > > > One problem with multiprocessing's Event uses 5 semaphores. > (Condition uses 4 and Lock, RLock, Semaphore use 1). So your Barrier > will use 5*numproc semaphores. This is likely to be a problem for > those Unixes (such as oldish versions of FreeBSD) which allow a very > limited number of semaphores. I actually overallocated the number of events, only O(n log n) should be needed. So a dict could have been used for sparse storage instead. Still that is a lot of semaphores. Sturla From sturla at molden.no Sun Feb 19 19:29:00 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 19:29:00 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <4F412BF5.90504@molden.no> Message-ID: <4F413F6C.9000806@molden.no> Den 19.02.2012 19:07, skrev shibturn: > > One problem with multiprocessing's Event uses 5 semaphores. > (Condition uses 4 and Lock, RLock, Semaphore use 1). So your Barrier > will use 5*numproc semaphores. It is of course trivial to implement a dissemination barrier in C, atomic read/write (and shared memory for multiprocessing). It would take O(n log2 n) amount of shared memory. One iteration of .wait() would take O(log2 n) time. Sturla From solipsis at pitrou.net Sun Feb 19 19:29:21 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 19 Feb 2012 19:29:21 +0100 Subject: [Python-ideas] Lack a API WCHAR* PyUnicode_WCHAR_DATA(PyObject *o), This is important for OS dependent feature port. References: Message-ID: <20120219192921.0e366a41@pitrou.net> > Lack a API WCHAR* PyUnicode_WCHAR_DATA(PyObject *o), This is > important for OS dependent feature port. Why can't you use the existing wchar_t functions: http://docs.python.org/dev/c-api/unicode.html#wchar-t-support ? Regards Antoine. From shibturn at gmail.com Sun Feb 19 19:46:22 2012 From: shibturn at gmail.com (shibturn) Date: Sun, 19 Feb 2012 18:46:22 +0000 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <4F412BF5.90504@molden.no> Message-ID: On 19/02/2012 6:07pm, shibturn wrote: > 5*numproc semaphores. This is likely to be a problem for those Unixes ^^^^^^^^^ 5*numproc**2 sbt From guido at python.org Sun Feb 19 20:04:45 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 19 Feb 2012 11:04:45 -0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> Message-ID: On Sun, Feb 19, 2012 at 10:08 AM, Paul Moore wrote: > On 19 February 2012 18:01, Sturla Molden wrote: >> It is ~20 lines >> of trivial code with objects already in the standard library. Well, one >> could say the same thing about a queue too (it's just deque and a lock), but >> it is very useful and commonly used, so there is a difference. > > FWIW, I wouldn't have got this code right if I'd tried to write it. > I'd have missed a lock or something. So it's possible that having it > in the standard library avoids people like me writing buggy > implementations. It would also encourage using it as an interface between libraries with different authors, which would not happen if it was just a recipe -- every author would implement their own version of the recipe, and they would not be API-compatible even if they did the same thing. Many of the existing primitives in threading.py are very simple combinations of the basic Lock; but that doesn't make it less valuable to have them. Also, writing a performant Channel implementation for multiprocessing would hardly be a trivial job; it seems primitives don't make it into multiprocessing without first existing in threading.py. So all this suggests to me that there is no great harm in adding threading.Channel and it might open up some interesting new approaches to synchronization. That said, it certainly isn't a panacea; e.g. some Go examples written using Channels are better done with coroutines instead of threads in Python. (IIUC Go intentionally blurs the difference, but that's not given to Python.) > On the other hand, I can't imagine ever needing to > use a channel object like this, so it would probably be worth having > some real-world use cases to justify it. I think Matt Joiner's original post hinted at some. Matt, could you elaborate? We may be only an inch away from getting this into the stdlib... -- --Guido van Rossum (python.org/~guido) From sturla at molden.no Sun Feb 19 21:23:03 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 21:23:03 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> Message-ID: <4F415A27.5060605@molden.no> Den 19.02.2012 20:04, skrev Guido van Rossum: > Also, writing a performant Channel implementation for multiprocessing > would hardly be a trivial job; I think this should work :-) Sturla from multiprocessing import Lock, Event, Pipe class Channel(object): def __init__(self): self._writelock = Lock() self._readlock = Lock() self._new_data = Event() self._recv_data = Event() self._conn1, self._conn2 = Pipe(False) def put(self, msg): with self._writelock: self._conn2.send(msg) self._new_data.set() self._recv_data.wait() self._recv_data.clear() def get(self): with self._readlock: self._new_data.wait() msg = self._conn1.recv() self._new_data.clear() self._recv_data.set() return msg ## ------------- def proc2(channel): from sys import stdout for i in range(1000): msg = channel.get() stdout.flush() print "Process 2 received '%s'\n" % msg, stdout.flush() def proc1(channel): from sys import stdout for i in range(1000): stdout.flush() print "Process 1 preparing to send 'message %d'\n" % i, stdout.flush() msg = channel.put(("message %d" % i,)) stdout.flush() print "Process 1 finished sending 'message %d'\n" % i, stdout.flush() if __name__ == "__main__": from multiprocessing import Process channel = Channel() p2 = Process(target=proc2, args=(channel,)) p2.start() proc1(channel) p2.join() From sturla at molden.no Sun Feb 19 21:51:03 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 21:51:03 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F415A27.5060605@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> <4F415A27.5060605@molden.no> Message-ID: <4F4160B7.9030409@molden.no> If someone want to write a PEP for Go'ish "channels" (or whatever), put it on the bug tracker, here is the example implementation (with a lock around stdout in the example code, stupid me...) It would still need a timeout argument. A unittest could check the output of the test code as it is a known in advance. I don't really see the usefulness of a "channel" primitive, for those who do, here is my contribution. I have only tested on Win64. Sturla -------------- next part -------------- A non-text attachment was scrubbed... Name: channels.zip Type: application/x-zip-compressed Size: 2407 bytes Desc: not available URL: From sturla at molden.no Sun Feb 19 22:29:23 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 22:29:23 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> Message-ID: <4F4169B3.6050101@molden.no> Den 19.02.2012 20:04, skrev Guido van Rossum: > I think Matt Joiner's original post hinted at some. Matt, could you > elaborate? We may be only an inch away from getting this into the > stdlib... One thing I could think of, is "atomic messaging" with multiple producers or consumers talking on the same channel. E.g. while process A sends a message to process B, process C cannot write and process D cannot read. So you always get a 1 to 1 conversation. But I am not sure why (or if) Go has this mechanism. On the other hand, if we put in N**2 pipes (or channels), we could achieve the same atomicity of transaction by having an index for sender and receiver of a message. This is what MPI does in the functions MPI_Send and MPI_Recv. But then I will be scolded for using to many semaphores on FreeBSD again :-( But there are some other useful mechanisms from MPI (and ?MQ) to consider as well. For example message broadcasting, message scatter and gather, and reductions. The latter is a reduce operation (e.g. add or multiply) on messages coming in from multiple processes. OpenMP also has reductions in the API. So there is a lot to be considered on the area of concurrency if we want to put in more classes in threading and multiprocessing. But now I'll stop before someone tells me to take this to the concurrency list :-) Sturla From solipsis at pitrou.net Mon Feb 20 00:26:15 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 20 Feb 2012 00:26:15 +0100 Subject: [Python-ideas] channel (synchronous queue) References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> <4F4169B3.6050101@molden.no> Message-ID: <20120220002615.46c57b2a@pitrou.net> On Sun, 19 Feb 2012 22:29:23 +0100 Sturla Molden wrote: > Den 19.02.2012 20:04, skrev Guido van Rossum: > > I think Matt Joiner's original post hinted at some. Matt, could you > > elaborate? We may be only an inch away from getting this into the > > stdlib... > > One thing I could think of, is "atomic messaging" with multiple > producers or consumers talking on the same channel. E.g. while process A > sends a message to process B, process C cannot write and process D > cannot read. So you always get a 1 to 1 conversation. What would be the point exactly? Regards Antoine. From massimo.dipierro at gmail.com Mon Feb 20 00:38:41 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Sun, 19 Feb 2012 17:38:41 -0600 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F4169B3.6050101@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> <4F4169B3.6050101@molden.no> Message-ID: <569B5ADF-FF08-4604-9A0B-60185F7543BA@gmail.com> On Feb 19, 2012, at 3:29 PM, Sturla Molden wrote: > On the other hand, if we put in N**2 pipes (or channels), we could achieve the same atomicity of transaction by having an index for sender and receiver of a message. This is what MPI does in the functions MPI_Send and MPI_Recv. But then I will be scolded for using to many semaphores on FreeBSD again :-( I like this a lot. Below is some toy code I use in my parallel algorithms class (I removed the global communications broadcast, scatter, gather, reduce and I removed logging, network topology constraints, and checks). class PSim(object): def __init__(self,p): """ forks p-1 processes and creates p*p pipes """ self.nprocs = p self.pipes = {} for i in range(p): for j in range(p): self.pipes[i,j] = os.pipe() self.rank = 0 for i in range(1,p): if not os.fork(): self.rank = i def send(self,j,data): s = cPickle.dumps(data) os.write(self.pipes[self.rank,j][1], string.zfill(str(len(s)),10)) os.write(self.pipes[self.rank,j][1], s) def recv(self,j): size=int(os.read(self.pipes[j,self.rank][0],10)) s=os.read(self.pipes[j,self.rank][0],size) data=cPickle.loads(s) return data if __name__ == '__main__': comm = PSim(2) if comm.rank == 0: comm.send(1,'hello world') else: print comm.recv(0) It would be very useful to have something like these channels built-in. Notice that using OS pipes have the problem of a OS dependent size. send is non-blocking for small data-size but becomes blocking for large data sizes. Using OS mkfifo or multiprocessing Queue is better but the OS limits the number of files open by one program. From anacrolix at gmail.com Mon Feb 20 01:34:15 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Mon, 20 Feb 2012 08:34:15 +0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F4138FB.8010103@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> Message-ID: Your implementation is incomplete. On Feb 20, 2012 2:01 AM, "Sturla Molden" wrote: > Den 19.02.2012 18:44, skrev Guido van Rossum: > >> I may be taking this out of context, but I have a really hard time >> understanding what you were trying to say. What does it mean for a >> complaint to be simple? Did you leave out a word in haste? (I know that >> happens a lot to me. :-) >> > > Sorry for the rude language. I ment I think it is a problem that does not > belong in the standard library, but perhaps in a cookbook. It is ~20 lines > of trivial code with objects already in the standard library. Well, one > could say the same thing about a queue too (it's just deque and a lock), > but it is very useful and commonly used, so there is a difference. > > Sturla > > > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon Feb 20 01:40:28 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 01:40:28 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <569B5ADF-FF08-4604-9A0B-60185F7543BA@gmail.com> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> <4F4169B3.6050101@molden.no> <569B5ADF-FF08-4604-9A0B-60185F7543BA@gmail.com> Message-ID: <4F41967C.90509@molden.no> Den 20.02.2012 00:38, skrev Massimo Di Pierro: > It would be very useful to have something like these channels > built-in. Notice that using OS pipes have the problem of a OS > dependent size. send is non-blocking for small data-size but becomes > blocking for large data sizes. Using OS mkfifo or multiprocessing > Queue is better but the OS limits the number of files open by one > program. Most MPI implementations use shared memory on localhost. In theory one could implement a queue (deque and lock) using a shared memory region (a file on /tmp or Windows equivalent). It would be extremely fast and could contain any number of "pipes" of arbitrary size. Sturla From sturla at molden.no Mon Feb 20 01:43:47 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 01:43:47 +0100 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> Message-ID: <4F419743.5040205@molden.no> Den 20.02.2012 01:34, skrev Matt Joiner: > > Your implementation is incomplete. > It does the basic communication you asked for. I know it is a featureless proof-of-concept, why don't you fill in the rest? (I don't really care.) Sturla From massimo.dipierro at gmail.com Mon Feb 20 01:44:38 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Sun, 19 Feb 2012 18:44:38 -0600 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: <4F41967C.90509@molden.no> References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> <4F4169B3.6050101@molden.no> <569B5ADF-FF08-4604-9A0B-60185F7543BA@gmail.com> <4F41967C.90509@molden.no> Message-ID: +1 On Feb 19, 2012, at 6:40 PM, Sturla Molden wrote: > Den 20.02.2012 00:38, skrev Massimo Di Pierro: >> It would be very useful to have something like these channels built-in. Notice that using OS pipes have the problem of a OS dependent size. send is non-blocking for small data-size but becomes blocking for large data sizes. Using OS mkfifo or multiprocessing Queue is better but the OS limits the number of files open by one program. > > Most MPI implementations use shared memory on localhost. In theory one could implement a queue (deque and lock) using a shared memory region (a file on /tmp or Windows equivalent). It would be extremely fast and could contain any number of "pipes" of arbitrary size. > > Sturla > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From anacrolix at gmail.com Mon Feb 20 01:57:41 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Mon, 20 Feb 2012 08:57:41 +0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> <4F4138FB.8010103@molden.no> <4F4169B3.6050101@molden.no> <569B5ADF-FF08-4604-9A0B-60185F7543BA@gmail.com> <4F41967C.90509@molden.no> Message-ID: I've created http://bugs.python.org/issue14059 for the multiprocessing.Barrier. I suggest a new thread be started to continue discussion on that. On Mon, Feb 20, 2012 at 8:44 AM, Massimo Di Pierro wrote: > +1 > > On Feb 19, 2012, at 6:40 PM, Sturla Molden wrote: > >> Den 20.02.2012 00:38, skrev Massimo Di Pierro: >>> It would be very useful to have something like these channels built-in. Notice that using OS pipes have the problem of a OS dependent size. send is non-blocking for small data-size but becomes blocking for large data sizes. Using OS mkfifo or multiprocessing Queue is better but the OS limits the number of files open by one program. >> >> Most MPI implementations use shared memory on localhost. In theory one could implement a queue (deque and lock) using a shared memory region (a file on /tmp or Windows equivalent). It would be extremely fast and could contain any number of "pipes" of arbitrary size. >> >> Sturla >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From anacrolix at gmail.com Mon Feb 20 01:59:26 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Mon, 20 Feb 2012 08:59:26 +0800 Subject: [Python-ideas] channel (synchronous queue) In-Reply-To: References: <4F404968.5070000@molden.no> <20120219100512.GA31747@cskk.homeip.net> <4F411C0F.6030008@molden.no> <20120219170103.17f5a8b9@pitrou.net> <4F412A4D.4090102@molden.no> <20120219181852.1bce007c@pitrou.net> <4F4130FB.4040204@molden.no> <4F413322.2070703@molden.no> Message-ID: I've created http://bugs.python.org/issue14060 for the possibility of a channel implementation. On Mon, Feb 20, 2012 at 1:44 AM, Guido van Rossum wrote: > On Sun, Feb 19, 2012 at 9:36 AM, Sturla Molden wrote: >> Den 19.02.2012 18:27, skrev Sturla Molden: >> >>> Den 19.02.2012 18:18, skrev Antoine Pitrou: >>>> >>>> This begs the question: what does it achieve? You know that the data has >>>> been "received" on the other side (i.e. get() has been called), but this >>>> doesn't tell you anything was done with the data, so: why is this an useful >>>> way to synchronize? >>> >>> >>> I think it achieves nothing, except making deadlocks more likely. >> >> >> Which is to say, I just wanted to prove how ridiculously simple Matt >> Joiner's complaint about a "channel" was. > > I may be taking this out of context, but I have a really hard time > understanding what you were trying to say. What does it mean for a > complaint to be simple? Did you leave out a word in haste? (I know > that happens a lot to me. :-) > >> The multiprocessing barrier on the other hand is quite useful. (Though the >> butterfly method is not the most efficient implementation of a barrier.) > > Glad to see some real code. It's probably time to move the code > samples to the bug tracker where they can be reviewed and have a > chance of getting incorporated into the next release. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From techtonik at gmail.com Mon Feb 20 10:47:17 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 20 Feb 2012 12:47:17 +0300 Subject: [Python-ideas] sys.path is a hack - bringing it back under control Message-ID: Hi, I often find this in my scripts/projects, that I run directly from checkout: DEVPATH = os.path.dirname(os.path.abspath(__file__)) sys.path.insert(0, DEVPATH) This seems like a hack to me, because the process of sys.path modification is completely out of control for Python application developer, which means it is easy to break an application and get lost. I don't remember the exact user story for that bad association with sys.path (perhaps Django issue #1908), but something makes me feel that I am not alone: http://stackoverflow.com/questions/5500736/troubleshooting-python-sys-path What I'd like to propose is some control/info over what modified sys.path. The simplest case: 1. make sys.path a list of pairs (path, file-that-added-the-path) 2. make sys.path read-only 3. add sys.path.add() method for modification 4. logger for sys.path.add() events (or recipe how to implement it in documentation) This will help a lot. Limiting sys.path may cause a loss of some functionality if you need to remove some or replace it completely, but I don't know where the ability to reset() sys.path can be useful. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.sapin at kozea.fr Mon Feb 20 11:06:34 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Mon, 20 Feb 2012 11:06:34 +0100 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: <4F421B2A.6090100@kozea.fr> Le 20/02/2012 10:47, anatoly techtonik a ?crit : > > I often find this in my scripts/projects, that I run directly from > checkout: > > DEVPATH =os.path.dirname(os.path.abspath(__file__)) > sys.path.insert(0,DEVPATH) > Hi, You shouldn?t have to do that if you?re running 'python something.py' > As initialized upon program startup, the first item of this list, > path[0], is the directory containing the script that was used to > invoke the Python interpreter. If the script directory is not > available (e.g. if the interpreter is invoked interactively or if the > script is read from standard input), path[0] is the empty string, > which directs Python to search modules in the current directory > first. http://docs.python.org/py3k/library/sys.html#sys.path The trick is to place the script in the directory that you want in the path, ie. next to top-level packages. But from your code above this seems to be the case already... Regards, -- Simon Sapin From techtonik at gmail.com Mon Feb 20 11:31:21 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 20 Feb 2012 13:31:21 +0300 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: <4F421B2A.6090100@kozea.fr> References: <4F421B2A.6090100@kozea.fr> Message-ID: On Mon, Feb 20, 2012 at 1:06 PM, Simon Sapin wrote: > I often find this in my scripts/projects, that I run directly from >> checkout: >> >> DEVPATH =os.path.dirname(os.path.**abspath(__file__)) >> sys.path.insert(0,DEVPATH) >> >> > You shouldn?t have to do that if you?re running 'python something.py' > But I did for some reason, and right now I can't even say if it was Windows, Linux, FreeBSD, PyPy, IPython, gdb or debugging from IDE. As initialized upon program startup, the first item of this list, >> path[0], is the directory containing the script that was used to >> invoke the Python interpreter. If the script directory is not >> available (e.g. if the interpreter is invoked interactively or if the >> script is read from standard input), path[0] is the empty string, >> which directs Python to search modules in the current directory >> first. >> > > http://docs.python.org/py3k/**library/sys.html#sys.path > > The trick is to place the script in the directory that you want in the > path, ie. next to top-level packages. But from your code above this seems > to be the case already... > s/trick/hack/ and it will be just what I am saying. Not many Python projects use this structure. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Feb 20 11:38:58 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Feb 2012 20:38:58 +1000 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 7:47 PM, anatoly techtonik wrote: > Hi, > > I often find this in my scripts/projects, that I run directly from checkout: > > DEVPATH = os.path.dirname(os.path.abspath(__file__)) > sys.path.insert(0, DEVPATH) PEP 395 describes my current plan to fix sys.path initialisation (however, I can't yet promise that it will make it into 3.3, since it doesn't even have a reference implementation yet, and I have several other things I want to get done first). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From luoyonggang at gmail.com Mon Feb 20 13:19:33 2012 From: luoyonggang at gmail.com (=?UTF-8?B?572X5YuH5YiaKFlvbmdnYW5nIEx1bykg?=) Date: Mon, 20 Feb 2012 20:19:33 +0800 Subject: [Python-ideas] Lack a API WCHAR* PyUnicode_WCHAR_DATA(PyObject *o), This is important for OS dependent feature port. In-Reply-To: <20120219192921.0e366a41@pitrou.net> References: <20120219192921.0e366a41@pitrou.net> Message-ID: 2012/2/20 Antoine Pitrou > > > Lack a API WCHAR* PyUnicode_WCHAR_DATA(PyObject *o), This is > > important for OS dependent feature port. > > Why can't you use the existing wchar_t functions: > http://docs.python.org/dev/c-api/unicode.html#wchar-t-support > ? > > Thanks, got it:). > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- ?? ? ??? Yours sincerely, Yonggang Luo -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Mon Feb 20 14:11:20 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 20 Feb 2012 16:11:20 +0300 Subject: [Python-ideas] Personal Project Roadmap (Was: sys.path is a hack - bringing it back under control) Message-ID: On Mon, Feb 20, 2012 at 1:38 PM, Nick Coghlan wrote: > (however, I can't yet promise that it will make it into 3.3, since it > doesn't even have a reference implementation yet, and I have several > other things I want to get done first). > I think that the idea of personal personal project roadmap would rock. If I'd like something to be done faster, I could look at these "other things" to see if I can help with some of them. In addition I could copy some stuff to my own list to say that I am also interested. Once the item reaches the top in somebody's list (or there a critical mass is reached), he opens a hangout with other people or schedules a time for discussion. The login method is Python account. Items are either bugs from trackers or short inline notes in a tree-like structure. Will it improve the Python development process? -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Mon Feb 20 14:18:22 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 20 Feb 2012 16:18:22 +0300 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 1:38 PM, Nick Coghlan wrote: > On Mon, Feb 20, 2012 at 7:47 PM, anatoly techtonik > wrote: > > Hi, > > > > I often find this in my scripts/projects, that I run directly from > checkout: > > > > DEVPATH = os.path.dirname(os.path.abspath(__file__)) > > sys.path.insert(0, DEVPATH) > > PEP 395 describes my current plan to fix sys.path initialisation > (however, I can't yet promise that it will make it into 3.3, since it > doesn't even have a reference implementation yet, and I have several > other things I want to get done first). > tl;dr :( The abstract doesn't give any valuable info. "proposes new mechanisms to eliminate some longstanding traps" doesn't say anything. Which mechanssms? What traps? I see there a mention of my problem with Django. How can it help to debug other sys.path problems? Do I really have to read 15 page document to understand that? -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Feb 20 14:58:04 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 20 Feb 2012 23:58:04 +1000 Subject: [Python-ideas] Personal Project Roadmap (Was: sys.path is a hack - bringing it back under control) In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 11:11 PM, anatoly techtonik wrote: > I think that the idea of personal personal project roadmap would rock. > If?I'd like something to be done faster, I could look at these "other > things" to see if I can help with some of them. In addition I could?copy > some stuff to my own list to say that I am also interested. Once the item > reaches the top in somebody's list (or there a critical mass is reached), he > opens a hangout with other people or schedules a time for discussion. > > The login method is Python account. Items are either bugs from trackers or > short inline notes in a tree-like structure. > > Will it improve the Python development process? My own time spent on Python things certainly isn't that organised. I'll have a couple of items on the "do this next" list (e.g. PEP 394 was at the top of my list recently, and getting PEP 409 finalised now occupies that spot). However, I may switch to other things based on external events (e.g. the email I just sent proposing acceptance of PEP 3144 was based on Georg posting Peter's latest draft, the PEP 408 discussions a short while back that were prompted by Eli following up on article I'd written some time ago with a full PEP), or because I want to get them done while they're clear in my mind (e.g. the time I spent last weekend writing up my summary of the text file processing in Python 3 Unicode discussion was time that I had previously planned to spend working on either PEP 394, which had already been resolved by then, or on PEP 409). There's a few other things that I'd like to get up on PyPI soon (especially contextlib2.CallbackStack) so people can tinker with them for a few months before the first 3.3 beta, which means setting up CI for contextlib2 before I cut a new release. I also had an illuminating off-list discussion with the PEP 407 authors and the 3.4 RM that I want to write up as a new PEP before the language summit in a few week's time (even though I won't be there in person). Other things (like revamping the sequence docs to bring them into the modern Python era or fixing CPython's longstanding operand precedence bug for sequences implemented in C) have been postponed until after more of the API related changes are done. Then there's a whole cloud of "other things to do" (such as all the bugs I'm nosy on on the tracker, all the issues I created because I wanted to remember them but didn't have time to address immediately myself, my perennial efforts to try to make callback-based programming in Python feel less forced and awkward) that may attract my interest at any given point in time. A reference implementation for PEP 395 is definitely in the mix of things I want to get done, but I'm happy to postpone even thinking particularly hard about it until after the importlib bootstrapping effort (which appears to be progressing well) is complete. And all that's without even considering that I'm doing almost everything Python related in personal time rather than work time, so there's plenty of scope for life to intervene with higher priority interrupts :) My impression is that the other core devs work in a similar fashion - our personal Python to-do lists are vague, nebulous things, not well-formed long-term plans (except in particular cases, like specific PEPs we're working on). In important ways, Greg Kroah-Hartman's recent description of Linux kernel development applies to CPython, too: "We always say that Linux kernel development is 'evolution, not intelligent design,' in that solutions are found to problems as they come up, so making forecasts as to what is going to happen in the future is always quite difficult,". In the CPython case, it's a matter of solutions generally being achievable with the language *as it already exists* - the proposed changes are mostly just ways of reducing external dependencies, or allowing developers to achieve the same results while writing less code of their own. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Mon Feb 20 15:16:30 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Feb 2012 00:16:30 +1000 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 11:18 PM, anatoly techtonik wrote: > The abstract doesn't give any valuable info. "proposes new mechanisms to > eliminate some longstanding traps" doesn't say anything. Which mechanssms? > What traps? I see there a mention of my problem with Django. How can it help > to debug other sys.path problems? Do I really have to read 15 page document > to understand that? Perhaps you could try reading the Table of Contents, too. (Hopefully you don't find it too long - it's 20+ lines) Or else you may want to refrain from participating in language discussions if you aren't interested in understanding the topic in depth. No serious design discussion can possibly be held amongst people that are only willing to read a PEP abstract rather than the full PEP. (But then, it's been suggested many times in the past that you may get better responses if you don't make a habit of effectively calling the current core developers a bunch of incompetent idiots, and that doesn't appear to have had the slightest effect on your style of communication. Why should this be any different?). Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From julien at tayon.net Mon Feb 20 15:28:57 2012 From: julien at tayon.net (julien tayon) Date: Mon, 20 Feb 2012 15:28:57 +0100 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: Hello, I am a bit confused. Tests are best not located amongst code, but in a sub directory. I was strongly stated on #python to use unittest(2) or nose in order not to use the path hacks. so did stackoverflow also stated when I googled it : http://stackoverflow.com/questions/61151/where-do-the-python-unit-tests-go Okay unittest tries relative import by adding a dot in front of the name, but in the fallback, in the end, does not it uses the sys path hack ? (my eyes may be old, my brain damaged by alcohol, but it looks very much this way). It looks like hiding dust under the carpet, and stating that by tabooing sys.path hack use and locating it in very savant module, the problem gets solved. I am honestly, just very candid on this one, and pretty puzzled. sys path hack, looks to me a lot like coupling between classes, global varaibles, or gotos, and other beasts. It may be needed and yet powerful therefore great wisdom is needed to handle them carefully. PS (not sure I am 100% serious after this point) Why not create a module like __use_wisefully__ then ? people would be warned by being compelled to write : from __use_wisefully__ import sys.path ? Cheers, -- Jul From techtonik at gmail.com Mon Feb 20 16:14:25 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 20 Feb 2012 18:14:25 +0300 Subject: [Python-ideas] Personal Project Roadmap (Was: sys.path is a hack - bringing it back under control) In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 4:58 PM, Nick Coghlan wrote: > On Mon, Feb 20, 2012 at 11:11 PM, anatoly techtonik > wrote: > > I think that the idea of personal personal project roadmap would rock. > > If I'd like something to be done faster, I could look at these "other > > things" to see if I can help with some of them. In addition I could copy > > some stuff to my own list to say that I am also interested. Once the item > > reaches the top in somebody's list (or there a critical mass is > reached), he > > opens a hangout with other people or schedules a time for discussion. > > > > The login method is Python account. Items are either bugs from trackers > or > > short inline notes in a tree-like structure. > > > > Will it improve the Python development process? > > My own time spent on Python things certainly isn't that organised. > I'll have a couple of items on the "do this next" list (e.g. PEP 394 > was at the top of my list recently, and getting PEP 409 finalised now > occupies that spot). However, I may switch to other things based on > external events (e.g. the email I just sent proposing acceptance of > PEP 3144 was based on Georg posting Peter's latest draft, the PEP 408 > discussions a short while back that were prompted by Eli following up > on article I'd written some time ago with a full PEP), or because I > want to get them done while they're clear in my mind (e.g. the time I > spent last weekend writing up my summary of the text file processing > in Python 3 Unicode discussion was time that I had previously planned > to spend working on either PEP 394, which had already been resolved by > then, or on PEP 409). > > There's a few other things that I'd like to get up on PyPI soon > (especially contextlib2.CallbackStack) so people can tinker with them > for a few months before the first 3.3 beta, which means setting up CI > for contextlib2 before I cut a new release. I also had an illuminating > off-list discussion with the PEP 407 authors and the 3.4 RM that I > want to write up as a new PEP before the language summit in a few > week's time (even though I won't be there in person). Other things > (like revamping the sequence docs to bring them into the modern Python > era or fixing CPython's longstanding operand precedence bug for > sequences implemented in C) have been postponed until after more of > the API related changes are done. > > Then there's a whole cloud of "other things to do" (such as all the > bugs I'm nosy on on the tracker, all the issues I created because I > wanted to remember them but didn't have time to address immediately > myself, my perennial efforts to try to make callback-based programming > in Python feel less forced and awkward) that may attract my interest > at any given point in time. > > A reference implementation for PEP 395 is definitely in the mix of > things I want to get done, but I'm happy to postpone even thinking > particularly hard about it until after the importlib bootstrapping > effort (which appears to be progressing well) is complete. > > And all that's without even considering that I'm doing almost > everything Python related in personal time rather than work time, so > there's plenty of scope for life to intervene with higher priority > interrupts :) > > My impression is that the other core devs work in a similar fashion - > our personal Python to-do lists are vague, nebulous things, not > well-formed long-term plans (except in particular cases, like specific > PEPs we're working on). > > In important ways, Greg Kroah-Hartman's recent description of Linux > kernel development applies to CPython, too: "We always say that Linux > kernel development is 'evolution, not intelligent design,' in that > solutions are found to problems as they come up, so making forecasts > as to what is going to happen in the future is always quite > difficult,". In the CPython case, it's a matter of solutions generally > being achievable with the language *as it already exists* - the > proposed changes are mostly just ways of reducing external > dependencies, or allowing developers to achieve the same results while > writing less code of their own. > That's a good insight. That is an interesting info for new people, because if gives a picture which ideas can be interesting to tackle, because they have more contributors to help. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Feb 20 16:29:04 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 20 Feb 2012 16:29:04 +0100 Subject: [Python-ideas] Personal Project Roadmap (Was: sys.path is a hack - bringing it back under control) References: Message-ID: <20120220162904.57eb2dfb@pitrou.net> On Mon, 20 Feb 2012 23:58:04 +1000 Nick Coghlan wrote: > > My impression is that the other core devs work in a similar fashion - > our personal Python to-do lists are vague, nebulous things, not > well-formed long-term plans (except in particular cases, like specific > PEPs we're working on). Agreed. cheers Antoine. From techtonik at gmail.com Mon Feb 20 18:39:48 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 20 Feb 2012 20:39:48 +0300 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 5:16 PM, Nick Coghlan wrote: > On Mon, Feb 20, 2012 at 11:18 PM, anatoly techtonik > wrote: > > The abstract doesn't give any valuable info. "proposes new mechanisms to > > eliminate some longstanding traps" doesn't say anything. Which > mechanssms? > > What traps? I see there a mention of my problem with Django. How can it > help > > to debug other sys.path problems? Do I really have to read 15 page > document > > to understand that? > > Perhaps you could try reading the Table of Contents, too. (Hopefully > you don't find it too long - it's 20+ lines) > I've read the ToC, but which of these parts answers the question: "How to make debugging sys.path problems easier?" Abstract Relationship with Other PEPs What's in a __name__? Traps for the Unwary Why are my imports broken? Importing the main module twice In a bit of a pickle Where's the source? Forkless Windows Qualified Names for Modules Alternative Names Eliminating the Traps Fixing main module imports inside packages Optional addition: command line relative imports Compatibility with PEP 382 Incompatibility with PEP 402 Potential incompatibilities with scripts stored in packages Fixing dual imports of the main module Fixing pickling without breaking introspection Fixing multiprocessing on Windows Explicit relative imports > Or else you may want to refrain from participating in language > discussions if you aren't interested in understanding the topic in > depth. No serious design discussion can possibly be held amongst > people that are only willing to read a PEP abstract rather than the > full PEP. I didn't want to offend anybody by giving an impression that what you're doing is not important. I realize that there are papers that people need to read, especially who are willing to participate in ideas discussion, but the point is that I'd like to have a simple answer for a simple proposal. I read the proposal. In the following order: PEP-0395: Abstract PEP-3155: Rationale (skimmed) PEP-3155: Proposal (reread several times, a lot of questions) PEP-3155: Discussion (skim, got a feeling that there should be a link to the actual discussion) PEP-3155: Naming choice PEP-3155: References (is still not clear what is `qualified name`) http://en.wikipedia.org/wiki/QName http://translate.google.com/#auto|ru|qualified%20name (got translation that it is 'full name' - that makes sense) PEP-3155: Naming choice (all right, the more intuitive 'full name' and 'path' are not really 'full name' and filesystem path, so the name is different) PEP-0395: Contents PEP-0395: Qualifed Names for Modules (started - "To make it feasible to fix these problems once and for all, it is proposed to add a new module level attribute: __qualname__" - which problems?) PEP-0395: Traps for the Unwary ("The overloading of the semantics of __name__, along with some historically associated behaviour in the initialisation of sys.path[0], has resulted in several traps for the unwary" - damn, how is this gonna help to debug sys.path problems? gave up, wrote a sad tl;dr smile) Now I hope it gives an overview what difficulties a person who is out-of-context has while trying to solve one tiny user story of debugging sys.path. I just want everything to be as much simplified as possible, possibly killing the fun for prose readers. Maybe I don't really want to think about complex PEP matters, because the idea is just an episode in the daily workflow. I'd also really prefer to keep complicated matters (e.g. discussions) around tiny user stories, that don't require much time to load into the brain and you can only concentrate on two or three of them that are conflicting. Proposal to read 15 page technical paper doesn't work well with this scenario, so if you just said - "Yes. You have to read that.", I'd reply "Well, ok. Next time then.". (But then, it's been suggested many times in the past that > you may get better responses if you don't make a habit of effectively > calling the current core developers a bunch of incompetent idiots, and > that doesn't appear to have had the slightest effect on your style of > communication. Why should this be any different?). > I am not an English writer, but I am interested to know where did this impression of me calling core developers a bunch of incompetent idiots is coming from. If anybody can quote concrete example and explain in private - I may have a chance to change something. My English is a result of learning legal and technical English texts, not love letters, and I may not possess the communication skills required to write proper letters in informal language (which also I prefer more than business stuff). I can write in third person without *you* or *I* other personal pronounce, but it takes more time to compose the proper form, so the note like this one can take an hour or more (it already took more), and time is that I really lack. Not me alone, though, but I may be too obsessed with saving someone else's time by placing too much attention to it, indeed. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Feb 20 19:28:32 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 20 Feb 2012 13:28:32 -0500 Subject: [Python-ideas] doctest References: Message-ID: <20120220132832.76b772da@resist.wooz.org> On Feb 17, 2012, at 02:57 PM, Mark Janssen wrote: >I find myself wanting to use doctest for some test-driven development, >and find myself slightly frustrated and wonder if others would be >interested in seeing the following additional functionality in >doctest: FWIW, I think doctests are fantastic and I use them all the time. There are IMO a couple of things to keep in mind: - doctests are documentation first. Specifically, they are testable documentation. What better way to ensure that your documentation is accurate and up-to-date? (And no, I do not generally find skew between the code and the separate-file documentation.) - I personally dislike docstring doctests, and much prefer separate reST documents. These have several advantages, such as the ability to inject names into doctests globals (use with care though), and the ability to set up the execution context for doctests (see below). The fact that it's so easy to turn these into documentation with Sphinx is a huge win. Since so many people point this out, let me say that I completely agree that doctests are not a *replacement* for unittests, but they are a fantastic *complement* to unittests. When I TDD, I always start writing the (testable) documentation first, because if I cannot explain the component under test in clearly intelligible English, then I probably don't really understand what it is I'm trying to write. My doctests usually describe mostly the good path through the API. Occasionally I'll describe error modes if I think those are important for understanding how to use the code. However, for all those fuzzy corner cases, weird behaviors, bug fixes, etc., unittests are much better suited because ensuring you've fixed these problems and don't regress in the future doesn't help the narrative very much. >1. Execution context determined by outer-scope doctest defintions. Can you explain this one? For the separate-reST-document style I use, these are almost always driven by a test_documentation.py which ostensibly fits into the unittest framework. It searches for .rst files and builds up DocFileSuites around them. Using this style it is very easy to clean up resources, reset persistent state (e.g. reset the database after every doctest), call setUp and tearDown methods, and even correctly fiddle the __future__ state expected by doctests. I usually put all this in an additional_tests() method, such as: http://bazaar.launchpad.net/~barry/flufl.enum/trunk/view/head:/flufl/enum/tests/test_documentation.py So setting up context is as easy as writing a setUp() method and passing that to DocFileSuite. One thing that bums me out about this is that I haven't really made the bulk of additional_tests() very generic. I usually cargo cult most of this code into every package I write. :( >2. Smart Comparisons that will detect output of a non-ordered type >(dict/set), lift and recast it and do a real comparison. I'm of mixed mind with these. Yes, you must be careful with ordering, but I find it less readable to just sort() some dictionary output for example. What I've found much more useful is to iterate over the sorted keys of a dictionary and print the key/values pairs. This general pattern has a few advantages, such as the ability to add some filtering to the output if you don't care about everything, and more importantly, the ability to print most string values without their u'' prefix (for better py2/py3 compatibility from the same code base without the use of 2to3). Nested structures can be more problematic, but I've often found that as the output gets uglier, the narrative suffers, so that's a good time to re-evaluate your documentation! >Without #1, "literate testing" becomes awash with re-defining re-used >variables which, generally, also detracts from exact purpose of the >test -- this creates testdoc noise and the docs become less useful. >Without #2, "readable docs" nicely co-aligning with "testable docs" >tends towards divergence. > >Perhaps not enough developers use doctest to care, but I find it one >of the more enjoyable ways to develop python code -- I don't have to >remember test cases nor go through the trouble of setting up >unittests. AND, it encourages agile development. Another user wrote >a while back of even having a built-in test() method. Wouldn't that >really encourage agile developement? And you wouldn't have to muddy >up your code with "if __name__ == "__main__": import doctest, yadda >yadda". > >Anyway... of course patches welcome, yes... ;^) I've no doubt that doctests could be improved, but I actually find them quite usable as is, with just a little bit of glue code to get it all hooked up. As I say though, I'm biased against docstring doctests. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From tjreedy at udel.edu Mon Feb 20 21:58:35 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 20 Feb 2012 15:58:35 -0500 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: On 2/20/2012 12:39 PM, anatoly techtonik wrote: > I am not an English writer, I am a native English (American) speaker/writer who can barely write in one other natural language (Spanish). So I have great sympathy for and appreciation for those who struggle with English. And I am willing to help those who wish to improve. > but I am interested to know where did this > impression of me calling core developers a bunch of incompetent idiots > is coming from. From the way you have written in the past. Nick may have been exaggerating a bit, but I have gotten similar impressions, though you have been writing better and more effectively recently. I think it was just a month ago that you were persuasive enough to get something added for 3.3. > If anybody can quote concrete example and explain in > private - I may have a chance to change something. Now that I know that it is not your intention to come across as antagonistic, I will try to do the above if I see bad examples in the future. > My English is a result of learning legal and technical English texts, The legal part in interesting. Legal English in adversarial situations is used to metaphorically club people -- or to confuse people. In reading the Argentina Python users list, I have noticed that conversational Spanish is not exactly the same as the formal Spanish I learned in classes some decades ago. But it is sometimes hard to know what is a sloppy error and what is an accepted idiom. -- Terry Jan Reedy From ncoghlan at gmail.com Mon Feb 20 23:44:11 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Feb 2012 08:44:11 +1000 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: On Tue, Feb 21, 2012 at 3:39 AM, anatoly techtonik wrote: > I've read the ToC, but which of these parts answers the question: "How to > make debugging sys.path problems easier?" Ah, my apologies. I must confess to having misread your original email (I paid too much attention to the first half, not enough to the latter). PEP 395 aims to avoid people feeling the need to mess with sys.path in the first place, thus reducing the likelihood of problems occurring in the first place. For *debugging* sys.path, as you say, the problem is figuring out who is messing it up after problems have already occurred and you've found strange entries in there. There's definitely a case to be made that sys.path should be a smarter kind of object by default, one that accepts callbacks to be triggered when modifications occur. (such behaviour would be useful for updating namespace package __path__ attributes, for instance). However, for purely debugging purposes, it should be sufficient to monkeypatch sys.path with an object that write any changes through to the original path, while overriding the various mutations methods to report the source of the modification (via stack introspection). IIt's the same kind of technique you can use to investigate faults in *any* kind of container. (Versions based on UserDict and UserList might be interesting Python Cookbook recipes). > Now I hope it gives an overview what difficulties a?person who is > out-of-context has while trying to solve one tiny user story of debugging > sys.path. I just want everything to be as much simplified as possible, > possibly killing the fun for prose readers. Maybe I don't really want to > think about complex PEP matters, because the idea is just an episode in the > daily workflow. I'd also really prefer to keep complicated matters (e.g. > discussions) around tiny user stories, that don't require much time to load > into the brain and you can only concentrate on two or three of them that are > conflicting. Proposal to read 15 page technical paper doesn't work well with > this scenario, so if you just said - "Yes. You have to read that.", I'd > reply "Well, ok. Next time then.". The fault was mine - I didn't understand your suggestion correctly, so I didn't realise that PEP 395 doesn't actually address it. >> (But then, it's been suggested many times in the past that >> you may get better responses if you don't make a habit of effectively >> calling the current core developers a bunch of incompetent idiots, and >> that doesn't appear to have had the slightest effect on your style of >> communication. Why should this be any different?). > > > I am not an English writer, but I am interested to know where did this > impression of me calling core developers a bunch of incompetent idiots is > coming from. If anybody can quote concrete example and explain in private - > I may have a chance to change something. My English is a result of > learning?legal and technical English texts, not love letters, and I may not > possess the communication skills required to write proper letters in > informal language (which also I prefer more than business stuff). I can > write in third person without *you* or *I* other personal pronounce, but it > takes more time to compose the proper form, so the note like this one can > take an hour or more (it already took more), and time is that I really lack. > Not me alone, though, but I may be too obsessed with saving someone else's > time by placing too much attention to it, indeed. Anatoly, thanks for taking the time to explain that. The impression comes from the fact that many of the things you object to within Python are largely a result of limited availability of development resources, so even things that are at least arguably good ideas simply don't get investigated. The universe of good ideas is vast, the universe of bad ideas is even larger, but the space we have the capacity to explore is actually relatively tiny. Since there's such an enormous number of things that *could* be done, the answer to "why aren't they done?" is almost always going to "because people don't think they're important enough to do them instead of all the other things they're doing". Deciding how to spend our time on Python-related efforts is a matter of perceived priorities and potential payoffs and those are always going to vary substantially across individuals. Being more willing to accept that as a rationale for not doing things would go a long way towards reducing the negative reactions - I know mine arise not so much for your initial suggestions (which are often, although not always, quite reasonable ideas in a world where we had unlimited development resources), but from subsequently continuing to push them in the face "because we're simply not interested in doing that" and "the status quo may not be perfect, but it's good enough that it isn't worth the hassle of changing" responses. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From victor.stinner at haypocalc.com Tue Feb 21 00:34:59 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Tue, 21 Feb 2012 00:34:59 +0100 Subject: [Python-ideas] Lack a API WCHAR* PyUnicode_WCHAR_DATA(PyObject *o), This is important for OS dependent feature port. In-Reply-To: References: <20120219192921.0e366a41@pitrou.net> Message-ID: It's not a lack, it's a design choice. Unicode strings are no more stored as wchar_t* in Python 3.3, but as a compact storage (1, 2 or 4 bytes per character). The conversion to wchar_t* require a conversion in most cases (no conversion is needed if the string already uses sizeof(wchar_t) bytes per character). From steve at pearwood.info Tue Feb 21 00:40:10 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 21 Feb 2012 10:40:10 +1100 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: <4F421B2A.6090100@kozea.fr> Message-ID: <4F42D9DA.9040502@pearwood.info> anatoly techtonik wrote: > On Mon, Feb 20, 2012 at 1:06 PM, Simon Sapin wrote: > >> I often find this in my scripts/projects, that I run directly from >>> checkout: >>> >>> DEVPATH =os.path.dirname(os.path.**abspath(__file__)) >>> sys.path.insert(0,DEVPATH) >>> >>> >> You shouldn?t have to do that if you?re running 'python something.py' >> > > But I did for some reason, and right now I can't even say if it was > Windows, Linux, FreeBSD, PyPy, IPython, gdb or debugging from IDE. If you can't say why you did it, how can we judge whether you did it for a good reason or a bad reason? Having a user-accessible search path is not a hack, or if it is, it is a hack in the positive sense: a feature, not a design bug. The same concept is used by Unix tools, via the PATH environment variable. It is "the simplest thing that could possibly work" for solving the problem of configurable search paths. Personally, I don't believe sys.pack needs to be brought back under control, because I don't believe it is out of control. Most code doesn't need to mess with the path; of that which does, most does not lead to problems. The only time I have seen path problems is when I have accidentally shadowed standard library modules, and they are simple to solve. Perhaps others have experienced harder problems, and if so, they have my sympathy, but I don't believe this is a problem so great that it needs to break backward compatibility. I would say, though, that nearly every time I have changed sys.path, I would have been satisfied with some way of importing directly from a known location. import spam from 'this/is/a/relative/path' from spam import ham from '/and/this/is/an/absolute/path' sort of thing, although I can see that import...from and from...import are too similar for comfort. -- Steven From wuwei23 at gmail.com Tue Feb 21 04:37:55 2012 From: wuwei23 at gmail.com (alex23) Date: Mon, 20 Feb 2012 19:37:55 -0800 (PST) Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: References: Message-ID: <32d96ac7-487c-4428-8aa7-7c2d7f43b296@i10g2000pbc.googlegroups.com> On Feb 20, 11:18?pm, anatoly techtonik wrote: > tl;dr :( You're _constantly_ bemoaning the "obvious" lack of clear communication paths in the Python community, and yet when you're pointed to an _explicit piece of documentation that answers your concerns_ you can't even be bothered to read it. It's pretty damn "obvious" that your only real issue with communication is when it isn't being spoon fed to you. From ncoghlan at gmail.com Tue Feb 21 05:09:00 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 21 Feb 2012 14:09:00 +1000 Subject: [Python-ideas] sys.path is a hack - bringing it back under control In-Reply-To: <32d96ac7-487c-4428-8aa7-7c2d7f43b296@i10g2000pbc.googlegroups.com> References: <32d96ac7-487c-4428-8aa7-7c2d7f43b296@i10g2000pbc.googlegroups.com> Message-ID: On Tue, Feb 21, 2012 at 1:37 PM, alex23 wrote: > On Feb 20, 11:18?pm, anatoly techtonik wrote: >> tl;dr :( > > You're _constantly_ bemoaning the "obvious" lack of clear > communication paths in the Python community, and yet when you're > pointed to an _explicit piece of documentation that answers your > concerns_ you can't even be bothered to read it. > > It's pretty damn "obvious" that your only real issue with > communication is when it isn't being spoon fed to you. In Anatoly's defence (and as he clarified in a later message), PEP 395 really *didn't* answer his question, and he had made his way through quite a bit of it (and PEP 3155, which it references) before giving up on trying to figure out how it was relevant - I had simply misunderstood the original email. After I *did* understand it, I pointed out that investigating unexpected or undesirable modifications to mutable containers when the data changes aren't enough to pinpoint the culprit is actually one of the valid use cases for monkeypatching rather than a reason to change the language behaviour. (That said, there are other, more valid, arguments in favour of providing a notification mechanism for sys.path changes, mainly relating to namespace packages. That's a different discussion, though, and one more appropriate for import-sig). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From dcolish at gmail.com Wed Feb 22 16:49:14 2012 From: dcolish at gmail.com (Dan Colish) Date: Wed, 22 Feb 2012 07:49:14 -0800 Subject: [Python-ideas] Make Difflib example callable as module __main__ Message-ID: <4F450E7A.4060804@gmail.com> Hey, I was reading over the difflib docs this morning and when I got to the bottom, I expected, probably due to lack of coffee, that the example would be callable as the module from the command line. There are already a number of modules which export command line functionality, ie. unittest, and I thought it would be great if difflib module offered the same. The code is pretty much there in the example from the documentation. It would just need to be included in the module itself. --Dan From dreamingforward at gmail.com Wed Feb 22 20:47:57 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Wed, 22 Feb 2012 12:47:57 -0700 Subject: [Python-ideas] doctest (re-send to list) Message-ID: On Mon, Feb 20, 2012 at 11:28 AM, Barry Warsaw wrote: > On Feb 17, 2012, at 02:57 PM, Mark Janssen wrote: > FWIW, I think doctests are fantastic and I use them all the time. ?There are > IMO a couple of things to keep in mind: > > ?- doctests are documentation first. ?Specifically, they are testable > ? documentation. ?What better way to ensure that your documentation is > ? accurate and up-to-date? ?(And no, I do not generally find skew between the > ? code and the separate-file documentation.) > > ?- I personally dislike docstring doctests, and much prefer separate reST > ? documents. ?These have several advantages, such as the ability to inject > ? names into doctests globals (use with care though), and the ability to set > ? up the execution context for doctests (see below). ?The fact that it's so > ? easy to turn these into documentation with Sphinx is a huge win. > > Since so many people point this out, let me say that I completely agree that > doctests are not a *replacement* for unittests, but they are a fantastic > *complement* to unittests. ?When I TDD, I always start writing the > (testable) documentation first, because if I cannot explain the component > under test in clearly intelligible English, then I probably don't really > understand what it is I'm trying to write. > > My doctests usually describe mostly the good path through the API. > Occasionally I'll describe error modes if I think those are important for > understanding how to use the code. ?However, for all those fuzzy corner cases, > weird behaviors, bug fixes, etc., unittests are much better suited because > ensuring you've fixed these problems and don't regress in the future doesn't > help the narrative very much. I think is an example of (mal)adapting to an incomplete module, rather than fixing it. ?I think doctest can handle all the points you're making. ?See clarification pointers below... >>1. Execution context determined by outer-scope doctest defintions. > > Can you explain this one? I gave an example in a prior message on this thread, dated Feb 17. ?I think it's clear there but let me know. Basically, the idea is that since the class def can also have a docstring, where better would setup and teardown code go to provide the execution context of the inner method docstrings? Now the question: ?is it useful or appropriate to put setup and teardown code in a classdef docstring? ?Well, I think this requires a committment on the behalf of the coder/documentor to concoct useful (didactic) example that could go there. ?For example, (as in the prior-referenced message) I imagine putting example of defining a variable of the classes type (">>> g = Graph({some complex, interesting initialization})"), which might return a (testable) value upon creation. Now this could, logically, be put in the classes __init__ method, but that doesn't make sense for defining an execution context, and *in addition*, that can be saved for those complex corner cases you mentioned earlier. > I usually put all this in an additional_tests() method, such as: Yes, I do the same for my modules with doctests. ?A dummy function which can catch all the non-interesting tests. ?This, still superior, in my opinion, than unittest. ?It is easier syntactically, as well as for casual users of your code (It has no leaning curve like understanding unittest). This superiority to unittest, by the way, is only realized if the second suggestion (smart comparisons) is implemented into doctest. >>2. Smart Comparisons that will detect output of a non-ordered type >>(dict/set), lift and recast it and do a real comparison. > > I'm of mixed mind with these. ?Yes, you must be careful with ordering, but I > find it less readable to just sort() some dictionary output for example. ?What > I've found much more useful is to iterate over the sorted keys of a dictionary > and print the key/values pairs. Yes, but you see you're destroying the very intent and spirit of doctest. ?The point is to make literate documentation. ?If you adapt to it's incompleteness, you reduce the power of it. >>Without #1, "literate testing" becomes awash with re-defining re-used >>variables which, generally, also detracts from exact purpose of the >>test -- this creates testdoc noise and the docs become less useful. >>Without #2, "readable docs" nicely co-aligning with "testable docs" >>tends towards divergence. > > I've no doubt that doctests could be improved, but I actually find them quite > usable as is, with just a little bit of glue code to get it all hooked up. ?As > I say though, I'm biased against docstring doctests. Well, hopefully, I've convinced you a little that the limitations in doctests over unittests are almost, if not entirely due, to the incompleteness of the module. ?If the two items I mentioned were implemented I think it would be far superior to unittest. ?(Corner cases, etc can all find a place, because every corner case should be documented somewhere anyway!!) Cheers!! mark santa fe, nm From tjreedy at udel.edu Wed Feb 22 22:40:32 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 22 Feb 2012 16:40:32 -0500 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F450E7A.4060804@gmail.com> References: <4F450E7A.4060804@gmail.com> Message-ID: On 2/22/2012 10:49 AM, Dan Colish wrote: > I was reading over the difflib docs this morning and when I got to the > bottom, I expected, probably due to lack of coffee, that the example > would be callable as the module from the command line. This is slightly garbled, but after looking, I see what you mean. As the doc says, the 'example' is available as Tools/Scripts/diff. Tools/Scripts/ndiff is another command-line front end for difflib. I believe difflib was extracted from the original version of ndiff. > There are already > a number of modules which export command line functionality, ie. > unittest, and I thought it would be great if difflib module offered the > same. If you run difflib directly, it runs difflib._test. which runs a doctest on difflib. Most modules do something similar. Having a real command-line interface in the module itself is unusual. > The code is pretty much there in the example from the > documentation. It would just need to be included in the module itself. I don't immediately see it as worth the trouble. I bet someone somewhere has a script that uses the interface in its current location. -- Terry Jan Reedy From dcolish at gmail.com Wed Feb 22 22:47:59 2012 From: dcolish at gmail.com (Dan Colish) Date: Wed, 22 Feb 2012 13:47:59 -0800 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> Message-ID: <4F45628F.6060400@gmail.com> On 2/22/12 1:40 PM, Terry Reedy wrote: > On 2/22/2012 10:49 AM, Dan Colish wrote: > >> I was reading over the difflib docs this morning and when I got to the >> bottom, I expected, probably due to lack of coffee, that the example >> would be callable as the module from the command line. > > This is slightly garbled, but after looking, I see what you mean. > As the doc says, the 'example' is available as Tools/Scripts/diff. > Tools/Scripts/ndiff is another command-line front end for difflib. > I believe difflib was extracted from the original version of ndiff. > Yes, I realized shortly after sending how unintelligible that sounded. Yes, even thought those tools exist, they are not installed as part of the Python build. > > There are already >> a number of modules which export command line functionality, ie. >> unittest, and I thought it would be great if difflib module offered the >> same. > > If you run difflib directly, it runs difflib._test. which runs a > doctest on difflib. Most modules do something similar. Having a real > command-line interface in the module itself is unusual. > Oh, I was unaware of that behavior. That's really good to know. Is this behavior documented? > > The code is pretty much there in the example from the >> documentation. It would just need to be included in the module itself. > > I don't immediately see it as worth the trouble. I bet someone > somewhere has a script that uses the interface in its current location. > I didn't think it would be that much trouble. It would be simple to install the scripts from Tools/Scripts. Either way I liked the idea of providing a cli frontend to difflib as part of the python install. --Dan From ncoghlan at gmail.com Wed Feb 22 23:08:08 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 23 Feb 2012 08:08:08 +1000 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> Message-ID: On Thu, Feb 23, 2012 at 7:40 AM, Terry Reedy wrote: > If you run difflib directly, it runs difflib._test. which runs a doctest on > difflib. Most modules do something similar. Having a real command-line > interface in the module itself is unusual. That's largely a historical artifact though - prior to -m direct execution was a pain, so the only time it really happened was in a source checkout during development. (plus I don't believe regrtest always had selective test execution, so run the library directly was a good way to only run some of the tests). If there's useful functionality that can be provided via -m, I'm a fan of moving tests out of the way to make room for it (it's also a good opportunity to make sure regrtest is covering whatever __main__ execution tests). I think there's also an open tracker issue suggesting the creation of a dedicated section in the standard library docs that summarises all the modules that offer useful -m functionality. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From masklinn at masklinn.net Wed Feb 22 23:24:59 2012 From: masklinn at masklinn.net (Masklinn) Date: Wed, 22 Feb 2012 23:24:59 +0100 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> Message-ID: <99329814-E535-4C2E-9762-6D53BBE6D65B@masklinn.net> On 2012-02-22, at 23:08 , Nick Coghlan wrote: > On Thu, Feb 23, 2012 at 7:40 AM, Terry Reedy wrote: >> If you run difflib directly, it runs difflib._test. which runs a doctest on >> difflib. Most modules do something similar. Having a real command-line >> interface in the module itself is unusual. > > That's largely a historical artifact though - prior to -m direct > execution was a pain, so the only time it really happened was in a > source checkout during development. (plus I don't believe regrtest > always had selective test execution, so run the library directly was a > good way to only run some of the tests). > > If there's useful functionality that can be provided via -m, I'm a fan > of moving tests out of the way to make room for it (it's also a good > opportunity to make sure regrtest is covering whatever __main__ > execution tests). > > I think there's also an open tracker issue suggesting the creation of > a dedicated section in the standard library docs that summarises all > the modules that offer useful -m functionality. Last time this popped up, Raymond Hettinger noted undocumented command-line interfaces to stdlib modules are mostly intentional: http://mail.python.org/pipermail/docs/2011-February/003171.html Maybe things have changed since, at the time the sentiment Raymond expressed was pretty much "not going to happen". But if you want a list, there's one at http://www.reddit.com/r/Python/comments/fofan/suggestion_for_a_python_blogger_figure_out_what/ Though things may have changed since and it's for Python 2, it's a starting point. From ncoghlan at gmail.com Thu Feb 23 01:27:45 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 23 Feb 2012 10:27:45 +1000 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <99329814-E535-4C2E-9762-6D53BBE6D65B@masklinn.net> References: <4F450E7A.4060804@gmail.com> <99329814-E535-4C2E-9762-6D53BBE6D65B@masklinn.net> Message-ID: On Thu, Feb 23, 2012 at 8:24 AM, Masklinn wrote: > Last time this popped up, Raymond Hettinger noted undocumented > command-line interfaces to stdlib modules are mostly intentional: > http://mail.python.org/pipermail/docs/2011-February/003171.html In my view, the most important points in Raymond's email are the first and the last: * Many of the undocumented command-line interfaces are intentionally undocumented -- they were there for the convenience of the developer for exercising the module as it was being developed and are not part of the official API. Most are not production quality and would have been done much differently if that had been the intent. * All that being said, there are some exceptions and it make may sense to document the interface in some where we really do want a command-line app. I'll look at any patches you want to submit, but try to not go wild turning the library into a suite of applications. For the most part, that is not what the standard library is about. What I'm envisioning is a dedicated section along the lines of X. Command Line Functionality in the Standard Library X.1 Supported Command Line Interfaces This section would list modules that provide a command line interface as detailed in the module documentation. A brief description would be given here, along with a link to the relevant section of the module docs. It would mainly consist of Python specific utilities for dumping diagnostic information about the interpreter's own state or analysing Python programs. Any CLIs in this section should also have associated unittests in their regression test suites. Interpreter Diagnostics - site - platform - locale Execution and Analysis of Python Code - runpy - unittest - doctest - pydoc - timeit - dis - tokenize - pdb - profile - pstats - modulefinder X.2 Unsupported Command Line Interfaces This section would list modules that offer command line functionality that is *not* designed to be production quality, but rather exists primarily as an interactive testing tool for sanity checking when working on the modules themselves. The only documentation of the functionality would be the brief descriptions here and the module's own interactive help (if any). It should be made clear that these interfaces are *not* covered by the regression test suite and they may break without warning. All the simple cross-platform file processing, networking and protocol handling utilities would be listed here. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From dcolish at gmail.com Thu Feb 23 05:56:31 2012 From: dcolish at gmail.com (Dan Colish) Date: Wed, 22 Feb 2012 20:56:31 -0800 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> <99329814-E535-4C2E-9762-6D53BBE6D65B@masklinn.net> Message-ID: <4F45C6FF.9060402@gmail.com> On 2/22/12 4:27 PM, Nick Coghlan wrote: > On Thu, Feb 23, 2012 at 8:24 AM, Masklinn wrote: >> Last time this popped up, Raymond Hettinger noted undocumented >> command-line interfaces to stdlib modules are mostly intentional: >> http://mail.python.org/pipermail/docs/2011-February/003171.html > In my view, the most important points in Raymond's email are the first > and the last: > > * Many of the undocumented command-line interfaces are > intentionally undocumented -- they were there for the > convenience of the developer for exercising the module > as it was being developed and are not part of the official API. > Most are not production quality and would have been done > much differently if that had been the intent. This makes perfect sense. If they are going to be documented then they need to work well. Just going over a few of the ones listed on the reddit list, I ran into a number of issues with their behavior. Dis was one example of a very useful module with a cli interface that could use some improvement. > > What I'm envisioning is a dedicated section along the lines of > > X. Command Line Functionality in the Standard Library > X.1 Supported Command Line Interfaces > This section would list modules that provide a command line interface > as detailed in the module documentation. A brief description would be > given here, along with a link to the relevant section of the module > docs. It would mainly consist of Python specific utilities for dumping > diagnostic information about the interpreter's own state or analysing > Python programs. Any CLIs in this section should also have associated > unittests in their regression test suites. > > Interpreter Diagnostics > - site > - platform > - locale > > Execution and Analysis of Python Code > - runpy > - unittest > - doctest > - pydoc > - timeit > - dis > - tokenize > - pdb > - profile > - pstats > - modulefinder > > X.2 Unsupported Command Line Interfaces > > This section would list modules that offer command line functionality > that is *not* designed to be production quality, but rather exists > primarily as an interactive testing tool for sanity checking when > working on the modules themselves. The only documentation of the > functionality would be the brief descriptions here and the module's > own interactive help (if any). It should be made clear that these > interfaces are *not* covered by the regression test suite and they may > break without warning. > > All the simple cross-platform file processing, networking and protocol > handling utilities would be listed here. That sounds like a good guide to getting started. I like the idea of only supporting modules which help with python development. I am also wondering if libraries which are not going to be supported should have their cli removed? I've come around to see difflib is probably not that critical for that since we're all using hg these days. Finally, I tried a number of searches in the bug tracker to see if a ticket for something like this existed and I found nothing. Nick had mentioned that a ticket might already exist? --Dan From techtonik at gmail.com Thu Feb 23 11:56:37 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 23 Feb 2012 12:56:37 +0200 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F450E7A.4060804@gmail.com> References: <4F450E7A.4060804@gmail.com> Message-ID: On Wed, Feb 22, 2012 at 6:49 PM, Dan Colish wrote: > Hey, > > I was reading over the difflib docs this morning and when I got to the > bottom, I expected, probably due to lack of coffee, that the example > would be callable as the module from the command line. There are already > a number of modules which export command line functionality, ie. > unittest, and I thought it would be great if difflib module offered the > same. The code is pretty much there in the example from the > documentation. It would just need to be included in the module itself. +1 if it will produce git-style unified patches by default It seems that every single VCS in Python reinvents own differ. -m option will help it become more polished/useful -- anatoly t. From rob.cliffe at btinternet.com Thu Feb 23 12:01:53 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Thu, 23 Feb 2012 11:01:53 +0000 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> Message-ID: <4F461CA1.3020503@btinternet.com> Can I put in a plea that postings to this list try to minimise the use of acronyms and jargon that may not be universally intelligible? This list is often read with interest by non-specialists such as myself. I have no idea for example what "VCS" means. Thanks Rob Cliffe On 23/02/2012 10:56, anatoly techtonik wrote: > On Wed, Feb 22, 2012 at 6:49 PM, Dan Colish wrote: >> Hey, >> >> I was reading over the difflib docs this morning and when I got to the >> bottom, I expected, probably due to lack of coffee, that the example >> would be callable as the module from the command line. There are already >> a number of modules which export command line functionality, ie. >> unittest, and I thought it would be great if difflib module offered the >> same. The code is pretty much there in the example from the >> documentation. It would just need to be included in the module itself. > +1 if it will produce git-style unified patches by default > It seems that every single VCS in Python reinvents own differ. > -m option will help it become more polished/useful > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From p.f.moore at gmail.com Thu Feb 23 12:22:48 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 23 Feb 2012 11:22:48 +0000 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F461CA1.3020503@btinternet.com> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> Message-ID: On 23 February 2012 11:01, Rob Cliffe wrote: > I have no idea for example what "VCS" means. Version Control System (things like Subversion, Mercurial, or Git) Paul. From anacrolix at gmail.com Thu Feb 23 12:32:30 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Thu, 23 Feb 2012 19:32:30 +0800 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> Message-ID: I don't think it was an actual question, and clearly for Rob it's not a sustainable approach to be expanding acronyms on request. I'd suggest a acronym FAQ but that also isn't sustainable, and google won't always help. Status: Won't fix, maintain status quo. On Feb 23, 2012 7:25 PM, "Paul Moore" wrote: > On 23 February 2012 11:01, Rob Cliffe wrote: > > I have no idea for example what "VCS" means. > > Version Control System (things like Subversion, Mercurial, or Git) > > Paul. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Feb 23 13:15:40 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 23 Feb 2012 22:15:40 +1000 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> Message-ID: On Thu, Feb 23, 2012 at 9:32 PM, Matt Joiner wrote: > Status: Won't fix, maintain status quo. But also, since language related discussions *will* occasional encounter domain specific discussions, people shouldn't be afraid to ask that such jargon be clarified. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From steve at pearwood.info Thu Feb 23 13:31:21 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 23 Feb 2012 23:31:21 +1100 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F461CA1.3020503@btinternet.com> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> Message-ID: <4F463199.9090504@pearwood.info> Rob Cliffe wrote: > Can I put in a plea that postings to this list try to minimise the use > of acronyms and jargon that may not be universally intelligible? "Universally intelligible" is an awfully big request. There are English speakers who don't know what you mean by either "postings" or "list", since both of those are themselves jargon. (My parents, for two.) To say nothing of children or non-English speakers who may not know what "acronym" means. > This list is often read with interest by non-specialists such as myself. > I have no idea for example what "VCS" means. While I sympathise, this is a list aimed at programmers, and while non-specialists are welcome, they are not the primary audience. I think you will be better off trying to learn programmer's jargon than asking programmers not to use common, if specialised, words in their technical conversations. You wouldn't expect (say) car enthusiasts to stop using the word "torque", or doctors not to use "dialysis", just because a non-specialist might wander by and be listening in. -- Steven From rob.cliffe at btinternet.com Thu Feb 23 13:42:55 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Thu, 23 Feb 2012 12:42:55 +0000 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F463199.9090504@pearwood.info> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> <4F463199.9090504@pearwood.info> Message-ID: <4F46344F.2040503@btinternet.com> I am a programmer, of some 30-odd years full-time. But that doesn't mean I understand every acronym of every specialised field under the sun. "Version Control System" instead of "VCS" is perfectly comprehensible and only takes a little longer to type. "VCS" meant nothing to me. I follow the postings on python-dev and python-ideas with keen interest. On 23/02/2012 12:31, Steven D'Aprano wrote: > Rob Cliffe wrote: >> Can I put in a plea that postings to this list try to minimise the >> use of acronyms and jargon that may not be universally intelligible? > > "Universally intelligible" is an awfully big request. There are > English speakers who don't know what you mean by either "postings" or > "list", since both of those are themselves jargon. (My parents, for > two.) To say nothing of children or non-English speakers who may not > know what "acronym" means. > > >> This list is often read with interest by non-specialists such as myself. >> I have no idea for example what "VCS" means. > > While I sympathise, this is a list aimed at programmers, and while > non-specialists are welcome, they are not the primary audience. > > I think you will be better off trying to learn programmer's jargon > than asking programmers not to use common, if specialised, words in > their technical conversations. You wouldn't expect (say) car > enthusiasts to stop using the word "torque", or doctors not to use > "dialysis", just because a non-specialist might wander by and be > listening in. > > From breamoreboy at yahoo.co.uk Thu Feb 23 14:36:29 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 23 Feb 2012 13:36:29 +0000 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F46344F.2040503@btinternet.com> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> <4F463199.9090504@pearwood.info> <4F46344F.2040503@btinternet.com> Message-ID: On 23/02/2012 12:42, Rob Cliffe wrote: > I am a programmer, of some 30-odd years full-time. > But that doesn't mean I understand every acronym of every specialised > field under the sun. > "Version Control System" instead of "VCS" is perfectly comprehensible > and only takes a little longer to type. "VCS" meant nothing to me. > I follow the postings on python-dev and python-ideas with keen interest. > > On 23/02/2012 12:31, Steven D'Aprano wrote: >> Rob Cliffe wrote: >>> Can I put in a plea that postings to this list try to minimise the >>> use of acronyms and jargon that may not be universally intelligible? >> >> "Universally intelligible" is an awfully big request. There are >> English speakers who don't know what you mean by either "postings" or >> "list", since both of those are themselves jargon. (My parents, for >> two.) To say nothing of children or non-English speakers who may not >> know what "acronym" means. >> >> >>> This list is often read with interest by non-specialists such as myself. >>> I have no idea for example what "VCS" means. >> >> While I sympathise, this is a list aimed at programmers, and while >> non-specialists are welcome, they are not the primary audience. >> >> I think you will be better off trying to learn programmer's jargon >> than asking programmers not to use common, if specialised, words in >> their technical conversations. You wouldn't expect (say) car >> enthusiasts to stop using the word "torque", or doctors not to use >> "dialysis", just because a non-specialist might wander by and be >> listening in. >> >> Including the postings that repeatedly ask people not to top post? -- Cheers. Mark Lawrence. From ned at nedbatchelder.com Thu Feb 23 15:35:31 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 23 Feb 2012 09:35:31 -0500 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F461CA1.3020503@btinternet.com> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> Message-ID: <4F464EB3.8020300@nedbatchelder.com> On 2/23/2012 6:01 AM, Rob Cliffe wrote: > Can I put in a plea that postings to this list try to minimise the use > of acronyms and jargon that may not be universally intelligible? > This list is often read with interest by non-specialists such as myself. > I have no idea for example what "VCS" means. > Thanks > Rob Cliffe > Googling either "vcs git" or "vcs python" shows "Version Control System" clearly highlighted right on the search results page. --Ned. From ethan at stoneleaf.us Thu Feb 23 15:00:48 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 23 Feb 2012 06:00:48 -0800 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F46344F.2040503@btinternet.com> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> <4F463199.9090504@pearwood.info> <4F46344F.2040503@btinternet.com> Message-ID: <4F464690.3000906@stoneleaf.us> Rob Cliffe wrote: > I am a programmer, of some 30-odd years full-time. > But that doesn't mean I understand every acronym of every specialised > field under the sun. > "Version Control System" instead of "VCS" is perfectly comprehensible > and only takes a little longer to type. "VCS" meant nothing to me. I also sympathize, but the reality is it's not going to happen. If the search engines don't help then post the question. ~Ethan~ From guido at python.org Thu Feb 23 19:00:21 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 23 Feb 2012 10:00:21 -0800 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F464690.3000906@stoneleaf.us> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> <4F463199.9090504@pearwood.info> <4F46344F.2040503@btinternet.com> <4F464690.3000906@stoneleaf.us> Message-ID: On Thu, Feb 23, 2012 at 6:00 AM, Ethan Furman wrote: > Rob Cliffe wrote: >> >> I am a programmer, of some 30-odd years full-time. >> But that doesn't mean I understand every acronym of every specialised >> field under the sun. >> "Version Control System" instead of "VCS" is perfectly comprehensible and >> only takes a little longer to type. ?"VCS" meant nothing to me. > > > I also sympathize, but the reality is it's not going to happen. ?If the > search engines don't help then post the question. +1 to this advice. I don't even sympathize. I have to look up the new jargon invented by the youngsters *all the time*. But using a search engine to educate myself is much more effective than asking around. And yes, if the search engine somehow doesn't help, just ask an explanation for a specific term. Not every problem can be fixed by asking everyone else to change their behavior. This is a technical list and technical jargon will be flouted. Deal with it. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Thu Feb 23 19:21:33 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 23 Feb 2012 19:21:33 +0100 Subject: [Python-ideas] Make Difflib example callable as module __main__ References: <4F450E7A.4060804@gmail.com> Message-ID: <20120223192133.7fdfc60f@pitrou.net> On Thu, 23 Feb 2012 08:08:08 +1000 Nick Coghlan wrote: > On Thu, Feb 23, 2012 at 7:40 AM, Terry Reedy wrote: > > If you run difflib directly, it runs difflib._test. which runs a doctest on > > difflib. Most modules do something similar. Having a real command-line > > interface in the module itself is unusual. > > That's largely a historical artifact though - prior to -m direct > execution was a pain, so the only time it really happened was in a > source checkout during development. (plus I don't believe regrtest > always had selective test execution, so run the library directly was a > good way to only run some of the tests). > > If there's useful functionality that can be provided via -m, I'm a fan > of moving tests out of the way to make room for it (it's also a good > opportunity to make sure regrtest is covering whatever __main__ > execution tests). +1 for moving self-tests to the regular test suite. Nobody, and especially not the buildbots, runs self-tests included in __main__ sections. (and, as a matter of fact, many of those may be broken without anyone noticing) Regards Antoine. From tjreedy at udel.edu Thu Feb 23 20:24:35 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 23 Feb 2012 14:24:35 -0500 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F464EB3.8020300@nedbatchelder.com> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> <4F464EB3.8020300@nedbatchelder.com> Message-ID: On 2/23/2012 9:35 AM, Ned Batchelder wrote: > On 2/23/2012 6:01 AM, Rob Cliffe wrote: >> Can I put in a plea that postings to this list try to minimise the use >> of acronyms and jargon that may not be universally intelligible? >> This list is often read with interest by non-specialists such as myself. >> I have no idea for example what "VCS" means. > Googling either "vcs git" or "vcs python" shows "Version Control System" > clearly highlighted right on the search results page. Googling just vcs returns as third hit "Version Control System" and a Wikipedia link. Alternatives like Verified Carbon Standard and Veterans Canteen Service are easily rejected in the context of this list ;-). -- Terry Jan Reedy From phd at phdru.name Thu Feb 23 20:56:27 2012 From: phd at phdru.name (Oleg Broytman) Date: Thu, 23 Feb 2012 23:56:27 +0400 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> <4F464EB3.8020300@nedbatchelder.com> Message-ID: <20120223195627.GD8946@iskra.aviel.ru> On Thu, Feb 23, 2012 at 02:24:35PM -0500, Terry Reedy wrote: > On 2/23/2012 9:35 AM, Ned Batchelder wrote: > >On 2/23/2012 6:01 AM, Rob Cliffe wrote: > >>Can I put in a plea that postings to this list try to minimise the use > >>of acronyms and jargon that may not be universally intelligible? > >>This list is often read with interest by non-specialists such as myself. > >>I have no idea for example what "VCS" means. > > >Googling either "vcs git" or "vcs python" shows "Version Control System" > >clearly highlighted right on the search results page. > > Googling just vcs returns as third hit "Version Control System" and > a Wikipedia link. Alternatives like Verified Carbon Standard and > Veterans Canteen Service are easily rejected in the context of this > list ;-). http://www.acronymfinder.com/VCS.html lists VCS at the second place. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From techtonik at gmail.com Thu Feb 23 21:19:16 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 23 Feb 2012 23:19:16 +0300 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> Message-ID: On Thu, Feb 23, 2012 at 2:22 PM, Paul Moore wrote: > On 23 February 2012 11:01, Rob Cliffe wrote: >> I have no idea for example what "VCS" means. > > Version Control System (things like Subversion, Mercurial, or Git) Bazaar and Mercurial in this case. Mercurial's differ: http://selenic.com/hg/file/816211dfa3a5/mercurial/pure/bdiff.py Bazaar's: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/diff.py -- anatoly t. From techtonik at gmail.com Thu Feb 23 21:22:09 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Thu, 23 Feb 2012 23:22:09 +0300 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: <4F46344F.2040503@btinternet.com> References: <4F450E7A.4060804@gmail.com> <4F461CA1.3020503@btinternet.com> <4F463199.9090504@pearwood.info> <4F46344F.2040503@btinternet.com> Message-ID: On Thu, Feb 23, 2012 at 3:42 PM, Rob Cliffe wrote: > I am a programmer, of some 30-odd years full-time. > But that doesn't mean I understand every acronym of every specialised field > under the sun. > "Version Control System" instead of "VCS" is perfectly comprehensible and > only takes a little longer to type. ?"VCS" meant nothing to me. > I follow the postings on python-dev and python-ideas with keen interest. VCS is a good new word to know in difflib context. Thanks for asking, -- anatoly t. From g.brandl at gmx.net Thu Feb 23 22:43:51 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 23 Feb 2012 22:43:51 +0100 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> Message-ID: Am 23.02.2012 11:56, schrieb anatoly techtonik: > On Wed, Feb 22, 2012 at 6:49 PM, Dan Colish wrote: >> Hey, >> >> I was reading over the difflib docs this morning and when I got to the >> bottom, I expected, probably due to lack of coffee, that the example >> would be callable as the module from the command line. There are already >> a number of modules which export command line functionality, ie. >> unittest, and I thought it would be great if difflib module offered the >> same. The code is pretty much there in the example from the >> documentation. It would just need to be included in the module itself. > > +1 if it will produce git-style unified patches by default > It seems that every single VCS in Python reinvents own differ. "Every single" makes it sounds like there are dozens... Apart from that: a diff/patch algorithm is such an integral part of version control that I would *not* expect them to use difflib, but something more sophisticated/optimized/etc. Georg From victor.stinner at haypocalc.com Fri Feb 24 00:34:49 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 24 Feb 2012 00:34:49 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ Message-ID: Hi, I'm trying to create read-only objects using a "frozendict" class. frozendict is a read-only dict. I would like to use frozendict for the class dict using a metaclass, but type.__new__() expects a dict and creates a copy of the input dict. I would be nice to support custom dict type: OrderedDict and frozendict for example. It looks possible to patch CPython to implement this feature, but first I would like first to know your opinion about this idea :-) Victor From pyideas at rebertia.com Fri Feb 24 00:51:22 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Thu, 23 Feb 2012 15:51:22 -0800 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: On Thu, Feb 23, 2012 at 3:34 PM, Victor Stinner wrote: > Hi, > > I'm trying to create read-only objects using a "frozendict" class. > frozendict is a read-only dict. I would like to use frozendict for the > class dict using a metaclass, but type.__new__() expects a dict and > creates a copy of the input dict. And you can't use __slots__ because...? Cheers, Chris From victor.stinner at haypocalc.com Fri Feb 24 01:27:37 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 24 Feb 2012 01:27:37 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: > And you can't use __slots__ because...? Hum, here is an example: --- def Enum(**kw): class _Enum(object): __slots__ = list(kw.keys()) def __new__(cls, **kw): inst = object.__new__(cls) for key, value in kw.items(): setattr(inst, key, value) return inst return _Enum(**kw) components = Enum(red=0, green=1, blue=2) print(components.red) components.red=2 print(components.red) components.unknown=10 --- components.unknown=10 raises an error, but not components.red=2. __slots__ denies to add new attributes, but not to modify existing attributes. The idea of using a frozendict is to deny the modification of an attribute value after the creation of the object. I don't see how to use __slots__ to implement such constraints. Victor From pyideas at rebertia.com Fri Feb 24 01:56:02 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Thu, 23 Feb 2012 16:56:02 -0800 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: On Thu, Feb 23, 2012 at 4:27 PM, Victor Stinner wrote: >> And you can't use __slots__ because...? > components.unknown=10 raises an error, but not components.red=2. > __slots__ denies to add new attributes, but not to modify existing > attributes. > > The idea of using a frozendict is to deny the modification of an > attribute value after the creation of the object. I don't see how to > use __slots__ to implement such constraints. Right, stupid question; didn't think that one all the way through. - Chris From ncoghlan at gmail.com Fri Feb 24 02:08:05 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Feb 2012 11:08:05 +1000 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 9:34 AM, Victor Stinner wrote: > Hi, > > I'm trying to create read-only objects using a "frozendict" class. > frozendict is a read-only dict. I would like to use frozendict for the > class dict using a metaclass, but type.__new__() expects a dict and > creates a copy of the input dict. Do you have a particular reason for doing it that way rather than just overriding __setattr__ and __delattr__ to raise TypeError? Or overriding the __dict__ descriptor to return a read-only proxy? There are a *lot* of direct calls to the PyDict APIs in the object machinery. Without benchmark results clearly showing a negligible speed impact, I'd be -1 on increasing the complexity of all that code (and making it slower) to support niche use cases that can already be handled a couple of other ways. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri Feb 24 04:48:32 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Feb 2012 13:48:32 +1000 Subject: [Python-ideas] Current status of PEP 403 (was Re: peps: Switch back to named functions, since the Ellipsis version degenerated badly) Message-ID: (switching lists to python-ideas, dropping python-dev and python-checkins) Context for anyone not following python-checkins: I recently moved PEP 3150 (statement local namespaces) to Withdrawn, rewrote PEP 403 with a different syntax proposal, retitled it to "Statement local class and function definitions" and moved it to Deferred. On Fri, Feb 24, 2012 at 2:37 AM, Jim Jewett wrote: > I understand that adding a colon and indent has its own problems, but > ... I'm not certain this is better, and I am certain that the desire > for indentation is strong enough to at least justify discussion in the > PEP. Fair point. The reason for the flat structure is that allowing a full suite screws with the scoping rules and gets us into the land of insane complexity that was PEP 3150. A decorator-inspired syntax makes it very clear (both to the reader and to the compiler) that there's only *one* name being forward referenced, rather than potentially hiding declarations of forward references an arbitrary distance from the statement that uses them. This doesn't actually lose any flexibility, since you can just make a forward reference to a class instead and use that as your local namespace (with ordinary attribute access semantics), rather than the brain-bender that was the proposed scoping rules for PEP 3150's given clause. The only other alternative syntax would be to use a custom suite definition that allowed only a single class or function definition statement, but I think having something that looks like a suite, but isn't one would be significantly worse than the current proposed syntax that merely allows a function (or class) definition's implied local name binding to be overridden with a custom statement. To (almost*) recreate the effect of an ordinary function definition with the in statement, you could write: in f = f def f(): pass And a decorated definition like: @deco1 @deco2 @deco3 def f(): pass Could (almost*) be expressed as: in f = deco1(deco2(deco3(f))) def f(): pass * The reason for the "almost" caveat is that, given the current PEP 403 semantics, recursive references to f() will resolve differently for the "in" statement cases - for the in statement, they will resolve directly to the innermost function definition, while for ordinary definitions they will be resolved according to the scoping rules for any name lookup. This could be an argument in favour of allowing *decorated* function and class definitions, rather than requiring that they be undecorated - if decorators are allowed, then recursive references would resolve directly to the post-decorated version. Alternatively, people could adopt a convention of prepending an underscore to the actual function name in cases where it mattered, meaning they would have easy access to *both* forms of the function (decorated and undecorated): in f = deco1(deco2(deco3(_f))) def _f(): return f, _f # decorated, undecorated In either case, whereas an ordinary recursive function definition can get confused by reassignments in the outer scope, an in-statement based definition would be truly recursive (via a cell reference) and hence ignore any subsequent changes in the outer namespace. Something else the PEP should mention explicitly is that, like __class__, a class object obviously won't be available while the class body is being executed. Only methods will be able to refer to the class by name, just as only methods can use __class__. Updates to PEP 403 are going to be pretty sporadic until some time after 3.3 release though - it's still very much in "this is a problem I am thinking about" territory rather than "this is a language addition I am proposing" (the latest round of updates were just to make sure I recorded my latest idea before I forgot about the details). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From stephen at xemacs.org Fri Feb 24 05:55:10 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 24 Feb 2012 13:55:10 +0900 Subject: [Python-ideas] Make Difflib example callable as module __main__ In-Reply-To: References: <4F450E7A.4060804@gmail.com> Message-ID: <8762ewn841.fsf@uwakimon.sk.tsukuba.ac.jp> Georg Brandl writes: > > +1 if it will produce git-style unified patches by default > > It seems that every single VCS in Python reinvents own differ. > > "Every single" makes it sounds like there are dozens... > > Apart from that: a diff/patch algorithm is such an integral part of > version control that I would *not* expect them to use difflib, but > something more sophisticated/optimized/etc. But Anatoly isn't talking about the algorithm. He's talking about the output, and actually, I would expect them to use something diff(1) and diff3(1) compatible for hunk-oriented changes.[1] My experience with home-grown diff functions suggests that very few produce output as good as that of diff(1), and only git seems to be an improvement (but it's not backward compatible, as the tracker/review tool maintainers regularly mention). It's true that there are better algorithms than the one used by diff(1) (such as the "patience diff" Bazaar uses, and git offers as an option), but there's no need to change the hunk format as far as I have seen, and the file headers could easily be standardized I would think. Footnotes: [1] Darcs for one allows non-hunk-based changes, specifically a token-replace patch. And there are binary diffs such as xdelta, and word diffs like wdiff, which necessarily use a different format since they are not line-oriented. From aquavitae69 at gmail.com Fri Feb 24 06:33:48 2012 From: aquavitae69 at gmail.com (David Townshend) Date: Fri, 24 Feb 2012 07:33:48 +0200 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 3:08 AM, Nick Coghlan wrote: > On Fri, Feb 24, 2012 at 9:34 AM, Victor Stinner > wrote: > > Hi, > > > > I'm trying to create read-only objects using a "frozendict" class. > > frozendict is a read-only dict. I would like to use frozendict for the > > class dict using a metaclass, but type.__new__() expects a dict and > > creates a copy of the input dict. > > Do you have a particular reason for doing it that way rather than just > overriding __setattr__ and __delattr__ to raise TypeError? > > Or overriding the __dict__ descriptor to return a read-only proxy? > > There are a *lot* of direct calls to the PyDict APIs in the object > machinery. Without benchmark results clearly showing a negligible > speed impact, I'd be -1 on increasing the complexity of all that code > (and making it slower) to support niche use cases that can already be > handled a couple of other ways. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > Can't this also be done using metaclasses? -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ronny.Pfannschmidt at gmx.de Fri Feb 24 09:07:31 2012 From: Ronny.Pfannschmidt at gmx.de (Ronny Pfannschmidt) Date: Fri, 24 Feb 2012 09:07:31 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: <4F474543.7090008@gmx.de> On 02/24/2012 01:27 AM, Victor Stinner wrote: >> And you can't use __slots__ because...? > > Hum, here is an example: > --- note untested, since written in mail client: class Enum(object): __slots__ = ("_data",) _data = WriteOnceDescr('_data') # left as exercise def __init__(self, **kw): self._data = frozendict(kw) def __getattr__(self, key): try: return self._data[key] except KeyError: raise AttributeError(key) > def Enum(**kw): > class _Enum(object): > __slots__ = list(kw.keys()) > def __new__(cls, **kw): > inst = object.__new__(cls) > for key, value in kw.items(): > setattr(inst, key, value) > return inst > return _Enum(**kw) > > components = Enum(red=0, green=1, blue=2) > print(components.red) > components.red=2 > print(components.red) > components.unknown=10 > --- > > components.unknown=10 raises an error, but not components.red=2. > __slots__ denies to add new attributes, but not to modify existing > attributes. > > The idea of using a frozendict is to deny the modification of an > attribute value after the creation of the object. I don't see how to > use __slots__ to implement such constraints. > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From simon.sapin at kozea.fr Fri Feb 24 10:22:32 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Fri, 24 Feb 2012 10:22:32 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: <4F4756D8.6000109@kozea.fr> Le 24/02/2012 06:33, David Townshend a ?crit : > Can't this also be done using metaclasses? Hi, Are you thinking of __prepare__? I did to but I read the details of this: http://docs.python.org/py3k/reference/datamodel.html#customizing-class-creation The class body can be executed "in" any mapping. Then I?m not sure but it looks like type.__new__ only takes a real dict. You have to do something in your overridden __new__ to eg. keep the OrderedDict?s order. Regards, -- Simon Sapin From techtonik at gmail.com Fri Feb 24 11:36:46 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 24 Feb 2012 12:36:46 +0200 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 4:08 AM, Nick Coghlan wrote: > On Fri, Feb 24, 2012 at 9:34 AM, Victor Stinner > wrote: >> Hi, >> >> I'm trying to create read-only objects using a "frozendict" class. >> frozendict is a read-only dict. I would like to use frozendict for the >> class dict using a metaclass, but type.__new__() expects a dict and >> creates a copy of the input dict. > > Do you have a particular reason for doing it that way rather than just > overriding __setattr__ and __delattr__ to raise TypeError? > > Or overriding the __dict__ descriptor to return a read-only proxy? > > There are a *lot* of direct calls to the PyDict APIs in the object > machinery. Without benchmark results clearly showing a negligible > speed impact, I'd be -1 on increasing the complexity of all that code > (and making it slower) to support niche use cases that can already be > handled a couple of other ways. I also think about reverse process of removing things that were proved to be underused. That probably requires AST spider that crawls existing Python project to see how various constructs are used. -- anatoly t. From techtonik at gmail.com Fri Feb 24 11:52:10 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 24 Feb 2012 12:52:10 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout Message-ID: Hello, subprocess is low level, cryptic, does too much, with poor usability, i.e. "don't make me think" is not about it. I don't know about you, but I can hardly write any subprocess call without spending at least 5-10 meditating over the documentation. So, I propose two high level KISS functions for shell utils (shutil) module: runret(command) - run command through shell, return ret code runout(command) - run command through shell, return output To avoid subprocess story (that makes Python too complicated) I deliberately limit the scope to: - executing from shell only - return one thing at a time I hope that this covers 80% of what _users_ need to execute commands from Python. If somebody needs more - there is `subprocess`. But if your own scripts are mostly outside these 80% - feel free to provide your user story and arguments, why this should be done in shutil and not in subprocess. Open questions: - security quoting for 'command' -- anatoly t. From dirkjan at ochtman.nl Fri Feb 24 11:58:21 2012 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Fri, 24 Feb 2012 11:58:21 +0100 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 11:52, anatoly techtonik wrote: > subprocess is low level, cryptic, does too much, with poor usability, > i.e. "don't make me think" is not about it. I don't know about you, > but I can hardly write any subprocess call without spending at least > 5-10 meditating over the documentation. So, I propose two high level > KISS functions for shell utils (shutil) module: > > runret(command) ? - run command through shell, return ret code > runout(command) ?- run command through shell, return output Have you seen subprocess.check_call() and subprocess.check_output()? I don't think your proposed functions add much benefit over these two. Cheers, Dirkjan From ncoghlan at gmail.com Fri Feb 24 11:59:42 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Feb 2012 20:59:42 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 8:52 PM, anatoly techtonik wrote: > Hello, > > subprocess is low level, cryptic, does too much, with poor usability, > i.e. "don't make me think" is not about it. I don't know about you, > but I can hardly write any subprocess call without spending at least > 5-10 meditating over the documentation. Hi Anatoly, I believe you'll find the simple convenience methods you are requesting already exist, in the form of subprocess.call(), subprocess.check_call() and subprocess.check_output(). The documentation has also been updated to emphasise these convenience functions over the Popen swiss army knife. If you do "pip install shell-command" you can also access the shell_call(), shell_check_call() and shell_output() functions I currently plan to include in subprocess for 3.3. (I'm not sure which versions of Python that module currently supports though - 2.7 and 3.2, IIRC). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ziade.tarek at gmail.com Fri Feb 24 12:10:08 2012 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 24 Feb 2012 12:10:08 +0100 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 11:52 AM, anatoly techtonik wrote: > Hello, > > subprocess is low level, cryptic, does too much, with poor usability, > i.e. "don't make me think" is not about it. I don't know about you, > but I can hardly write any subprocess call without spending at least > 5-10 meditating over the documentation. So, I propose two high level > KISS functions for shell utils (shutil) module: > > runret(command) - run command through shell, return ret code > mmm you are describing subprocess.call() here... I don't see how this new command makes thing better, besides shell=True. > runout(command) - run command through shell, return output > what is 'output' ? the stderr ? the stdout ? a merge of both ? what about subprocess.check_output() ? > > To avoid subprocess story (that makes Python too complicated) > I seems to me that the only complication here is shell=True, which seems ok to me to have it at False for security reasons. Cheers Tarek -- Tarek Ziad? | http://ziade.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Fri Feb 24 12:12:29 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 24 Feb 2012 13:12:29 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 1:59 PM, Nick Coghlan wrote: > On Fri, Feb 24, 2012 at 8:52 PM, anatoly techtonik wrote: >> Hello, >> >> subprocess is low level, cryptic, does too much, with poor usability, >> i.e. "don't make me think" is not about it. I don't know about you, >> but I can hardly write any subprocess call without spending at least >> 5-10 meditating over the documentation. > > I believe you'll find the simple convenience methods you are > requesting already exist, in the form of subprocess.call(), > subprocess.check_call() and subprocess.check_output(). The > documentation has also been updated to emphasise these convenience > functions over the Popen swiss army knife. I don't find the names of these functions more intuitive than Popen(). I also think they far from being simple, because (in the order of appearance): 1. they require try/catch 2. docs still refer Popen, which IS complicated 3. contain shell FUD 4. completely confuse users with stdout=PIPE or stderr=PIPE stuff http://docs.python.org/library/subprocess.html#subprocess.check_call My verdict - these fail to be simple, and require the same low-level system knowledge as Popen() for confident use. > If you do "pip install shell-command" you can also access the > shell_call(), shell_check_call() and shell_output() functions I > currently plan to include in subprocess for 3.3. (I'm not sure which > versions of Python that module currently supports though - 2.7 and > 3.2, IIRC). Don't you find strange that shell utils module don't have any functions for the main shell function - command execution? In game development current state of subprocess bloat is called "featurecrepping" and the "scope definition" is a method to cope with this disease. -- anatoly t. From techtonik at gmail.com Fri Feb 24 12:23:57 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 24 Feb 2012 13:23:57 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 2:10 PM, Tarek Ziad? wrote: > On Fri, Feb 24, 2012 at 11:52 AM, anatoly techtonik > wrote: >> >> subprocess is low level, cryptic, does too much, with poor usability, >> i.e. "don't make me think" is not about it. I don't know about you, >> but I can hardly write any subprocess call without spending at least >> 5-10 meditating over the documentation. So, I propose two high level >> KISS functions for shell utils (shutil) module: >> >> runret(command) ? - run command through shell, return ret code > > > mmm you are describing subprocess.call()? here... I don't see how this new > command makes thing better, besides shell=True. shutil.runret() - by definition has shell=True >> >> runout(command) ?- run command through shell, return output > > > what is 'output' ? the stderr ? the stdout ? a merge of both ? That's a high-level _user_ function. When user runs command in shell he sees both. So, this 'shell util' is an analogue. If you have you own user scripts that require stdout or stderr separately, I am free to discuss the cases. The main purpose of this function is to be useful from Python console, so the interface should be very simple to remember from the first try. Like runout(command, ret='stdout|stderr|both'). No universal PIPEs. > what about subprocess.check_output() ? See my reply above. >> To avoid subprocess story (that makes Python too complicated) > > > I seems to me that the only complication here is shell=True, which seems ok > to me to have it at False for security reasons. It won't be 'shell util' function anymore. If you're using shell execution functions, you already realize that will happen if your input parameters are not validated properly. Isolating calls that require shell execution in shutil module will also simplify security analysis for 3rd party libraries. -- anatoly t. From mwm at mired.org Fri Feb 24 12:25:25 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 24 Feb 2012 06:25:25 -0500 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: <20120224062525.0e168a39@bhuda.mired.org> On Fri, 24 Feb 2012 12:10:08 +0100 Tarek Ziad? wrote: > On Fri, Feb 24, 2012 at 11:52 AM, anatoly techtonik wrote: > > Hello, > > subprocess is low level, cryptic, does too much, with poor usability, > > i.e. "don't make me think" is not about it. I don't know about you, > > but I can hardly write any subprocess call without spending at least > > 5-10 meditating over the documentation. So, I propose two high level > > KISS functions for shell utils (shutil) module: > > runret(command) - run command through shell, return ret code > mmm you are describing subprocess.call() here... I don't see how this new > command makes thing better, besides shell=True. The stated purpose of the new functions is to allow people to run shell commands without thinking about them. That's a bad idea (isn't most programming without thinking about it?). The first problem is that it's a great way to add data injection vulnerabilities to your application. It's also a good way to introduce bugs in your application when asked to (for instance) process user-provided file names. -1 http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ncoghlan at gmail.com Fri Feb 24 12:31:19 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Feb 2012 21:31:19 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 9:12 PM, anatoly techtonik wrote: > Don't you find strange that shell utils module don't have any > functions for the main shell function - command execution? > In game development current state of subprocess bloat is called > "featurecrepping" and the "scope definition" is a method to cope with > this disease. They may still end up in shutil. I haven't really decided which location I like better. However, if you (or anyone else) wants to see Python's innate capabilities improve in this area (and they really are subpar compared to Perl 5, for example), your best bet is to download my Shell Command module and give me feedback on any problems you find with it via the BitBucket issue tracker. http://shell-command.readthedocs.org Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From techtonik at gmail.com Fri Feb 24 12:46:12 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 24 Feb 2012 13:46:12 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: <20120224062525.0e168a39@bhuda.mired.org> References: <20120224062525.0e168a39@bhuda.mired.org> Message-ID: On Fri, Feb 24, 2012 at 2:25 PM, Mike Meyer wrote: > On Fri, 24 Feb 2012 12:10:08 +0100 > Tarek Ziad? wrote: >> On Fri, Feb 24, 2012 at 11:52 AM, anatoly techtonik wrote: >> > Hello, >> > subprocess is low level, cryptic, does too much, with poor usability, >> > i.e. "don't make me think" is not about it. I don't know about you, >> > but I can hardly write any subprocess call without spending at least >> > 5-10 meditating over the documentation. So, I propose two high level >> > KISS functions for shell utils (shutil) module: >> > runret(command) ? - run command through shell, return ret code >> mmm you are describing subprocess.call() ?here... I don't see how this new >> command makes thing better, besides shell=True. > > The stated purpose of the new functions is to allow people to run > shell commands without thinking about them. That's a bad idea (isn't > most programming without thinking about it?). The first problem is > that it's a great way to add data injection vulnerabilities to your > application. It's also a good way to introduce bugs in your > application when asked to (for instance) process user-provided file > names. > > -1 The proposal doesn't took into account security implications, so your -1 is premature. I agree with your point that users should think about *security* when they run commands. But they should not think about how tons of different ways to execute their command and different combinations on different operating systems, *and* security implications about this. This is *the main point* that make subprocess module a failure, and a basis (main reason) of this proposal. If users choose to trade security over simplicity, they should know what the risks are, and what to do if they want to avoid them. So I completely support the idea of shutil docs containing a user friendly explanation of how to exploit and how to protect (i.e. use subprocess) from the flaws provided by this method of execution - if they need to protect. Python is not a Java - it should give users a choice of simple API when they don't need security, and let this choice of shooting themselves in the foot be explicit.. and simple. -- anatoly t. From masklinn at masklinn.net Fri Feb 24 12:50:35 2012 From: masklinn at masklinn.net (Masklinn) Date: Fri, 24 Feb 2012 12:50:35 +0100 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On 2012-02-24, at 12:12 , anatoly techtonik wrote: > > 1. they require try/catch No. > 2. docs still refer Popen, which IS complicated True. > 3. contain shell FUD No, they contain warnings, against shell injection security risks. Warnings are not FUD, it's not trying to sell some sort of alternative it's just warning that `shell=True` is dangerous on untrusted input. > 4. completely confuse users with stdout=PIPE or stderr=PIPE stuff > > http://docs.python.org/library/subprocess.html#subprocess.check_call On the one hand, these notes are a bit clumsy. On the other hand, piping is a pretty fundamental concept of shell execution, I see nothing wrong about saying that these functions *can't* be involved in pipes. In fact stating it upfront looks sensible. >> If you do "pip install shell-command" you can also access the >> shell_call(), shell_check_call() and shell_output() functions I >> currently plan to include in subprocess for 3.3. (I'm not sure which >> versions of Python that module currently supports though - 2.7 and >> 3.2, IIRC). > > Don't you find strange that shell utils module don't have any > functions for the main shell function - command execution? What "shell utils" module? Subprocess has exactly that in `call` and its variants. And "shutil" does not bill itself as a "shell utils" module right now, its description is "High-level file operations". > shutil.runret() - by definition has shell=True Great, so your recommendation is to be completely insecure by default? > That's a high-level _user_ function. When user runs command in shell > he sees both. So, this 'shell util' is an analogue. That makes no sense, when users invoke shell commands programmatically (which is what these APIs are about), they expect two semantically different reporting streams to be split, not to be merged, indistinguishable and unusable as a default. Dropping stderr on the ground may be an acceptable default but munging stdout and stderr is not. > The main purpose of this function is to be useful from Python console Then I'm not sure it belongs in subprocess or shutil, and users with that need should probably be driven towards iPython which provides extensive means of calling into the system shell in interactive sessions[0]. bpython may also provide such facilities. It *may* belong in the interactive interpreter's own namespace. > The main purpose of this function is to be > useful from Python console, so the interface should be very simple to > remember from the first try. Like runout(command, > ret='stdout|stderr|both'). As opposed to `check_output(command)`? > It won't be 'shell util' function anymore. If you're using shell > execution functions, you already realize that will happen if your > input parameters are not validated properly. This assertion demonstrably does not match reality, shell injections (the very reason for this warning) would not exist if this were the case. [0] http://ipython.org/ipython-doc/rel-0.12/interactive/reference.html#system-shell-access From mwm at mired.org Fri Feb 24 13:13:25 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 24 Feb 2012 07:13:25 -0500 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> Message-ID: <20120224071325.08f07d32@bhuda.mired.org> On Fri, 24 Feb 2012 13:46:12 +0200 anatoly techtonik wrote: > On Fri, Feb 24, 2012 at 2:25 PM, Mike Meyer wrote: > > On Fri, 24 Feb 2012 12:10:08 +0100 > > Tarek Ziad? wrote: > >> On Fri, Feb 24, 2012 at 11:52 AM, anatoly techtonik wrote: > >> > Hello, > >> > subprocess is low level, cryptic, does too much, with poor usability, > >> > i.e. "don't make me think" is not about it. I don't know about you, > >> > but I can hardly write any subprocess call without spending at least > >> > 5-10 meditating over the documentation. So, I propose two high level > >> > KISS functions for shell utils (shutil) module: > >> > runret(command) ? - run command through shell, return ret code > >> mmm you are describing subprocess.call() ?here... I don't see how this new > >> command makes thing better, besides shell=True. > > The stated purpose of the new functions is to allow people to run > > shell commands without thinking about them. That's a bad idea (isn't > > most programming without thinking about it?). The first problem is > > that it's a great way to add data injection vulnerabilities to your > > application. It's also a good way to introduce bugs in your > > application when asked to (for instance) process user-provided file > > names. > > -1 > The proposal doesn't took into account security implications, so your > -1 is premature. Failing to take into account security implications means the -1 isn't premature, it's mandatory! > I agree with your point that users should think about *security* when > they run commands. But they should not think about how tons of > different ways to execute their command and different combinations on > different operating systems, *and* security implications about this. This sounds like a documentation issue, not a code issue. In fact, checking the shutil docs (via pydoc) turns up: shutil - Utility functions for copying and archiving files and directory trees. Clearly, running commands is *not* part of this functionality, so these new functions don't belong there. > If users choose to trade security over simplicity, they should know > what the risks are, and what to do if they want to avoid them. So I > completely support the idea of shutil docs containing a user friendly > explanation of how to exploit and how to protect (i.e. use subprocess) > from the flaws provided by this method of execution - if they need to > protect. Python is not a Java - it should give users a choice of > simple API when they don't need security, and let this choice of > shooting themselves in the foot be explicit.. and simple. So now look at use cases. The "simple" method you propose is *only* safe to use on a very small set of constant strings. If any of the values in the string are supplied by the user in any way, you can't use it. If any of the arguments contain shell meta-characters, you either have to quote them or not use your method. Since you're explicitly proposing passing the command to the shell, the programmer doesn't even know which characters are meta-characters when they write the code. This means these functions - as proposed - are more attractive nuisances than useful utilities. Oddly enough, I read the Julia docs on external commands between my first answer and your reply, and their solution is both as simple as what you want, and safe. This inspired a counter proposal: How about adding your new function to subprocess, except instead of passing them to the shell, they use shlex to parse them, then call Popen with the appropriate arguments? shlex might need some work for this. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ncoghlan at gmail.com Fri Feb 24 13:14:42 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Feb 2012 22:14:42 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> Message-ID: On Fri, Feb 24, 2012 at 9:46 PM, anatoly techtonik wrote: > This is *the main point* that make subprocess module a failure, and a > basis (main reason) of this proposal. Anatoly, this is the exact kind of blanket statement that pisses people off and makes them stop listening to you. The subprocess module is not a failure by any means. Safely invoking subprocesses is a *hard problem*. Other languages make the choice "guarding against shell injections is a problem for the user to deal with" and allow them by default in their subprocess invocation interfaces. They also make the choice that the risk of data leakage through user provided format strings is something for the developer to worry about and allow implicit string interpolation. Python doesn't allow either of those as a *deliberate design choice*. The current behaviour isn't an accident, or due to neglect, or because we're stupid. Instead, we default to the more secure, less convenient options, and allow people to explicitly request the insecure behaviour if they either: 1. don't care; or 2. do care, but also know it isn't actually a problem for their use case. This is a *good thing* if you're an application programmer - secure defaults lets you conduct security audits by looking specifically for cases where the safety checks have been bypassed. However, it mostly sucks if you're wanting to use Python for system administration (or similar) tasks where the shell is an essential tool rather than a security risk and there's no untrusted data that comes anywhere near your script. I'll repeat my suggestion: if you want to do something *constructive* about this, get Shell Command from PyPI and start using it, as it aims to address both the shell invocation and the string interpolation aspects of this issue. If you find problems, report them on the module's issue tracker (although I'll point out in advance that STDERR being separate from STDOUT by default is *deliberate*. If people want them merged they can include a redirection in their shell command. Otherwise STDERR needs to remain mapped to the same stream as it is in the parent process so that tools like getpass() will still work in an invoked shell command). Regards, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri Feb 24 13:19:07 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Feb 2012 22:19:07 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: <20120224071325.08f07d32@bhuda.mired.org> References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: On Fri, Feb 24, 2012 at 10:13 PM, Mike Meyer wrote: > How about adding your new function to subprocess, except instead of > passing them to the shell, they use shlex to parse them, then call > Popen with the appropriate arguments? shlex might need some work for > this. http://shell-command.readthedocs.org >>> from shell_command import shell_call >>> shell_call("ls *.py") setup.py shell_command.py test_shell_command.py 0 >>> shell_call("ls {}", "*.py") ls: cannot access *.py: No such file or directory 2 >>> shell_call("ls {!u}", "*.py") setup.py shell_command.py test_shell_command.py 0 Unless someone uncovers a major design flaw in the next few months, at least ShellCommand, shell_call, shell_check_call and shell_output are likely to make an appearance in subprocess for 3.3. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri Feb 24 13:41:50 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 24 Feb 2012 22:41:50 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: <20120224071325.08f07d32@bhuda.mired.org> References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: On Fri, Feb 24, 2012 at 10:13 PM, Mike Meyer wrote: > Oddly enough, I read the Julia docs on external commands between my > first answer and your reply, and their solution is both as simple as > what you want, and safe. That *is* rather nice, although they never get around to actually explaining *how* to capture the output from the child processes (http://julialang.org/manual/running-external-programs/, for anyone else that's interested). It should definitely be possible to implement something along those lines as a third party library on top of subprocess (although it would be a lot more complicated than Shell Command is). Kenneth Reitz (author of "requests") has also spent some time tinkering with subprocess invocation API design concepts: https://github.com/kennethreitz/envoy Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From mwm at mired.org Fri Feb 24 13:59:51 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 24 Feb 2012 07:59:51 -0500 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: <20120224075951.0ec1076d@bhuda.mired.org> On Fri, 24 Feb 2012 22:19:07 +1000 Nick Coghlan wrote: > On Fri, Feb 24, 2012 at 10:13 PM, Mike Meyer wrote: > > How about adding your new function to subprocess, except instead of > > passing them to the shell, they use shlex to parse them, then call > > Popen with the appropriate arguments? shlex might need some work for > > this. > > http://shell-command.readthedocs.org That says: This module aims to take over where subprocess leaves off, providing convenient, low-level access to the system shell, that automatically handles filenames and paths containing whitespace, as well as protecting naive code from shell injection vulnerabilities. That's a backwards approach to security. Rather than allowing anything and turning off what you know isn't safe, you should disallow everything and turn on what you know is safe. So rather than trying to make the strings you pass to the shell safe, you should parse them yourself and avoid calling the shell at all. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From techtonik at gmail.com Fri Feb 24 14:00:25 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 24 Feb 2012 15:00:25 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 2:50 PM, Masklinn wrote: > On 2012-02-24, at 12:12 , anatoly techtonik wrote: >> >> 1. they require try/catch > > No. Quote from the docs: "Run command with arguments. Wait for command to complete. If the return code was zero then return, otherwise raise CalledProcessError." http://docs.python.org/library/subprocess.html#subprocess.check_call >> 2. docs still refer Popen, which IS complicated > > True. > >> 3. contain shell FUD > > No, they contain warnings, against shell injection security > risks. Warnings are not FUD, it's not trying to sell some sort > of alternative it's just warning that `shell=True` is dangerous > on untrusted input. Warnings would be o.k. if they provided at least some guidelines where shell=True can be useful and where do you need to use Popen (or escaping). Without positive examples, and a little research to show attack vectors (so that users can analyse if they are applicable in their specific case) it is FUD IMO. >> 4. completely confuse users with stdout=PIPE or stderr=PIPE stuff >> >> http://docs.python.org/library/subprocess.html#subprocess.check_call > > On the one hand, these notes are a bit clumsy. On the other hand, > piping is a pretty fundamental concept of shell execution, I see > nothing wrong about saying that these functions *can't* be involved > in pipes. In fact stating it upfront looks sensible. The point is that it makes things more complicated than necessary. As a system programmer I feel confident about all this stuff, but users struggle to get it and they blame Python for complexity, and I have to agree. We can change that with high level API. The API that will automatically provide a rolling buffer for output if required to avoid locks (for the missing info as a drawback), and remove headache about "what to do about that?". >>> If you do "pip install shell-command" you can also access the >>> shell_call(), shell_check_call() and shell_output() functions I >>> currently plan to include in subprocess for 3.3. (I'm not sure which >>> versions of Python that module currently supports though - 2.7 and >>> 3.2, IIRC). >> >> Don't you find strange that shell utils module don't have any >> functions for the main shell function - command execution? > > What "shell utils" module? Subprocess has exactly that in `call` > and its variants. And "shutil" does not bill itself as a > "shell utils" module right now, its description is > "High-level file operations". > >> shutil.runret() ?- by definition has shell=True > > Great, so your recommendation is to be completely insecure by default? Not "by default" - only if it is impossible to make shutil.run*() functions more secure. They only make sense with shell=True, so my recommendation is to analyse security implications and *let* users make their grounded choice. Not frighten them, but making them think about security. The difference. User friendly docs for shutil.run*() docs should be structured as following: 1. you are free to use these functions 2. but know that they are insecure 3. in these cases: 3.1 3.2 3.3 4. if you think these cases won't apply to your project, then feel free to use, otherwise look at subprocess Of course, if some cases 3.1-3.3 have workarounds, they should be mentioned. >> That's a high-level _user_ function. When user runs command in shell >> he sees both. So, this 'shell util' is an analogue. > > That makes no sense, when users invoke shell commands programmatically > (which is what these APIs are about), they expect two semantically > different reporting streams to be split, not to be merged, > indistinguishable and unusable as a default. Dropping stderr on the > ground may be an acceptable default but munging stdout and stderr is not. Conflict point: Do users care about stdout/stderr when they invoke shell commands? Do users care about stdout/stderr when they use Python syntax for invoking shell commands? These functions is no a syntax sugar for developers (as the aforementioned "alternatives" from subprocess modules are). They are helper for users. If you're a developer, who cares about pipes and needs programmatic acces - there is already a low level subprocess API with developer's defaults. If we speak about users: The standard shell console behaviour is to output both streams to the screen. That means that if I want to process this output, I don't know if it comes from stderr or stdout. So, if I want to process the output - I use Python to do this. If I know what I need the output from stderr only, I specify this explicitly. That's my default user story. >> The main purpose of this function is to be useful from Python console > > Then I'm not sure it belongs in subprocess or shutil, and users with that > need should probably be driven towards iPython which provides extensive > means of calling into the system shell in interactive sessions[0]. > bpython may also provide such facilities. I think it is a good idea to unify interface across interactive mode in Python. Hopefully shutil.copy and friends are already good enough so that they don't have reasons to reimplement them (and users to learn new commands). >> The main purpose of this function is to be >> useful from Python console, so the interface should be very simple to >> remember from the first try. Like runout(command, >> ret='stdout|stderr|both'). > > As opposed to `check_output(command)`? As opposed to check_output(command, *, stdin=None, stdout=None, stderr=None, shell=True) >> It won't be 'shell util' function anymore. If you're using shell >> execution functions, you already realize that will happen if your >> input parameters are not validated properly. > > This assertion demonstrably does not match reality, shell injections > (the very reason for this warning) would not exist if this were the > case. It is not assertion, it is a wannabe for shutil documentation to clarify shell injections problems to the level that allow users to make a reasonable choice, so if the user is "using shell execution functions he already realizes that will happen if his input parameters are not validated properly". -- anatoly t. From mwm at mired.org Fri Feb 24 14:09:31 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 24 Feb 2012 08:09:31 -0500 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: <20120224080931.6cad78db@bhuda.mired.org> On Fri, 24 Feb 2012 15:00:25 +0200 anatoly techtonik wrote: > On Fri, Feb 24, 2012 at 2:50 PM, Masklinn wrote: > > On 2012-02-24, at 12:12 , anatoly techtonik wrote: > >> 1. they require try/catch > > No. > Quote from the docs: > "Run command with arguments. Wait for command to complete. If the > return code was zero then return, otherwise raise CalledProcessError." > http://docs.python.org/library/subprocess.html#subprocess.check_call Quote from the docs: subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False) Run the command described by args. Wait for command to complete, then return the returncode attribute. No documented exceptions raised, so no need for try/catch. > >> 2. docs still refer Popen, which IS complicated > > True. > >> 3. contain shell FUD > > No, they contain warnings, against shell injection security > > risks. Warnings are not FUD, it's not trying to sell some sort > > of alternative it's just warning that `shell=True` is dangerous > > on untrusted input. > Warnings would be o.k. if they provided at least some guidelines where > shell=True can be useful and where do you need to use Popen (or > escaping). Without positive examples, and a little research to show > attack vectors (so that users can analyse if they are applicable in > their specific case) it is FUD IMO. You mean something like (quoting from the docs): Warning Executing shell commands that incorporate unsanitized input from an untrusted source makes a program vulnerable to shell injection, a serious security flaw which can result in arbitrary command execution. For this reason, the use of shell=True is strongly discouraged in cases where the command string is constructed from external input: http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From tshepang at gmail.com Fri Feb 24 14:11:06 2012 From: tshepang at gmail.com (Tshepang Lekhonkhobe) Date: Fri, 24 Feb 2012 15:11:06 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 13:31, Nick Coghlan wrote: > However, if you (or anyone else) wants to see Python's innate > capabilities improve in this area (and they really are subpar compared > to Perl 5, for example), your best bet is to download my Shell Command > module and give me feedback on any problems you find with it via the > BitBucket issue tracker. Just curious: If put in the stdlib, will the above-mentioned module bring CPython shell handling to Perl 5 level? From ncoghlan at gmail.com Fri Feb 24 15:11:57 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Feb 2012 00:11:57 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: <20120224075951.0ec1076d@bhuda.mired.org> References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <20120224075951.0ec1076d@bhuda.mired.org> Message-ID: On Fri, Feb 24, 2012 at 10:59 PM, Mike Meyer wrote: > That's a backwards approach to security. Rather than allowing anything > and turning off what you know isn't safe, you should disallow > everything and turn on what you know is safe. So rather than trying to > make the strings you pass to the shell safe, you should parse them > yourself and avoid calling the shell at all. Yes, that's why these are *separate functions* (each with "shell" in the name to make the shell's involvement rather hard to miss). Any application (rather than system administration script) that calls them with user provided data should immediately fail a security audit. The new APIs are intended specifically for system administrators that want the *system shell*, not a language level "cross platform" reinvention of it (and when it comes to shells, "cross platform" generally means, "POSIX even if you're on Windows, because we're not interesting in trying to reproduce Microsoft's idiosyncratic way of doing things"). The automatic quoting feature is mainly there to handle spaces in filenames - providing poorly structured programs with some minimal defence against shell injections is really just a bonus (although I admit I wasn't thinking about it that way when I wrote the current docs). As things stand, Python is a lousy language for system administration tasks - the standard APIs are either *very* low level (os.system()) or they're written almost entirely from the point of view of an application programmer (subprocess). Even when I *am* the administrator writing automation scripts for my own use, the subprocess library still keeps getting in the way, telling me it isn't safe to access my own shell. Normally, Python is pretty good about striking a sensible balance between "safe defaults" and "consenting adults", but it currently fails badly on this particular point. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From masklinn at masklinn.net Fri Feb 24 15:12:30 2012 From: masklinn at masklinn.net (Masklinn) Date: Fri, 24 Feb 2012 15:12:30 +0100 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On 2012-02-24, at 14:00 , anatoly techtonik wrote: > On Fri, Feb 24, 2012 at 2:50 PM, Masklinn wrote: >> On 2012-02-24, at 12:12 , anatoly techtonik wrote: >>> >>> 1. they require try/catch >> >> No. > > Quote from the docs: > "Run command with arguments. Wait for command to complete. If the > return code was zero then return, otherwise raise CalledProcessError." > http://docs.python.org/library/subprocess.html#subprocess.check_call Yes. If you want to run commands you just do. try/except are only needed if you call commands which may fail and want to handle them without quitting the whole interpreter. And for your stated use case of interactively calling those functions, there is no need whatsoever for try/catch. And `subprocess.call` returns the status code, no exception ever thrown. >>> 3. contain shell FUD >> >> No, they contain warnings, against shell injection security >> risks. Warnings are not FUD, it's not trying to sell some sort >> of alternative it's just warning that `shell=True` is dangerous >> on untrusted input. > > Warnings would be o.k. if they provided at least some guidelines where > shell=True can be useful and where do you need to use Popen (or > escaping). Without positive examples, and a little research to show > attack vectors (so that users can analyse if they are applicable in > their specific case) it is FUD IMO. http://docs.python.org/library/subprocess.html#frequently-used-arguments >>> 4. completely confuse users with stdout=PIPE or stderr=PIPE stuff >>> >>> http://docs.python.org/library/subprocess.html#subprocess.check_call >> >> On the one hand, these notes are a bit clumsy. On the other hand, >> piping is a pretty fundamental concept of shell execution, I see >> nothing wrong about saying that these functions *can't* be involved >> in pipes. In fact stating it upfront looks sensible. > > The point is that it makes things more complicated than necessary. How? > As > a system programmer I feel confident about all this stuff You feel confident about something which does not work, without warning? >>>> If you do "pip install shell-command" you can also access the >>>> shell_call(), shell_check_call() and shell_output() functions I >>>> currently plan to include in subprocess for 3.3. (I'm not sure which >>>> versions of Python that module currently supports though - 2.7 and >>>> 3.2, IIRC). >>> >>> Don't you find strange that shell utils module don't have any >>> functions for the main shell function - command execution? >> >> What "shell utils" module? Subprocess has exactly that in `call` >> and its variants. And "shutil" does not bill itself as a >> "shell utils" module right now, its description is >> "High-level file operations". >> >>> shutil.runret() - by definition has shell=True >> >> Great, so your recommendation is to be completely insecure by default? > > Not "by default" Oh? Because this: > - only if it is impossible to make shutil.run*() > functions more secure. They only make sense with shell=True, so my > recommendation is to analyse security implications and *let* users > make their grounded choice. Not frighten them, but making them think > about security. > > The difference. User friendly docs for shutil.run*() docs should be > structured as following: > 1. you are free to use these functions > 2. but know that they are insecure > 3. in these cases: > 3.1 > 3.2 > 3.3 > 4. if you think these cases won't apply to your project, then feel > free to use, otherwise look at subprocess > > Of course, if some cases 3.1-3.3 have workarounds, they should be mentioned. states precisely that the function would be insecure by default, and would have caveat warnings in the docs. Which is the correct approach to security? never as far as I know. >>> The main purpose of this function is to be useful from Python console >> >> Then I'm not sure it belongs in subprocess or shutil, and users with that >> need should probably be driven towards iPython which provides extensive >> means of calling into the system shell in interactive sessions[0]. >> bpython may also provide such facilities. > > I think it is a good idea to unify interface across interactive mode > in Python. Considering IPython uses syntactic extentions (a "!" prefix) and does not require any importing effort currently, I doubt that's going to happen. >>> It won't be 'shell util' function anymore. If you're using shell >>> execution functions, you already realize that will happen if your >>> input parameters are not validated properly. >> >> This assertion demonstrably does not match reality, shell injections >> (the very reason for this warning) would not exist if this were the >> case. > > It is not assertion, You may want to look up the definition of that word, I did not remove any context, you asserted people using shell-exec functions are aware of the risks. Which is, as, factually wrong. > it is a wannabe for shutil documentation to > clarify shell injections problems to the level that allow users to > make a reasonable choice, so if the user is "using shell execution > functions he already realizes that will happen if his input parameters > are not validated properly". Not sufficient when the default behavior is unsafe (and broken), as numerous users *will* discover the function through third parties and may never come close to the caveats they *should* know for the default usage of the function. From ncoghlan at gmail.com Fri Feb 24 15:16:37 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Feb 2012 00:16:37 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 11:11 PM, Tshepang Lekhonkhobe wrote: > Just curious: If put in the stdlib, will the above-mentioned module > bring CPython shell handling to Perl 5 level? Closer, but it's hard to match backticks and implicit interpolation for convenience (neither of which is going to happen in Python). However, the trade-off is that you get things like the ability to create pre-defined commands and easier invocation of shlex.quote when appropriate, along with exceptions for some errors that would otherwise pass silently. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From steve at pearwood.info Fri Feb 24 15:23:25 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 25 Feb 2012 01:23:25 +1100 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: <4F479D5D.6080102@pearwood.info> Nick Coghlan wrote: > On Fri, Feb 24, 2012 at 11:11 PM, Tshepang Lekhonkhobe > wrote: >> Just curious: If put in the stdlib, will the above-mentioned module >> bring CPython shell handling to Perl 5 level? > > Closer, but it's hard to match backticks and implicit interpolation > for convenience (neither of which is going to happen in Python). Anyone wanting to use Python as a system shell should look at IPython rather than the standard Python interactive interpreter. http://ipython.org/ipython-doc/dev/interactive/shell.html -- Steven From p.f.moore at gmail.com Fri Feb 24 15:32:05 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 24 Feb 2012 14:32:05 +0000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: On 24 February 2012 12:41, Nick Coghlan wrote: > Kenneth Reitz (author of "requests") has also spent some time > tinkering with subprocess invocation API design concepts: > https://github.com/kennethreitz/envoy Vinay Sanjip extended this with "sarge" (available on PyPI, IIRC). One key advantage of sarge for me is that it handles piping and redirection in a cross-platfom manner, rather than just deferring to the shell. (I think envoy does this too, but it's not very reliable on WIndows from what I recall of my brief experiments). Paul. From ncoghlan at gmail.com Fri Feb 24 15:32:23 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Feb 2012 00:32:23 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: <4F479D5D.6080102@pearwood.info> References: <4F479D5D.6080102@pearwood.info> Message-ID: On Sat, Feb 25, 2012 at 12:23 AM, Steven D'Aprano wrote: > Anyone wanting to use Python as a system shell should look at IPython rather > than the standard Python interactive interpreter. > > http://ipython.org/ipython-doc/dev/interactive/shell.html Sure, but unless we add ! statements to Python itself, that doesn't help with shell *scripting*. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Fri Feb 24 15:54:19 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Feb 2012 00:54:19 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: On Sat, Feb 25, 2012 at 12:32 AM, Paul Moore wrote: > On 24 February 2012 12:41, Nick Coghlan wrote: >> Kenneth Reitz (author of "requests") has also spent some time >> tinkering with subprocess invocation API design concepts: >> https://github.com/kennethreitz/envoy > > Vinay Sanjip extended this with "sarge" (available on PyPI, IIRC). One > key advantage of sarge for me is that it handles piping and > redirection in a cross-platfom manner, rather than just deferring to > the shell. (I think envoy does this too, but it's not very reliable on > WIndows from what I recall of my brief experiments). Ah, I knew I'd seen a more polished version of that somewhere - Vinay posted about it a while back. As I see it, the two complement each other fairly nicely: shell_command is for direct access to the system shell. Appropriate when you're writing platform specific administration scripts. sarge is for cross platform scripting support. I'm actually not sure what this is useful for (since the default Windows shell has different spellings for so many basic commands and different syntax for environment variable expansion, it seems easier to just use the *actual* cross platform abstractions in the os module instead), but apparently it's good for something (or Vinay wouldn't have taken the time to write it). Of course, since it's just a convenience wrapper around Popen, ShellCommand does let you get pretty cute: >>> import sys >>> from functools import partial >>> from shell_command import ShellCommand >>> code = """ ... def f(): ... print("Python in a subprocess, easy as!") ... f() ... """ >>> PyCmd = partial(ShellCommand, executable=sys.executable) >>> PyCmd(code).shell_call() Python in a subprocess, easy as! 0 >>> x = PyCmd("print('Reporting for duty!')").shell_output() >>> x 'Reporting for duty!' (I didn't actually do a great deal in ShellCommand to enable that - it's just a matter of passing all the keyword args through to subprocess.Popen) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From simon.sapin at kozea.fr Fri Feb 24 16:04:06 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Fri, 24 Feb 2012 16:04:06 +0100 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: <4F47A6E6.5030409@kozea.fr> Le 24/02/2012 11:52, anatoly techtonik a ?crit : > runret(command) - run command through shell, return ret code > runout(command) - run command through shell, return output Hi, Brevity is nice, but I had no idea what either of these functions is supposed to do before reading these descriptions. The names could be more explicit. (By the way, I agree with other issues raised in this thread. This was only my first impression.) Regards, -- Simon Sapin From amcnabb at mcnabbs.org Fri Feb 24 18:24:54 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Fri, 24 Feb 2012 10:24:54 -0700 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <20120224075951.0ec1076d@bhuda.mired.org> Message-ID: <20120224172454.GA3795@mcnabbs.org> On Sat, Feb 25, 2012 at 12:11:57AM +1000, Nick Coghlan wrote: > > As things stand, Python is a lousy language for system administration > tasks - the standard APIs are either *very* low level (os.system()) or > they're written almost entirely from the point of view of an > application programmer (subprocess). Even when I *am* the > administrator writing automation scripts for my own use, the > subprocess library still keeps getting in the way, telling me it isn't > safe to access my own shell. > > Normally, Python is pretty good about striking a sensible balance > between "safe defaults" and "consenting adults", but it currently > fails badly on this particular point. I disagree with this analysis. Python, with its fantastic subprocess module, is the only language I really trust for system administration tasks. Most languages provide "shell=True" as the default, making them extremely frustrating for system administration. Every time I choose to write a shell script instead of using Python, the lack of robustness makes me eventually regret it (and then rewrite in Python with subprocess). Setting "shell=True" (or equivalent) seems really convenient in the short term, but in the long term, scripts behave erratically and are vulnerable to attacks. The subprocess module (with "shell=False") is a wonderful balance between "safe defaults" and "consenting adults". -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From g.brandl at gmx.net Fri Feb 24 19:17:08 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 24 Feb 2012 19:17:08 +0100 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: Message-ID: Am 24.02.2012 12:12, schrieb anatoly techtonik: > On Fri, Feb 24, 2012 at 1:59 PM, Nick Coghlan wrote: >> On Fri, Feb 24, 2012 at 8:52 PM, anatoly techtonik wrote: >>> Hello, >>> >>> subprocess is low level, cryptic, does too much, with poor usability, >>> i.e. "don't make me think" is not about it. I don't know about you, >>> but I can hardly write any subprocess call without spending at least >>> 5-10 meditating over the documentation. >> >> I believe you'll find the simple convenience methods you are >> requesting already exist, in the form of subprocess.call(), >> subprocess.check_call() and subprocess.check_output(). The >> documentation has also been updated to emphasise these convenience >> functions over the Popen swiss army knife. > > I don't find the names of these functions more intuitive than Popen(). > I also think they far from being simple, because (in the order of appearance): > > 1. they require try/catch > 2. docs still refer Popen, which IS complicated > 3. contain shell FUD > 4. completely confuse users with stdout=PIPE or stderr=PIPE stuff > > http://docs.python.org/library/subprocess.html#subprocess.check_call > > My verdict - these fail to be simple, and require the same low-level > system knowledge as Popen() for confident use. And therefore they need to be completely replaced by something incompatible and in another module? Sorry, Anatoly, this is not how Python development happens. We usually work incrementally, improving on what we have rather than throwing all out the door. I think this is what rubs most people wrong about your posts: you invariably propose radical changes that invalidate all previous work in the related area. That's something apart from your style of expression, which was discussed recently. So here's some constructive advice: your point 1 was shown invalid. The points 2-4 are "merely" documentation related: how about you think about how to improve these docs to be less confusing? Georg From victor.stinner at haypocalc.com Sat Feb 25 03:19:19 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 25 Feb 2012 03:19:19 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F4756D8.6000109@kozea.fr> References: <4F4756D8.6000109@kozea.fr> Message-ID: >> Can't this also be done using metaclasses? Yes, my current proof-of-concept (PoC) uses a metadata with __prepare__. > The class body can be executed "in" any mapping. Then I?m not sure but it > looks like type.__new__ only takes a real dict. You have to do something in > your overridden __new__ to eg. keep the OrderedDict?s order. type.__new__ accepts any class inheriting from dict. My frozendict PoC inherits from dict, so it just works. But the point is that type.__new__ makes a copy of the dict and later it is no more possible to replace the dict. I would like to be able to choose the type of the __dict__ of my class. Victor From victor.stinner at haypocalc.com Sat Feb 25 03:29:12 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sat, 25 Feb 2012 03:29:12 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: Hum, after thinking twice, using a "frozendict" for type.__dict__ is maybe overkill for my needs (and intrused as noticed Nick). Attached patch for Python 3.3 is a simpler approach: add __final__ special value for class. If this variable is present, the type is constant. Example: --- class Test: __final__=True x = 1 Test.x = 2 # raise a TypeError Test.new_attr = 1 # raise a TypeError del Test.x # raise a TypeError --- There are various ways to deny the modification of a class attribute, but I don't know how to block the removal of an attribute of the addition of a new attribute without my patch. -- My patch is just a proof-of-concept. For example, it doesn't ensure that values are read-only too. By the way, how can I check that "a value is constant"? Except builtin immutable types, I suppose that the only way is to call hash(obj) and excepts an expect a TypeError. Victor -------------- next part -------------- A non-text attachment was scrubbed... Name: type_final.patch Type: text/x-patch Size: 2345 bytes Desc: not available URL: From yselivanov.ml at gmail.com Sat Feb 25 05:58:25 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 24 Feb 2012 23:58:25 -0500 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: <7A1BA916-26A6-4FE5-9508-03C5D5F58AF0@gmail.com> On 2012-02-24, at 9:29 PM, Victor Stinner wrote: > Hum, after thinking twice, using a "frozendict" for type.__dict__ is > maybe overkill for my needs (and intrused as noticed Nick). Attached > patch for Python 3.3 is a simpler approach: add __final__ special > value for class. If this variable is present, the type is constant. > Example: > --- > class Test: > __final__=True > x = 1 -1 on this. The next move would be adding friend classes and protected methods ;) __setattr__ works perfectly for those purposes. Moreover, you can emulate your idea on unpatched python by using metaclasses. - Yury From ned at nedbatchelder.com Sat Feb 25 14:17:58 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sat, 25 Feb 2012 08:17:58 -0500 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: <4F48DF86.7060600@nedbatchelder.com> On 2/24/2012 9:29 PM, Victor Stinner wrote: > Hum, after thinking twice, using a "frozendict" for type.__dict__ is > maybe overkill for my needs (and intrused as noticed Nick). Attached > patch for Python 3.3 is a simpler approach: add __final__ special > value for class. If this variable is present, the type is constant. The Python answer for people who want read-only data structures has always been, "Don't modify them if you don't want to, and write docs that tell other people not to as well." What are you building that this answer isn't good enough? --Ned. From stephen at xemacs.org Sat Feb 25 15:03:11 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 25 Feb 2012 23:03:11 +0900 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <20120224075951.0ec1076d@bhuda.mired.org> Message-ID: <87wr7bko2o.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > As things stand, Python is a lousy language for system administration > tasks Yeah, the worst possible sysadmin language except for all the others. AFAICT it more than holds its own with distro maintainers, no? From steve at pearwood.info Sun Feb 26 00:05:56 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 26 Feb 2012 10:05:56 +1100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F48DF86.7060600@nedbatchelder.com> References: <4F48DF86.7060600@nedbatchelder.com> Message-ID: <4F496954.30101@pearwood.info> Ned Batchelder wrote: > The Python answer for people who want read-only data structures has > always been, "Don't modify them if you don't want to, and write docs > that tell other people not to as well." What are you building that this > answer isn't good enough? That is silly. That alleged "Python answer" is like telling people that they don't need test frameworks or debuggers because the "Python answer" for people wanting to debug their code is not to write buggy code in the first place. Python has read-only data structures: tuple, frozenset, str, etc. If you ask yourself why Python has immutable types, it might give you a clue why Victor wants the ability to create other immutable types like frozendict, and why "don't modify them" is not a good enough answer: - Immutable types can be used as keys in dicts. - Immutable types protect you from errors. While you might intend not to modify a data structure, bugs do happen. Immutability gives you an immediate exception at the exact time and place you attempt to modify the data structure instead of at some arbitrary time later far from the actual bug. Python has excellent support for read-only data structures, so long as you write them in C. -- Steven From masklinn at masklinn.net Sun Feb 26 00:32:52 2012 From: masklinn at masklinn.net (Masklinn) Date: Sun, 26 Feb 2012 00:32:52 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F496954.30101@pearwood.info> References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> Message-ID: On 2012-02-26, at 00:05 , Steven D'Aprano wrote: > - Immutable types can be used as keys in dicts. *technically*, you can use mutable types as dict keys if you define their __hash__ no? That is of course a bad idea when the instances are *expected* to be modified, but it should "work". > - Immutable types protect you from errors. While you might intend not > to modify a data structure, bugs do happen. Immutables are also inherently thread-safe (since thread safety is about shared state, and shared immutables are not state). Which is a nice guarantee. > Python has excellent support for read-only data structures, so long as you write them in C. There's also good support of the "consenting adults" variety (use _-prefixed attributes for the actual state and expose what needs to be exposed via properties and methods). That can be simplified with a custom descriptor type which can only be set once (similar to java's `final`), it would be set in the type's constructor and never re-set from this. From ncoghlan at gmail.com Sun Feb 26 07:27:19 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Feb 2012 16:27:19 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: <87wr7bko2o.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <20120224075951.0ec1076d@bhuda.mired.org> <87wr7bko2o.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Feb 26, 2012 at 12:03 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > ?> As things stand, Python is a lousy language for system administration > ?> tasks > > Yeah, the worst possible sysadmin language except for all the others. > AFAICT it more than holds its own with distro maintainers, no? For applications where correctness in all circumstances is the dominant criterion? Sure. For throwaway scripts, though, most of the Linux sysadmins I know just use shell scripts or Perl. For the devops (and deployment automation in general) crowd, there's no real Python-based competitor to Chef and Puppet (both Ruby based) (my understanding is that the Python-based Fabric doesn't play in *quite* the same space as the other two). As things currently stand, Python deliberately makes it hard to say "I want my individual commands to be shell commands, but I also want Python's superior flow control constructs to decide which shell commands to run". For an application, that's a good thing. For personal automation, it's not. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From aquavitae69 at gmail.com Sun Feb 26 09:07:39 2012 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 26 Feb 2012 10:07:39 +0200 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> Message-ID: On Feb 26, 2012 1:35 AM, "Masklinn" wrote: > > On 2012-02-26, at 00:05 , Steven D'Aprano wrote: > > - Immutable types can be used as keys in dicts. > > *technically*, you can use mutable types as dict keys if you define > their __hash__ no? That is of course a bad idea when the instances > are *expected* to be modified, but it should "work". I wouldn't say this is necessarily a bad thing at all. It just depends what defines the object. If an instance represent a specific object (e.g. a database record) you wouldn't expect the hash to change if you modified an attribute of it, since the instance still represents the same object. > > > - Immutable types protect you from errors. While you might intend not > > to modify a data structure, bugs do happen. > > Immutables are also inherently thread-safe (since thread safety is about > shared state, and shared immutables are not state). Which is a nice > guarantee. > > > Python has excellent support for read-only data structures, so long as you write them in C. > > There's also good support of the "consenting adults" variety (use > _-prefixed attributes for the actual state and expose what needs to be > exposed via properties and methods). That can be simplified with a > custom descriptor type which can only be set once (similar to java's > `final`), it would be set in the type's constructor and never re-set > from this. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas Maybe I'm missing something here, but what's wrong with just using __getattr__, __setattr__ and __delattr__ to restrict access? -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.sapin at kozea.fr Sun Feb 26 09:26:50 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Sun, 26 Feb 2012 09:26:50 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: <4F49ECCA.7060802@kozea.fr> Le 24/02/2012 00:34, Victor Stinner a ?crit : > I'm trying to create read-only objects using a "frozendict" class. > frozendict is a read-only dict. I would like to use frozendict for the > class dict using a metaclass, but type.__new__() expects a dict and > creates a copy of the input dict. > > I would be nice to support custom dict type: OrderedDict and > frozendict for example. It looks possible to patch CPython to > implement this feature, but first I would like first to know your > opinion about this idea:-) Hi, Combining ideas from other messages in this thread: would this work? 1. Inherit from frozendict 2. Define a __getattr__ that defers to frozendict.__getitem__ 3. Use an empty __slots__ so that there is no "normal" instance attribute. Thinking about it a bit more, it?s probably the same as having a normal __dict__ and raising in __setattr__ and __delattr__. Isn?t this how you implement frozendict? (Raise in __setitem__, __delitem__, update, etc.) Regards, -- Simon Sapin From eliben at gmail.com Sun Feb 26 09:53:28 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 26 Feb 2012 10:53:28 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <20120224075951.0ec1076d@bhuda.mired.org> <87wr7bko2o.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Feb 26, 2012 at 08:27, Nick Coghlan wrote: > On Sun, Feb 26, 2012 at 12:03 AM, Stephen J. Turnbull > wrote: > > Nick Coghlan writes: > > > > > As things stand, Python is a lousy language for system administration > > > tasks > > > > Yeah, the worst possible sysadmin language except for all the others. > > AFAICT it more than holds its own with distro maintainers, no? > > For applications where correctness in all circumstances is the > dominant criterion? Sure. > > For throwaway scripts, though, most of the Linux sysadmins I know just > use shell scripts or Perl. For the devops (and deployment automation > in general) crowd, there's no real Python-based competitor to Chef and > Puppet (both Ruby based) (my understanding is that the Python-based > Fabric doesn't play in *quite* the same space as the other two). > > As things currently stand, Python deliberately makes it hard to say "I > want my individual commands to be shell commands, but I also want > Python's superior flow control constructs to decide which shell > commands to run". For an application, that's a good thing. For > personal automation, it's not. > Personally I find Python just find for all kinds of automation, including bash/Perl replacement. Yes, some things may be a few characters more to type than in Perl, but I'm happy to have all the other Python features and libraries in my arsenal. Sysadmins use what they learned, and it also depends on culture. Some places do use Python for sysadmin stuff too. The Chef/Puppet/Fabric example is a good one to support this point - Ruby, like Python is also more a dev language than a sysadmin language, and yet Chef & Puppet are written in Ruby and not Perl. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From aquavitae69 at gmail.com Sun Feb 26 10:42:54 2012 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 26 Feb 2012 11:42:54 +0200 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F49ECCA.7060802@kozea.fr> References: <4F49ECCA.7060802@kozea.fr> Message-ID: On Feb 26, 2012 10:27 AM, "Simon Sapin" wrote: > Le 24/02/2012 00:34, Victor Stinner a ?crit : > >> I'm trying to create read-only objects using a "frozendict" class. >> frozendict is a read-only dict. I would like to use frozendict for the >> class dict using a metaclass, but type.__new__() expects a dict and >> creates a copy of the input dict. >> >> I would be nice to support custom dict type: OrderedDict and >> frozendict for example. It looks possible to patch CPython to >> implement this feature, but first I would like first to know your >> opinion about this idea:-) >> > > Hi, > > Combining ideas from other messages in this thread: would this work? > > 1. Inherit from frozendict > 2. Define a __getattr__ that defers to frozendict.__getitem__ > 3. Use an empty __slots__ so that there is no "normal" instance attribute. > > Thinking about it a bit more, it?s probably the same as having a normal > __dict__ and raising in __setattr__ and __delattr__. Isn?t this how you > implement frozendict? (Raise in __setitem__, __delitem__, update, etc.) > > Regards, > -- > Simon Sapin > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas Using frozendict, and especially inheriting from it sounds unnecessarily complicated to me. A simple class which doesn't allow changes to instance attributes could be implemented something like this: class Three_inst: @property def value(self): return 3 def __setattr__(self, attr, value): raise AttributeError def __delattr__(self, attr): raise AttributeError Or, if you're worried about changes to the class attributes, you could do basically the same thing using a metaclass (python 2.7 syntax): class FinalMeta(type): def __setattr__(cls, attr, value): if attr in cls.__dict__ or '__done__' in cls.__dict__: raise AttributeError else: type.__setattr__(cls, attr, value) def __delattr__(cls, attr): raise AttributeError class Three: __metaclass__ = FinalMeta value = 3 __done__ = True # There may be a neater way to do this... Each of the following examples will fail: >>> Three.another_value = 4 >>> Three.value = 4 >>> del Three.value >>> three = Three(); three.value = 4 Actually, I think this is quite a nice illustration of what can be done with metaclasses! David -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Feb 26 12:46:33 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Feb 2012 21:46:33 +1000 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <20120224075951.0ec1076d@bhuda.mired.org> <87wr7bko2o.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Feb 26, 2012 at 6:53 PM, Eli Bendersky wrote: > The Chef/Puppet/Fabric example is a good one to support this point - Ruby, > like Python is also more a dev language than a sysadmin language, and yet > Chef & Puppet are written in Ruby and not Perl. For the key operation I'm talking about here, though, Ruby works the same way Perl does: it supports shell command execution via backtick quoted strings with implicit string interpolation. Is it really that hard to admit that there are some tasks that other languages are currently just plain better for than Python, and perhaps we can learn something useful from that? (And no, I'm not suggesting we adopt backtick command execution or implicit string interpolation. A convenience API that combines shell invocation, explicit string interpolation and whitespace and shell metacharacter quoting, though, *that* I support). Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From storchaka at gmail.com Sun Feb 26 12:49:00 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 26 Feb 2012 13:49:00 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: 24.02.12 14:41, Nick Coghlan ???????(??): > On Fri, Feb 24, 2012 at 10:13 PM, Mike Meyer wrote: >> Oddly enough, I read the Julia docs on external commands between my >> first answer and your reply, and their solution is both as simple as >> what you want, and safe. Yes, I want this in Python: readall(cmd('cut -d: -f3 $file', file='/etc/passwd') | cmd('sort -n') | cmd('tail -n5')) or cmd('cut', '-d:', '-f3', '/etc/passwd').pipe('sort', '-n').pipe('tail', '-n5').readlines() or something similar. > That *is* rather nice, although they never get around to actually > explaining *how* to capture the output from the child processes > (http://julialang.org/manual/running-external-programs/, for anyone > else that's interested). https://github.com/JuliaLang/julia/blob/10aabddc3834223568a87721149d05765e7e9997/j/process.j See readall and each_line. From anacrolix at gmail.com Sun Feb 26 12:59:17 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 26 Feb 2012 19:59:17 +0800 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: I strongly suspect such 3rd party library exists. On Feb 26, 2012 7:49 PM, "Serhiy Storchaka" wrote: > 24.02.12 14:41, Nick Coghlan ???????(??): > > On Fri, Feb 24, 2012 at 10:13 PM, Mike Meyer wrote: > >> Oddly enough, I read the Julia docs on external commands between my > >> first answer and your reply, and their solution is both as simple as > >> what you want, and safe. > > Yes, I want this in Python: > > readall(cmd('cut -d: -f3 $file', file='/etc/passwd') | cmd('sort -n') | > cmd('tail -n5')) > > or > > cmd('cut', '-d:', '-f3', '/etc/passwd').pipe('sort', '-n').pipe('tail', > '-n5').readlines() > > or something similar. > > > That *is* rather nice, although they never get around to actually > > explaining *how* to capture the output from the child processes > > (http://julialang.org/manual/running-external-programs/, for anyone > > else that's interested). > > > https://github.com/JuliaLang/julia/blob/10aabddc3834223568a87721149d05765e7e9997/j/process.j > See readall and each_line. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Feb 26 13:58:01 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 26 Feb 2012 21:58:01 +0900 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <20120224075951.0ec1076d@bhuda.mired.org> <87wr7bko2o.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r4xhlpk6.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > For throwaway scripts, though, most of the Linux sysadmins I know just > use shell scripts Sure, but it's really hard to beat *sh plus GNU readline for brevity in using recent history to create a script. At some point, we "just don't want to go there." As for the Perl arm of your disjunction, do those sysadmins use Python for anything? There's a lot of history in the Linux sysadmin community favoring Perl. (Although the l33t Perlmonger I know is a Ruby hacker now....) > For the devops (and deployment automation in general) crowd, > there's no real Python-based competitor to Chef and Puppet (both > Ruby based) (my understanding is that the Python-based Fabric > doesn't play in *quite* the same space as the other two). No, there isn't, but creating one could be rather hard, as Puppet and Chef both make heavy use of Ruby features conducive to writing DSLs. Note that although Fabric plays in a distinct space, its implementation looks like Chef, far more so than Puppet (ie, you write Fabric configs in Python, and Chef configs in a (domain-specific extension of) Ruby, while Puppet is a restricted DSL). One of the Puppet rationales for using Puppet rather than Chef is telling here: 3. Choice of configuration languages The language which Puppet uses to configure servers is designed specifically for the task: it is a domain language optimised for the task of describing and linking resources such as users and files. Chef uses an extension of the Ruby language. Ruby is a good general-purpose programming language, but it is not designed for configuration management - and learning Ruby is a lot harder than learning Puppet?s language. Some people think that Chef?s lack of a special-purpose language is an advantage. ?You get the power of Ruby for free,? they argue. Unfortunately, there are many things about Ruby which aren?t so intuitive, especially for beginners, and there is a large and complex syntax that has to be mastered. -- http://bitfieldconsulting.com/puppet-vs-chef That applies equally well to "DSL"s that are extensions of (function calls in) Python. Making it easier to write DSLs in Python has come up many times, and so far the answer has always been "if you want to write a DSL in Python, write a DSL in Python; but you can't, and won't soon be able to, run it directly in the Python interpreter." DSLs have been done; there's configparser for one, argparse and ancestors, and things like gitosis. But it's hard to see Python beating Ruby at that game. > As things currently stand, Python deliberately makes it hard to say > "I want my individual commands to be shell commands, but I also > want Python's superior flow control constructs to decide which > shell commands to run". I don't think that's ever been my motivation for writing a script in Python. Really, is Python's for loop so much better than bash's? For me, it's data structures: something where my sed fu isn't enough, or the content has to persist longer than into the next pipe. And quoting. Shell quoting is such a pain, especially if there's an ssh remote command in there somewhere. This is not to say I'm opposed to making it easier to use Python as a command shell in principle, but I have to wonder whether it can be done as easily as all that, and without sacrificing some of the things we've insisted on in past discussions. On the other hand, for things where avoiding shell makes sense, Python is one of my tools of choice (the other being Emacs Lisp, where I want integration with my editor and don't much care about performance). From storchaka at gmail.com Sun Feb 26 14:07:35 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 26 Feb 2012 15:07:35 +0200 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: 26.02.12 13:59, Matt Joiner ???????(??): > I strongly suspect such 3rd party library exists. I also hope this. And such library will be better candidate for including in stdlib. From victor.stinner at haypocalc.com Sun Feb 26 14:56:08 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Sun, 26 Feb 2012 14:56:08 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F49ECCA.7060802@kozea.fr> Message-ID: type.__setattr__(Three, 'value', 4) changes the value. Victor > class FinalMeta(type): > > ? ? def __setattr__(cls, attr, value): > ? ? ? ? if attr in cls.__dict__ or '__done__' in cls.__dict__: > ? ? ? ? ? ? raise AttributeError > ? ? ? ? else: > ? ? ? ? ? ? type.__setattr__(cls, attr, value) > > ? ? def __delattr__(cls, attr): > ? ? ? ? raise AttributeError > > > class Three: > ? ? __metaclass__ = FinalMeta > ? ? value = 3 > ? ? __done__ = True ? # There may be a neater way to do this... > > Each of the following examples will fail: > >>>> Three.another_value = 4 >>>> Three.value = 4 >>>> del Three.value >>>> three = Three(); three.value = 4 From stephen at xemacs.org Sun Feb 26 15:02:42 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 26 Feb 2012 23:02:42 +0900 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> Message-ID: <87pqd1lmkd.fsf@uwakimon.sk.tsukuba.ac.jp> Serhiy Storchaka writes: > Yes, I want this in Python: > > readall(cmd('cut -d: -f3 $file', file='/etc/passwd') | cmd('sort -n') | cmd('tail -n5')) > > or > > cmd('cut', '-d:', '-f3', '/etc/passwd').pipe('sort', '-n').pipe('tail', '-n5').readlines() > > or something similar. But you can already do sorted([l.split(":")[2] for l in open('/etc/passwd')])[-5:] (and I don't really care whether you were being ironic or not; either way that one-liner is an answer). Actually, I wrote that off the top of my head and it almost worked. The problem I ran into is that I'm on a Mac, and there was a bunch of cruft comments (which don't contain any colons) in the beginning of the file. So I got a list index out of range when accessing the split line. In this case, cut | sort | tail would produce the expected output. But cut | sort | head would just produce garbage (the leading comments in sorted order). So the failure modes differ. It might be useful for people used to shell failure modes. From aquavitae69 at gmail.com Sun Feb 26 15:30:09 2012 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 26 Feb 2012 16:30:09 +0200 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F49ECCA.7060802@kozea.fr> Message-ID: Ah, I think I misunderstood exactly what you were trying to achieve. To me, that is essentially immutable - if I ever found myself using type.__setattr__ to change a variable I'd have to seriously question what I was doing! But that would be a way around it, and I don't think it would be possible to implement it fully in python. On the other hand, the same argument could be made for the introduction of private variables; Class.__var is not private because it can be changed through Class._Class__var. I'd also consider having to do this to be indicative of a design flaw in my code. On Sun, Feb 26, 2012 at 3:56 PM, Victor Stinner < victor.stinner at haypocalc.com> wrote: > type.__setattr__(Three, 'value', 4) changes the value. > > Victor > > > class FinalMeta(type): > > > > def __setattr__(cls, attr, value): > > if attr in cls.__dict__ or '__done__' in cls.__dict__: > > raise AttributeError > > else: > > type.__setattr__(cls, attr, value) > > > > def __delattr__(cls, attr): > > raise AttributeError > > > > > > class Three: > > __metaclass__ = FinalMeta > > value = 3 > > __done__ = True # There may be a neater way to do this... > > > > Each of the following examples will fail: > > > >>>> Three.another_value = 4 > >>>> Three.value = 4 > >>>> del Three.value > >>>> three = Three(); three.value = 4 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From anacrolix at gmail.com Sun Feb 26 16:10:00 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Sun, 26 Feb 2012 23:10:00 +0800 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: <87pqd1lmkd.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <87pqd1lmkd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I did recently see "pyp" touted as a Python-like sed/awk. I guess this stuff always comes down to what you're used to. To me it is insane to be still using Perl yet I prefer perl regex over posix anyday :) On Feb 26, 2012 10:03 PM, "Stephen J. Turnbull" wrote: > Serhiy Storchaka writes: > > > Yes, I want this in Python: > > > > readall(cmd('cut -d: -f3 $file', file='/etc/passwd') | cmd('sort -n') | > cmd('tail -n5')) > > > > or > > > > cmd('cut', '-d:', '-f3', '/etc/passwd').pipe('sort', '-n').pipe('tail', > '-n5').readlines() > > > > or something similar. > > But you can already do > > sorted([l.split(":")[2] for l in open('/etc/passwd')])[-5:] > > (and I don't really care whether you were being ironic or not; either > way that one-liner is an answer). > > Actually, I wrote that off the top of my head and it almost worked. > The problem I ran into is that I'm on a Mac, and there was a bunch of > cruft comments (which don't contain any colons) in the beginning of > the file. So I got a list index out of range when accessing the split > line. In this case, cut | sort | tail would produce the expected > output. But cut | sort | head would just produce garbage (the leading > comments in sorted order). So the failure modes differ. It might be > useful for people used to shell failure modes. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwm at mired.org Sun Feb 26 16:26:27 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 26 Feb 2012 10:26:27 -0500 Subject: [Python-ideas] shutil.runret and shutil.runout In-Reply-To: References: <20120224062525.0e168a39@bhuda.mired.org> <20120224071325.08f07d32@bhuda.mired.org> <20120224075951.0ec1076d@bhuda.mired.org> <87wr7bko2o.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20120226102627.56a86232@bhuda.mired.org> On Sun, 26 Feb 2012 21:46:33 +1000 Nick Coghlan wrote: > On Sun, Feb 26, 2012 at 6:53 PM, Eli Bendersky wrote: > > The Chef/Puppet/Fabric example is a good one to support this point - Ruby, > > like Python is also more a dev language than a sysadmin language, and yet > > Chef & Puppet are written in Ruby and not Perl. > For the key operation I'm talking about here, though, Ruby works the > same way Perl does: it supports shell command execution via backtick > quoted strings with implicit string interpolation. Does Ruby also have something like Perl's -t/-T options and supporting functions? > Is it really that hard to admit that there are some tasks that other > languages are currently just plain better for than Python, and perhaps > we can learn something useful from that? The key word is "perhaps". There are some things other languages are better at than Python, and Python is the better off for it. I think that "supporting code injection attacks" is one such feature. > (And no, I'm not suggesting > we adopt backtick command execution or implicit string interpolation. > A convenience API that combines shell invocation, explicit string > interpolation and whitespace and shell metacharacter quoting, though, > *that* I support). I'm only willing to support it if it's at least as safe as Perl. Meaning that either 1) It doesn't really invoke the shell, but handles provides those features explicitly, or 2) it throws errors if passed tainted strings. On the other hand, my support (or lack of it) isn't worth very much. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From simon.sapin at kozea.fr Sun Feb 26 17:51:08 2012 From: simon.sapin at kozea.fr (Simon Sapin) Date: Sun, 26 Feb 2012 17:51:08 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F49ECCA.7060802@kozea.fr> Message-ID: <4F4A62FC.6080403@kozea.fr> Le 26/02/2012 14:56, Victor Stinner a ?crit : > type.__setattr__(Three, 'value', 4) changes the value. Then there is the question of how much craziness you want to protect from. Nothing is ever truly private or immutable in CPython, given enough motivation and ctypes. See for example Armin Ronacher?s "Bad Ideas" presentation, especially the "Interpreter Warfare" part near the end: https://ep2012.europython.eu/media/conference/slides/5-years-of-bad-ideas.pdf I think that the code patching tracebacks is in production in Jinja2. I?m sure frozensets could be modified in a similar way. The point is: immutable data types protect against mistakes more than someone truly determined to break the rules. With that in mind, I think that having to go through __setattr__ is good enough to make sure it?s not accidental. Regards, -- Simon Sapin From aquavitae69 at gmail.com Sun Feb 26 20:02:21 2012 From: aquavitae69 at gmail.com (David Townshend) Date: Sun, 26 Feb 2012 21:02:21 +0200 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F4A62FC.6080403@kozea.fr> References: <4F49ECCA.7060802@kozea.fr> <4F4A62FC.6080403@kozea.fr> Message-ID: My point exactly! On Feb 26, 2012 6:51 PM, "Simon Sapin" wrote: > Le 26/02/2012 14:56, Victor Stinner a ?crit : > >> type.__setattr__(Three, 'value', 4) changes the value. >> > > Then there is the question of how much craziness you want to protect from. > Nothing is ever truly private or immutable in CPython, given enough > motivation and ctypes. > > See for example Armin Ronacher?s "Bad Ideas" presentation, especially the > "Interpreter Warfare" part near the end: > > https://ep2012.europython.eu/**media/conference/slides/5-** > years-of-bad-ideas.pdf > > I think that the code patching tracebacks is in production in Jinja2. I?m > sure frozensets could be modified in a similar way. > > The point is: immutable data types protect against mistakes more than > someone truly determined to break the rules. With that in mind, I think > that having to go through __setattr__ is good enough to make sure it?s not > accidental. > > Regards, > -- > Simon Sapin > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at haypocalc.com Mon Feb 27 10:54:45 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 27 Feb 2012 10:54:45 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F4A62FC.6080403@kozea.fr> References: <4F49ECCA.7060802@kozea.fr> <4F4A62FC.6080403@kozea.fr> Message-ID: >> type.__setattr__(Three, 'value', 4) changes the value. > > Then there is the question of how much craziness you want to protect from. > Nothing is ever truly private or immutable in CPython, given enough > motivation and ctypes. My pysandbox project uses various hacks to secure Python. The attacker doesn't care of writing pythonic code, (s)he just want to break the sandbox :-) See my pysandbox project for more information: https://github.com/haypo/pysandbox/ See sandbox/test/ if you like weird code :-) Tests ensure that the sandbox is safe. Constant types would also help optimization, especially PyPy JIT. Victor From dreamingforward at gmail.com Mon Feb 27 19:32:05 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 11:32:05 -0700 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> Message-ID: On Sat, Feb 25, 2012 at 4:05 PM, Steven D'Aprano wrote: > Ned Batchelder wrote: > >> The Python answer for people who want read-only data structures has always >> been, "Don't modify them if you don't want to, and write docs that tell >> other people not to as well." ?What are you building that this answer isn't >> good enough? > > That is silly. That alleged "Python answer" is like telling people that they > don't need test frameworks or debuggers because the "Python answer" for > people wanting to debug their code is not to write buggy code in the first > place. Perhaps a good middle ground for this is to NOT tie it to particular data structures (like tuples vs lists), but abstract it by making an "immutable bit" that is part of the basic Object type. ?This doesn't give complete security, but does *force* a choice by a human agent to deliberately modify data. ?(This was actually going to be implemented in a sort of python fork several years ago.) ?There could be a "mutable?" check that returns True or False. mark Santa Fe, NM From rob.cliffe at btinternet.com Mon Feb 27 19:35:57 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 27 Feb 2012 18:35:57 +0000 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> Message-ID: <4F4BCD0D.8040706@btinternet.com> On 27/02/2012 18:32, Mark Janssen wrote: > On Sat, Feb 25, 2012 at 4:05 PM, Steven D'Aprano wrote: >> Ned Batchelder wrote: >> >>> The Python answer for people who want read-only data structures has always >>> been, "Don't modify them if you don't want to, and write docs that tell >>> other people not to as well." What are you building that this answer isn't >>> good enough? >> That is silly. That alleged "Python answer" is like telling people that they >> don't need test frameworks or debuggers because the "Python answer" for >> people wanting to debug their code is not to write buggy code in the first >> place. > Perhaps a good middle ground for this is to NOT tie it to particular > data structures (like tuples vs lists), but abstract it by making an > "immutable bit" that is part of the basic Object type. This doesn't > give complete security, but does *force* a choice by a human agent to > deliberately modify data. (This was actually going to be implemented > in a sort of python fork several years ago.) There could be a > "mutable?" check that returns True or False. > > mark > Santa Fe, NM > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > I suggested a "mutable" attribute some time ago. This could lead to finally doing away with one of Python's FAQs: Why does python have lists AND tuples? They could be unified into a single type. Rob Cliffe. From ethan at stoneleaf.us Mon Feb 27 19:46:49 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Feb 2012 10:46:49 -0800 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F4BCD0D.8040706@btinternet.com> References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> Message-ID: <4F4BCF99.6030508@stoneleaf.us> > I suggested a "mutable" attribute some time ago. > This could lead to finally doing away with one of Python's FAQs: Why > does python have lists AND tuples? They could be unified into a single > type. If a tuple is just an immutable list it will become worse with regards to performance and memory space. ~Ethan~ From dreamingforward at gmail.com Mon Feb 27 19:45:45 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 11:45:45 -0700 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F4BCD0D.8040706@btinternet.com> References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> Message-ID: On Mon, Feb 27, 2012 at 11:35 AM, Rob Cliffe wrote: > I suggested a "mutable" attribute some time ago. > This could lead to finally doing away with one of Python's FAQs: Why does > python have lists AND tuples? ?They could be unified into a single type. > Rob Cliffe. Yeah, that would be cool. It would force (ok, *allow*) the documenting of any non-mutable attributes (i.e. when they're mutable, and why they're being set immutable, etc.). There an interesting question, then, should the mutable bit be on the Object itself (the whole type) or in each instance....? There's probably no "provable" or abstract answer to this, but rather just an organization principle to the language.... m From dreamingforward at gmail.com Mon Feb 27 19:47:29 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 11:47:29 -0700 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F4BCF99.6030508@stoneleaf.us> References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> <4F4BCF99.6030508@stoneleaf.us> Message-ID: On Mon, Feb 27, 2012 at 11:46 AM, Ethan Furman wrote: > If a tuple is just an immutable list it will become worse with regards to > performance and memory space. That's a good point also.... m From phd at phdru.name Mon Feb 27 19:49:42 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 27 Feb 2012 22:49:42 +0400 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <4F4BCD0D.8040706@btinternet.com> References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> Message-ID: <20120227184942.GA12927@iskra.aviel.ru> On Mon, Feb 27, 2012 at 06:35:57PM +0000, Rob Cliffe wrote: > I suggested a "mutable" attribute some time ago. > This could lead to finally doing away with one of Python's FAQs: Why > does python have lists AND tuples? They could be unified into a > single type. The main difference between lists and tuples is not mutability but usage: lists are for a (unknown) number of similar items (a list of messages, e.g.), tuples are for a (known) number of different items at fixed positions (an address is a tuple of (country, city, street address), for example). Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ethan at stoneleaf.us Mon Feb 27 20:02:27 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Feb 2012 11:02:27 -0800 Subject: [Python-ideas] [Fwd: Re: Support other dict types for type.__dict__] Message-ID: <4F4BD343.7040900@stoneleaf.us> [forwarding on to list] On 27/02/2012 18:46, Ethan Furman wrote: >> I suggested a "mutable" attribute some time ago. >> This could lead to finally doing away with one of Python's FAQs: Why >> does python have lists AND tuples? They could be unified into a >> single type. > > If a tuple is just an immutable list it will become worse with regards > to performance and memory space. > > ~Ethan~ > Doesn't that depend on how smart the implementation is? (Of course, toggling the mutable flag could cause performance penalties, but that's something you can't do at all at the moment.) Rob From victor.stinner at haypocalc.com Mon Feb 27 19:55:50 2012 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 27 Feb 2012 19:55:50 +0100 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <20120227184942.GA12927@iskra.aviel.ru> References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> <20120227184942.GA12927@iskra.aviel.ru> Message-ID: 2012/2/27 Oleg Broytman : > On Mon, Feb 27, 2012 at 06:35:57PM +0000, Rob Cliffe wrote: >> I suggested a "mutable" attribute some time ago. >> This could lead to finally doing away with one of Python's FAQs: Why >> does python have lists AND tuples? ?They could be unified into a >> single type. > > ? The main difference between lists and tuples is not mutability but > usage: lists are for a (unknown) number of similar items (a list of > messages, e.g.), tuples are for a (known) number of different items at > fixed positions (an address is a tuple of (country, city, street > address), for example). And tuple doesn't have append, extend, remove, ... methods. Victor From mwm at mired.org Mon Feb 27 20:12:23 2012 From: mwm at mired.org (Mike Meyer) Date: Mon, 27 Feb 2012 14:12:23 -0500 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> Message-ID: <20120227141223.6329ab8f@bhuda.mired.org> On Mon, 27 Feb 2012 11:45:45 -0700 Mark Janssen wrote: > On Mon, Feb 27, 2012 at 11:35 AM, Rob Cliffe wrote: > > I suggested a "mutable" attribute some time ago. > > This could lead to finally doing away with one of Python's FAQs: Why does > > python have lists AND tuples? ?They could be unified into a single type. > > Rob Cliffe. > Yeah, that would be cool. It would force (ok, *allow*) the > documenting of any non-mutable attributes (i.e. when they're mutable, > and why they're being set immutable, etc.). This also has implications for people working on making python friendlier for concurrent and parallel programming. > There an interesting question, then, should the mutable bit be on the > Object itself (the whole type) or in each instance....? There's > probably no "provable" or abstract answer to this, but rather just an > organization principle to the language.... Ok, you said "non-mutable attributes" in the first paragraph. That to me implies that the object bound to that attribute can't be changed. This is different from the attribute being bound to an immutable object, which this paragraph implies. Which do you want here? http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From ericsnowcurrently at gmail.com Mon Feb 27 20:18:05 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 27 Feb 2012 12:18:05 -0700 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> Message-ID: On Mon, Feb 27, 2012 at 11:45 AM, Mark Janssen wrote: > On Mon, Feb 27, 2012 at 11:35 AM, Rob Cliffe wrote: >> I suggested a "mutable" attribute some time ago. >> This could lead to finally doing away with one of Python's FAQs: Why does >> python have lists AND tuples? ?They could be unified into a single type. >> Rob Cliffe. > > Yeah, that would be cool. ?It would force (ok, *allow*) the > documenting of any non-mutable attributes (i.e. when they're mutable, > and why they're being set immutable, etc.). > > There an interesting question, then, should the mutable bit be on the > Object itself (the whole type) or in each instance....? ?There's > probably no "provable" or abstract answer to this, but rather just an > organization principle to the language.... In contrast to a flag on objects, one alternative is to have a __mutable__() method for immutable types and __immutable__() for mutable types. I'd be nervous about being able to make an immutable object mutable at an arbitrary moment with the associated effect on the hash of the object. -eric From dreamingforward at gmail.com Mon Feb 27 20:23:15 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 12:23:15 -0700 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: Message-ID: I just realized I've been replying personally to these replies instead of the whole list (damn I hate that!). So resending a bunch of messages that went to individuals. [Mark] On Fri, Feb 17, 2012 at 3:12 PM, Nick Coghlan wrote: > On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen wrote: >> Anyway... of course patches welcome, yes... ?;^) > > Not really. doctest is for *testing code example in docs*. I understand. ?This is exactly what I was wanting to use it for. ?As Tim says "literate testing" or "executable documentation". The suggestions I made are for enhancing those two. Personally, I don't find unittest very suitable for test-driven *development*, although it *is* obviously well-suited for writing assurance tests otherwise. The key difference, to me, is in that doctest promotes tests being written in order to have the *additional functionality* of documentation. ? ?That makes it fun since your getting "twice the value for the cost of one", and that alone is the major item which drives test-driven development (IMHO) within the spirit of python, otherwise unittest is rather bulky to write in and of itself. Does anyone really use unittest outside the context of shop policy? mark From dreamingforward at gmail.com Mon Feb 27 20:24:31 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 12:24:31 -0700 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: Message-ID: On Fri, Feb 17, 2012 at 5:43 PM, Devin Jeanpierre wrote... I, firstly, thank you for your thoughtful reply. ?I myself am rather busy, but totally think it's worth the effort. > On Fri, Feb 17, 2012 at 4:57 PM, Mark Janssen wrote: >> 1. Execution context determined by outer-scope doctest defintions. > > I'm not sure what you mean, but it might be relevant that Sphinx lets > you define multiple scopes for doctests. Something like this: ?In a class definition doctest, I may put various definitions ("SetUp" constructs) of interesting and useful class initializations. ?If my class is a Graph, say, I might define several Graphs (which might produce testable output, or in any case should not throw an error), which could then be used (as "globals") in the inner doctests of the various class methods. ?It makes no sense to define them again in each doctest. ?The hitch I suppose would be defining the possible "TearDown' code which would have to be done after the inner-scope doctests all run. ?This would require some syntactical feature either in doctest or python itself (this latter would be something like an extra docstring at the end of a class definition, but such a construct would really only be interesting if test()ing was built-in to the python interpreter itself (if even then)). ?This really is the only way doctest scoping should make sense. ?Any other way is probably not organizing codetest to docs well. ?(In other words it would enforce a certain testing standard of good practices.) >I feel like its approach is > the right one, but it isn't reusable in Python docstrings. That said, > I think users of doctest have moved away from embedded doctests in > docstrings -- it encourages doctests to have way too many "examples" > (test cases), which reduces their usefulness as documentation. Again, I think this is an example of python not really having test-driven development built-in. ?Complicated doctests are a result of too coarse of grain in the method definitions, usually (or probably) a result of other called methods not having their own doctests, so the slack is being picked-up in an ad-hoc way. ?This is just mostly speculation, I haven't actually gone through any examples. ?I'd be interested in viewing some though if you have them. >> 2. Smart Comparisons that will detect output of a non-ordered type >> (dict/set), lift and recast it and do a real comparison. > > I think it's better to just always use ast.literal_eval on the output > as another form of testing for equivalence. This could break code, but > probably not any code worth caring about. > > (in particular, > ? ?>>> print 'r""' > ? ?"" Hmm, I think that would pass in doctest's current framework, which just tests syntactic characters without regard to semantics. ?However, if one were to fix the dict ordering issue, it would have to gain a minimum semantic knowledge (like an unordered grouping starts and terminates with the characters "{" and "}") >> Anyway... of course patches welcome, yes... ?;^) > > Not exactly... doctest has no maintainer, and so no patches ever get > accepted. If you want to improve it, you'll have to fork it. I hope > you're that sort of person, because doctest can totally be improved. > It suffers a lot from people thinking of what it is rather than what > it could be. :( I agree! ?I'm a bit like yourself though, swamped with other priorities. ? But I'm glad to know about your fork, although it looks like a efforts are a bit orthogonal to each other.... > This is all assuming your intentions are to contribute rather than > only suggest. Not that suggestions aren't welcome, I suppose, but > maybe not here. doctest is not actively developed or maintained > anywhere, as far as I know. (I want to say "except by me", because > that'd make me seem all special and so on, but I haven't committed a > thing in months.) Well I appreciate your taking the time. ?I will make another look at the code and see what it would take. > I definitely hope you help to make the doctest world better. I think > it fills a role that should be filled, and its neglect is unfortunate. I'm glad someone appreciates it. ?I really think the idea should be integrated more deeply so that it becomes a natural habit for python programmers. ?The test() as a built-in idea came from another doctest fan. ? It would sit right alongside the help() built-in. ?Maybe the idea will gain traction... mark From dreamingforward at gmail.com Mon Feb 27 20:25:53 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 12:25:53 -0700 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Fri, Feb 17, 2012 at 9:23 PM, Ian Bicking wrote: > On Feb 17, 2012 3:58 PM, "Mark Janssen" wrote: >> Without #1, "literate testing" becomes awash with re-defining re-used >> variables which, generally, also detracts from exact purpose of the >> test -- this creates testdoc noise and the docs become less useful. > > I dunno... I find the discipline of defining your prerequesites to be a > helpful feature of doctest (I find TestCase.setUp to be smelly). Yeah, I kinda agree, but in this case the doctests are always confined to the same module (or class) and have a standardized location, so always near at hand (at least if you're using them well) if you want to see what a variable used in a sub-test has been defined. >? You can > include a namespace in doctest invocations, but I'm guessing the problem is > that you aren't able to give these settings when using some kind of test > collector/runner?? More flexible ways of defining doctest options (e.g., > ELLIPSIS) would be helpful. Yeah, doctests has these (globs and M.__test__), it's just that it takes you out of the mode of "executable documentation" and becomes less fun. >> Without #2, "readable docs" nicely co-aligning with "testable docs" >> tends towards divergence. > > IMHO this could be more easily solved by replacing the standard repr with > one that is more predictable. Yes, but then this gets more into "my" idea of making a test() builtin, like help(). ?In that case, you could do fancy stuff where you wouldn't even have to test string output. Cheers, mark PS. Darn, I hate when I forget to reply-all... From dreamingforward at gmail.com Mon Feb 27 20:26:23 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 12:26:23 -0700 Subject: [Python-ideas] doctest In-Reply-To: References: <4F3F2E32.7070907@pearwood.info> Message-ID: On Fri, Feb 17, 2012 at 9:50 PM, Steven D'Aprano wrote: > Really? Not in my experience, although I admit I haven't tried to push the > envelope too far. > > But I haven't had any problem with a literate programming model: > > * Use short, self-contained but not necessarily exhaustive examples in the > code's docstrings (I don't try to give examples of *every* combination of > good and bad data, special cases, etc. in the docstring). > > * Write extensive (ideally exhaustive) examples with explanatory text, in a > separate text file. Hmmm, interesting. ?I generally like to keep it all in one file and define a dummy "test" function that just contains doctest code so that it can be all kept in one file and in-sync. > If my tests require setting up and tearing down > resources, I stick to unittest which has better setup/teardown support. (It > would be hard to have *less* support for setup and teardown than doctest.) If doctest had context-scoping, I think it would be superior to unittest. ?SetUp functionality would be contained in the class definition's __doc__, or out in the module's own __doc__. ?If any teardown functionality was necessary in the class's code a dummy teardown method could be defined at the end of the class definition. (Not as ideal as a more integrated test-driven development approach, but likely acceptable...) mark From dreamingforward at gmail.com Mon Feb 27 20:27:23 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 12:27:23 -0700 Subject: [Python-ideas] doctest In-Reply-To: References: <4F3F3240.4090104@pearwood.info> Message-ID: On Fri, Feb 17, 2012 at 10:08 PM, Steven D'Aprano wrote: > Mark Janssen wrote: >> 1. Execution context determined by outer-scope doctest defintions. > > Can you give an example of how you would like this to work? > Sure, I wish I had a good example off the top of my head, but perhaps this will convey the idea: class MyClass(): ?"""Yadda Yadda: foo's bars. >>> m = MyClass({some, sufficiently, interesting, initialization}) ?#tPOINT1: ?this variable (m) now accessible by all methods. "foobar check" ? #POINT2: possible output here is a useful test case not well-definable elsewhere without losing context. """ ? ?def method1(self, other): ? ? ? ? """Method method method method. ? ? ? ? >>> m.method("foo") ?#Now we see m is already defined and useable. ? ? ? ? "bar" ? ? ? ? ?""" ? ?def meth2(self, other): ? ? ? ?"""Method to foo all bars ? ? ? >>> m.method("bar") #would have to decide whether a fresh m is redefined with each innerscope doctest (if we want side-effects to carry across inner doctests). (END) This is a basic example, sorry it's rather crude. ?There's probably a better example. ?(Think establishing a network socket connection or something in the class' doc which is then used by all the methods, for example.) >> 2. Smart Comparisons that will detect output of a non-ordered type >> (dict/set), lift and recast it and do a real comparison.> > > I would love to see a doctest directive that accepted differences in output > order, e.g. would match {1, 2, 3} and {3, 1, 2}. But I think that's a hard > problem to solve in the general case. I think this would be as simple as lifting the (string) output and doing an eval("{1,2,3}")=={3,2,1}, or (for security) using ast.literal_eval like Devin suggested. > I'd like a #3 as well: an abbreviated way to spell doctest directives, > because they invariably push my tests well past the 80 character mark. Hmm, seem like an alias could be defined easily enough, but I'll try to think about this when I have more time. mark From dreamingforward at gmail.com Mon Feb 27 20:28:17 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 12:28:17 -0700 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: <20120220132832.76b772da@resist.wooz.org> Message-ID: On Mon, Feb 20, 2012 at 11:28 AM, Barry Warsaw wrote: > On Feb 17, 2012, at 02:57 PM, Mark Janssen wrote: > FWIW, I think doctests are fantastic and I use them all the time. ?There are > IMO a couple of things to keep in mind: > > ?- doctests are documentation first. ?Specifically, they are testable > ? documentation. ?What better way to ensure that your documentation is > ? accurate and up-to-date? ?(And no, I do not generally find skew between the > ? code and the separate-file documentation.) > > ?- I personally dislike docstring doctests, and much prefer separate reST > ? documents. ?These have several advantages, such as the ability to inject > ? names into doctests globals (use with care though), and the ability to set > ? up the execution context for doctests (see below). ?The fact that it's so > ? easy to turn these into documentation with Sphinx is a huge win. > > Since so many people point this out, let me say that I completely agree that > doctests are not a *replacement* for unittests, but they are a fantastic > *complement* to unittests. ?When I TDD, I always start writing the > (testable) documentation first, because if I cannot explain the component > under test in clearly intelligible English, then I probably don't really > understand what it is I'm trying to write. > > My doctests usually describe mostly the good path through the API. > Occasionally I'll describe error modes if I think those are important for > understanding how to use the code. ?However, for all those fuzzy corner cases, > weird behaviors, bug fixes, etc., unittests are much better suited because > ensuring you've fixed these problems and don't regress in the future doesn't > help the narrative very much. I think is an example of (mal)adapting to an incomplete module, rather than fixing it. ?I think doctest can handle all the points you're making. ?See clarification pointers below... >>1. Execution context determined by outer-scope doctest defintions. > > Can you explain this one? I gave an example in a prior message on this thread, dated Feb 17. ?I think it's clear there but let me know. Basically, the idea is that since the class def can also have a docstring, where better would setup and teardown code go to provide the execution context of the inner method docstrings? Now the question: ?is it useful or appropriate to put setup and teardown code in a classdef docstring? ?Well, I think this requires a committment on the behalf of the coder/documentor to concoct useful (didactic) example that could go there. ?For example, (as in the prior-referenced message) I imagine putting example of defining a variable of the classes type (">>> g = Graph({some complex, interesting initialization})"), which might return a (testable) value upon creation. Now this could, logically, be put in the classes __init__ method, but that doesn't make sense for defining an execution context, and *in addition*, that can be saved for those complex corner cases you mentioned earlier. > I usually put all this in an additional_tests() method, such as: Yes, I do the same for my modules with doctests. ?A dummy function which can catch all the non-interesting tests. ?This, still superior, in my opinion, than unittest. ?It is easier syntactically, as well as for casual users of your code (It has no leaning curve like understanding unittest). This superiority to unittest, by the way, is only realized if the second suggestion (smart comparisons) is implemented into doctest. >>2. Smart Comparisons that will detect output of a non-ordered type >>(dict/set), lift and recast it and do a real comparison. > > I'm of mixed mind with these. ?Yes, you must be careful with ordering, but I > find it less readable to just sort() some dictionary output for example. ?What > I've found much more useful is to iterate over the sorted keys of a dictionary > and print the key/values pairs. Yes, but you see you're destroying the very intent and spirit of doctest. ?The point is to make literate documentation. ?If you adapt to it's incompleteness, you reduce the power of it. >>Without #1, "literate testing" becomes awash with re-defining re-used >>variables which, generally, also detracts from exact purpose of the >>test -- this creates testdoc noise and the docs become less useful. >>Without #2, "readable docs" nicely co-aligning with "testable docs" >>tends towards divergence. > > I've no doubt that doctests could be improved, but I actually find them quite > usable as is, with just a little bit of glue code to get it all hooked up. ?As > I say though, I'm biased against docstring doctests. Well, hopefully, I've convinced you a little that the limitations in doctests over unittests are almost, if not entirely due, to the incompleteness of the module. ?If the two items I mentioned were implemented I think it would be far superior to unittest. ?(Corner cases, etc can all find a place, because every corner case should be documented somewhere anyway!!) Cheers!! mark santa fe, nm From dreamingforward at gmail.com Mon Feb 27 21:01:49 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 13:01:49 -0700 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: <4F4BE07C.1000505@stoneleaf.us> Message-ID: On Mon, Feb 27, 2012 at 12:58 PM, Ethan Furman wrote: > Mark Janssen wrote: >> Personally, I don't find unittest very suitable for test-driven >> *development*, although it *is* obviously well-suited for writing >> assurance tests otherwise. > > I like unittest for TDD. I should probably correct myself. ?It is suiltable, just not enjoyable. ?But now I know you are someone who likes all that arcana of unittest module. > unittest can be a bit bulky, but definitely worth it IMO, especially when > covering the corner cases. Corner cases are generally useful for the developer to know about, so its worth it to mention (==> test) in the documentation. > I have not used doctest, but I can say that I strongly dislike having more > than one or two examples in a docstring. This is often just a failure to separate tests property among different methods. > The other gripe I have (possibly easily fixed): my python prompt is '-->' > (makes email posting easier) -- should my doctests still use '>>>'? ?Will > doctest fail on my machine? As written, yes, but easily changeable in the module code for your unique case.... mark From ethan at stoneleaf.us Mon Feb 27 21:22:10 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Feb 2012 12:22:10 -0800 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: <4F4BE07C.1000505@stoneleaf.us> Message-ID: <4F4BE5F2.0@stoneleaf.us> Mark Janssen wrote: > On Mon, Feb 27, 2012 at 12:58 PM, Ethan Furman wrote: >> Mark Janssen wrote: >>> Personally, I don't find unittest very suitable for test-driven >>> *development*, although it *is* obviously well-suited for writing >>> assurance tests otherwise. >> I like unittest for TDD. > > I should probably correct myself. It is suiltable, just not > enjoyable. But now I know you are someone who likes all that arcana > of unittest module. I'm not sure about *that* -- having to exactly reproduce the output of the interpreter seems kind of arcane to me. ;) >> unittest can be a bit bulky, but definitely worth it IMO, especially when >> covering the corner cases. > > Corner cases are generally useful for the developer to know about, so > its worth it to mention (==> test) in the documentation. Absolutely. I can see great value to using doctest on documentation, and even on code itself -- as I mentioned already, I just hate having code cluttered with lots of non-code. The other thing I like about unittest as opposed to doctest is the ability to be exhaustive. For an example, take a look at the tests I have for my dbf module on PyPI -- not even sure how I could convert that into a doctest format. >> I have not used doctest, but I can say that I strongly dislike having more >> than one or two examples in a docstring. > > This is often just a failure to separate tests property among different methods. > >> The other gripe I have (possibly easily fixed): my python prompt is '-->' >> (makes email posting easier) -- should my doctests still use '>>>'? Will >> doctest fail on my machine? > > As written, yes, but easily changeable in the module code for your > unique case.... Go with the Source, eh? I can live with that. :) ~Ethan~ From ned at nedbatchelder.com Mon Feb 27 21:14:55 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 27 Feb 2012 15:14:55 -0500 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: Message-ID: <4F4BE43F.6090003@nedbatchelder.com> On 2/27/2012 2:23 PM, Mark Janssen wrote: > I just realized I've been replying personally to these replies instead > of the whole list (damn I hate that!). So resending a bunch of > messages that went to individuals. [Mark] > On Fri, Feb 17, 2012 at 3:12 PM, Nick Coghlan wrote: >> On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen wrote: >>> Anyway... of course patches welcome, yes... ;^) >> Not really. doctest is for *testing code example in docs*. > I understand. This is exactly what I was wanting to use it for. As > Tim says "literate testing" or "executable documentation". I think you misunderstand: Nick meant, "doctest is only useful for testing the snippets of code that naturally appear in documentation meant for people to read." Many people agree with this sentiment, and find doctest unsuitable for writing comprehensive tests. > The suggestions I made are for enhancing those two. > > Personally, I don't find unittest very suitable for test-driven > *development*, although it *is* obviously well-suited for writing > assurance tests otherwise. > > The key difference, to me, is in that doctest promotes tests being > written in order to have the *additional functionality* of > documentation. That makes it fun since your getting "twice the > value for the cost of one", and that alone is the major item which > drives test-driven development (IMHO) within the spirit of python, > otherwise unittest is rather bulky to write in and of itself. > > Does anyone really use unittest outside the context of shop policy? Many, many people use unittest, namely, all of us that think doctest is a cute idea, but its many limitations hobble it for serious work. > > mark > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From fuzzyman at gmail.com Mon Feb 27 21:35:08 2012 From: fuzzyman at gmail.com (Michael Foord) Date: Mon, 27 Feb 2012 20:35:08 +0000 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On 18 February 2012 04:24, Ian Bicking wrote: > On Feb 17, 2012 4:12 PM, "Nick Coghlan" wrote: > > > > On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen > wrote: > > > Anyway... of course patches welcome, yes... ;^) > > > > Not really. doctest is for *testing code example in docs*. If you try > > to use it for more than that, it's likely to drive you up the wall, so > > proposals to make it more than it is usually don't get a great > > reception (docs patches to make it's limitations clearer are generally > > welcome, though). The stdib solution for test driven development is > > unittest (the vast majority of our own regression suite is written > > that way - only a small proportion uses doctest). > > This pessimistic attitude is why doctest is challenging to work with at > times, not anything to do with doctest's actual model. The constant > criticisms of doctest keep contributors away, and keep its many resolvable > problems from being resolved. > Personally I think there are several fundamental problems with doctest *as a unit testing tool*. doctest is *awesome* for testing documentation examples but in particular this one: * Every line becomes an assertion - in a unit test you typically follow the arrange -> act -> assert pattern. Only the results of the *assertion* are relevant to the test. (Obviously unexpected exceptions at any stage are relevant....). With doctest you have to take care to ensure that the exact output of *every line* of your arrange and act steps also match, even if they are irrelevant to your assertion. (The arrange and act steps will often include lines where you are creating state, and their output is irrelevant so long as they put the right things in place.) The particular implementation of doctest means that there are additional, potentially resolvable problems that are also a damn nuisance in a unit testing fail: Execution of an individual testing section continues after a failure. So a single failure results in the *reporting* of potentially many failures. The problem of being dependent on order of unorderable types (actually very difficult to solve). Things like shared fixtures and mocking become *harder* (although by no means impossible) in a doctest environment. Another thing I dislike is that it encourages a "test last" approach, as by far the easiest way of generating doctests is to copy and paste from the interactive interpreter. The alternative is lots of annoying typing of '>>>' and '...', and as you're editing text and not code IDE support tends to be worse (although this is a tooling issue and not a problem with doctest itself). So whilst I'm not against improving doctest, I don't promote it as a unit testing tool and disagree that it is suited to that task. All the best, Michael Foord > > An interesting third party alternative that has been created recently > > is behave: http://crate.io/packages/behave/ > > This style of test is why it's so sad that doctest is ignored and > unmaintained. It's based on testing patterns developed by people who care > to promote what they are doing, but I'm of the strong opinion that they are > inferior to doctest. > > Ian > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Mon Feb 27 21:39:37 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 27 Feb 2012 15:39:37 -0500 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> Message-ID: On Sat, Feb 25, 2012 at 6:32 PM, Masklinn wrote: > On 2012-02-26, at 00:05 , Steven D'Aprano wrote: >> - Immutable types can be used as keys in dicts. Not always; for example, you can't use a tuple of lists, even though the tuple itself is immutable. > *technically*, you can use mutable types as dict keys if you define > their __hash__ no? That is of course a bad idea when the instances > are *expected* to be modified, but it should "work". Not even a bad idea, if you define the hash carefully. (Similar to java final.) Once hash(obj) returns something other than -1, it should return that same value forever. Attributes which do not contribute to the hash can certainly still change. That said, I would be nervous about changes to attributes that contribute to __eq__, just because third party code may be so surprised. >>> class Str(str): pass >>> a=Str("a") >>> a.x=5 >>> a == "a" True >>> "x" in dir("a") False >>> "x" in dir(a) True -jJ From dreamingforward at gmail.com Mon Feb 27 21:43:14 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 13:43:14 -0700 Subject: [Python-ideas] Fwd: doctest (and.... python3000) Message-ID: On Mon, Feb 27, 2012 at 1:22 PM, Ethan Furman wrote: >> I should probably correct myself. It is suiltable, just not >> enjoyable. But now I know you are someone who likes all that arcana >> of unittest module. > > I'm not sure about *that* -- having to exactly reproduce the output of the > interpreter seems kind of arcane to me. ;) Well, you're an interesting test case for a theory -- some people shouldn't be coding in python... Python, as I see, is "the coder's language". It's meant for a programmers who want to write code for the sake of their art -- coding for him/herself firstly (and their community) and secondly for "industrial productions" -- shops that just churn out working apps without a consideration for the art. In the latter case, tests won't be for future coders in your community, but for maintaining "*la machine*" -- the simple, logical machine in your code. This, to me, is the primary split between those of us who still have high hopes for a true Python3000 (now evolved into python4000 because of release v3) and the rest.... Accurate in your case? mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Feb 27 20:58:52 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Feb 2012 11:58:52 -0800 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: Message-ID: <4F4BE07C.1000505@stoneleaf.us> Mark Janssen wrote: > On Fri, Feb 17, 2012 at 3:12 PM, Nick Coghlan wrote: >> On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen wrote: >>> Anyway... of course patches welcome, yes... ;^) >> Not really. doctest is for *testing code example in docs*. > > I understand. This is exactly what I was wanting to use it for. As > Tim says "literate testing" or "executable documentation". > > The suggestions I made are for enhancing those two. > > Personally, I don't find unittest very suitable for test-driven > *development*, although it *is* obviously well-suited for writing > assurance tests otherwise. I like unittest for TDD. > The key difference, to me, is in that doctest promotes tests being > written in order to have the *additional functionality* of > documentation. That makes it fun since your getting "twice the > value for the cost of one", and that alone is the major item which > drives test-driven development (IMHO) within the spirit of python, > otherwise unittest is rather bulky to write in and of itself. unittest can be a bit bulky, but definitely worth it IMO, especially when covering the corner cases. I have not used doctest, but I can say that I strongly dislike having more than one or two examples in a docstring. Having all possibilities (including corner cases) in a separate file I am okay with (as that would be documentation -- when I'm reading code I want to see code, and I'll look up the docs if I have a question). The other gripe I have (possibly easily fixed): my python prompt is '-->' (makes email posting easier) -- should my doctests still use '>>>'? Will doctest fail on my machine? > Does anyone really use unittest outside the context of shop policy? Yup. From ben+python at benfinney.id.au Mon Feb 27 22:35:37 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 28 Feb 2012 08:35:37 +1100 Subject: [Python-ideas] Fwd: doctest References: Message-ID: <87k438q7rq.fsf@benfinney.id.au> Mark Janssen writes: > The key difference, to me, is in that doctest promotes tests being > written in order to have the *additional functionality* of > documentation. I think that doctest promotes docstrings being written with the additional functionality of tests. To that extent, it is very good. -- \ ?The fact that I have no remedy for all the sorrows of the | `\ world is no reason for my accepting yours. It simply supports | _o__) the strong probability that yours is a fake.? ?Henry L. Mencken | Ben Finney From phd at phdru.name Mon Feb 27 22:53:08 2012 From: phd at phdru.name (Oleg Broytman) Date: Tue, 28 Feb 2012 01:53:08 +0400 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> <20120227184942.GA12927@iskra.aviel.ru> Message-ID: <20120227215307.GA20426@iskra.aviel.ru> On Mon, Feb 27, 2012 at 07:55:50PM +0100, Victor Stinner wrote: > 2012/2/27 Oleg Broytman : > > On Mon, Feb 27, 2012 at 06:35:57PM +0000, Rob Cliffe wrote: > >> I suggested a "mutable" attribute some time ago. > >> This could lead to finally doing away with one of Python's FAQs: Why > >> does python have lists AND tuples? ?They could be unified into a > >> single type. > > > > ? The main difference between lists and tuples is not mutability but > > usage: lists are for a (unknown) number of similar items (a list of > > messages, e.g.), tuples are for a (known) number of different items at > > fixed positions (an address is a tuple of (country, city, street > > address), for example). > > And tuple doesn't have append, extend, remove, ... methods. Tuples are *also* read only, but being read only lists is not their main purpose. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From dreamingforward at gmail.com Mon Feb 27 23:25:14 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 15:25:14 -0700 Subject: [Python-ideas] Fwd: doctest (and.... python3000) In-Reply-To: References: <4F4C00B6.9020406@stoneleaf.us> Message-ID: On Mon, Feb 27, 2012 at 3:16 PM, Ethan Furman wrote: > Mark Janssen wrote: > > On Mon, Feb 27, 2012 at 1:22 PM, Ethan Furman wrote: >> >>> I should probably correct myself. It is suiltable, just not >>>> enjoyable. But now I know you are someone who likes all that arcana >>>> of unittest module. >>>> >>> >>> I'm not sure about *that* -- having to exactly reproduce the output of >>> the interpreter seems kind of arcane to me. ;) >>> >> >> Well, you're an interesting test case for a theory -- some people >> shouldn't be coding in python... >> > > Wow. Talk about mixed emotions -- on the one hand I totally agree with > you, on the other I haven't been that offended in quite some time. ;) > > Haha, okay. Sorry, I was a bit blunt there. > > Python, as I see, is "the coder's language". It's meant for a >> programmers who want to write code for the sake of their art -- coding for >> him/herself firstly (and their community) and secondly for "industrial >> productions" -- shops that just churn out working apps without a >> consideration for the art. >> > > While Python is the most enjoyable language I have ever used, I strive for > mastery and beauty in all the languages I work with. One of Python's big > strengths is it's simplicity, while still allowing for great power (with > it's data structures, exception handling, metaclasses (okay, not so simple > there ;)). > Have you seen Ada, Oberon? For some reason I couldn't begin to describe, I think you might actually like them better. But, hey, I'm happy there're people who enjoy python. > In the latter case, tests won't be for future coders in your community, >> but for maintaining "/la machine/" -- the simple, logical machine in your >> code. >> > > I fail to see your point here with regards to doctest versus unittest. > When I actually write the docs for my dbf module (simple Sphinx generated > at the moment), I will have examples in it and run it through doctest. > However, I will still have the unit tests as the primary test bench for it. > Hmm, I guess you're kind of a hybrid, then... > As an example, for the dBase III table type there are five field types. > There is a test for a table with each possible combination (not > permutation) of one to five of those field types (okay, so I'm slightly > paranoid, too ;) -- would you really want to see that in your documentation? No, you are right there. But this look's like a case of hybridized code -- you aren't able to make doctests at a fine-enough granularity in order to ensure your code so they go up a level of abstraction where it gets bulky and no longer self-documenting. If Python 3 is so hope-dashing, perhaps you should fork your own version? > > Well, I still have hopes for it, it's just still in progress... I appreciate your reply, mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Feb 27 23:16:22 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Feb 2012 14:16:22 -0800 Subject: [Python-ideas] Fwd: doctest (and.... python3000) In-Reply-To: References: Message-ID: <4F4C00B6.9020406@stoneleaf.us> Mark Janssen wrote: > On Mon, Feb 27, 2012 at 1:22 PM, Ethan Furman wrote: >>> I should probably correct myself. It is suiltable, just not >>> enjoyable. But now I know you are someone who likes all that arcana >>> of unittest module. >> >> I'm not sure about *that* -- having to exactly reproduce the output >> of the interpreter seems kind of arcane to me. ;) > > Well, you're an interesting test case for a theory -- some people > shouldn't be coding in python... Wow. Talk about mixed emotions -- on the one hand I totally agree with you, on the other I haven't been that offended in quite some time. ;) > Python, as I see, is "the coder's language". It's meant for a > programmers who want to write code for the sake of their art -- coding > for him/herself firstly (and their community) and secondly for > "industrial productions" -- shops that just churn out working apps > without a consideration for the art. While Python is the most enjoyable language I have ever used, I strive for mastery and beauty in all the languages I work with. One of Python's big strengths is it's simplicity, while still allowing for great power (with it's data structures, exception handling, metaclasses (okay, not so simple there ;)). > In the latter case, tests won't be for future coders in your community, > but for maintaining "/la machine/" -- the simple, logical machine in > your code. I fail to see your point here with regards to doctest versus unittest. When I actually write the docs for my dbf module (simple Sphinx generated at the moment), I will have examples in it and run it through doctest. However, I will still have the unit tests as the primary test bench for it. As an example, for the dBase III table type there are five field types. There is a test for a table with each possible combination (not permutation) of one to five of those field types (okay, so I'm slightly paranoid, too ;) -- would you really want to see that in your documentation? > This, to me, is the primary split between those of us who still have > high hopes for a true Python3000 (now evolved into python4000 because of > release v3) and the rest.... Overall I am quite happy with Py3k. I seriously doubt that I would be 100% satisfied with somebody else's language simply because we are not the same individual and so have different preferences. I can say I am at least 95% happy with Python, which is the best approval rating I have been able to give since Assembly. If Python 3 is so hope-dashing, perhaps you should fork your own version? > Accurate in your case? That I shouldn't be using Python? No, inaccurate. That I am part of the bunch so disappointed with Py3k that I am yearning for Py4k? No, inaccurate. ~Ethan~ From fuzzyman at gmail.com Mon Feb 27 23:59:17 2012 From: fuzzyman at gmail.com (Michael Foord) Date: Mon, 27 Feb 2012 22:59:17 +0000 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On 27 February 2012 20:35, Michael Foord wrote: > > > On 18 February 2012 04:24, Ian Bicking wrote: > >> On Feb 17, 2012 4:12 PM, "Nick Coghlan" wrote: >> > >> > On Sat, Feb 18, 2012 at 7:57 AM, Mark Janssen < >> dreamingforward at gmail.com> wrote: >> > > Anyway... of course patches welcome, yes... ;^) >> > >> > Not really. doctest is for *testing code example in docs*. If you try >> > to use it for more than that, it's likely to drive you up the wall, so >> > proposals to make it more than it is usually don't get a great >> > reception (docs patches to make it's limitations clearer are generally >> > welcome, though). The stdib solution for test driven development is >> > unittest (the vast majority of our own regression suite is written >> > that way - only a small proportion uses doctest). >> >> This pessimistic attitude is why doctest is challenging to work with at >> times, not anything to do with doctest's actual model. The constant >> criticisms of doctest keep contributors away, and keep its many resolvable >> problems from being resolved. >> > > Personally I think there are several fundamental problems with doctest *as > a unit testing tool*. doctest is *awesome* for testing documentation > examples but in particular this one: > > * Every line becomes an assertion - in a unit test you typically follow > the arrange -> act -> assert pattern. Only the results of the *assertion* > are relevant to the test. (Obviously unexpected exceptions at any stage are > relevant....). With doctest you have to take care to ensure that the exact > output of *every line* of your arrange and act steps also match, even if > they are irrelevant to your assertion. (The arrange and act steps will > often include lines where you are creating state, and their output is > irrelevant so long as they put the right things in place.) > > The particular implementation of doctest means that there are additional, > potentially resolvable problems that are also a damn nuisance in a unit > testing fail: > Jeepers, I changed direction mid-sentence there. It should have read something along the lines of: As well as fundamental problems, the particular implementation of doctest suffers from these potentially resolvable problems: > > Execution of an individual testing section continues after a failure. So a > single failure results in the *reporting* of potentially many failures. > > The problem of being dependent on order of unorderable types (actually > very difficult to solve). > > Things like shared fixtures and mocking become *harder* (although by no > means impossible) in a doctest environment. > > Another thing I dislike is that it encourages a "test last" approach, as > by far the easiest way of generating doctests is to copy and paste from the > interactive interpreter. The alternative is lots of annoying typing of > '>>>' and '...', and as you're editing text and not code IDE support tends > to be worse (although this is a tooling issue and not a problem with > doctest itself). > > More fundamental-ish problems: Putting debugging prints into a function can break a myriad of tests (because they're output based). With multiple doctest blocks in a test file running an individual test can be difficult (impossible?). I may be misremembering, but I think debugging support is also problematic because of the stdout redirection. So yeah. Not a huge fan. All the best, Michael > So whilst I'm not against improving doctest, I don't promote it as a unit > testing tool and disagree that it is suited to that task. > > All the best, > > Michael Foord > > > > >> > An interesting third party alternative that has been created recently >> > is behave: http://crate.io/packages/behave/ >> >> This style of test is why it's so sad that doctest is ignored and >> unmaintained. It's based on testing patterns developed by people who care >> to promote what they are doing, but I'm of the strong opinion that they are >> inferior to doctest. >> >> Ian >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > > May you share freely, never taking more than you give. > -- the sqlite blessing http://www.sqlite.org/different.html > > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at gmail.com Tue Feb 28 00:20:44 2012 From: fuzzyman at gmail.com (Michael Foord) Date: Mon, 27 Feb 2012 23:20:44 +0000 Subject: [Python-ideas] doctest In-Reply-To: References: <4F3F3240.4090104@pearwood.info> Message-ID: On 27 February 2012 19:27, Mark Janssen wrote: > On Fri, Feb 17, 2012 at 10:08 PM, Steven D'Aprano > wrote: > > Mark Janssen wrote: > >> 1. Execution context determined by outer-scope doctest defintions. > > > > Can you give an example of how you would like this to work? > > > > Sure, I wish I had a good example off the top of my head, but perhaps > this will convey the idea: > > class MyClass(): > """Yadda Yadda: foo's bars. > > >>> m = MyClass({some, sufficiently, interesting, initialization}) > #tPOINT1: this variable (m) now accessible by all methods. > "foobar check" #POINT2: possible output here is a useful test case > not well-definable elsewhere without losing context. > """ > > def method1(self, other): > """Method method method method. > > >>> m.method("foo") #Now we see m is already defined and useable. > "bar" > """ > > def meth2(self, other): > """Method to foo all bars > > >>> m.method("bar") #would have to decide whether a fresh m is > redefined with each innerscope doctest (if we want side-effects to > carry across inner doctests). > > (END) > > This is a basic example, sorry it's rather crude. There's probably a > better example. (Think establishing a network socket connection or > something in the class' doc which is then used by all the methods, for > example.) > > >> 2. Smart Comparisons that will detect output of a non-ordered type > >> (dict/set), lift and recast it and do a real comparison.> > > > > I would love to see a doctest directive that accepted differences in > output > > order, e.g. would match {1, 2, 3} and {3, 1, 2}. But I think that's a > hard > > problem to solve in the general case. > > I think this would be as simple as lifting the (string) output and > doing an eval("{1,2,3}")=={3,2,1}, or (for security) using > ast.literal_eval like Devin suggested. > > How will that handle not-particularly-obscure code like this: >>> class Foo(object): ... def __init__(self, a): ... self.a = a ... def __repr__(self): ... return '' % self.a ... >>> a = {Foo(1), Foo(2), Foo(3)} >>> b = {Foo(4), Foo(5), Foo(6)} >>> {'first': a, 'second': b} {'second': set([, , ]), 'first': set([, , ])} I don't think a *general* solution for unordered types is even possible because you can't parse arbitrary reprs. All the best, Michael > > I'd like a #3 as well: an abbreviated way to spell doctest directives, > > because they invariably push my tests well past the 80 character mark. > > Hmm, seem like an alias could be defined easily enough, but I'll try > to think about this when I have more time. > > mark > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.cliffe at btinternet.com Tue Feb 28 00:18:57 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 27 Feb 2012 23:18:57 +0000 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: <20120227215307.GA20426@iskra.aviel.ru> References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> <20120227184942.GA12927@iskra.aviel.ru> <20120227215307.GA20426@iskra.aviel.ru> Message-ID: <4F4C0F61.5020405@btinternet.com> On 27/02/2012 21:53, Oleg Broytman wrote: > On Mon, Feb 27, 2012 at 07:55:50PM +0100, Victor Stinner wrote: >> 2012/2/27 Oleg Broytman: >>> On Mon, Feb 27, 2012 at 06:35:57PM +0000, Rob Cliffe wrote: >>>> I suggested a "mutable" attribute some time ago. >>>> This could lead to finally doing away with one of Python's FAQs: Why >>>> does python have lists AND tuples? They could be unified into a >>>> single type. >>> The main difference between lists and tuples is not mutability but >>> usage: lists are for a (unknown) number of similar items (a list of >>> messages, e.g.), tuples are for a (known) number of different items at >>> fixed positions (an address is a tuple of (country, city, street >>> address), for example). >> And tuple doesn't have append, extend, remove, ... methods. > Tuples are *also* read only, but being read only lists is not their > main purpose. > > Oleg. With respect, I think you are thinking too narrowly, conditioned by familiar usage. Items of a list do not have to be similar (there is nothing in the language that implies that). And tuples are often - conceptually - extended, even though it actually has to be done by building a new tuple - Python even allows you to write tuple1 += tuple2 A unified type would have "mutating" methods such as append - it's just that they would raise an error if the object's flag (however it was implemented) defined it as immutable. I visualised an actual object attribute, e.g. __mutable__, that could be set to a boolean value. But having __mutable__() and __immutable__() methods as suggested by Eric is an alternative. And there may well be others. Rob Cliffe From fuzzyman at gmail.com Tue Feb 28 00:31:17 2012 From: fuzzyman at gmail.com (Michael Foord) Date: Mon, 27 Feb 2012 23:31:17 +0000 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On 27 February 2012 23:23, Mark Janssen wrote: > On Mon, Feb 27, 2012 at 3:59 PM, Michael Foord wrote: >> >> As well as fundamental problems, the particular implementation of doctest >> suffers from these potentially resolvable problems: >> >> >>> Execution of an individual testing section continues after a failure. So >>> a single failure results in the *reporting* of potentially many failures. >>> >>> Hmm, perhaps I don't understand you. doctest reports how many failures > occur, without blocking on any single failure. > Right. But you typically group a bunch of actions into a single "test". If a doctest fails in an early action then every line after that will probably fail - a single test failure will cause multiple *reported* failures. > > >> The problem of being dependent on order of unorderable types (actually >>> very difficult to solve). >>> >> > Well, a crude solution is just to lift any output text that denotes an > non-ordered type and pass it through an "eval" operation. > Not a general solution - not all reprs are reversible (in fact very few are as a proportion of all objects). > > >> Things like shared fixtures and mocking become *harder* (although by no >>> means impossible) in a doctest environment. >>> >>> > This, I think, what I was suggesting with doctest "scoping" where the > execution environment is a matter of how nested the docstring is in > relation to the "python semantic environment", with a final scope of > "globs" that can be passed into the test environment, for anything with > global scope. > > >> Another thing I dislike is that it encourages a "test last" approach, as >>> by far the easiest way of generating doctests is to copy and paste from the >>> interactive interpreter. The alternative is lots of annoying typing of >>> '>>>' and '...', and as you're editing text and not code IDE support tends >>> to be worse (although this is a tooling issue and not a problem with >>> doctest itself). >>> >> > This is where I think the idea of having a test() built-in, like help(), > would really be nice. One could run test(myClass.mymethod) iterively while > one codes, encouraging TDD and writing tests *along with* your code. My > TDD sense says it couldn't get any better. > > >> More fundamental-ish problems: >> >> Putting debugging prints into a function can break a myriad of tests >> (because they're output based). >> > > That's a good point. But then it's a fairly simple matter of adding the > output device: 'print >> stderr, 'here I am'", another possibility, if TDD > were to become more of part of the language, is a special debug exception: > "raise Debug("Am at the test point, ", x)" Such special exceptions could > be caught and ignored by doctest. > > >> With multiple doctest blocks in a test file running an individual >> test can be difficult (impossible?). >> >> This again solved with the test() built-in an making TDD something that > is a feature of the language itself. > I don't fully follow you, but it shouldn't be hard to add this to doctest and see if it is really useful. > > >> I may be misremembering, but I think debugging support is also >> problematic because of the stdout redirection >> > > Interesting, I try to pre-conceive tests well enough so I never need to > invoke the debugger. > Heh. When I'm adding new features to existing code it is very common for me to write a test that drops into the debugger after setting up some state - and potentially using the test infrastructure (fixtures, django test client perhaps, etc). So not being able to run a single test or drop into a debugger puts the kybosh on that. Michael > > >> So yeah. Not a huge fan. >> >> That's good feedback. Thanks. > > Mark > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Tue Feb 28 00:44:02 2012 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 27 Feb 2012 17:44:02 -0600 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Mon, Feb 27, 2012 at 5:31 PM, Michael Foord wrote: > > > On 27 February 2012 23:23, Mark Janssen wrote: > >> On Mon, Feb 27, 2012 at 3:59 PM, Michael Foord wrote: >>> >>> As well as fundamental problems, the particular implementation of >>> doctest suffers from these potentially resolvable problems: >>> >>> >>>> Execution of an individual testing section continues after a failure. >>>> So a single failure results in the *reporting* of potentially many failures. >>>> >>>> Hmm, perhaps I don't understand you. doctest reports how many failures >> occur, without blocking on any single failure. >> > > > Right. But you typically group a bunch of actions into a single "test". > If a doctest fails in an early action then every line after that will > probably fail - a single test failure will cause multiple *reported* > failures. > > >> >> >>> The problem of being dependent on order of unorderable types (actually >>>> very difficult to solve). >>>> >>> >> Well, a crude solution is just to lift any output text that denotes an >> non-ordered type and pass it through an "eval" operation. >> > > > Not a general solution - not all reprs are reversible (in fact very few > are as a proportion of all objects). > Just an implementation suggestion - Guido's suggestion of using sys.displayhook will work to change the repr of objects (I had never heard of it until then, and had to test to convince myself). Doctest needs reliable repr's more than reversable repr's, and you can create them using that. You'll still get a lot of strings, which suck... but if you are committed to doctest then maybe better to provide good __repr__ methods on your custom objects! For doctest.js (where I implemented a number of changes I would have wanted for doctest in Python) I have found this sort of thing sufficient, but Javascript objects tend to be a little more bare and there aren't existing conventions for repr/print/etc, so I have some more flexibility in my implementation. Ian -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Tue Feb 28 00:48:29 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 27 Feb 2012 16:48:29 -0700 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> Message-ID: On Mon, Feb 27, 2012 at 12:18 PM, Eric Snow wrote: > On Mon, Feb 27, 2012 at 11:45 AM, Mark Janssen > wrote: >> On Mon, Feb 27, 2012 at 11:35 AM, Rob Cliffe wrote: >>> I suggested a "mutable" attribute some time ago. >>> This could lead to finally doing away with one of Python's FAQs: Why does >>> python have lists AND tuples? ?They could be unified into a single type. >>> Rob Cliffe. >> >> Yeah, that would be cool. ?It would force (ok, *allow*) the >> documenting of any non-mutable attributes (i.e. when they're mutable, >> and why they're being set immutable, etc.). >> >> There an interesting question, then, should the mutable bit be on the >> Object itself (the whole type) or in each instance....? ?There's >> probably no "provable" or abstract answer to this, but rather just an >> organization principle to the language.... > > In contrast to a flag on objects, one alternative is to have a > __mutable__() method for immutable types and __immutable__() for > mutable types. ?I'd be nervous about being able to make an immutable > object mutable at an arbitrary moment with the associated effect on > the hash of the object. Just to be clear, I meant that __mutable__() would return a mutable version of the object, of a distinct mutable type, if the object supported one. So for a tuple, it would return the corresponding list. These would be distinct objects. Likewise obj.__immutable__() would return a separate, immutable version of obj. Such an approach could be applied to lists/tuples, sets/frozensets, strings/bytearrays, bytes/bytearrays, and any other pairings we already have. Unless a frozendict were added as a standard type, dict would not have a match so an __immutable__() method would not be added. In that case, trying to call dict.__immutable__() would be an AttributeError, as happens now. -eric From ncoghlan at gmail.com Tue Feb 28 01:15:19 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 28 Feb 2012 10:15:19 +1000 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Tue, Feb 28, 2012 at 9:44 AM, Ian Bicking wrote: > Just an implementation suggestion - Guido's suggestion of using > sys.displayhook will work to change the repr of objects (I had never heard > of it until then, and had to test to convince myself).? Doctest needs > reliable repr's more than reversable repr's, and you can create them using > that.? You'll still get a lot of > strings, which suck... but if you are committed to doctest then maybe better > to provide good __repr__ methods on your custom objects!? For doctest.js > (where I implemented a number of changes I would have wanted for doctest in > Python) I have found this sort of thing sufficient, but Javascript objects > tend to be a little more bare and there aren't existing conventions for > repr/print/etc, so I have some more flexibility in my implementation. You can actually do some pretty cool doctest hacks via displayhook and excepthook. I created a hacked together doctest variant [1] years ago that could run doctests from ODT files and also pay attention to sys.excepthook/displayhook before deciding that the test had failed. [1] http://svn.python.org/view/sandbox/trunk/userref/ Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From ncoghlan at gmail.com Tue Feb 28 01:26:38 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 28 Feb 2012 10:26:38 +1000 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> Message-ID: On Tue, Feb 28, 2012 at 9:48 AM, Eric Snow wrote: > Such an approach could be applied to lists/tuples, sets/frozensets, > strings/bytearrays, bytes/bytearrays, and any other pairings we > already have. ?Unless a frozendict were added as a standard type, dict > would not have a match so an __immutable__() method would not be > added. ?In that case, trying to call dict.__immutable__() would be an > AttributeError, as happens now. Folks, before retreading this ground, please make sure to review the relevant past history and decide what (if anything) has changed since Barry proposed the freeze protocol 5 years ago and the PEP was rejected: http://www.python.org/dev/peps/pep-0351/ While hypergeneralisation of this behaviour is tempting, it really isn't a solid abstraction. It's better to make use case specific design decisions that handle all the corner cases relating to mutable vs immutable variants of *particular* container types. The issues you have to consider when converting a list to a tuple are not the same as those that exist when converting bytearray to bytes or a set to a frozenset. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From sven at marnach.net Tue Feb 28 00:39:06 2012 From: sven at marnach.net (Sven Marnach) Date: Mon, 27 Feb 2012 23:39:06 +0000 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: Message-ID: <20120227233906.GB3406@pantoffel-wg.de> An easy way to create immutable instances is 'collections.namedtuple': X = namedtuple("X", "a b") x = X(a=4, b=2) x.a + x.b # fine x.a = 5 # AttributeError: can't set attribute x.c = 5 # AttributeError: 'X' object has no attribute 'c' Tricks using 'object.__setattr__()' etc. will fail since the instance doesn't have a '__dict__'. The only data in the instance is stored in a tuple, so it's as immutable as a tuple. You can also derive from 'X' to add further methods. Remember to set '__slots__' to an empty iterable to maintain immutability. Cheers, Sven From ericsnowcurrently at gmail.com Tue Feb 28 01:51:17 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 27 Feb 2012 17:51:17 -0700 Subject: [Python-ideas] Support other dict types for type.__dict__ In-Reply-To: References: <4F48DF86.7060600@nedbatchelder.com> <4F496954.30101@pearwood.info> <4F4BCD0D.8040706@btinternet.com> Message-ID: On Mon, Feb 27, 2012 at 5:26 PM, Nick Coghlan wrote: > On Tue, Feb 28, 2012 at 9:48 AM, Eric Snow wrote: >> Such an approach could be applied to lists/tuples, sets/frozensets, >> strings/bytearrays, bytes/bytearrays, and any other pairings we >> already have. ?Unless a frozendict were added as a standard type, dict >> would not have a match so an __immutable__() method would not be >> added. ?In that case, trying to call dict.__immutable__() would be an >> AttributeError, as happens now. > > Folks, before retreading this ground, please make sure to review the > relevant past history and decide what (if anything) has changed since > Barry proposed the freeze protocol 5 years ago and the PEP was > rejected: http://www.python.org/dev/peps/pep-0351/ > > While hypergeneralisation of this behaviour is tempting, it really > isn't a solid abstraction. It's better to make use case specific > design decisions that handle all the corner cases relating to mutable > vs immutable variants of *particular* container types. The issues you > have to consider when converting a list to a tuple are not the same as > those that exist when converting bytearray to bytes or a set to a > frozenset. Point taken. :) I knew I'd heard the idea somewhere. I appreciate how Raymond reacts here: http://mail.python.org/pipermail/python-dev/2006-February/060802.html and how Greg Ewing responds here: http://mail.python.org/pipermail/python-dev/2006-February/060822.html My point was that an __immutable__ flag was not a good idea. However, I agree that the generic protocol is likewise inadvisable because it fosters a generic design approach where a generic one is not appropriate. -eric From ethan at stoneleaf.us Tue Feb 28 01:00:02 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Feb 2012 16:00:02 -0800 Subject: [Python-ideas] Fwd: doctest (and.... python3000) In-Reply-To: References: <4F4C00B6.9020406@stoneleaf.us> <4F4C0A18.3060802@stoneleaf.us> Message-ID: <4F4C1902.4040802@stoneleaf.us> Mark Janssen wrote: > On Mon, Feb 27, 2012 at 3:56 PM, Ethan Furman > wrote: > > As probably the easiest example, what is gained by having regression > tests as a document? With unittest you write a test with the > expected output and your done. I would imagine a doctest being > something like > > """This bug introduced in version 2.7.1, fixed in 2.7.2 > >>> this = quibble('that') > >>> this.attr > 'correct value' > """ > > > Huh? Perhaps I'm being dumb, but this is generally done outside of > unittest and within the code itself, something like: > > if sys.version > 2.4: > this= quibbleV4 > else: > this = quibbleV3 The choice of python similar version numbers was probably a mistake. The point is quibbleV3 has a bug in it, and I want to make sure that bug doesn't come back in later versions -- so I add a test in my unit tests to make sure that it doesn't. ~Ethan~ From dreamingforward at gmail.com Tue Feb 28 05:34:02 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 21:34:02 -0700 Subject: [Python-ideas] adding a Debug exception? Message-ID: Had an idea on another thread (doctest) about a special exception called "Debug" that could could be raised to generate arbitrary output to stderr. This would be used instead of spurious print statements in code to inform developers during debugging (which might throw off doctest, for example). It could also replace "assert" (and improve upon it) which seems to be deprecated. Also, the __debug__ global could actually gain some functionality... Its "argument" could be an `eval`uatable string (checked at compile time) and it's output, this very string *plus* the output of it (if it evaluates to something different than itself). Just an idea.... mark santa fe -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Tue Feb 28 05:52:10 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 21:52:10 -0700 Subject: [Python-ideas] Fwd: [Python-Dev] matrix operations on dict :) In-Reply-To: References: <87d39oycrv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: More messages I didn't realize weren't being sent to the group....[mark] On Wed, Feb 8, 2012 at 7:13 PM, Stephen J. Turnbull wrote: > Mark Janssen writes: > > > The math (in my world) simply decided that factorial(0)=1 as the > > convention of "an empty product" (Wikipedia::Factorial). > > In modern math (ie, post-Eilenberg-Mac Lane), it's not really a > convention (unlike, say, Euclid's Parallel Postulate); it's the only > way to go if you want the idea of product to generalize. If you don't > understand that, I have serious doubts that you know what you're > talking about. If you do understand that, please take care to be more > precise. > Awesome. I didn't know anyone else really understood this kind of issue. Yes, I want the idea to generalize. In this case, not of "product" and arithmetic (in a mathematical space), but of "object model" and the notion of "grouping" (in a set-theoretical space). So a formalization must be made, and perhaps this arena will be the place to do that. I have to say that I'm approaching this from in the domain of computer science, so in some ways creating a definition in a new "space", or at least a space separate from the Platonian "Abstract" of mathematics. Love it! cheers! Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Tue Feb 28 05:53:41 2012 From: dreamingforward at gmail.com (Mark Janssen) Date: Mon, 27 Feb 2012 21:53:41 -0700 Subject: [Python-ideas] [Python-Dev] matrix operations on dict :) In-Reply-To: References: <87d39oycrv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Perhaps more specifically, I want to define a "grouping" (as encapsulated semantically and syntactically in a dict), at the place where the transition from atomic element into a group occurs. Syntactically this will be *denoted* by the curly brackets {}, operationally this will be defined in the CPython code itself. But the semantics in the middle must be "hashed out" for either to occur (pardon the pun). The question is whether it's premature to attempt such a task or not.... mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Feb 28 07:39:50 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Feb 2012 17:39:50 +1100 Subject: [Python-ideas] adding a Debug exception? In-Reply-To: References: Message-ID: <20120228063949.GA22075@ando> On Mon, Feb 27, 2012 at 09:34:02PM -0700, Mark Janssen wrote: > Had an idea on another thread (doctest) about a special exception called > "Debug" that could could be raised to generate arbitrary output to stderr. > This would be used instead of spurious print statements in code to inform > developers during debugging (which might throw off doctest, for example). Printing to sys.stderr does not throw off doctest. If you use print(something, file=sys.stderr) # Python 3 print >>sys.stderr, something # Python 2 the output is invisible to doctest. > It could also replace "assert" (and improve upon it) which seems to be > deprecated. What makes you think assert is deprecated? Informational messages printed to stderr and assertions are completely different functions. You can't replace one with the other. > Also, the __debug__ global could actually gain some > functionality... What makes you think it doesn't? __debug__ is very useful for conditional compilation of debugging code that is safe to optimise away when running under -O. I use it in most of my projects. > Its "argument" could be an `eval`uatable string (checked at compile time) > and it's output, this very string *plus* the output of it (if it evaluates > to something different than itself). So you mean, anything except a quine would be printed? I don't get what you mean, or how you intend for this to be used. -- Steven From steve at pearwood.info Tue Feb 28 08:59:56 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Feb 2012 18:59:56 +1100 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: <20120220132832.76b772da@resist.wooz.org> Message-ID: <20120228075956.GB22075@ando> On Mon, Feb 27, 2012 at 12:28:17PM -0700, Mark Janssen wrote: > On Mon, Feb 20, 2012 at 11:28 AM, Barry Warsaw wrote: > > On Feb 17, 2012, at 02:57 PM, Mark Janssen wrote: > > FWIW, I think doctests are fantastic and I use them all the time. ?There are > > IMO a couple of things to keep in mind: > > > > ?- doctests are documentation first. ?Specifically, they are testable > > ? documentation. ?What better way to ensure that your documentation is > > ? accurate and up-to-date? ?(And no, I do not generally find skew between the > > ? code and the separate-file documentation.) I second what Barry says here. Doctests are for documentation, or at least, doctests in docstrings are for documentation, which means they should be simple and minimal, and only cover the most important parts of your function. Certainly they should only cover the interface, and never the implementation, so not used for regression testing or coverage of odd corner cases. I have no problem with extensive doctests if they are put in an external document. I do this myself. But when I call help(func), I want to learn how to use func, not see seven pages of tests that don't help me understand how to use the function. > > My doctests usually describe mostly the good path through the API. > > Occasionally I'll describe error modes if I think those are important for > > understanding how to use the code. ?However, for all those fuzzy corner cases, > > weird behaviors, bug fixes, etc., unittests are much better suited because > > ensuring you've fixed these problems and don't regress in the future doesn't > > help the narrative very much. And again, +1 with what Barry says here, which means I disagree with your response: > I think is an example of (mal)adapting to an incomplete module, rather > than fixing it. ?I think doctest can handle all the points you're > making. ?See clarification pointers below... I don't accept this argument. Doctest is designed for including example code in documentation, and ensuring that the examples are correct. For that, it does a very good job. It makes a great hammer. Don't use it when you need a spanner. It's not that doctest can't handle regression tests, but that regression tests shouldn't be put inside the the function docstring. Why should people see a test case for some bug that occurred three versions back in the documentation? Put it in a separate test suite, either unit tests, or a literate programming doc using doctest. Don't pollute the docstring with tests that aren't useful documentation. When people read your docstring, you have to expect that they are reading it in isolation. They want to know "How do I use function spam?", and any example code should show them how to use function spam: >>> data = (23, 42, 'foo', 9) >>> collector = [5] >>> spam(data, collector) >>> collector [5, 28, 70, None, 79] That works as documentation first, and as a test second. This does not: >>> spam(data, collector) # data and collector defined elsewhere >>> collector [5, 28, 70, None, 79] The fact that each docstring sees a fresh execution context is a good thing, not a bug. > >>1. Execution context determined by outer-scope doctest defintions. > > > > Can you explain this one? > > I gave an example in a prior message on this thread, dated Feb 17. ?I > think it's clear there but let me know. > > Basically, the idea is that since the class def can also have a > docstring, where better would setup and teardown code go to provide > the execution context of the inner method docstrings? Is that a trick question? I don't want docstrings to have automatic setup and teardown code at all, and if they did, I certainly don't want them to be in some other object's docstring (related or not). > Now the question: ?is it useful or appropriate to put setup and > teardown code in a classdef docstring? In my opinion, no, neither useful nor appropriate. It would be counter-productive, by reducing the value of documentation as documentation, while still being insufficiently powerful to replace unit tests. A big -1 on this. [...] > Well, hopefully, I've convinced you a little that the limitations in > doctests over unittests are almost, if not entirely due, to the > incompleteness of the module. ?If the two items I mentioned were > implemented I think it would be far superior to unittest. I already think that doctest is far superior to unittest, for testing executable examples in documentation. I don't think it is superior to unittest for unit testing, or regression testing. Nor is it inferior -- its just different. >?(Corner > cases, etc can all find a place, because every corner case should be > documented somewhere anyway!!) I think you have a different idea of "corner case" than I do. Corner cases, in my experience, refer to the implementation: does the function work correctly when the input is in the corner? Since this is testing the implementation, it shouldn't be in the documentation. The classic example is, does this list-function work when the list is empty? So I would expect that the unit tests for, say, the sorted() built-in will include a test case for sorted([]). (This applies regardless of whether you use unittest, doctest, nose, or some other testing framework.) But the documentation for sorted() don't need to explicitly state that sorting an empty list returns an empty list. That's a given from the accepted meaning of sorting -- if there's nothing to sort, you get nothing. Nor does it need to explicitly state that sorting a list with one item returns a list with one item. A single example of sorting a list of (say) four items is sufficient to document the purpose of sorted(), but it would be completely insufficient for unit testing purposes. -- Steven From ben+python at benfinney.id.au Tue Feb 28 11:24:46 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 28 Feb 2012 21:24:46 +1100 Subject: [Python-ideas] Fwd: doctest References: <20120220132832.76b772da@resist.wooz.org> <20120228075956.GB22075@ando> Message-ID: <87r4xfp85t.fsf@benfinney.id.au> Steven D'Aprano writes: > On Mon, Feb 27, 2012 at 12:28:17PM -0700, Mark Janssen wrote: > > Well, hopefully, I've convinced you a little that the limitations in > > doctests over unittests are almost, if not entirely due, to the > > incompleteness of the module. I don't think ?doctest? is incomplete. It comprehensively covers the use case for which it is designed. The ?unittest? module is limited in usefulness at testing code examples in documentation. That doesn't make it incomplete, either. > > If the two items I mentioned were implemented I think it would be > > far superior to unittest. > > I already think that doctest is far superior to unittest, for testing > executable examples in documentation. I don't think it is superior to > unittest for unit testing, or regression testing. Nor is it inferior > -- its just different. +1 QotW. -- \ ?The fact of your own existence is the most astonishing fact | `\ you'll ever have to confront. Don't dare ever see your life as | _o__) boring, monotonous, or joyless.? ?Richard Dawkins, 2010-03-10 | Ben Finney From ben+python at benfinney.id.au Tue Feb 28 12:40:37 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 28 Feb 2012 22:40:37 +1100 Subject: [Python-ideas] doctest References: Message-ID: <87ipirp4ne.fsf@benfinney.id.au> Ian Bicking writes: > On Feb 17, 2012 4:12 PM, "Nick Coghlan" wrote: > > An interesting third party alternative that has been created > > recently is behave: http://crate.io/packages/behave/ > > This style of test is why it's so sad that doctest is ignored and > unmaintained. I don't see why you draw a connection. There doesn't, to me, seem any need to expand the capabilities of ?doctest?: it does what it says on the tin, and does it well. Other tasks require other tools. > [the ?behave? library is] based on testing patterns developed by > people who care to promote what they are doing, but I'm of the strong > opinion that they are inferior to doctest. I think the code-examples-in-documentation is a good thing to have and it's what ?doctest? excels at. I don't think distorting behaviour-driven specifications, of the kind ?behave? is designed to read, to fit the doctest model would be a good thing. Can you present an argument why you think it would? -- \ ?Now Maggie, I?ll be watching you too, in case God is busy | `\ creating tornadoes or not existing.? ?Homer, _The Simpsons_ | _o__) | Ben Finney From ned at nedbatchelder.com Tue Feb 28 13:17:52 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Tue, 28 Feb 2012 07:17:52 -0500 Subject: [Python-ideas] adding a Debug exception? In-Reply-To: References: Message-ID: <4F4CC5F0.1070004@nedbatchelder.com> On 2/27/2012 11:34 PM, Mark Janssen wrote: > Had an idea on another thread (doctest) about a special exception > called "Debug" that could could be raised to generate arbitrary output > to stderr. This would be used instead of spurious print statements in > code to inform developers during debugging (which might throw off > doctest, for example). It could also replace "assert" (and improve > upon it) which seems to be deprecated. Also, the __debug__ global > could actually gain some functionality... > Raising an exception to generate a log message? You'd never execute the statement after the raise, completely destroying the flow of the code. Perhaps this idea needs a little more thought... Have you looked into the logging module? --Ned. > > mark > santa fe > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Feb 28 21:59:53 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 28 Feb 2012 15:59:53 -0500 Subject: [Python-ideas] Fwd: doctest In-Reply-To: <20120228075956.GB22075@ando> References: <20120220132832.76b772da@resist.wooz.org> <20120228075956.GB22075@ando> Message-ID: On 2/28/2012 2:59 AM, Steven D'Aprano wrote: > Corner cases, in my experience, refer to the implementation: does the > function work correctly when the input is in the corner? Since this is > testing the implementation, it shouldn't be in the documentation. It depends on the function. fact(0) = 1 could go in a doc string. So, I think, could combo(0,0) = 1. But sometimes implementations introduce a special case that is an artifact of the implementation and definitely not belong in a doc string. An real example is approximating the normal integral (cumlative normal distribution) with different approximations for [0,a] and (a,infinity). Unit tests should test f(a) and f(a+epsilonl) and the difference (should be >= 0) to make sure the two approximations 'join' properly so as to be transparent to the user. If the join point changes, so does the special unit test. -- Terry Jan Reedy From barry at python.org Tue Feb 28 23:14:14 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Feb 2012 17:14:14 -0500 Subject: [Python-ideas] doctest References: Message-ID: <20120228171414.3cc7e38a@limelight.wooz.org> On Feb 27, 2012, at 08:35 PM, Michael Foord wrote: >The problem of being dependent on order of unorderable types (actually very >difficult to solve). Actually, not so much, only because IME, I find that I rarely want to just dump the repr of such objects. That's usually going to be hard to read even if the output were sorted. Instead, I very often iterate over the items (in sorted order of course), and use ellipses to ignore the lines (i.e. items) I don't care about. In practice, I haven't found this one to be so bad. >Things like shared fixtures and mocking become *harder* (although by no >means impossible) in a doctest environment. Not if you use separate DocFileSuites. >Another thing I dislike is that it encourages a "test last" approach, as by >far the easiest way of generating doctests is to copy and paste from the >interactive interpreter. The alternative is lots of annoying typing of >'>>>' and '...', and as you're editing text and not code IDE support tends >to be worse (although this is a tooling issue and not a problem with >doctest itself). Actually, Emacs users should use rst-mode, which has no so bad support for separate file doctests. Of course, the mode is useful for reST documentation even if your documentation is untested . -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Feb 28 23:19:18 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Feb 2012 17:19:18 -0500 Subject: [Python-ideas] doctest References: Message-ID: <20120228171918.3db69bd2@limelight.wooz.org> On Feb 27, 2012, at 05:44 PM, Ian Bicking wrote: >Doctest needs reliable repr's more than reversable repr's, and you can create >them using that. You'll still get a lot of 0x391a9df> strings, which suck... but if you are committed to doctest then >maybe better to provide good __repr__ methods on your custom objects! +1 even if you don't use doctests! I can't tell you how many times adding a useful repr has vastly improved debugging. I urge everyone to flesh out your reprs with a little bit of useful information so you can quickly identify your instances at a pdb prompt. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Tue Feb 28 23:16:35 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 28 Feb 2012 17:16:35 -0500 Subject: [Python-ideas] doctest References: Message-ID: <20120228171635.6fe3e4d6@limelight.wooz.org> On Feb 27, 2012, at 10:59 PM, Michael Foord wrote: > I may be misremembering, but I think debugging support is also >problematic because of the stdout redirection. This one is largely solved too, but the trick is to put the pdb entry on the same line as the doctest line you care about, e.g.: >>> import pdb; pdb.set_trace(); command.process(None) GNU Mailman 3... When the debugger drops me into command.process(), everything Just Works. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ethan at stoneleaf.us Tue Feb 28 23:16:21 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 28 Feb 2012 14:16:21 -0800 Subject: [Python-ideas] Fwd: doctest In-Reply-To: References: <4F4BE07C.1000505@stoneleaf.us> Message-ID: <4F4D5235.5000601@stoneleaf.us> Mark Janssen wrote: > On Mon, Feb 27, 2012 at 12:58 PM, Ethan Furman wrote: >> The other gripe I have (possibly easily fixed): my python prompt is '-->' >> (makes email posting easier) -- should my doctests still use '>>>'? Will >> doctest fail on my machine? > > As written, yes, but easily changeable in the module code for your > unique case.... Which means my doctests will then fail on other's machines unless they also change their local module *and* their python prompt. Not good. ~Ethan~ From ncoghlan at gmail.com Wed Feb 29 00:07:13 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 29 Feb 2012 09:07:13 +1000 Subject: [Python-ideas] More helpers in reprlib (was Re: doctest) Message-ID: On Wed, Feb 29, 2012 at 8:19 AM, Barry Warsaw wrote: > On Feb 27, 2012, at 05:44 PM, Ian Bicking wrote: > >>Doctest needs reliable repr's more than reversable repr's, and you can create >>them using that. ?You'll still get a lot of >0x391a9df> strings, which suck... but if you are committed to doctest then >>maybe better to provide good __repr__ methods on your custom objects! > > +1 even if you don't use doctests! ?I can't tell you how many times adding a > useful repr has vastly improved debugging. ?I urge everyone to flesh out your > reprs with a little bit of useful information so you can quickly identify your > instances at a pdb prompt. Since this question came up recently, what do you think of adding some more helpers to reprlib to make this even easier to do? I know I just added some utility functions to PulpDist [1] to avoid reinventing that particular wheel for each of my class definitions. Cheers, Nick. [1] http://git.fedorahosted.org/git/?p=pulpdist.git;a=blob;f=src/pulpdist/core/util.py Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From steve at pearwood.info Wed Feb 29 00:49:32 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 29 Feb 2012 10:49:32 +1100 Subject: [Python-ideas] More helpers in reprlib (was Re: doctest) In-Reply-To: References: Message-ID: <4F4D680C.9000603@pearwood.info> Nick Coghlan wrote: > On Wed, Feb 29, 2012 at 8:19 AM, Barry Warsaw wrote: >> On Feb 27, 2012, at 05:44 PM, Ian Bicking wrote: >> >>> Doctest needs reliable repr's more than reversable repr's, and you can create >>> them using that. You'll still get a lot of >> 0x391a9df> strings, which suck... but if you are committed to doctest then >>> maybe better to provide good __repr__ methods on your custom objects! >> +1 even if you don't use doctests! I can't tell you how many times adding a >> useful repr has vastly improved debugging. I urge everyone to flesh out your >> reprs with a little bit of useful information so you can quickly identify your >> instances at a pdb prompt. > > Since this question came up recently, what do you think of adding some > more helpers to reprlib to make this even easier to do? Your question is too general. Of course people should be in favour of helpers to simplify making good reprs, but that's like asking if people are in favour of solving world hunger. Who wouldn't be? But the answer should depend on what the helpers do, how well they do them, and whether or not they actually help. I fear getting carried away with enthusiasm for repr helpers and dumping a lot of unnecessary, trivial or sub-optimal helpers in reprlib, where they will be enshrined as "the one obvious way to do it" when perhaps they shouldn't be. Since you are the author of them, I'm sure that they scratch your itches, but will they scratch other people's? I suggest publishing them as recipes on ActiveState first, and see what feedback you get. http://code.activestate.com/recipes/langs/python/top/ -- Steven From ben+python at benfinney.id.au Wed Feb 29 02:23:30 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 29 Feb 2012 12:23:30 +1100 Subject: [Python-ideas] [OT] rst-mode (was: doctest) References: <20120228171414.3cc7e38a@limelight.wooz.org> Message-ID: <87aa42ph4d.fsf_-_@benfinney.id.au> Barry Warsaw writes: > Actually, Emacs users should use rst-mode, which has no so bad support > for separate file doctests. Of course, the mode is useful for reST > documentation even if your documentation is untested . Any idea where I should send bug reports for ?rst-mode?? It's not clear to me who develops it. -- \ ?Welchen Teil von ?Gestalt? verstehen Sie nicht? [What part of | `\ ?gestalt? don't you understand?]? ?Karsten M. Self | _o__) | Ben Finney -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: not available URL: From craigyk at me.com Wed Feb 29 08:02:51 2012 From: craigyk at me.com (Craig Yoshioka) Date: Tue, 28 Feb 2012 23:02:51 -0800 Subject: [Python-ideas] revisit pep 377: good use case? Message-ID: So I've recently been trying to implement something for which I had hoped the 'with' statement would be perfect, but it turns out, won't work because Python provides no mechanism by which to skip the block of code in a with statement. I want to create some functionality to make it easy too wrap command line programs in a caching architecture. To do this there are some things that need to happen before and after the wrapped CLI program is called, a try,except,finally version might look like this: def cachedcli(*args): try: hashedoutput = hashon(args) if iscached(): return hashedoutput acquirelock() cli(*args,hashedoutput) iscached(True) return hashedoutput except AlreadyLocked: while locked: wait() return example(*args) finally: releaselock() the 'with' version would look like def cachedcli(*args) hashedpath = hashon(args) with cacheon(hashedpath): cli(hashedpath,*args) return hashedpath So obviously the 'with' statement would be a good fit, especially since non-python programmers might be wrapping their CLI programs... unfortunately I can't use 'with' because I can't find a clean way to make the with block code conditional. PEP377 suggested some mechanics that seemed a bit complicated for getting the desired effect, but I think, and correct me if I'm wrong, that the same effect could be achieved by having the __enter__ function raise a StopIteration that would be caught by the context and skip directly to the __exit__ function. The semantics of this even make some sense too me, since the closest I've been able to get to what I had hoped for was using an iterator to execute the appropriate code before and after the loop block: def cachedcli(*args) hashedpath = hashon(args) for _ in cacheon(hashedpath): cli(hashedpath,*args) return hashedpath this still seems non-ideal to me... From ncoghlan at gmail.com Wed Feb 29 09:23:39 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 29 Feb 2012 18:23:39 +1000 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: References: Message-ID: On Wed, Feb 29, 2012 at 5:02 PM, Craig Yoshioka wrote: > PEP377 suggested some mechanics that seemed a bit complicated for getting the desired effect, but I think, and correct me if I'm wrong, that the same effect could be achieved by having the __enter__ function raise a StopIteration that would be caught by the context and skip directly to the __exit__ function. It was the overhead of doing exception handling around the __enter__ call that got PEP 377 rejected. One way to handle this case is to use a separate if statement to make the flow control clear. with cm() as run_body: if run_body: # Do stuff Depending on the use case, the return value from __enter__ may be a simple flag as shown, or it may be a more complex object. Alternatively, you may want to investigate contextlib2, which aims to provide improved support for conditional cleanup in with statements. (in the current version, this is provided by contextlib2.ContextStack, but the next version will offer an improved API as contextlib2.CallbackStack. No current ETA on the next update though) Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From taleinat at gmail.com Wed Feb 29 15:07:19 2012 From: taleinat at gmail.com (Tal Einat) Date: Wed, 29 Feb 2012 16:07:19 +0200 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: References: Message-ID: On Wed, Feb 29, 2012 at 09:02, Craig Yoshioka wrote: > So I've recently been trying to implement something for which I had hoped the 'with' statement would be perfect, but it turns out, won't work because Python provides no mechanism by which to skip the block of code in a with statement. > > I want to create some functionality to make it easy too wrap command line programs in a caching architecture. ?To do this there are some things that need to happen before and after the wrapped CLI program is called, a try,except,finally version might look like this: > > def cachedcli(*args): > ? ?try: > ? ? ? ?hashedoutput = hashon(args) > ? ? ? ?if iscached(): > ? ? ? ? ? ?return hashedoutput > ? ? ? ?acquirelock() > ? ? ? ?cli(*args,hashedoutput) > ? ? ? ?iscached(True) > ? ? ? ?return hashedoutput > ? ?except AlreadyLocked: > ? ? ? ?while locked: > ? ? ? ? ? ?wait() > ? ? ? ?return example(*args) > ? ?finally: > ? ? ? ?releaselock() > > the 'with' version would look like > > def cachedcli(*args) > ? ?hashedpath = hashon(args) > ? ?with cacheon(hashedpath): > ? ? ? ? cli(hashedpath,*args) > ? ?return hashedpath > > > So obviously the 'with' statement would be a good fit, especially since non-python programmers might be wrapping their CLI programs... unfortunately I can't use 'with' because I can't find a clean way to make the with block code conditional. > > PEP377 suggested some mechanics that seemed a bit complicated for getting the desired effect, but I think, and correct me if I'm wrong, that the same effect could be achieved by having the __enter__ function raise a StopIteration that would be caught by the context and skip directly to the __exit__ function. ?The semantics of this even make some sense too me, since the closest I've been able to get to what I had hoped for was using an iterator to execute the appropriate code before and after the loop block: > > def cachedcli(*args) > ? ?hashedpath = hashon(args) > ? ?for _ in cacheon(hashedpath): > ? ? ? ? cli(hashedpath,*args) > ? ?return hashedpath > > this still seems non-ideal to me... Specifically with regard to caching, I recommend writing a CLI execution class which implements the caching logic internally. If you really want to do this with some special syntax sugar, use decorators, which are good for wrapping functions/methods with caching. The "with" statement is IMO not suitable here (and rightfully so). - Tal Einat From fuzzyman at gmail.com Wed Feb 29 15:24:30 2012 From: fuzzyman at gmail.com (Michael Foord) Date: Wed, 29 Feb 2012 14:24:30 +0000 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: References: Message-ID: On 29 February 2012 08:23, Nick Coghlan wrote: > On Wed, Feb 29, 2012 at 5:02 PM, Craig Yoshioka wrote: > > PEP377 suggested some mechanics that seemed a bit complicated for > getting the desired effect, but I think, and correct me if I'm wrong, that > the same effect could be achieved by having the __enter__ function raise a > StopIteration that would be caught by the context and skip directly to the > __exit__ function. > > It was the overhead of doing exception handling around the __enter__ > call that got PEP 377 rejected. > > One way to handle this case is to use a separate if statement to make > the flow control clear. > > with cm() as run_body: > if run_body: > # Do stuff > > Depending on the use case, the return value from __enter__ may be a > simple flag as shown, or it may be a more complex object. > The trouble with this is it indents all your code an extra level. One possibility would be allowing continue in a with statement as an early exit: with cm() as run_body: if not run_body: continue Michael > > Alternatively, you may want to investigate contextlib2, which aims to > provide improved support for conditional cleanup in with statements. > (in the current version, this is provided by contextlib2.ContextStack, > but the next version will offer an improved API as > contextlib2.CallbackStack. No current ETA on the next update though) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From craigyk at me.com Wed Feb 29 18:49:17 2012 From: craigyk at me.com (Craig Yoshioka) Date: Wed, 29 Feb 2012 17:49:17 +0000 (GMT) Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: Message-ID: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> I've tried classes, decorators, and passing the conditional using 'as', as suggested by Michael, ?so I disagree that with is not suitable here since I have yet to find a better alternative. ?If you want I can give pretty concrete examples in the ways they aren't as good. ?Furthermore, I think it could be argued that it makes more sense to be able to safely skip the with body without the user of the with statement having to manually catch the exception themselves.... we don't make people catch the StopIteration exception manually when using iterators... 1) ?I can't think of many instances in python where a block of code can not be conditionally executed safely: ?? ? if - obvious ?? ? functions - need to be called ?? ? loops - can have 0 or more iterations ?? ? try/except/finally - even here there is the same notion of the code blocks being conditionally executed, just a bit more scrambled in my view, the 'with' statement exists just because it is nice sugar for bracketing boilerplate around a block of code, so it might as well do that in the most general, reasonable way. ?And I think this behavior is pretty reasonable.?? ? On Feb 29, 2012, at 06:07 AM, Tal Einat wrote: On Wed, Feb 29, 2012 at 09:02, Craig Yoshioka wrote: > So I've recently been trying to implement something for which I had hoped the 'with' statement would be perfect, but it turns out, won't work because Python provides no mechanism by which to skip the block of code in a with statement. > > I want to create some functionality to make it easy too wrap command line programs in a caching architecture. ?To do this there are some things that need to happen before and after the wrapped CLI program is called, a try,except,finally version might look like this: > > def cachedcli(*args): > ? ?try: > ? ? ? ?hashedoutput = hashon(args) > ? ? ? ?if iscached(): > ? ? ? ? ? ?return hashedoutput > ? ? ? ?acquirelock() > ? ? ? ?cli(*args,hashedoutput) > ? ? ? ?iscached(True) > ? ? ? ?return hashedoutput > ? ?except AlreadyLocked: > ? ? ? ?while locked: > ? ? ? ? ? ?wait() > ? ? ? ?return example(*args) > ? ?finally: > ? ? ? ?releaselock() > > the 'with' version would look like > > def cachedcli(*args) > ? ?hashedpath = hashon(args) > ? ?with cacheon(hashedpath): > ? ? ? ? cli(hashedpath,*args) > ? ?return hashedpath > > > So obviously the 'with' statement would be a good fit, especially since non-python programmers might be wrapping their CLI programs... unfortunately I can't use 'with' because I can't find a clean way to make the with block code conditional. > > PEP377 suggested some mechanics that seemed a bit complicated for getting the desired effect, but I think, and correct me if I'm wrong, that the same effect could be achieved by having the __enter__ function raise a StopIteration that would be caught by the context and skip directly to the __exit__ function. ?The semantics of this even make some sense too me, since the closest I've been able to get to what I had hoped for was using an iterator to execute the appropriate code before and after the loop block: > > def cachedcli(*args) > ? ?hashedpath = hashon(args) > ? ?for _ in cacheon(hashedpath): > ? ? ? ? cli(hashedpath,*args) > ? ?return hashedpath > > this still seems non-ideal to me... Specifically with regard to caching, I recommend writing a CLI execution class which implements the caching logic internally. If you really want to do this with some special syntax sugar, use decorators, which are good for wrapping functions/methods with caching. The "with" statement is IMO not suitable here (and rightfully so). - Tal Einat -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Wed Feb 29 20:30:36 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Wed, 29 Feb 2012 11:30:36 -0800 Subject: [Python-ideas] doctest In-Reply-To: References: Message-ID: On Feb 17, 2012, at 1:57 PM, Mark Janssen wrote: > I find myself wanting to use doctest for some test-driven development, > and find myself slightly frustrated ISTM that you're doing it wrong ;-) Doctests are all about testing documentation, not about unittesting. And because they are very literal (in fact, intentionally stupid with respect to whitespace), doctests are inappropriate for test driven development. It is *much* easier to test the function by hand and then cut-and-paste the test/result pair into the docstring. Extending the doctest module to support your style of using it would likely be counter-productive as that would encourage more people to use the wrong tool for the job -- the doctest style is almost completely at odds with the principles of unittesting (i.e. isolated/independent tests, etc). My clients tend to use doctests quite a bit (that is what I teach), yet the need for doctest extensions almost never arises when it is being used as designed. I suggest that you try out some other third-party testing packages that are designed to accommodate other testing styles. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From craigyk at me.com Wed Feb 29 20:44:01 2012 From: craigyk at me.com (Craig Yoshioka) Date: Wed, 29 Feb 2012 11:44:01 -0800 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: <4F4E7CA8.7010109@stoneleaf.us> References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> <4F4E7CA8.7010109@stoneleaf.us> Message-ID: <9611190B-6909-4037-9A1D-2ED701D38E3C@me.com> Ok, I'll go clean them up to try and present them as concisely as possible. The code to skip the with body would have to go in the __enter__ method because wether the body should be executed is dependent on the semantics of the context being used. Imagine a context that looks like: with uncached('file') as file: # write data to file Making the context skippable only from __enter__ means the person writing the context can be more confident of the possible code paths. And the person writing the body code could always just 'skip' manually anyways by returning early, i.e. per Michael's suggestion. with uncached('file') as file: if not file: return which isn't so bad, except it is overloading the meaning of file a bit, and why shouldn't the with block be skippable? I can see a couple of ways it might work: 1) catch raised StopIteration, or a new 'SkipWithBlock', exception thrown from the __enter__ code 2) skip the with block when __enter__ returns a unique value like SkipWithBlock, otherwise assign the returned value using 'as' In my mind 2 should be easy? to implement, and shouldn't break any existing code since the new sentinel value didn't exist before anyways. Maybe it would be more efficient that also wrapping __enter__ in yet another try|except|finally? On Feb 29, 2012, at 11:29 AM, Ethan Furman wrote: > Craig Yoshioka wrote: >> I've tried classes, decorators, and passing the conditional using 'as', as suggested by Michael, so I disagree that with is not suitable here since I have yet to find a better alternative. If you want I can give pretty concrete examples in the ways they aren't as good. > > I would be interested in your concrete examples. > > As far as conditionally skipping the with body, where would that code go? In __enter__? How would it know whether or not to skip? > > ~Ethan~ From ethan at stoneleaf.us Wed Feb 29 20:29:44 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Feb 2012 11:29:44 -0800 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> Message-ID: <4F4E7CA8.7010109@stoneleaf.us> Craig Yoshioka wrote: > I've tried classes, decorators, and passing the conditional using 'as', > as suggested by Michael, so I disagree that with is not suitable here > since I have yet to find a better alternative. If you want I can give > pretty concrete examples in the ways they aren't as good. I would be interested in your concrete examples. As far as conditionally skipping the with body, where would that code go? In __enter__? How would it know whether or not to skip? ~Ethan~ From arnodel at gmail.com Wed Feb 29 21:29:43 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Wed, 29 Feb 2012 20:29:43 +0000 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> Message-ID: On 29 February 2012 17:49, Craig Yoshioka wrote: > I've tried classes, decorators, and passing the conditional using 'as', as > suggested by Michael, ?so I disagree that with is not suitable here since I > have yet to find a better alternative. ?If you want I can give pretty > concrete examples in the ways they aren't as good. ?Furthermore, I think it > could be argued that it makes more sense to be able to safely skip the with > body without the user of the with statement having to manually catch the > exception themselves.... >From PEP 343: But the final blow came when I read Raymond Chen's rant about flow-control macros[1]. Raymond argues convincingly that hiding flow control in macros makes your code inscrutable, and I find that his argument applies to Python as well as to C. So it is explicitly stated that the with statement should not be capable of controlling the flow. -- Arnaud [1] Raymond Chen's article on hidden flow control http://blogs.msdn.com/oldnewthing/archive/2005/01/06/347666.aspx From ethan at stoneleaf.us Wed Feb 29 20:55:30 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Feb 2012 11:55:30 -0800 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: <9611190B-6909-4037-9A1D-2ED701D38E3C@me.com> References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> <4F4E7CA8.7010109@stoneleaf.us> <9611190B-6909-4037-9A1D-2ED701D38E3C@me.com> Message-ID: <4F4E82B2.5050809@stoneleaf.us> Craig Yoshioka wrote: > Ok, I'll go clean them up to try and present them as concisely as possible. The code to skip the with body would have to go in the __enter__ method because wether the body should be executed is dependent on the semantics of the context being used. Imagine a context that looks like: > > with uncached('file') as file: > # write data to file > > Making the context skippable only from __enter__ means the person writing the context can be more confident of the possible code paths. And the person writing the body code could always just 'skip' manually anyways by returning early, i.e. per Michael's suggestion. > > with uncached('file') as file: > if not file: return Can you give an example of the code that would be in __enter__? ~Ethan~ From ncoghlan at gmail.com Wed Feb 29 22:11:39 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Mar 2012 07:11:39 +1000 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> Message-ID: On Thu, Mar 1, 2012 at 6:29 AM, Arnaud Delobelle wrote: > On 29 February 2012 17:49, Craig Yoshioka wrote: >> I've tried classes, decorators, and passing the conditional using 'as', as >> suggested by Michael, ?so I disagree that with is not suitable here since I >> have yet to find a better alternative. ?If you want I can give pretty >> concrete examples in the ways they aren't as good. ?Furthermore, I think it >> could be argued that it makes more sense to be able to safely skip the with >> body without the user of the with statement having to manually catch the >> exception themselves.... > > From PEP 343: > > ? ?But the final blow came when I read Raymond Chen's rant about > ? ?flow-control macros[1]. ?Raymond argues convincingly that hiding > ? ?flow control in macros makes your code inscrutable, and I find > ? ?that his argument applies to Python as well as to C. > > So it is explicitly stated that the with statement should not be > capable of controlling the flow. Indeed. Craig, if you want to pursue this to the extent of writing up a full PEP, I suggest starting with the idea I briefly wrote up a while ago [1]. Instead of changing the semantics of __enter__, add a new optional method __entered__ to the protocol that executes inside the with statement's implicit try/except block. That is (glossing over the complexities in the real with statement expansion), something roughly like: _exit = cm.__exit__ _entered = getattr(cm, "__entered__", None) _var = cm.__enter__() try: if _entered is not None: _var = _entered(_var) VAR = _var # if 'as' clause is present # with statement body finally: _exit(*sys.exc_info()) Then CM's would be free to skip directly from __entered__ to __exit__ by raising a custom exception. GeneratorContextManagers could similarly be updated to handle the case where the underlying generator doesn't yield. However, that last point highlights why I no longer like the idea: it makes it *really* easy to accidentally create CM's that, instead of throwing an exception if you try to reuse them inappropriately, will instead silently skip the with statement body. The additional expressiveness provided by such a construct is minimal, but the additional risk of incorrectly silencing errors is quite high - that's not a good trade-off for the overall language design. [1] http://readthedocs.org/docs/ncoghlan_devs-python-notes/en/latest/pep_ideas/skip_with.html Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia From barry at python.org Wed Feb 29 22:35:56 2012 From: barry at python.org (Barry Warsaw) Date: Wed, 29 Feb 2012 16:35:56 -0500 Subject: [Python-ideas] [OT] rst-mode (was: doctest) References: <20120228171414.3cc7e38a@limelight.wooz.org> <87aa42ph4d.fsf_-_@benfinney.id.au> Message-ID: <20120229163556.67d31009@resist.wooz.org> On Feb 29, 2012, at 12:23 PM, Ben Finney wrote: >Barry Warsaw writes: > >> Actually, Emacs users should use rst-mode, which has no so bad support >> for separate file doctests. Of course, the mode is useful for reST >> documentation even if your documentation is untested . > >Any idea where I should send bug reports for ?rst-mode?? It's not clear >to me who develops it. From the head of the file that I have in my personal elisp: ;;; rst.el --- Mode for viewing and editing reStructuredText-documents. ;; Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 ;; Free Software Foundation, Inc. ;; Maintainer: Stefan Merten ;; Author: Martin Blais , ;; David Goodger , ;; Wei-Wei Guo Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From craigyk at me.com Wed Feb 29 22:47:45 2012 From: craigyk at me.com (Craig Yoshioka) Date: Wed, 29 Feb 2012 13:47:45 -0800 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: <4F4E82B2.5050809@stoneleaf.us> References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> <4F4E7CA8.7010109@stoneleaf.us> <9611190B-6909-4037-9A1D-2ED701D38E3C@me.com> <4F4E82B2.5050809@stoneleaf.us> Message-ID: <3395104A-4BE9-4372-BC6F-23AB5EA89A85@me.com> On Feb 29, 2012, at 11:55 AM, Ethan Furman wrote: > From PEP 343: > > But the final blow came when I read Raymond Chen's rant about > flow-control macros[1]. Raymond argues convincingly that hiding > flow control in macros makes your code inscrutable, and I find > that his argument applies to Python as well as to C. > > So it is explicitly stated that the with statement should not be > capable of controlling the flow. > I read the rant, and I agree in principle, but I think it's also a far stretch to draw a line between a very confusing non-standard example of macros in C, and documentable behavior of a built-in statement. That is, the only reason you might say with would be hiding flow-control is because people don't currently expect it to. I also think that when people use non-builtin contextmanagers it's usually within a very specific... context (*dammit*), and so they are likely to look up why they are using an object as a context manager. That's where you would document the behavior: with uncached(path): # code here only executes if the path does not exist > Indeed. > > Craig, if you want to pursue this to the extent of writing up a full > PEP, I suggest starting with the idea I briefly wrote up a while ago > [1]. > > Instead of changing the semantics of __enter__, add a new optional > method __entered__ to the protocol that executes inside the with > statement's implicit try/except block. > > That is (glossing over the complexities in the real with statement > expansion), something roughly like: > > _exit = cm.__exit__ > _entered = getattr(cm, "__entered__", None) > _var = cm.__enter__() > try: > if _entered is not None: > _var = _entered(_var) > VAR = _var # if 'as' clause is present > # with statement body > finally: > _exit(*sys.exc_info()) that is an interesting alternative... do you see that as much better than __enter__ passing some sort of unique value to signal the skip? I can't say I'm enamored of doing it with a signal value, just thought it would be easier to implement (and not require more exception handling): _exit = cm.__exit__ _var = cm.__enter__() if _var == SkipWithBody: _exit(None,None,None) try: VAR = _var # if 'as' clause is present # with statement body finally: _exit(*sys.exc_info()) From ethan at stoneleaf.us Wed Feb 29 23:03:00 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Feb 2012 14:03:00 -0800 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: <3395104A-4BE9-4372-BC6F-23AB5EA89A85@me.com> References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> <4F4E7CA8.7010109@stoneleaf.us> <9611190B-6909-4037-9A1D-2ED701D38E3C@me.com> <4F4E82B2.5050809@stoneleaf.us> <3395104A-4BE9-4372-BC6F-23AB5EA89A85@me.com> Message-ID: <4F4EA094.30907@stoneleaf.us> Craig Yoshioka wrote: > On Feb 29, 2012, at 11:55 AM, Ethan Furman wrote: > >> From PEP 343: >> >> But the final blow came when I read Raymond Chen's rant about >> flow-control macros[1]. Raymond argues convincingly that hiding >> flow control in macros makes your code inscrutable, and I find >> that his argument applies to Python as well as to C. >> >> So it is explicitly stated that the with statement should not be >> capable of controlling the flow. >> > > I read the rant, and I agree in principle, but I think it's also a far stretch to draw a line between a very confusing non-standard example of macros in C, and documentable behavior of a built-in statement. That is, the only reason you might say with would be hiding flow-control is because people don't currently expect it to. I also think that when people use non-builtin contextmanagers it's usually within a very specific... context (*dammit*), and so they are likely to look up why they are using an object as a context manager. That's where you would document the behavior: > > with uncached(path): > # code here only executes if the path does not exist I am -1 on the idea. if / while / for / try are *always* flow control. Your proposal would have 'with' sometimes being flow control, and sometimes not, and the only way to know is look at the object's code and/or docs. This makes for a lot more complication for very little gain. ~Ethan~ From ironfroggy at gmail.com Wed Feb 29 23:23:05 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Wed, 29 Feb 2012 17:23:05 -0500 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: <4F4EA094.30907@stoneleaf.us> References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> <4F4E7CA8.7010109@stoneleaf.us> <9611190B-6909-4037-9A1D-2ED701D38E3C@me.com> <4F4E82B2.5050809@stoneleaf.us> <3395104A-4BE9-4372-BC6F-23AB5EA89A85@me.com> <4F4EA094.30907@stoneleaf.us> Message-ID: On Feb 29, 2012 4:56 PM, "Ethan Furman" wrote: > > Craig Yoshioka wrote: >> >> On Feb 29, 2012, at 11:55 AM, Ethan Furman wrote: >> >>> From PEP 343: >>> >>> But the final blow came when I read Raymond Chen's rant about >>> flow-control macros[1]. Raymond argues convincingly that hiding >>> flow control in macros makes your code inscrutable, and I find >>> that his argument applies to Python as well as to C. >>> >>> So it is explicitly stated that the with statement should not be >>> capable of controlling the flow. >>> >> >> I read the rant, and I agree in principle, but I think it's also a far stretch to draw a line between a very confusing non-standard example of macros in C, and documentable behavior of a built-in statement. That is, the only reason you might say with would be hiding flow-control is because people don't currently expect it to. I also think that when people use non-builtin contextmanagers it's usually within a very specific... context (*dammit*), and so they are likely to look up why they are using an object as a context manager. That's where you would document the behavior: >> >> with uncached(path): >> # code here only executes if the path does not exist > > > I am -1 on the idea. > > if / while / for / try are *always* flow control. > > Your proposal would have 'with' sometimes being flow control, and sometimes not, and the only way to know is look at the object's code and/or docs. This makes for a lot more complication for very little gain. > > ~Ethan~ > > I like the general idea, but a conditionally conditional control syntax is a readability nightmare., however, I wonder if the case in which the with statement act as a conditional could be explicit so a reader can distinguish between those that will always execute their body and those which may or may not. with cached(key): do_caching() else: update_exp(key) _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Feb 29 22:48:26 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 29 Feb 2012 13:48:26 -0800 Subject: [Python-ideas] revisit pep 377: good use case? In-Reply-To: <4F4E82B2.5050809@stoneleaf.us> References: <6264a70c-d8cf-49a1-85da-d7d3b40343c3@me.com> <4F4E7CA8.7010109@stoneleaf.us> <9611190B-6909-4037-9A1D-2ED701D38E3C@me.com> <4F4E82B2.5050809@stoneleaf.us> Message-ID: <4F4E9D2A.9080709@stoneleaf.us> Ethan Furman wrote: re-posting to list Craig Yoshioka wrote: > Here is what the context might look like: > > class Uncached(object): > def __init__(self,path): > self.path = path > self.lock = path + '.locked' > def __enter__(self): > if os.path.exists(self.path): > return SKipWithBlock # skips body goes straight to __exit__ > try: > os.close(os.open(self.lock,os.O_CREAT|os.O_EXCL|os.O_RDWR)) > except OSError as e: > if e.errno != errno.EEXIST: > raise > while os.path.exists(self.lock): > time.sleep(0.1) > return self.__enter__() > return self.path > def __exit__(self,et,ev,st): > if os.path.exists(self.lock): > os.unlink(self.lock) > > class Cache(object): > def __init__(self,*args,**kwargs): > self.base = os.path.join(CACHE_DIR,hashon(args,kwargs)) > #..... > def create(self,path): > return Uncached(os.path.join(self.base,path)) > #..... > > def cached(func): > def wrapper(*args,**kwargs): > cache = Cache(*args,**kwargs) > return func(cache,*args,**kwargs) > return wrapper > > --------------------------------------------------------------------- > Person using code: > --------------------------------------------------------------------- > > @cached > def createdata(cache,x): > path = cache.pathfor('output.data') > with cache.create(path) as cpath: > with open(cpath,'wb') as cfile: > cfile.write(x*10000) > return path > > pool.map(createdata,['x','x','t','x','t']) > > --------------------------------------------------------------------- > > so separate processes return the path to the cached data and create it > if it doesn't exist, and even wait if another process is working on > it. > > my collaborators could hopefully very easily wrap their programs with > minimal effort using the cleanest syntax possible, > and since inputs get hashed to consistent output paths for each > wrapped function, the wrapped functions can be easily combined, > chained, etc. and behind the scenes they are reusing as much work as > possible. > > Here are the current possible alternatives: > > 1. use the passed var as a flag, they must insert the if for every use > of the context, if not, then cached results get recomputed > > @cached > def createdata(cache,x): > path = cache.pathfor('output.data') > with cache.create(path) as cpath: > if not cpath: return > with open(cpath,'wb') as cfile: > cfile.write(x*10000) > return path > > 2. using the for loop and an iterator instead of a context, is more > fool-proof, but a bit confusing? > > @cached > def createdata(cache,x): > path = cache.pathfor('output.data') > for cpath in cache.create(path): > if not cpath: return > with open(cpath,'wb') as cfile: > cfile.write(x*10000) > return path > > 3. using a class the outputs and caching function need to be specified > separately so that calls can be scripted together, also a lot more > boilerplate: > > class createdata(CachedWrapper): > def outputs(self,x): > self.outputs += [self.cache.pathfor('output.data')] > def tocache(self,x): > with open(self.outputs[0],'wb') as cfile: > cfile.write(x*10000)