From 4kir4.1i at gmail.com Mon Sep 1 21:14:04 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Mon, 01 Sep 2014 23:14:04 +0400 Subject: [Python-ideas] Subprocess: Add an encoding argument References: Message-ID: <8738cbrw5f.fsf@gmail.com> Paul Moore writes: > I propose adding an "encoding" parameter to subprocess.Popen (and the > various wrapper routines) to allow specifying the actual encoding to > use. > > Obviously, you can simply wrap the binary streams yourself - the main > use for this facility would be in the higher level functions like > check_output and communicate. > > Does this seem like a reasonable suggestion? Could you provide examples how the final result could look like? For example, to read utf-8 encoded byte stream as a text with universal newline mode enabled: with (Popen(cmd, stdout=PIPE, bufsize=1) as p, TextIOWrapper(p.stdout, encoding='utf-8') as pipe): for line in pipe: process(line) Or the same, all at once: lines = check_output(cmd).decode('utf-8').splitlines() #XXX issue22232 -- Akira From p.f.moore at gmail.com Mon Sep 1 21:29:46 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 1 Sep 2014 20:29:46 +0100 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: <8738cbrw5f.fsf@gmail.com> References: <8738cbrw5f.fsf@gmail.com> Message-ID: On 1 September 2014 20:14, Akira Li <4kir4.1i at gmail.com> wrote: > Could you provide examples how the final result could look like? Do you mean what I'm proposing? p = Popen(..., encoding='utf-8') p.stdout is now a text stream assuming the data is in UTF8, rather than assuming it's in the default encoding. > For example, to read utf-8 encoded byte stream as a text with universal > newline mode enabled: > > with (Popen(cmd, stdout=PIPE, bufsize=1) as p, > TextIOWrapper(p.stdout, encoding='utf-8') as pipe): > for line in pipe: > process(line) That looks like sort of what I had in mind as a workaround. I hadn't tried it to confirm it worked, though. > Or the same, all at once: > > lines = check_output(cmd).decode('utf-8').splitlines() #XXX issue22232 Yes, essentially, although the need for an explicit decode feels a bit ugly to me... Paul From python at mrabarnett.plus.com Mon Sep 1 22:05:48 2014 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 01 Sep 2014 21:05:48 +0100 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: <8738cbrw5f.fsf@gmail.com> References: <8738cbrw5f.fsf@gmail.com> Message-ID: <5404D19C.1080501@mrabarnett.plus.com> On 2014-09-01 20:14, Akira Li wrote: > Paul Moore writes: > >> I propose adding an "encoding" parameter to subprocess.Popen (and the >> various wrapper routines) to allow specifying the actual encoding to >> use. >> >> Obviously, you can simply wrap the binary streams yourself - the main >> use for this facility would be in the higher level functions like >> check_output and communicate. >> >> Does this seem like a reasonable suggestion? > > Could you provide examples how the final result could look like? > > For example, to read utf-8 encoded byte stream as a text with universal > newline mode enabled: > > with (Popen(cmd, stdout=PIPE, bufsize=1) as p, > TextIOWrapper(p.stdout, encoding='utf-8') as pipe): > for line in pipe: > process(line) > You can parenthesise multiple context managers like that, and, anyway, I think it would be clearer as: with Popen(cmd, stdout=PIPE, bufsize=1) as p: for line in TextIOWrapper(p.stdout, encoding='utf-8'): process(line) > Or the same, all at once: > > lines = check_output(cmd).decode('utf-8').splitlines() #XXX issue22232 > From 4kir4.1i at gmail.com Mon Sep 1 22:08:58 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Tue, 02 Sep 2014 00:08:58 +0400 Subject: [Python-ideas] Subprocess: Add an encoding argument References: <8738cbrw5f.fsf@gmail.com> Message-ID: <87tx4rqf1h.fsf@gmail.com> Paul Moore writes: > On 1 September 2014 20:14, Akira Li > <4kir4.1i at gmail.com> wrote: >> Could you provide examples how the final result could look like? > > Do you mean what I'm proposing? > > p = Popen(..., encoding='utf-8') > p.stdout is now a text stream assuming the data is in UTF8, rather > than assuming it's in the default encoding. What if you want to specify an error handler e.g., to read a file list from `find -print0` -like program: you could pass errors='surrogateescape', newlines='\0' (issue1152248) to TextIOWrapper(p.stdin). Both errors and newlines can be different for stdin/stdout pipes. >> For example, to read utf-8 encoded byte stream as a text with universal >> newline mode enabled: >> >> with (Popen(cmd, stdout=PIPE, bufsize=1) as p, >> TextIOWrapper(p.stdout, encoding='utf-8') as pipe): >> for line in pipe: >> process(line) > > That looks like sort of what I had in mind as a workaround. I hadn't > tried it to confirm it worked, though. > >> Or the same, all at once: >> >> lines = check_output(cmd).decode('utf-8').splitlines() #XXX issue22232 > > Yes, essentially, although the need for an explicit decode feels a bit > ugly to me... > -- Akira From 4kir4.1i at gmail.com Mon Sep 1 22:33:38 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Tue, 02 Sep 2014 00:33:38 +0400 Subject: [Python-ideas] Subprocess: Add an encoding argument References: <8738cbrw5f.fsf@gmail.com> <5404D19C.1080501@mrabarnett.plus.com> Message-ID: <87ppffqdwd.fsf@gmail.com> MRAB writes: > On 2014-09-01 20:14, Akira Li wrote: >> Paul Moore writes: >> >>> I propose adding an "encoding" parameter to subprocess.Popen (and the >>> various wrapper routines) to allow specifying the actual encoding to >>> use. >>> >>> Obviously, you can simply wrap the binary streams yourself - the main >>> use for this facility would be in the higher level functions like >>> check_output and communicate. >>> >>> Does this seem like a reasonable suggestion? >> >> Could you provide examples how the final result could look like? >> >> For example, to read utf-8 encoded byte stream as a text with universal >> newline mode enabled: >> >> with (Popen(cmd, stdout=PIPE, bufsize=1) as p, >> TextIOWrapper(p.stdout, encoding='utf-8') as pipe): >> for line in pipe: >> process(line) >> > You can parenthesise multiple context managers like that, and, anyway, You mean: "can't". I know [1] [1] https://mail.python.org/pipermail/python-dev/2014-August/135769.html > I think it would be clearer as: > > with Popen(cmd, stdout=PIPE, bufsize=1) as p: > for line in TextIOWrapper(p.stdout, encoding='utf-8'): > process(line) > It is a habit to use the explicit with-statement for file-like objects. You are right -- it is not necessary in this case. Though with-statement forces file.close() in time and you don't need to consider carefully what happens if it is called by a garbage collector (if at all) at some indeterminate time in the future. -- Akira From abarnert at yahoo.com Mon Sep 1 22:37:00 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Sep 2014 13:37:00 -0700 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: <87tx4rqf1h.fsf@gmail.com> References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> Message-ID: <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> On Monday, September 1, 2014 1:10 PM, Akira Li <4kir4.1i at gmail.com> wrote: >Paul Moore writes: > >> On 1 September 2014 20:14, Akira Li >> <4kir4.1i at gmail.com> wrote: >>> Could you provide examples how the final result could look like? >> >> Do you mean what I'm proposing? >> >> p = Popen(..., encoding='utf-8') >> p.stdout is now a text stream assuming the data is in UTF8, rather >> than assuming it's in the default encoding. > >What if you want to specify an error handler e.g., to read a file list >from `find -print0` -like program: you could pass >errors='surrogateescape', newlines='\0' (issue1152248) to >TextIOWrapper(p.stdin). Presumably you either meant passing them to `TextIOWrapper(p.stdout)` for `find -print0`, or passing them to `TextIOWrapper(p.stdin)` for `xargs -0`; find doesn't even look at its input. >Both errors and newlines can be different for stdin/stdout pipes. This brings up a good point: having a single encoding, errors, and newlines set of parameters for Popen and the convenience functions implies that you want to pass the same ones to all pipes. But how often is that true? In your particular case, for `find -print0`, you want `newlines='\0'` on stdout, but not on stderr. For the convenience methods that's probably not an issue, because the only way to read both stdout and stderr is to reroute the latter to the former anyway. But even there, you might not necessarily want input and output to be the same?`xargs -0` is a perfect example of that. And, even forgetting #1152248, it's not hard to think of cases where you want input and output to be different. For example, I've got an old script that selects and cats a bunch of old Excel-format CSV files (in CP-1252, CRLF) off a file server, based on input data in native text files (which on my machine means UTF-8, LF). Using it with binary pipes is pretty easy, changing it to explicitly wrap each pipe in the appropriate `TextIOWrapper` would be easy, being able to pass an encoding and newline value to the Popen would be misleading? But as long as there are enough use cases for wanting to pass the same arguments for all pipes, I think the suggestion is OK. Especially considering that often you only want one pipe in the first place, which counts as a use case for passing the same arguments for all 1 pipe, right? (By the way, thanks for this reminder to finish testing and cleaning up that patch for #1152248?) From 4kir4.1i at gmail.com Mon Sep 1 22:53:42 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Tue, 02 Sep 2014 00:53:42 +0400 Subject: [Python-ideas] Subprocess: Add an encoding argument References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> Message-ID: <87lhq3qcyx.fsf@gmail.com> Andrew Barnert writes: > On Monday, September 1, 2014 1:10 PM, Akira Li <4kir4.1i at gmail.com> wrote: > >>Paul Moore writes: >> >>> On 1 September 2014 20:14, Akira Li >>> <4kir4.1i at gmail.com> wrote: >>>> Could you provide examples how the final result could look like? >>> >>> Do you mean what I'm proposing? >>> >>> p = Popen(..., encoding='utf-8') >>> p.stdout is now a text stream assuming the data is in UTF8, rather >>> than assuming it's in the default encoding. >> >>What if you want to specify an error handler e.g., to read a file list >>from `find -print0` -like program: you could pass >>errors='surrogateescape', newlines='\0' (issue1152248) to >>TextIOWrapper(p.stdin). > > Presumably you either meant passing them to `TextIOWrapper(p.stdout)` > for `find -print0`, or passing them to `TextIOWrapper(p.stdin)` for > xargs -0`; find doesn't even look at its input. > You are right. I've looked at 'surrogateescape' that means reading, that is associated with sys.stdin; so I wrote p.stdin instead of p.stdout by mistake. -- Akira From p.f.moore at gmail.com Mon Sep 1 23:15:28 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 1 Sep 2014 22:15:28 +0100 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> Message-ID: On 1 September 2014 21:37, Andrew Barnert wrote: > This brings up a good point: having a single encoding, errors, and newlines set of parameters for Popen and the convenience functions implies that you want to pass the same ones to all pipes. But how often is that true? My proposal was purely for encoding, and was prompted by the fact that the Windows default encoding does not support all of Unicode. Setting PYTHONIOENCODING to utf-8 for a Python subprocess allows handling of all of Unicode if you can set the subprocess channels' encoding to utf-8. As PYTHONIOENCODING affects all 3 channels, being able to set a single value for all 3 channels is sufficient for that use case. Setting newline and the error handler were *not* part of my original proposal, essentially because I know of no other way to force a subprocess to use anything other than the default encoding for the standard IO streams. Handling programs that are defined as using the standard streams for anything other than normal text (nul-terminated lines, explicitly defined non-default encodings) isn't something I have any examples of. The find -print0 example is out of scope, IMO, as newline handling is different from encoding. At some point, it becomes easier to manually wrap the streams rather than having huge numbers of parameters to the Popen constructor. I'll think some more on this... Paul From ncoghlan at gmail.com Tue Sep 2 15:25:33 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Sep 2014 23:25:33 +1000 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> Message-ID: On 2 September 2014 07:15, Paul Moore wrote: > The find -print0 example is out of scope, IMO, as newline handling is > different from encoding. At some point, it becomes easier to manually > wrap the streams rather than having huge numbers of parameters to the > Popen constructor. Don't forget Antoine's suggestion of creating a TextPopen subclass that wraps the streams as strict UTF-8 by default and allows the encoding and errors arguments to be either strings (affecting all pipes) or a dictionary mapping "stdin", "stdout" and "stderr" to individual settings. With that, the simple utf-8 example just becomes: with TextPopen(cmd, stdout=PIPE) as p: for line in p.stdout: process(line) > I'll think some more on this... For your torture test, consider the "iconv" (or "win_iconv") utility, which does encoding conversions, and how you might test that from a Python program without needing to do your own encoding and decoding, but instead let the subprocess module handle it for you :) (There's a flip side to that problem which is the question of *writing* an iconv utility in Python 3, and that's why there's an open RFE to support changing the encoding of an existing stream) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Tue Sep 2 15:43:23 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 2 Sep 2014 14:43:23 +0100 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> Message-ID: On 2 September 2014 14:25, Nick Coghlan wrote: > On 2 September 2014 07:15, Paul Moore wrote: >> The find -print0 example is out of scope, IMO, as newline handling is >> different from encoding. At some point, it becomes easier to manually >> wrap the streams rather than having huge numbers of parameters to the >> Popen constructor. > > Don't forget Antoine's suggestion of creating a TextPopen subclass > that wraps the streams as strict UTF-8 by default and allows the > encoding and errors arguments to be either strings (affecting all > pipes) or a dictionary mapping "stdin", "stdout" and "stderr" to > individual settings. > > With that, the simple utf-8 example just becomes: > > with TextPopen(cmd, stdout=PIPE) as p: > for line in p.stdout: > process(line) I'd not forgotten that, but it doesn't help for the -print0 case, which is about using nul as a line ending, and not about encodings. I'm going to carefully avoid getting sucked into that open issue here, and stick to only considering encodings :-) >> I'll think some more on this... > > For your torture test, consider the "iconv" (or "win_iconv") utility, > which does encoding conversions, and how you might test that from a > Python program without needing to do your own encoding and decoding, > but instead let the subprocess module handle it for you :) That's another good use case for this functionality. Paul From tjreedy at udel.edu Tue Sep 2 23:29:32 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 02 Sep 2014 17:29:32 -0400 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> Message-ID: On 9/2/2014 9:25 AM, Nick Coghlan wrote: > On 2 September 2014 07:15, Paul Moore wrote: >> The find -print0 example is out of scope, IMO, as newline handling is >> different from encoding. At some point, it becomes easier to manually >> wrap the streams rather than having huge numbers of parameters to the >> Popen constructor. > > Don't forget Antoine's suggestion of creating a TextPopen subclass I would expect something call Textxxx to present with (text) strings, not bytes. > that wraps the streams as strict UTF-8 by default and allows the But this implies to me that I would still get (encoded) bytes. > encoding and errors arguments to be either strings (affecting all > pipes) or a dictionary mapping "stdin", "stdout" and "stderr" to > individual settings. What I would want is automatic conversion of strings to encoded bytes on send to the pipe and automatic reconersion of encoded bytes to strings on received. For that, there is little reason I can think of to use anything other than utf-8 > With that, the simple utf-8 example just becomes: > > with TextPopen(cmd, stdout=PIPE) as p: > for line in p.stdout: > process(line) Would type(line) be str or bytes? -- Terry Jan Reedy From ncoghlan at gmail.com Wed Sep 3 00:20:17 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 3 Sep 2014 08:20:17 +1000 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> Message-ID: On 3 Sep 2014 07:30, "Terry Reedy" wrote: > > On 9/2/2014 9:25 AM, Nick Coghlan wrote: >> >> On 2 September 2014 07:15, Paul Moore wrote: >>> >>> The find -print0 example is out of scope, IMO, as newline handling is >>> different from encoding. At some point, it becomes easier to manually >>> wrap the streams rather than having huge numbers of parameters to the >>> Popen constructor. >> >> >> Don't forget Antoine's suggestion of creating a TextPopen subclass > > > I would expect something call Textxxx to present with (text) strings, not bytes. Exactly. >> that wraps the streams as strict UTF-8 by default and allows the > > > But this implies to me that I would still get (encoded) bytes. I'm not sure how that follows - TextPopen is making the assumption *because* it is providing a str based API, and thus needs to know the appropriate text encoding details. >> encoding and errors arguments to be either strings (affecting all >> pipes) or a dictionary mapping "stdin", "stdout" and "stderr" to >> individual settings. > > > What I would want is automatic conversion of strings to encoded bytes on send to the pipe and automatic reconersion of encoded bytes to strings on received. For that, there is little reason I can think of to use anything other than utf-8 Still plenty of other applications that use other encodings (and as I suggested to Paul, the real stress test for any proposed API is using it to call iconv to do an encoding conversion). >> With that, the simple utf-8 example just becomes: >> >> with TextPopen(cmd, stdout=PIPE) as p: >> for line in p.stdout: >> process(line) > > > Would type(line) be str or bytes? str, otherwise this wouldn't be any different to the existing Popen behaviour. Cheers, Nick. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Wed Sep 3 01:43:11 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Wed, 03 Sep 2014 03:43:11 +0400 Subject: [Python-ideas] Subprocess: Add an encoding argument References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> Message-ID: <8761h5r3lc.fsf@gmail.com> Paul Moore writes: > On 1 September 2014 21:37, Andrew Barnert > wrote: >> This brings up a good point: having a single encoding, errors, and >> newlines set of parameters for Popen and the convenience functions >> implies that you want to pass the same ones to all pipes. But how >> often is that true? > > My proposal was purely for encoding, and was prompted by the fact that > the Windows default encoding does not support all of Unicode. Setting > PYTHONIOENCODING to utf-8 for a Python subprocess allows handling of > all of Unicode if you can set the subprocess channels' encoding to > utf-8. As PYTHONIOENCODING affects all 3 channels, being able to set a > single value for all 3 channels is sufficient for that use case. > > Setting newline and the error handler were *not* part of my original > proposal, essentially because I know of no other way to force a > subprocess to use anything other than the default encoding for the > standard IO streams. Handling programs that are defined as using the > standard streams for anything other than normal text (nul-terminated > lines, explicitly defined non-default encodings) isn't something I > have any examples of. > > The find -print0 example is out of scope, IMO, as newline handling is > different from encoding. At some point, it becomes easier to manually > wrap the streams rather than having huge numbers of parameters to the > Popen constructor. > > I'll think some more on this... PYTHONIOENCODING allows to specify the error handler e.g., to avoid exceptions while reading list of files: $ ls | PYTHONIOENCODING=:surrogateescape python3 -c 'import sys; print(list(sys.stdin))' Or the same but with TextPopen suggested by Antoine: with TextPopen(['ls'], stdout=PIPE, ioencoding=':surrogateescape') as p: for filename in p.stdout: process(filename) os.fsencode(filename) would get original bytes. Note: ioencoding parameter is my interpretation. -- Akira From p.f.moore at gmail.com Wed Sep 3 09:02:47 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 3 Sep 2014 08:02:47 +0100 Subject: [Python-ideas] Subprocess: Add an encoding argument In-Reply-To: <8761h5r3lc.fsf@gmail.com> References: <8738cbrw5f.fsf@gmail.com> <87tx4rqf1h.fsf@gmail.com> <1409603820.24466.YahooMailNeo@web181001.mail.ne1.yahoo.com> <8761h5r3lc.fsf@gmail.com> Message-ID: 3 September 2014 00:43, Akira Li <4kir4.1i at gmail.com> wrote: > PYTHONIOENCODING allows to specify the error handler e.g., to avoid > exceptions while reading list of files: Thanks, I hadn't realised that. Paul From norman at kaapstorm.com Thu Sep 4 19:46:13 2014 From: norman at kaapstorm.com (Norman Hooper) Date: Thu, 04 Sep 2014 19:46:13 +0200 Subject: [Python-ideas] My apologies Message-ID: <5408A565.5060602@kaapstorm.com> I completely missed the same proposal made by Anthony Towns in March this year. Kind regards Norman From norman at kaapstorm.com Thu Sep 4 19:35:06 2014 From: norman at kaapstorm.com (Norman Hooper) Date: Thu, 04 Sep 2014 19:35:06 +0200 Subject: [Python-ideas] Proposal: New syntax for OrderedDict, maybe built-in Message-ID: <5408A2CA.9070408@kaapstorm.com> Hi there, I'd like to propose that the OrderedDict get a more readable syntax, the way the syntax for set changed from set(['spam', 'ham', 'eggs']) to {'spam', 'ham', 'eggs'} when it became a built-in type in Python 2.4. The way set is unordered like the keys of a dict, similarly list is ordered like the keys of an OrderedDict. In that sense, what {'ham', 'eggs'} is to {'ham': 'spam', 'eggs': 'spam'}, so ['ham', 'eggs'] is to ... my proposal of a cleaner OrderedDict syntax ... ['ham': 'spam', 'eggs': 'spam']. So a "dict" is like a "set" (hence curly braces) of key-value pairs. And an "OrderedDict" is like a "list" (hence square braces) of key-value pairs. (Of course, I am ignoring how lists can have duplicate items. An ordered set would carry the analogy further, but it wouldn't help illustrate my syntax proposal.) I find "['ham': 'spam', 'eggs': 'spam']"" unambiguous, and more readable than "OrderedDict([('ham', 'spam'), ('eggs', 'spam')])". I work with OrderedDict a lot, because JSON represents an OrderedDict and I need to work with JSON a lot. In an attempt to make the OrderedDicts that I work with more readable, I wrote a module to parse a string representation of an OrderedDict that uses my proposed syntax. You can find it here: https://github.com/kaapstorm/listdict With the ubiquity of JSON, it may also be time to promote OrderedDict to a built-in type too. Kind regards, Norman From dw+python-ideas at hmmz.org Thu Sep 4 20:00:44 2014 From: dw+python-ideas at hmmz.org (David Wilson) Date: Thu, 4 Sep 2014 18:00:44 +0000 Subject: [Python-ideas] Proposal: New syntax for OrderedDict, maybe built-in In-Reply-To: <5408A2CA.9070408@kaapstorm.com> References: <5408A2CA.9070408@kaapstorm.com> Message-ID: <20140904180044.GA15424@k2> On Thu, Sep 04, 2014 at 07:35:06PM +0200, Norman Hooper wrote: > I work with OrderedDict a lot, because JSON represents an OrderedDict > and I need to work with JSON a lot. ECMAScript: 4.3.3 Object An object is a member of the type Object. It is an unordered collection of properties each of which contains a primitive value, object, or function. A function stored in a property of an object is called a method. JSON: 1. Introduction An object is an unordered collection of zero or more name/value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array. > With the ubiquity of JSON, it may also be time to promote OrderedDict > to a built-in type too. Neither JSON objects nor JavaScript object properties preserve enumeration order. This is a common misconception since implementations tend to preserve order when the number of keys is small, since they may use a more efficient internal representation in that case, whose enumeration order depends on order the properties were defined in. David From norman at kaapstorm.com Thu Sep 4 20:11:10 2014 From: norman at kaapstorm.com (Norman Hooper) Date: Thu, 04 Sep 2014 20:11:10 +0200 Subject: [Python-ideas] Proposal: New syntax for OrderedDict, maybe built-in In-Reply-To: <20140904180044.GA15424@k2> References: <5408A2CA.9070408@kaapstorm.com> <20140904180044.GA15424@k2> Message-ID: <5408AB3E.904@kaapstorm.com> Hi, On 09/04/2014 08:00 PM, David Wilson wrote: > On Thu, Sep 04, 2014 at 07:35:06PM +0200, Norman Hooper wrote: > >> I work with OrderedDict a lot, because JSON represents an OrderedDict >> and I need to work with JSON a lot. > ECMAScript: > > 4.3.3 Object > An object is a member of the type Object. It is an unordered > collection of properties each of which contains a primitive value, > object, or function. A function stored in a property of an object is > called a method. > > JSON: > > 1. Introduction > An object is an unordered collection of zero or more name/value > pairs, where a name is a string and a value is a string, number, > boolean, null, object, or array. > You are absolutely right. And as you point out ... > implementations > tend to preserve order when the number of keys is small, since they may > use a more efficient internal representation in that case, whose > enumeration order depends on order the properties were defined in. I am not aware of any implementations (although I'm also not aware of all implementations) that do not preserve order. So (a bit like natural languages) one needs to conform with how it is used, not how it is defined, if one wants to be fully understood. Kind regards, Norman From kn0m0n3 at gmail.com Thu Sep 4 20:14:02 2014 From: kn0m0n3 at gmail.com (Jason Bursey) Date: Thu, 4 Sep 2014 13:14:02 -0500 Subject: [Python-ideas] Proposal: New syntax for OrderedDict, maybe built-in In-Reply-To: <5408AB3E.904@kaapstorm.com> References: <5408A2CA.9070408@kaapstorm.com> <20140904180044.GA15424@k2> <5408AB3E.904@kaapstorm.com> Message-ID: Any word on when Richard will be releasing a fsf/gnu friendly cell phone? thanks, Jason s.o.f.t. www.leap.cc On Thu, Sep 4, 2014 at 1:11 PM, Norman Hooper wrote: > Hi, > > > On 09/04/2014 08:00 PM, David Wilson wrote: > >> On Thu, Sep 04, 2014 at 07:35:06PM +0200, Norman Hooper wrote: >> >> I work with OrderedDict a lot, because JSON represents an OrderedDict >>> and I need to work with JSON a lot. >>> >> ECMAScript: >> >> 4.3.3 Object >> An object is a member of the type Object. It is an unordered >> collection of properties each of which contains a primitive value, >> object, or function. A function stored in a property of an object is >> called a method. >> >> JSON: >> >> 1. Introduction >> An object is an unordered collection of zero or more name/value >> pairs, where a name is a string and a value is a string, number, >> boolean, null, object, or array. >> >> > You are absolutely right. And as you point out ... > > > implementations >> tend to preserve order when the number of keys is small, since they may >> use a more efficient internal representation in that case, whose >> enumeration order depends on order the properties were defined in. >> > > I am not aware of any implementations (although I'm also not aware of all > implementations) that do not preserve order. > > So (a bit like natural languages) one needs to conform with how it is > used, not how it is defined, if one wants to be fully understood. > > Kind regards, > > Norman > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dw+python-ideas at hmmz.org Thu Sep 4 20:18:53 2014 From: dw+python-ideas at hmmz.org (David Wilson) Date: Thu, 4 Sep 2014 18:18:53 +0000 Subject: [Python-ideas] Proposal: New syntax for OrderedDict, maybe built-in In-Reply-To: <5408AB3E.904@kaapstorm.com> References: <5408A2CA.9070408@kaapstorm.com> <20140904180044.GA15424@k2> <5408AB3E.904@kaapstorm.com> Message-ID: <20140904181853.GA16389@k2> On Thu, Sep 04, 2014 at 08:11:10PM +0200, Norman Hooper wrote: > I am not aware of any implementations (although I'm also not aware of all > implementations) that do not preserve order. $ jsc >>> function keys(o) { ... var k = []; ... for(var key in o) k.push(key); ... return k; ... } undefined >>> keys({"5": 1, "4": 1}) 4,5 David From skip at pobox.com Thu Sep 4 20:24:38 2014 From: skip at pobox.com (Skip Montanaro) Date: Thu, 4 Sep 2014 13:24:38 -0500 Subject: [Python-ideas] Proposal: New syntax for OrderedDict, maybe built-in In-Reply-To: <20140904181853.GA16389@k2> References: <5408A2CA.9070408@kaapstorm.com> <20140904180044.GA15424@k2> <5408AB3E.904@kaapstorm.com> <20140904181853.GA16389@k2> Message-ID: On Thu, Sep 4, 2014 at 1:18 PM, David Wilson wrote: > $ jsc > >>> function keys(o) { > ... var k = []; > ... for(var key in o) k.push(key); > ... return k; > ... } > undefined > >>> keys({"5": 1, "4": 1}) > 4,5 > I love a simple existence proof! Can we now discard the presumed equivalence of OrderDict and JS objects? Besides, even if JS and Python were the only two languages we cared about, JSON is supported in many other languages. I'd be really surprised if all of them guaranteed key order. If fact, I wouldn't be surprised if none of them did. Skip -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Sat Sep 6 15:53:55 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Sat, 06 Sep 2014 17:53:55 +0400 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() Message-ID: <878ulw6ej0.fsf@gmail.com> Python uses "seconds since the epoch" term to describe time.time() return value. POSIX term is "seconds since the Epoch" (notice the capitalization) where Epoch is 1970-01-01 00:00:00+00:00. C99 term is "calendar time" -- the encoding of the calendar time returned by the time() function is unspecified. Python documentation defines `epoch` as: The :dfn:`epoch` is the point where the time starts. On January 1st of that year, at 0 hours, the "time since the epoch" is zero. For Unix, the epoch is 1970. To find out what the epoch is, look at ``gmtime(0)``. time module documentation specifies calendar.timegm() as the inverse of time.gmtime() while timegm() uses the fixed 1970 Epoch instead of gmtime(0) epoch. datetime.astimezone() (local_timezone()) passes Unix timestamp [1970] to time.localtime() that may expect timestamp with different epoch [gmtime(0)]. email.util.mktime_tz() uses both mktime() [gmtime(0)] and timegm() [1970]. mailbox.py uses both time.time() [gmtime(0)] and timegm() [1970]. http.cookiejar uses both EPOCH_YEAR=1970 and datetime.utcfromtimestamp() [gmtime(0) epoch] for "seconds since epoch". It seems 1970 Epoch is used for file times on Windows (os.stat()) but os.path.getatime() refers to "seconds since epoch" [gmtime(0) epoch]. date{,time}.{,utc}fromtimestamp(), datetime.timestamp() docs equates "POSIX timestamp" [1970 Epoch] and time.time()'s returned value [gmtime(0) epoch]. datetime.timestamp() is inconsistent if gmtime(0) is not 1970. It uses mktime() for naive datetime objects [gmtime(0) epoch]. But it uses POSIX Epoch for aware datetime objects. Correct me if I'm wrong here. --- Possible fixes: Say in the `epoch` definition that stdlib doesn't support gmtime(0).tm_year != 1970. OR don't use mktime() if 1970 Epoch is used e.g., create an aware datetime object in the local timezone instead and use it to compute the result with 1970 Epoch. OR add *analog* of TZ=UTC time.mktime() and use it in stdlib where necessary. Looking at previous attempts (e.g., [1], [2]) to implement timegm(), the problem seems over-constrained. A different name could be used, to avoid wrong expectations e.g., datetime could use `(aware_datetime_object - gmtime0_epoch) // sec` [1] http://bugs.python.org/issue6280, [2] http://bugs.python.org/issue1667546 OR set EPOCH_YEAR=gmtime(0).tm_year instead of 1970 in calendar.timegm(). It may break backward compatibility if there is a system with non-1970 epoch. Deal on a case-by-case basis with other places where 1970 Epoch is used. And drop "POSIX timestamp" [1970 Epoch] and use "seconds since the epoch" [gmtime(0) epoch] in the datetime documentation. Change internal EPOCH year accordingly. What is Python-ideas opinion about it? -- Akira From guido at python.org Sat Sep 6 19:06:25 2014 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Sep 2014 10:06:25 -0700 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() In-Reply-To: <878ulw6ej0.fsf@gmail.com> References: <878ulw6ej0.fsf@gmail.com> Message-ID: There used to be systems with a different notion of epoch. Are there still such systems around? OSX has the UNIX epoch -- what's it gmtime(0) on Windows? On Sat, Sep 6, 2014 at 6:53 AM, Akira Li <4kir4.1i at gmail.com> wrote: > Python uses "seconds since the epoch" term to describe time.time() > return value. POSIX term is "seconds since the Epoch" (notice the > capitalization) where Epoch is 1970-01-01 00:00:00+00:00. C99 term is > "calendar time" -- the encoding of the calendar time returned by the > time() function is unspecified. > > Python documentation defines `epoch` as: > > The :dfn:`epoch` is the point where the time starts. On January 1st > of that year, at 0 hours, the "time since the epoch" is zero. For > Unix, the epoch is 1970. To find out what the epoch is, look at > ``gmtime(0)``. > > time module documentation specifies calendar.timegm() as the inverse of > time.gmtime() while timegm() uses the fixed 1970 Epoch instead of > gmtime(0) epoch. > > datetime.astimezone() (local_timezone()) passes Unix timestamp [1970] to > time.localtime() that may expect timestamp with different epoch > [gmtime(0)]. > > email.util.mktime_tz() uses both mktime() [gmtime(0)] and timegm() [1970]. > > mailbox.py uses both time.time() [gmtime(0)] and timegm() [1970]. > > http.cookiejar uses both EPOCH_YEAR=1970 and datetime.utcfromtimestamp() > [gmtime(0) epoch] for "seconds since epoch". > > It seems 1970 Epoch is used for file times on Windows (os.stat()) but > os.path.getatime() refers to "seconds since epoch" [gmtime(0) epoch]. > > date{,time}.{,utc}fromtimestamp(), datetime.timestamp() docs equates > "POSIX timestamp" [1970 Epoch] and time.time()'s returned value > [gmtime(0) epoch]. > > datetime.timestamp() is inconsistent if gmtime(0) is not 1970. It uses > mktime() for naive datetime objects [gmtime(0) epoch]. But it > uses POSIX Epoch for aware datetime objects. > > Correct me if I'm wrong here. > > --- > Possible fixes: > > Say in the `epoch` definition that stdlib doesn't support > gmtime(0).tm_year != 1970. > > OR don't use mktime() if 1970 Epoch is used e.g., create an aware > datetime object in the local timezone instead and use it to compute the > result with 1970 Epoch. > > OR add *analog* of TZ=UTC time.mktime() and use it in stdlib where > necessary. Looking at previous attempts (e.g., [1], [2]) to implement > timegm(), the problem seems over-constrained. A different name could be > used, to avoid wrong expectations e.g., datetime could use > `(aware_datetime_object - gmtime0_epoch) // sec` > > [1] http://bugs.python.org/issue6280, > [2] http://bugs.python.org/issue1667546 > > OR set EPOCH_YEAR=gmtime(0).tm_year instead of 1970 in > calendar.timegm(). It may break backward compatibility if there is a > system with non-1970 epoch. Deal on a case-by-case basis with other > places where 1970 Epoch is used. And drop "POSIX timestamp" [1970 > Epoch] and use "seconds since the epoch" [gmtime(0) epoch] in the > datetime documentation. Change internal EPOCH year accordingly. > > What is Python-ideas opinion about it? > > > -- > Akira > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sat Sep 6 19:28:09 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 06 Sep 2014 18:28:09 +0100 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() In-Reply-To: References: <878ulw6ej0.fsf@gmail.com> Message-ID: On 06/09/2014 18:06, Guido van Rossum wrote: > There used to be systems with a different notion of epoch. Are there > still such systems around? OSX has the UNIX epoch -- what's it gmtime(0) > on Windows? > >>> time.gmtime(0) time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From python at mrabarnett.plus.com Sat Sep 6 19:32:01 2014 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 06 Sep 2014 18:32:01 +0100 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() In-Reply-To: References: <878ulw6ej0.fsf@gmail.com> Message-ID: <540B4511.60000@mrabarnett.plus.com> On 2014-09-06 18:06, Guido van Rossum wrote: > There used to be systems with a different notion of epoch. Are there > still such systems around? OSX has the UNIX epoch -- what's it gmtime(0) > on Windows? > Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AM D64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> time.gmtime(0) time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec= 0, tm_wday=3, tm_yday=1, tm_isdst=0) >>> [snip] From random832 at fastmail.us Sat Sep 6 23:56:03 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sat, 06 Sep 2014 17:56:03 -0400 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() In-Reply-To: References: <878ulw6ej0.fsf@gmail.com> Message-ID: <1410040563.2792555.164455181.137FA6D8@webmail.messagingengine.com> On Sat, Sep 6, 2014, at 13:06, Guido van Rossum wrote: > There used to be systems with a different notion of epoch. Are there still > such systems around? OSX has the UNIX epoch -- what's it gmtime(0) on > Windows? If you call the time.h functions from the CRT library, they use 1970 (and always have). Windows has _other_ functions that use a different epoch (1600, if I remember correctly), but they're native win32 functions not called directly by python. The wrinkle you get on windows is that most of the functions don't work with *negative* time_t values (or struct tm values representing dates before 1970), and/or some other arbitrary cutoff dates. In particular, time.localtime gives an OSError on negative values, but time.gmtime gives an OSError on values below -43200, and both give errors if passed a value representing a year above 2999, and strftime does not accept years above 9999. But, as I've advocated before, there's no fundamental reason that python should chain itself to the C library's underlying implementation rather than defining its own time functions that do always use an epoch of 1970, and handle leap seconds consistently, have unlimited range, etc. From guido at python.org Sun Sep 7 00:19:44 2014 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Sep 2014 15:19:44 -0700 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() In-Reply-To: <1410040563.2792555.164455181.137FA6D8@webmail.messagingengine.com> References: <878ulw6ej0.fsf@gmail.com> <1410040563.2792555.164455181.137FA6D8@webmail.messagingengine.com> Message-ID: On Sat, Sep 6, 2014 at 2:56 PM, wrote: > On Sat, Sep 6, 2014, at 13:06, Guido van Rossum wrote: > > There used to be systems with a different notion of epoch. Are there > still > > such systems around? OSX has the UNIX epoch -- what's it gmtime(0) on > > Windows? > > If you call the time.h functions from the CRT library, they use 1970 > (and always have). Windows has _other_ functions that use a different > epoch (1600, if I remember correctly), but they're native win32 > functions not called directly by python. > > The wrinkle you get on windows is that most of the functions don't work > with *negative* time_t values (or struct tm values representing dates > before 1970), and/or some other arbitrary cutoff dates. In particular, > time.localtime gives an OSError on negative values, but time.gmtime > gives an OSError on values below -43200, and both give errors if passed > a value representing a year above 2999, and strftime does not accept > years above 9999. > > But, as I've advocated before, there's no fundamental reason that python > should chain itself to the C library's underlying implementation rather > than defining its own time functions that do always use an epoch of > 1970, and handle leap seconds consistently, have unlimited range, etc. > I'm fine with that, as long as "handle leap seconds consistently" means "pretend they don't exist" (which is necessary for compatibility with POSIX). But it sounds like a big coding project. Fixing the docs to be consistent and correctly describe the current implementation may be simpler. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Sep 7 02:35:30 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Sep 2014 17:35:30 -0700 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() In-Reply-To: References: <878ulw6ej0.fsf@gmail.com> <1410040563.2792555.164455181.137FA6D8@webmail.messagingengine.com> Message-ID: <540BA852.4070706@stoneleaf.us> On 09/06/2014 03:19 PM, Guido van Rossum wrote: > On Sat, Sep 6, 2014 at 2:56 PM, random832 wrote: >> >> But, as I've advocated before, there's no fundamental reason that python >> should chain itself to the C library's underlying implementation rather >> than defining its own time functions that do always use an epoch of >> 1970, and handle leap seconds consistently, have unlimited range, etc. > > I'm fine with that, as long as "handle leap seconds consistently" means "pretend they don't exist" (which is necessary > for compatibility with POSIX). But it sounds like a big coding project. Fixing the docs to be consistent and correctly > describe the current implementation may be simpler. Or at least fix the docs until the new project is done, and then we can fix them again. :) -- ~Ethan~ From 4kir4.1i at gmail.com Sun Sep 7 20:05:11 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Sun, 07 Sep 2014 22:05:11 +0400 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() References: <878ulw6ej0.fsf@gmail.com> <1410040563.2792555.164455181.137FA6D8@webmail.messagingengine.com> Message-ID: <87y4tvjoh4.fsf@gmail.com> random832 at fastmail.us writes: > On Sat, Sep 6, 2014, at 13:06, Guido van Rossum wrote: >> There used to be systems with a different notion of epoch. Are there still >> such systems around? OSX has the UNIX epoch -- what's it gmtime(0) on >> Windows? > > If you call the time.h functions from the CRT library, they use 1970 > (and always have). Windows has _other_ functions that use a different > epoch (1600, if I remember correctly), but they're native win32 > functions not called directly by python. python does call these functions, look at the code that uses FILETIME type. For example, os.utime() uses the hardcoded 1970 Epoch on Windows and converts it to FILETIME (1601 epoch in 100s of nanoseconds) to call SetFileTime(). In this case, using 1970 Epoch is justified. gmtime(0) is documented as "midnight (00:00:00), January 1, 1970, coordinated universal time (UTC)" on Windows. -- Akira From 4kir4.1i at gmail.com Mon Sep 8 00:08:00 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Mon, 08 Sep 2014 02:08:00 +0400 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() References: <878ulw6ej0.fsf@gmail.com> <1410040563.2792555.164455181.137FA6D8@webmail.messagingengine.com> Message-ID: <87oaurjd8f.fsf@gmail.com> Guido van Rossum writes: > On Sat, Sep 6, 2014 at 2:56 PM, > wrote: > >> On Sat, Sep 6, 2014, at 13:06, Guido van Rossum wrote: >> > There used to be systems with a different notion of epoch. Are there >> still >> > such systems around? OSX has the UNIX epoch -- what's it gmtime(0) on >> > Windows? >> >> If you call the time.h functions from the CRT library, they use 1970 >> (and always have). Windows has _other_ functions that use a different >> epoch (1600, if I remember correctly), but they're native win32 >> functions not called directly by python. >> >> The wrinkle you get on windows is that most of the functions don't work >> with *negative* time_t values (or struct tm values representing dates >> before 1970), and/or some other arbitrary cutoff dates. In particular, >> time.localtime gives an OSError on negative values, but time.gmtime >> gives an OSError on values below -43200, and both give errors if passed >> a value representing a year above 2999, and strftime does not accept >> years above 9999. >> >> But, as I've advocated before, there's no fundamental reason that python >> should chain itself to the C library's underlying implementation rather >> than defining its own time functions that do always use an epoch of >> 1970, and handle leap seconds consistently, have unlimited range, etc. >> > > I'm fine with that, as long as "handle leap seconds consistently" means > "pretend they don't exist" (which is necessary for compatibility with > POSIX). But it sounds like a big coding project. Fixing the docs to be > consistent and correctly describe the current implementation may be > simpler. datetime doesn't support leap seconds. That part is already implemented ;) I've submitted a documentation patch that explicitly mentions the assumption about 1970 epoch -- https://bugs.python.org/issue22356 Does it make sense to submit other patches that only matter if the epoch is not 1970? --- Unrelated: it is possible to convert between POSIX time and UTC without knowing leap seconds: posix_time = (utc_time - datetime(1970, 1, 1)) // timedelta(seconds=1) It is the exact relation. TAI is one bisect() call away from UTC if the list of leap seconds (such as provided by the tz database (used in pep 431)) is known. It is available on all modern OSes (including Windows [1]). [1] http://www.iana.org/time-zones/repository/tz-link.html -- Akira From kerncece at gmail.com Mon Sep 8 16:49:55 2014 From: kerncece at gmail.com (Kernc) Date: Mon, 8 Sep 2014 16:49:55 +0200 Subject: [Python-ideas] Have ConfigParser handle sectionless config files by default Message-ID: Hi all, I recently opened this bug report [1] stating that ConfigParser not handling configuration files without sections (or those that have options above first section definition) is an issue. ConfigParser* should be able to parse "POSIX" config files (key=value, no sections). I'd like to lobby even for it being the default behavior without requiring a strict=False argument. If interested, please have a look at the bug report for some further arguments. Thanks, [1]: http://bugs.python.org/issue22253 * It's called ConfigParser, not IniParser. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lkb.teichmann at gmail.com Tue Sep 9 19:36:35 2014 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Tue, 9 Sep 2014 19:36:35 +0200 Subject: [Python-ideas] yield from __setitem__ and __setattr__ Message-ID: Hi list, I am a happy user of yield from, especially while also using asyncio. A great feature is that one can seamlessly convert synchronous code into asynchronous code by adding the appropriate yield froms and coroutine statements at the proper place, including special methods like __getitem__, as in "yield from something[7]". But we cannot yield from the special methods __setattr__ and __setitem__. That is unfortunate, as I am programming a proxy object for something remote over the network, and also __setitem__ might take some time. So I got the idea that yield from might also be used on the left side of an assigment, as in yield from o[3], yield from o.something = "a", "b" which should be equivalent to yield from o.__setitem__(3, "a") yield from o.__setattr__("something", "b") I know that with this proposal I am probably far up on the list of most ugly syntax proposed ever, so maybe someone has a better idea. As a side note: with asyncio I am using yield from a lot, and I think it is a pity that it is so long an clumsy a keyword-couple. Couln't it be shortened? I'm thinking about just using "from", especially in an asyncio context, "from sleep(3)" sounds bad to me, but still better than "yield from sleep(3)". Or maybe we could have an auto-yielder, by just declaring that every expression that evaluates to a coroutine is automatically yielded from, as long as the calling function is a coroutine itself. Greetings Martin -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 9 19:48:30 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Sep 2014 10:48:30 -0700 Subject: [Python-ideas] yield from __setitem__ and __setattr__ In-Reply-To: References: Message-ID: There are a number of places where special syntax and yield-from don't mix well. (Another case is a generator to yield items to a consumer and the corresponding for-loop.) I think it's too soon to start inventing new syntax to address these; instead, I strongly recommend changing your APIs to use explicit method calls instead of the special syntax you so favor. Even __getitem__() returning a Future feels awkward to me (unless it's clear that the container you have in front of you is a container full of Futures). Regarding yield-from itself being somewhat awkward, I agree. (I like C#'s "await" keyword for this purpose.) But again I would like to establish asyncio as the de-factor standard for asynchronous work before trying to tweak the language more -- introducing a new keyword is a pretty heavy-handed change due to the way it invalidates all previous uses of the keyword. (Bare "from" cannot be used, since it would be confused with the "from ... import ..." statement.) On Tue, Sep 9, 2014 at 10:36 AM, Martin Teichmann wrote: > Hi list, > > I am a happy user of yield from, especially while also using asyncio. > A great feature is that one can seamlessly convert synchronous code > into asynchronous code by adding the appropriate yield froms and > coroutine statements at the proper place, including special methods > like __getitem__, as in "yield from something[7]". > > But we cannot yield from the special methods __setattr__ and > __setitem__. That is unfortunate, as I am programming a proxy object > for something remote over the network, and also __setitem__ might > take some time. So I got the idea that yield from might also be used > on the left side of an assigment, as in > > yield from o[3], yield from o.something = "a", "b" > > which should be equivalent to > > yield from o.__setitem__(3, "a") > yield from o.__setattr__("something", "b") > > I know that with this proposal I am probably far up on the list of most > ugly syntax proposed ever, so maybe someone has a better idea. > > > As a side note: with asyncio I am using yield from a lot, and I think it > is a pity that it is so long an clumsy a keyword-couple. Couln't it be > shortened? I'm thinking about just using "from", especially in an asyncio > context, "from sleep(3)" sounds bad to me, but still better than "yield > from sleep(3)". > Or maybe we could have an auto-yielder, by just declaring that every > expression that evaluates to a coroutine is automatically yielded from, > as long as the calling function is a coroutine itself. > > Greetings > > Martin > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.lasher at gmail.com Wed Sep 10 09:04:23 2014 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 10 Sep 2014 00:04:23 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 Message-ID: Why did the CPython core developers decide to force the display of ASCII characters in the printable representation of bytes objects in CPython 3? For example >>> import struct >>> # In go bytes for four floats: >>> my_packed_bytes = struct.pack('ffff', 3.544294848931151e-12, 1.853266900760489e+25, 1.6215185358725202e-19, 0.9742483496665955) >>> # And out comes a speciously human-readable representation of those bytes >>> my_packed_bytes b'Why, Guido? Why?' >>> >>> # But it's just an illusion; it's truly bytes underneath! >>> a_reasonable_representation = bytes((0x57, 0x68, 0x79, 0x2c, 0x20, 0x47, 0x75, 0x69, 0x64, 0x6f, 0x3f, 0x20, 0x57, 0x68, 0x79, 0x3f)) >>> my_packed_bytes == a_reasonable_reperesentation True >>> >>> this_also_seems_reasonable = b'\x57\x68\x79\x2c\x20\x47\x75\x69\x64\x6f\x3f\x20\x57\x68\x79\x3f' >>> my_packed_bytes == this_also_seems_reasonable True I understand bytes literals were brought in to Python 3 to aid the transition from Python 2 to Python 3 [1], but this did not imply that `repr()` on a bytes object ought to display bytes mapping to ASCII characters as ASCII characters. I have not yet found a PEP describing why this decision was made. I am now seeking to put forth a PEP to change printable representation of bytes to be simple, consistent, and easy to understand. The current behavior printing of elements of bytes with a mapping to printable ASCII characters as those characters seems to violate multiple tenants of the Zen of Python [2] * "Explicit is better than implicit." This display happens without the user's explicit request. * "Simple is better than complex." The printable representation of bytes is complex, surprising, and unintuitive: Elements of bytes shall be displayed as their hexadecimal value, unless such a value maps to a printable ASCII character, in which case, the character shall be displayed instead of the hexadecimal value. The underlying values of each element, however, are always integers. The printable representation of an element of a byte will always be an integer representation. The simple thing is to show the hex value for every byte, unconditionally. * "Special cases aren't special enough to break the rules." Implicit decoding of bytes to ASCII characters comes in handy only some of the time. * "In the face of ambiguity, refuse the temptation to guess." Python is guessing that I want to see some bytes as ASCII characters. In the example above, though, what I gave Python was bytes from four floating point numbers. * "There should be one-- and preferably only one --obvious way to do it." `bytes.decode('ascii', errors='backslashreplace')` already provides users the means to display ASCII characters among bytes, as a real string. To be fair, there are two tenants of the Zen of Python that support the implicit display of ASCII characters in bytes: * "Readability counts." * "Although practicality beats purity." In counterargument, though, I would say that the extra readability and practicality are only served boosted in special cases (which are not special enough). Much ado was (and continues to be) raised over Python 3 enforcing distinction between (Unicode) strings and bytes. A lot of this resentment comes from Python programmers who do not yet appreciate the difference between bytes and text?, or from those who remain apathetic and prefer Python 2's it-works-'til-it-doesn't strings. This implicit displaying of ASCII characters in bytes ends up conflating the two data types even deeper in novice programmers' minds. In the example above, `my_packed_bytes` looks like a string. It reads like a string. But it is not a string. The ASCII characters are a lie, as evidenced when trying to access elements of a bytes instance: >>> b'Why, Guido? Why?'[0] 87 >>> # Oh, perhaps you were expecting b'W'? I find this behavior harmful to Python 3 advocacy, and novices and those accustomed to Python 2 find this yet another deterrent in the way of Python 3 adoption. I would like to gauge the feasibility of a PEP to change the printable representation of bytes in CPython 3 to display all elements by their hexadecimal values, and only by their hexadecimal values. Thanks, Chris L. ? I write this as someone who, himself, didn't appreciate nor understand the difference between bytes, strings, and Unicode. I have Ned Batchelder [3] to thank and his illuminating "Pragmatic Unicode" presentation [4] for getting me on the right path. [1]: http://legacy.python.org/dev/peps/pep-3112/#rationale [2]: http://legacy.python.org/dev/peps/pep-0020/ [3]: http://nedbatchelder.com/ [4]: http://nedbatchelder.com/text/unipain.html From ncoghlan at gmail.com Wed Sep 10 09:34:34 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Sep 2014 17:34:34 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: Message-ID: On 10 September 2014 17:04, Chris Lasher wrote: > Why did the CPython core developers decide to force the display of > ASCII characters in the printable representation of bytes objects in > CPython 3? Primarily because it's incredibly useful for debugging ASCII based binary formats (which covers many network protocols and file formats). Early (pre-release) versions of Python 3.0 didn't have this behaviour, and getting the raw integer dumps instead turned out to be *really* annoying in practice, so we decided the easier debugging justified the increased risk of creating incorrect mental models for users (especially those migrating from Python 2). The recently updated docs for binary sequences hopefully do a better job of explaining this "binary data with ASCII compatible segments" favouritism in their description of the bytes and bytearray methods: https://docs.python.org/3/library/stdtypes.html#bytes-and-bytearray-operations (until a couple of months ago, those methods weren't documented separately, which I agree must have been incredibly confusing). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Wed Sep 10 09:36:32 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 10 Sep 2014 09:36:32 +0200 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: Message-ID: <540FFF80.70709@egenix.com> On 10.09.2014 09:04, Chris Lasher wrote: > Why did the CPython core developers decide to force the display of > ASCII characters in the printable representation of bytes objects in > CPython 3? This wasn't forced. It's a simple consequence of turning the Python 2 8-bit string type into the Python 3 bytes type while keeping breakage to a pain level which doesn't have Python users skip Python 3 entirely ;-) Seriously, it doesn't help being purist over concepts that are used in a very pragmatic way in every day (programmer's) life. Even when being binary data, most such binary strings do contain encoded text characters and being able to quickly identify those as such helps in debugging, working with the data and writing it down in form of literals. A definite -1 from me on making repr(b"Hello World") harder to read than necessary. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 10 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-09-19: PyCon UK 2014, Coventry, UK ... 9 days to go 2014-09-27: PyDDF Sprint 2014 ... 17 days to go 2014-09-30: Python Meeting Duesseldorf ... 20 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From cory at lukasa.co.uk Wed Sep 10 09:42:32 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 10 Sep 2014 08:42:32 +0100 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: Message-ID: On 10 September 2014 08:04, Chris Lasher wrote: > Why did the CPython core developers decide to force the display of > ASCII characters in the printable representation of bytes objects in > CPython 3? I'd argue this is symptomatic of something that got mentioned in the lengthy discussions around PEP 461: namely, that Python's bytestrings are really still very stringy. For example, they retain their 'upper' method, which is so totally bizarre in the context of bytes that it causes me to mentally segfault every time I see it: >>> a = b'hi there' >>> a.upper() b'HI THERE' As Nick mentioned, this is fundamentally because of protocols like HTTP/1.1, which are a weird hybrid of text-based and binary that is only simple if you assume ASCII everywhere. (Of course, HTTP does not assume ASCII everywhere, but that's because it's wildly underspecified). I doubt you'll get far with this proposal on this list, which is a shame because I think you have a point. There is an impedance mismatch between the Python community saying "Bytes are not text" and the fact that, wow, they really do look like they are sometimes! For what it's worth, Nick has made this comment: > Primarily because it's incredibly useful for debugging ASCII based > binary formats (which covers many network protocols and file formats). This is true, but it goes both ways: it makes it a lot *harder* to debug pure-binary network formats (like HTTP/2). I basically have to have an ASCII codepage in front of me to debug using the printed representation of a bytestring because I keep getting characters thrown into my nice hex output. Sadly, you can't please everyone. From ncoghlan at gmail.com Wed Sep 10 09:43:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Sep 2014 17:43:50 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <540FFF80.70709@egenix.com> References: <540FFF80.70709@egenix.com> Message-ID: On 10 September 2014 17:36, M.-A. Lemburg wrote: > On 10.09.2014 09:04, Chris Lasher wrote: >> Why did the CPython core developers decide to force the display of >> ASCII characters in the printable representation of bytes objects in >> CPython 3? > > This wasn't forced. It's a simple consequence of turning the Python 2 > 8-bit string type into the Python 3 bytes type while keeping breakage > to a pain level which doesn't have Python users skip Python 3 entirely ;-) I believe you may be forgetting the pre-release period where there wasn't an immutable bytes types at all. It wasn't until PEP 3137 [1] was implemented that we got to the status quo for Python 3. Cheers, Nick. P.S. I haven't forgotten my promise to try to put together a recipe for a cleaner wrapper around "memoryview(data).cast('c')", but it may be a while before I get back to the idea. [1] http://www.python.org/dev/peps/pep-3137/ -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Sep 10 09:45:38 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Sep 2014 17:45:38 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: Message-ID: On 10 September 2014 17:42, Cory Benfield wrote: > For what it's worth, Nick has made this comment: > >> Primarily because it's incredibly useful for debugging ASCII based >> binary formats (which covers many network protocols and file formats). > > This is true, but it goes both ways: it makes it a lot *harder* to > debug pure-binary network formats (like HTTP/2). I basically have to > have an ASCII codepage in front of me to debug using the printed > representation of a bytestring because I keep getting characters > thrown into my nice hex output. Sadly, you can't please everyone. memoryview.cast can be a potentially useful tool for that :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From cory at lukasa.co.uk Wed Sep 10 09:56:21 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 10 Sep 2014 08:56:21 +0100 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: Message-ID: On 10 September 2014 08:45, Nick Coghlan wrote: > memoryview.cast can be a potentially useful tool for that :) Sure, and so can binascii.hexlify (which is what I normally use). I'm not saying I can't debug my binary data, that would be a pretty weird complaint. I'm saying that I don't get to do debugging with a simple print statement when using the bytes type to do actual binary work, while those who are doing sort-of binary work do. I'm not actually personally bothered here: I do plenty of HTTP/1.1 work as well where this 'feature' is useful, and I'm experienced enough that I know what I'm getting into and I can work around it. My point is more that this adds further confusion to the notion that 'bytes are not text', when much of the language seems to go out of its way to pretend that they are in fact ASCII-encoded text. This is probably going to get wildly off-topic, so if you'd like to continue this chat Nick we should either take it off-list or to a new thread. =) From mal at egenix.com Wed Sep 10 10:01:18 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 10 Sep 2014 10:01:18 +0200 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <540FFF80.70709@egenix.com> Message-ID: <5410054E.6070103@egenix.com> On 10.09.2014 09:43, Nick Coghlan wrote: > On 10 September 2014 17:36, M.-A. Lemburg wrote: >> On 10.09.2014 09:04, Chris Lasher wrote: >>> Why did the CPython core developers decide to force the display of >>> ASCII characters in the printable representation of bytes objects in >>> CPython 3? >> >> This wasn't forced. It's a simple consequence of turning the Python 2 >> 8-bit string type into the Python 3 bytes type while keeping breakage >> to a pain level which doesn't have Python users skip Python 3 entirely ;-) > > I believe you may be forgetting the pre-release period where there > wasn't an immutable bytes types at all. It wasn't until PEP 3137 [1] > was implemented that we got to the status quo for Python 3. Oh, I do know. That was a path which was luckily quickly abandoned as default bytes type :-) Note that we now have PyByteArray C APIs in Python 3 for bytearray objects. PyBytes C APIs are (mostly) the Python 2 PyString C APIs - unlike what's listed in the PEP. > Cheers, > Nick. > > P.S. I haven't forgotten my promise to try to put together a recipe > for a cleaner wrapper around "memoryview(data).cast('c')", but it may > be a while before I get back to the idea. > > [1] http://www.python.org/dev/peps/pep-3137/ > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 10 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-09-19: PyCon UK 2014, Coventry, UK ... 9 days to go 2014-09-27: PyDDF Sprint 2014 ... 17 days to go 2014-09-30: Python Meeting Duesseldorf ... 20 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tjreedy at udel.edu Wed Sep 10 10:11:28 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 10 Sep 2014 04:11:28 -0400 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: Message-ID: I agree with Chris Lasher's basic point, that the representation of bytes confusingly contradicts the idea that bytes are bytes. But it is not going to change. On 9/10/2014 3:56 AM, Cory Benfield wrote: > On 10 September 2014 08:45, Nick Coghlan wrote: >> memoryview.cast can be a potentially useful tool for that :) > > Sure, and so can binascii.hexlify (which is what I normally use). See http://bugs.python.org/issue9951 to add bytes.hex or .tohex as more of less the inverse of bytes.fromhex or even have hex(bytes) work. This change *is* possible and I think we should pick one of the suggestions for 3.5. -- Terry Jan Reedy From lkb.teichmann at gmail.com Wed Sep 10 11:40:34 2014 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Wed, 10 Sep 2014 11:40:34 +0200 Subject: [Python-ideas] yield from __setitem__ and __setattr__ In-Reply-To: References: Message-ID: Hi Guido, Hi List > I think it's too soon to start inventing new syntax to address these; instead, I strongly > recommend changing your APIs to use explicit method calls instead of the special > syntax you so favor. Even __getitem__() returning a Future feels awkward to me > (unless it's clear that the container you have in front of you is a container full of Futures). > [...] > But again I would like to establish asyncio as the de-factor standard for asynchronous work > before trying to tweak the language more I agree that first we need to settle a bit and look how everything works, but hey, this is python-ideas, the list for "discussing speculative language ideas". Limiting all APIs to just use explicit method calls unfortunately leads to a very Java-like language that misses all the cool features of python. And this is not limited only to low-level stuff, but also high level interfaces. Right now I am actually working on a distributed framework, where users which are not professional programmers are supposed to write their own plugins. My boss is constantly bugging me "all that yield from stuff looks complicated, can't you just put it in a function the user calls so they don't see it anymore?" Well, as you know, I cannot. Even in the highest level of abstraction, every function that needs to do I/O has to be yielded from. Imagine the client side of a Big Data service like google earth, what syntax do you prefer, "yield from map.get_section(long1, long2, long_step, lat1, lat2, lat_step)", or simply "yield from map[long1:long2:long_step, lat1:lat2:lat_step]"? Until now, a yield from was just a rather rare case used only on special occasions. But once asyncio becomes more wide spread, it will simply be everywhere. It will be so ubiquitous that I even thought it should not be spelled out anymore at all - every expression evaluating to a coroutine should automatically be yielded from. But that would actually kill the advantange of this construct - the cooperative multitasking we are doing with yield from. The programmer of a piece of code would not know anymore where other tasks step in, so all the problems with race conditions re-appear. Given that it will be used very often in the future, I thought we could even use a special character to represent the yield from, say, a $. Then code like the following would be possible, please don't hate me for the Perl-look: for a, $b, c in some_iterator(): $ob.foo = $calculate(a + b, c) which would be equivalent to for a, _b, c in some_iterator(): b = yield from _b yield from ob.__setattr__("foo", yield from calculate(a + b, c)) But I guess with that I am just way too speculative... Greetings Martin From steve at pearwood.info Wed Sep 10 12:57:32 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 10 Sep 2014 20:57:32 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: Message-ID: <20140910105732.GE9293@ando.pearwood.info> On Wed, Sep 10, 2014 at 12:04:23AM -0700, Chris Lasher wrote: [...] > I would like to gauge the feasibility of a PEP to change the printable > representation of bytes in CPython 3 to display all elements by their > hexadecimal values, and only by their hexadecimal values. I'm very sympathetic to this "purity" approach. I too consider it a shame that the repr of byte-strings in Python 3 pretends to be ASCII-ish[1], regardless of the source of the bytes. Alas, not only do we have backward compatibility to consider -- there are now five versions of Python 3 where bytes display as ASCII -- but practicality as well. There are many use-cases where human-readable ASCII bytes are embedded inside otherwise binary bytes. To my regret, I don't think purity arguments are strong enough to justify a change. However, I do support Terry's suggestion that bytes (and, I presume, bytearray) grow some sort of easy way of displaying the bytes in hex. The trouble is, what do we actually want? b'Abc' --> '0x416263' b'Abc' --> '\x41\x62\x63' I can see use-cases for both. After less than two minutes of thought, it seems to me that perhaps the most obvious APIs for these two different representations are: hex(b'Abc') --> '0x416263' b'Abc'.decode('hexescapes') --> '\x41\x62\x63' [1] They're not *strictly* ASCII, since ASCII doesn't support ordinal values above 127. -- Steven From wolfgang.maier at biologie.uni-freiburg.de Wed Sep 10 13:54:17 2014 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Wed, 10 Sep 2014 13:54:17 +0200 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140910105732.GE9293@ando.pearwood.info> References: <20140910105732.GE9293@ando.pearwood.info> Message-ID: On 09/10/2014 12:57 PM, Steven D'Aprano wrote: > On Wed, Sep 10, 2014 at 12:04:23AM -0700, Chris Lasher wrote: > > [...] >> I would like to gauge the feasibility of a PEP to change the printable >> representation of bytes in CPython 3 to display all elements by their >> hexadecimal values, and only by their hexadecimal values. > > I'm very sympathetic to this "purity" approach. I too consider it a > shame that the repr of byte-strings in Python 3 pretends to be > ASCII-ish[1], regardless of the source of the bytes. Alas, not only do > we have backward compatibility to consider -- there are now five versions > of Python 3 where bytes display as ASCII -- but practicality as well. > There are many use-cases where human-readable ASCII bytes are embedded > inside otherwise binary bytes. To my regret, I don't think purity > arguments are strong enough to justify a change. > > However, I do support Terry's suggestion that bytes (and, I presume, > bytearray) grow some sort of easy way of displaying the bytes in hex. > The trouble is, what do we actually want? > > b'Abc' --> '0x416263' > b'Abc' --> '\x41\x62\x63' > > I can see use-cases for both. After less than two minutes of thought, it > seems to me that perhaps the most obvious APIs for these two different > representations are: > > hex(b'Abc') --> '0x416263' This would require a change in the documented (https://docs.python.org/3/library/functions.html#hex) behavior of hex(), which I think is quite a big deal for a relatively special case. > b'Abc'.decode('hexescapes') --> '\x41\x62\x63' This, OTOH, looks elegant (avoids a new method) and clear (no doubt about the returned type) to me. +1 Wolfgang From ncoghlan at gmail.com Wed Sep 10 15:43:25 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Sep 2014 23:43:25 +1000 Subject: [Python-ideas] yield from __setitem__ and __setattr__ In-Reply-To: References: Message-ID: On 10 Sep 2014 19:41, "Martin Teichmann" wrote: > > Hi Guido, Hi List > > > I think it's too soon to start inventing new syntax to address these; instead, I strongly > > recommend changing your APIs to use explicit method calls instead of the special > > syntax you so favor. Even __getitem__() returning a Future feels awkward to me > > (unless it's clear that the container you have in front of you is a container full of Futures). > > [...] > > But again I would like to establish asyncio as the de-factor standard for asynchronous work > > before trying to tweak the language more > > I agree that first we need to settle a bit and look how everything > works, but hey, this is python-ideas, the list for "discussing speculative > language ideas". > > Limiting all APIs to just use explicit method calls unfortunately > leads to a very Java-like language that misses all the cool features > of python. And this is not limited only to low-level stuff, but also > high level interfaces. Right now I am actually working on a > distributed framework, where users which are not professional > programmers are supposed to write their own plugins. My boss > is constantly bugging me "all that yield from stuff looks complicated, > can't you just put it in a function the user calls so they > don't see it anymore?" Well, as you know, I cannot. Even in the > highest level of abstraction, every function that needs to do I/O > has to be yielded from. The local visibility of the asynchronous flow control is deliberate. Folks that don't want that behaviour to be visible in user level code have the option of using gevent as a synchronous to asynchronous adapter to provide a developer experience that is closer to the pre-emptive multithreading many folks are used to working with. For a couple of longer discussions of some of the trade-offs involved in adopting that more implicit approach, you may want to take a look at my post at http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programming.html as well as Glyph's at https://glyph.twistedmatrix.com/2014/02/unyielding.html Regards, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Sep 10 15:50:52 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Sep 2014 23:50:52 +1000 Subject: [Python-ideas] yield from __setitem__ and __setattr__ In-Reply-To: References: Message-ID: On 10 Sep 2014 23:43, "Nick Coghlan" wrote: > > The local visibility of the asynchronous flow control is deliberate. My apologies, I initially missed the part of your email where you made it clear you already understood those benefits. In that case, I agree that a shorter (perhaps even symbolic) spelling may make sense some day, but also agree with Guido that it's way too soon to be making further changes in this area. asyncio hasn't even graduated from provisional status yet :) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From graffatcolmingov at gmail.com Wed Sep 10 16:24:46 2014 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Wed, 10 Sep 2014 09:24:46 -0500 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> Message-ID: On Wed, Sep 10, 2014 at 6:54 AM, Wolfgang Maier wrote: >> I can see use-cases for both. After less than two minutes of thought, it >> seems to me that perhaps the most obvious APIs for these two different >> representations are: >> >> hex(b'Abc') --> '0x416263' > > > This would require a change in the documented > (https://docs.python.org/3/library/functions.html#hex) behavior of hex(), > which I think is quite a big deal for a relatively special case. I agree that we should leave hex alone. >> b'Abc'.decode('hexescapes') --> '\x41\x62\x63' > > > This, OTOH, looks elegant (avoids a new method) and clear (no doubt about > the returned type) to me. > +1 Another +0.5 for me. I think this is quite elegant and reasonable. I'm not sure it needs to be unicode though. Perhaps it's too early for me, but does turning that into a unicode string make sense? From rosuav at gmail.com Wed Sep 10 16:30:00 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Sep 2014 00:30:00 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> Message-ID: On Thu, Sep 11, 2014 at 12:24 AM, Ian Cordasco wrote: >>> b'Abc'.decode('hexescapes') --> '\x41\x62\x63' >> >> >> This, OTOH, looks elegant (avoids a new method) and clear (no doubt about >> the returned type) to me. >> +1 > > Another +0.5 for me. I think this is quite elegant and reasonable. I'm > not sure it needs to be unicode though. Perhaps it's too early for me, > but does turning that into a unicode string make sense? It's becoming text. What other type makes more sense than a text string? ChrisA From p.f.moore at gmail.com Wed Sep 10 16:37:03 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Sep 2014 15:37:03 +0100 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> Message-ID: On 10 September 2014 15:24, Ian Cordasco wrote: >>> b'Abc'.decode('hexescapes') --> '\x41\x62\x63' >> >> >> This, OTOH, looks elegant (avoids a new method) and clear (no doubt about >> the returned type) to me. >> +1 > > Another +0.5 for me. I think this is quite elegant and reasonable. I'm > not sure it needs to be unicode though. Perhaps it's too early for me, > but does turning that into a unicode string make sense? It's easy enough to do by hand: >>> print(''.join("\\x{:02x}".format(c) for c in b'Abc')) \x41\x62\x63 And you get any other format you like, just by changing the format string in there, or the string you join on: >>> print(':'.join("{:02x}".format(c) for c in b'Abc')) 41:62:63 Not every one-liner needs to be a builtin... Paul From barry at python.org Wed Sep 10 16:48:30 2014 From: barry at python.org (Barry Warsaw) Date: Wed, 10 Sep 2014 10:48:30 -0400 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 References: Message-ID: <20140910104830.5b53279f@anarchist.wooz.org> On Sep 10, 2014, at 08:42 AM, Cory Benfield wrote: >I doubt you'll get far with this proposal on this list, which is a >shame because I think you have a point. There is an impedance mismatch >between the Python community saying "Bytes are not text" and the fact >that, wow, they really do look like they are sometimes! That's the nature of wire protocols - they're like quantum particles, exhibiting both bytes-like and string-like behavior. You can't look too closely, and they have spooky action at a distance too. For the email protocols at least, you also have mind-crushing singularities. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From cory at lukasa.co.uk Wed Sep 10 17:05:17 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 10 Sep 2014 16:05:17 +0100 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140910104830.5b53279f@anarchist.wooz.org> References: <20140910104830.5b53279f@anarchist.wooz.org> Message-ID: On 10 September 2014 15:48, Barry Warsaw wrote: > That's the nature of wire protocols - they're like quantum particles, > exhibiting both bytes-like and string-like behavior. You can't look too > closely, and they have spooky action at a distance too. For the email > protocols at least, you also have mind-crushing singularities. > Well, it's the nature of *many* wire protocols. Binary protocols are increasing in popularity at the moment, because it turns out that "kinda-text-like" wire protocols are a nightmare to parse correctly. Thus, the Python decision is great for SMTP and HTTP/1.1, and infuriating for things like HTTP/2. From stephen at xemacs.org Wed Sep 10 18:59:52 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 11 Sep 2014 01:59:52 +0900 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140910104830.5b53279f@anarchist.wooz.org> References: <20140910104830.5b53279f@anarchist.wooz.org> Message-ID: <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > On Sep 10, 2014, at 08:42 AM, Cory Benfield wrote: > > >I doubt you'll get far with this proposal on this list, which is a > >shame because I think you have a point. There is an impedance mismatch > >between the Python community saying "Bytes are not text" and the fact > >that, wow, they really do look like they are sometimes! So does 0xDEADBEEF, but actually that's *not* text, it's a 32-bit pointer, conveniently invalid on most 32-bit architectures and very obvious when it shows up in a backtrace. Do you see an impedence mismatch in the C community because of that? In fact, *all* bytes "look like text", because *you can't see them until they're converted to text by repr()*! This is the key to the putative "impedence mismatch" -- it's perceived as such when people don't distinguish the map from the territory. The issue that sometimes it's easier to read hex than ASCII mixed with other stuff (hex escapes or Latin-1) is true enough, though. But it's not about an impedence mismatch, it's a question of what does *this* developer consider to be the convenient repr for *that* task. I just don't see hex-based use cases coming close to being as important as the convenience for those cases where the structure being imposed on some bytes is partly derived from English. The current default repr is, I believe, the right default repr. That doesn't mean that it would be a terrible idea to provide other reprs in the stdlib (although it is after all a one-liner!) > That's the nature of wire protocols - they're like quantum particles, > exhibiting both bytes-like and string-like behavior. I find the analogy picturesque but unconvincing. Wire protocols are punctuated *by design* with European (mostly English) words, acronyms, and abbreviations, because (a) it's convenient for syntax to be mnemonic, (b) because the arbitrary standard for network streams is octets, and you can't fit much more than an English character into an octet, and (c) historically, English-speakers got there first (and had economic hegemony on their side, too). > You can't look too closely, and they have spooky action at a > distance too. For the email protocols at least, you also have > mind-crushing singularities. Doom, gloom, DMARC, and boom! But I guess you were referring to From-stuffing, not From-munging. From cory at lukasa.co.uk Wed Sep 10 19:51:57 2014 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 10 Sep 2014 18:51:57 +0100 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 10 September 2014 17:59, Stephen J. Turnbull wrote: > So does 0xDEADBEEF, but actually that's *not* text, it's a 32-bit > pointer, conveniently invalid on most 32-bit architectures and very > obvious when it shows up in a backtrace. Do you see an impedence > mismatch in the C community because of that? > > In fact, *all* bytes "look like text", because *you can't see them > until they're converted to text by repr()*! This is the key to the > putative "impedence mismatch" -- it's perceived as such when people > don't distinguish the map from the territory. I apologise, I was insufficiently clear. I mean that interaction with the bytes type in Python has a lot of textual aspects to it. This is a *deliberate* decision (or at least the documentation makes it seem deliberate), and I can understand the rationale, but it's hard to be surprised that it leads developers astray. Also, while I'm being picky, 0xDEADBEEF is not a 32-bit pointer, it's a 32-bit something. Its type is undefined in that expression. It has a standard usage as a guard word, but still, let's not jump to conclusions here! I accept your core point, however, which I consider to be this: > The issue that sometimes it's easier to read hex than ASCII mixed with > other stuff (hex escapes or Latin-1) is true enough, though. But it's > not about an impedence mismatch, it's a question of what does *this* > developer consider to be the convenient repr for *that* task. This is definitely true, which I believe I've already admitted in this thread. I do happen to believe that having it be hex would provide a better pedagogical position ("you know this isn't text because it looks like gibberish!"), but that ship sailed a long time ago. From chris.lasher at gmail.com Wed Sep 10 20:35:25 2014 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 10 Sep 2014 11:35:25 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I originally wrote this late last night but realized today that I only sent this reply to Terry Reedy, not to python-ideas. (Apologies, Terry ? I didn't mean to single you out with my rant!) I'm reposting it in full, below. Some of these ideas have already been raised by others and counter-arguments already posed. I still feel I have not seen some of these points directly addressed, namely, the unreasonableness of seeing bytes from floating point numbers as ASCII characters, and the sanity of the API I counter-propose. Message now appears below: On Wed, Sep 10, 2014 at 1:11 AM, Terry Reedy wrote: > > I agree with Chris Lasher's basic point, that the representation of bytes confusingly contradicts the idea that bytes are bytes. But it is not going to change. Unless printable representation of bytes objects appears as part of the language specification for Python 3, it's an implementation detail, thus, it is a candidate for change, especially if the BDFL wills it so. Consider me optimistic that we can change it, or I would have just posted yet another "Python 3 gets it all wrong" blog post to the web instead of writing this pre-proposal. :-) > > > > On 9/10/2014 3:56 AM, Cory Benfield wrote: >> >> On 10 September 2014 08:45, Nick Coghlan wrote: >>> >>> memoryview.cast can be a potentially useful tool for that :) >> >> >> Sure, and so can binascii.hexlify (which is what I normally use). > > > See http://bugs.python.org/issue9951 to add bytes.hex or .tohex as more of less the inverse of bytes.fromhex or even have hex(bytes) work. This change *is* possible and I think we should pick one of the suggestions for 3.5. Here's the API Issue 9951 is proposing: >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' b'Hello, World!' >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'.tohex() b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>> b'Hello, World!' b'Hello, World!' >>> b'Hello, World!'.tohex() b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' I'll tell you what: here's the API of my counter-proposal: >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'.asciify() b'Hello, World!' >>> b'Hello, World!' b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>> b'Hello, World!'.asciify() b'Hello, World!' Here's the prose description of my counter-proposal: add a method to the bytes object called `.asciify`, that returns a printable representation of the bytes, where bytes mapping to printable ASCII characters are displayed as ASCII characters, and the remainder are given as hex codes. That is, .asciify() should round-trip a bytes literal. This frees up repr() to do what universally makes sense on a series of bytes: state the bytes! Marc-Andre Lemburg said: > > A definite -1 from me on making repr(b"Hello World") harder to read than necessary. Okay, but a definite -1e6 from me on making my Python interpreter do this: >>> my_packed_bytes = struct.pack('ffff', 3.544294848931151e-12, 1.853266900760489e+25, 1.6215185358725202e-19, 0.9742483496665955) >>> my_packed_bytes b'Why, Guido? Why?' I do understand the utility of peering in to ASCII text, but like Cory Benfield stated earlier: > I'm saying that I don't get to do debugging with a simple > print statement when using the bytes type to do actual binary work, > while those who are doing sort-of binary work do. Does the inconvenience of having to explicitly call the .asciify() method on a bytes object justify the current behavior for repr() on a bytes object? The privilege of being lazy is obstructing the right to see what we've actually got in the bytes object, and is jeopardizing the very argument that "bytes are not strings". On Wed, Sep 10, 2014 at 10:51 AM, Cory Benfield wrote: > On 10 September 2014 17:59, Stephen J. Turnbull wrote: >> So does 0xDEADBEEF, but actually that's *not* text, it's a 32-bit >> pointer, conveniently invalid on most 32-bit architectures and very >> obvious when it shows up in a backtrace. Do you see an impedence >> mismatch in the C community because of that? >> >> In fact, *all* bytes "look like text", because *you can't see them >> until they're converted to text by repr()*! This is the key to the >> putative "impedence mismatch" -- it's perceived as such when people >> don't distinguish the map from the territory. > > I apologise, I was insufficiently clear. I mean that interaction with > the bytes type in Python has a lot of textual aspects to it. This is a > *deliberate* decision (or at least the documentation makes it seem > deliberate), and I can understand the rationale, but it's hard to be > surprised that it leads developers astray. > > Also, while I'm being picky, 0xDEADBEEF is not a 32-bit pointer, it's > a 32-bit something. Its type is undefined in that expression. It has a > standard usage as a guard word, but still, let's not jump to > conclusions here! > > I accept your core point, however, which I consider to be this: > >> The issue that sometimes it's easier to read hex than ASCII mixed with >> other stuff (hex escapes or Latin-1) is true enough, though. But it's >> not about an impedence mismatch, it's a question of what does *this* >> developer consider to be the convenient repr for *that* task. > > This is definitely true, which I believe I've already admitted in this > thread. I do happen to believe that having it be hex would provide a > better pedagogical position ("you know this isn't text because it > looks like gibberish!"), but that ship sailed a long time ago. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Wed Sep 10 21:27:10 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Sep 2014 12:27:10 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> On Sep 10, 2014, at 11:35, Chris Lasher wrote: > I originally wrote this late last night but realized today that I only > sent this reply to Terry Reedy, not to python-ideas. (Apologies, Terry > ? I didn't mean to single you out with my rant!) > > I'm reposting it in full, below. Some of these ideas have already been > raised by others and counter-arguments already posed. I still feel I > have not seen some of these points directly addressed, namely, the > unreasonableness of seeing bytes from floating point numbers as ASCII > characters, and the sanity of the API I counter-propose. > > Message now appears below: > > On Wed, Sep 10, 2014 at 1:11 AM, Terry Reedy wrote: >> >> I agree with Chris Lasher's basic point, that the representation of bytes confusingly contradicts the idea that bytes are bytes. But it is not going to change. > > > Unless printable representation of bytes objects appears as part of > the language specification for Python 3, it's an implementation > detail, thus, it is a candidate for change, especially if the BDFL > wills it so. Consider me optimistic that we can change it, or I would > have just posted yet another "Python 3 gets it all wrong" blog post to > the web instead of writing this pre-proposal. :-) > >> >> >> >> On 9/10/2014 3:56 AM, Cory Benfield wrote: >>> >>> On 10 September 2014 08:45, Nick Coghlan wrote: >>>> >>>> memoryview.cast can be a potentially useful tool for that :) >>> >>> >>> Sure, and so can binascii.hexlify (which is what I normally use). >> >> >> See http://bugs.python.org/issue9951 to add bytes.hex or .tohex as more of less the inverse of bytes.fromhex or even have hex(bytes) work. This change *is* possible and I think we should pick one of the suggestions for 3.5. > > > > Here's the API Issue 9951 is proposing: > >>>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' > b'Hello, World!' >>>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'.tohex() > b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>>> b'Hello, World!' > b'Hello, World!' >>>> b'Hello, World!'.tohex() > b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' > > > I'll tell you what: here's the API of my counter-proposal: > >>>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' > b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'.asciify() > b'Hello, World!' >>>> b'Hello, World!' > b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>>> b'Hello, World!'.asciify() > b'Hello, World!' It strikes me that we should have both asciify and hexlify (or whatever we call them) so people can be explicit when debugging; the question then becomes which one repr calls. At which point it really is just a question of which group of developers (those working on HTTP/2.0 or those working on HTTP/1.1, for example) get to be "lazy" instead of explicit in their debugging. The argument in favor of "asciify" is that the hex representation is more purist. The argument in favor of "hexlify" is that it makes Python 3.6 do the same thing as 3.0-3.5, and in fact 1.0-2.7 as well; people have had a few decades to get used to being lazy with mostly-ASCII protocols, while people have had a few decades to get used to being explicit with pure-binary protocols. But maybe there's another potential concern that can help decide. A lot of novices using bytes get confused when they see b'\x05Hello' and ask questions about how to deal with that 8-character string. (You can see them all over StackOverflow, for example.) Of course the same people also ask how to get the b out of their string, etc.; obviously they need to be taught the difference between a bytes and its repr no matter what. Would switching to hexlify as a default help those people by forcing them to confront their confusion early, or slow them down by not letting them write a lot of simple code and learn other important stuff before getting to that confusion? I that the answer to that might be as compelling as the answer to which group of experienced developers (where the groups often overlap) deserves to be allowed to be lazy. But I don't have the answer... > Here's the prose description of my counter-proposal: add a method to > the bytes object called `.asciify`, that returns a printable > representation of the bytes, where bytes mapping to printable ASCII > characters are displayed as ASCII characters, and the remainder are > given as hex codes. That is, .asciify() should round-trip a bytes > literal. This frees up repr() to do what universally makes sense on a > series of bytes: state the bytes! > > > Marc-Andre Lemburg said: >> >> A definite -1 from me on making repr(b"Hello World") harder to read than necessary. > > > Okay, but a definite -1e6 from me on making my Python interpreter do this: > >>>> my_packed_bytes = struct.pack('ffff', 3.544294848931151e-12, > 1.853266900760489e+25, 1.6215185358725202e-19, 0.9742483496665955) >>>> my_packed_bytes > b'Why, Guido? Why?' > > I do understand the utility of peering in to ASCII text, but like Cory > Benfield stated earlier: > >> I'm saying that I don't get to do debugging with a simple >> print statement when using the bytes type to do actual binary work, >> while those who are doing sort-of binary work do. > > > Does the inconvenience of having to explicitly call the .asciify() > method on a bytes object justify the current behavior for repr() on a > bytes object? The privilege of being lazy is obstructing the right to > see what we've actually got in the bytes object, and is jeopardizing > the very argument that "bytes are not strings". > > On Wed, Sep 10, 2014 at 10:51 AM, Cory Benfield wrote: >> On 10 September 2014 17:59, Stephen J. Turnbull wrote: >>> So does 0xDEADBEEF, but actually that's *not* text, it's a 32-bit >>> pointer, conveniently invalid on most 32-bit architectures and very >>> obvious when it shows up in a backtrace. Do you see an impedence >>> mismatch in the C community because of that? >>> >>> In fact, *all* bytes "look like text", because *you can't see them >>> until they're converted to text by repr()*! This is the key to the >>> putative "impedence mismatch" -- it's perceived as such when people >>> don't distinguish the map from the territory. >> >> I apologise, I was insufficiently clear. I mean that interaction with >> the bytes type in Python has a lot of textual aspects to it. This is a >> *deliberate* decision (or at least the documentation makes it seem >> deliberate), and I can understand the rationale, but it's hard to be >> surprised that it leads developers astray. >> >> Also, while I'm being picky, 0xDEADBEEF is not a 32-bit pointer, it's >> a 32-bit something. Its type is undefined in that expression. It has a >> standard usage as a guard word, but still, let's not jump to >> conclusions here! >> >> I accept your core point, however, which I consider to be this: >> >>> The issue that sometimes it's easier to read hex than ASCII mixed with >>> other stuff (hex escapes or Latin-1) is true enough, though. But it's >>> not about an impedence mismatch, it's a question of what does *this* >>> developer consider to be the convenient repr for *that* task. >> >> This is definitely true, which I believe I've already admitted in this >> thread. I do happen to believe that having it be hex would provide a >> better pedagogical position ("you know this isn't text because it >> looks like gibberish!"), but that ship sailed a long time ago. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From chris.lasher at gmail.com Wed Sep 10 22:29:13 2014 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 10 Sep 2014 13:29:13 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: On Wed, Sep 10, 2014 at 12:27 PM, Andrew Barnert wrote: > > It strikes me that we should have both asciify and hexlify (or whatever we call them) so people can be explicit when debugging; the question then becomes which one repr calls. Well said, and I agree both methods should be added. Explicit is better than implicit," here, to me, trumps, "There should be one and only one obvious way to do it." Using these methods should be preferred when one needs to actually store the results. repr() is, to me, meant as a convenience function for the programmer to inspect her data structure, and is not meant to be relied upon as a shortcut to string representation in production code. But perhaps others here disagree and think repr() can and should be used in production code. > > The argument in favor of "asciify" is that the hex representation is more purist. > > The argument in favor of "hexlify" is that it makes Python 3.6 do the same thing as 3.0-3.5, and in fact 1.0-2.7 as well; people have had a few decades to get used to being lazy with mostly-ASCII protocols, while people have had a few decades to get used to being explicit with pure-binary protocols. Again, very well said! > But maybe there's another potential concern that can help decide. A lot of novices using bytes get confused when they see b'\x05Hello' I guess I wasn't clear: this is precisely why I've raised this issue. I promise I'm not trying to make life harder for folks using Python 3 to work with HTTP/1.1! I'm trying to lower the barrier of comprehension to those who have not used Python 3, and especially those who have never programmed before in their life. I have teach these people, in my local Python meetup group, in Software Carpentry courses, and one-on-one with junior developers in my company. Put yourself in the shoes of a beginner. If Python does this >>> bytes(range(15)) b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e' To understand this, you have to learn just two things: 1. Bytes is a sequence of integers between the range of 0 and 255. 2. How to translate base-10 integers into hexadecimal. You have to see this through the eyes of a beginner to see this >>> bytes(range(15)) b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e' Now you have four things to explain! 1. Bytes is a sequence of integers between the range of 0 and 255. 2. How to translate base-10 integers into hexadecimal. 3. How ASCII provides a mapping between some integers and English characters 4. The conditions under which you'll see an ASCII character in place of a hexadecimal value versus the hexadecimal value itself It's easier to teach a student how to decode bytes into ASCII characters when the student can see the bytes, then the resulting ASCII characters in the string, in a one-to-one fashion. It is deeply confusing when they inspect the bytes in the REPL and already see the ASCII characters. The natural question is, "But I already see the character, so why do I have to decode it?!" The current behavior of repr() on bytes puts an unfair cognitive burden on novices (and those of us working with "pure binary" files) compared to the gains to advanced programmers who already can comprehend the mapping of bytes to ASCII characters and can manage the mixture of the two. Think of the children! :-) From erik.m.bray at gmail.com Wed Sep 10 22:59:03 2014 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 10 Sep 2014 16:59:03 -0400 Subject: [Python-ideas] Abstract metaclasses? Message-ID: Hi all, I recently ran across an interesting (mis?)-feature at least in CPython that I couldn't find any specific justification for or against. The issue is that abstract *types*, while technically possible, don't behave as abstract classes. To exemplify, say I wanted to create a metaclass which itself has ABCMeta as a metaclass, and which has some abstract classmethod defined: >>> import abc >>> class Meta(type, metaclass=abc.ABCMeta): ... @abc.abstractmethod ... def foo(cls): pass ... Now for all intents and purposes Meta *is* an abstract type: >>> import inspect >>> inspect.isabstract(Meta) True >>> Meta.__abstractmethods__ frozenset({'foo'}) However, nothing prevents Meta from being used as a metaclass for another class, despite it being "abstract": >>> class A(metaclass=Meta): pass ... >>> A This is simply because the check for the Py_TYPFLAGS_IS_ABSTRACT flag is implemented in object_new, which is overridden by type_new for type subclasses. type_new does not perform this check. I'm perfectly fine if this is dismissed as too abstract or too academic to be useful, but I will mention that this came up in a real use case. The use case is in a hierarchy of metaclasses involved in a syntactic-sugary class factory framework involving creation of new classes via operators. So I just wonder if this is a bug that should be fixed, or at the very least a feature request. The IS_ABSTRACT flag check is cheap and easy to add to type_new, in principle. In the meantime a workaround, which doesn't seem too terrible, is simply to define something I called AbstractableType: >>> class AbstractableType(type): ... def __new__(mcls, name, bases, members): ... if inspect.isabstract(mcls): ... raise TypeError( ... "Can't instantiate abstract type {0} with " ... "abstract methods {1}".format( ... mcls.__name__, ', '.join(sorted(mcls.__abstractmethods__)))) ... return super(AbstractableType, mcls).__new__(mcls, name, bases, members) ... Now create an abstract metaclass with (meta-)metaclass abc.ABCMeta: >>> class AbstractMeta(AbstractableType, metaclass=abc.ABCMeta): ... @abc.abstractmethod ... def foo(cls): pass ... Creating a class with metaclass AbstractMeta fails as it should: >>> class A(metaclass=AbstractMeta): pass ... Traceback (most recent call last): File "", line 1, in File "", line 7, in __new__ TypeError: Can't instantiate abstract type AbstractMeta with abstract methods foo However, AbstractMeta can be subclassed with a concrete implementation: >>> class ConcreteMeta(AbstractMeta): ... def foo(cls): print("Concrete method") ... >>> class A(metaclass=ConcreteMeta): pass ... >>> A.foo() Concrete method Thanks, Erik From erik.m.bray at gmail.com Wed Sep 10 23:25:51 2014 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 10 Sep 2014 17:25:51 -0400 Subject: [Python-ideas] Proposal: New syntax for OrderedDict, maybe built-in In-Reply-To: <20140904180044.GA15424@k2> References: <5408A2CA.9070408@kaapstorm.com> <20140904180044.GA15424@k2> Message-ID: On Thu, Sep 4, 2014 at 2:00 PM, David Wilson wrote: > On Thu, Sep 04, 2014 at 07:35:06PM +0200, Norman Hooper wrote: > >> I work with OrderedDict a lot, because JSON represents an OrderedDict >> and I need to work with JSON a lot. >> >> With the ubiquity of JSON, it may also be time to promote OrderedDict >> to a built-in type too. > > Neither JSON objects nor JavaScript object properties preserve > enumeration order. This is a common misconception since implementations > tend to preserve order when the number of keys is small, since they may > use a more efficient internal representation in that case, whose > enumeration order depends on order the properties were defined in. JSON/JavaScript might not be the best use case for this reason. But YAML is! YAML has an "omap" construct [1] for representing ordered mappings. In "block style" this is represented like: - aardvark: African pig-like ant eater. Ugly. - anteater: South-American ant eater. Two species. - anaconda: South-American constrictor snake. Scaly. This syntax is basically the same as that for a list of one-element mappings. In YAML's "flow style" this equates to exactly the OP's proposal: ["aardvark": "...", "anteater": "...", "anaconda": "..."] Unfortunately there's no way for the YAML parser to disambiguate this from a list of one-element mappings, unless the "!!omap" tag is explicitly prepended. But in some of my own applications I find it useful enough to just assume this should be an OrderedDict. Though it's just as easy only make an OrderedDict when "!!omap" is used. In PyYAML this is supported by returning a list of tuples, though it's easy to then wrap that in an OrderedDict: >>> yaml.load("['a': 1, 'b': 2]") [{'a': 1}, {'b': 2}] >>> yaml.load("!!omap ['a': 1, 'b': 2]") [('a', 1), ('b', 2)] I have one application that uses this extensively and would personally like to see it in Python. But I think the objections thus far are (mostly) reasonable. Erik [1] http://yaml.org/type/omap.html From ethan at stoneleaf.us Wed Sep 10 23:41:21 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Sep 2014 14:41:21 -0700 Subject: [Python-ideas] Abstract metaclasses? In-Reply-To: References: Message-ID: <5410C581.5030903@stoneleaf.us> On 09/10/2014 01:59 PM, Erik Bray wrote: > > --> import abc > --> class Meta(type, metaclass=abc.ABCMeta): > ... @abc.abstractmethod > ... def foo(cls): pass > ... > --> class A(metaclass=Meta): pass > ... > --> A > I think this is a bug. However, if the class were: --> class A(metaclass=Meta): ... def foo(self): ... pass ... Then this should succeed, and I don't think your Abstractable type allows it. -- ~Ethan~ From ncoghlan at gmail.com Thu Sep 11 00:09:24 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Sep 2014 08:09:24 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: On 11 Sep 2014 06:30, "Chris Lasher" wrote: > > Put yourself in the shoes of a beginner. We often compromise the beginner experience for backwards compatibility reasons, or to provide a better developer experience in the long run (cf. changing print from a statement to a builtin function). In this case, I *agree* the current behaviour is confusing, since it recreates some of the old "is it binary or is it text?" confusion that was more endemic in Python 2. In Python 3, "bytes" is still a hybrid type that can hold: * arbitrary binary data * binary data that contains ASCII segments A pure teaching language wouldn't make that compromise. Python 3 isn't a pure teaching language though - it's a pragmatic professional programming language that is *also* useful for teaching. The problem is that for a lot of data it is *genuinely ambiguous* as to which of those it actually is (and it may change at runtime depending on the specific nature of the data). Both the default repr and the literal form assume the "binary data ASCII compatible segments", which aligns with the behaviour of the Python 2 str type. That isn't going to change in Python, especially since we actually *did* try it for a while (prior to the 3.0 release) and really didn't like it. However, as others have noted, making it easier to get a pure hex representation is likely worth doing. There are lots of ways of doing that currently, but none that really qualify as "obvious". Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From erik.m.bray at gmail.com Thu Sep 11 00:15:14 2014 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 10 Sep 2014 18:15:14 -0400 Subject: [Python-ideas] Abstract metaclasses? In-Reply-To: <5410C581.5030903@stoneleaf.us> References: <5410C581.5030903@stoneleaf.us> Message-ID: On Wed, Sep 10, 2014 at 5:41 PM, Ethan Furman wrote: > On 09/10/2014 01:59 PM, Erik Bray wrote: >> >> >> --> import abc >> --> class Meta(type, metaclass=abc.ABCMeta): >> ... @abc.abstractmethod >> ... def foo(cls): pass >> ... >> --> class A(metaclass=Meta): pass >> ... >> --> A >> > > > I think this is a bug. However, if the class were: > > --> class A(metaclass=Meta): > ... def foo(self): > ... pass > ... > > Then this should succeed, and I don't think your Abstractable type allows > it. I don't necessarily agree that that should succeed. The use of an abstract meta-class is basically requiring there to be a concrete *classmethod* of the name "foo", (an unbound instancemethod wouldn't suffice). What maybe *should* work, but doesn't with this implementation is: class A(metaclass=Meta): @classmethod def foo(cls): pass That could be fixed reasonably easily by extending the AbstractableType.__new__ to check for classmethods in the new class's members, a la ABCMeta.__new__. I'm not sure how that would be best handled in CPython though. Alternatively it could just be required that an abstract metaclass simply can't be used as a metaclass unless a concrete subclass is made. But using @classmethod to override abstract class methods does make some intuitive sense. Erik From ethan at stoneleaf.us Thu Sep 11 00:29:07 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Sep 2014 15:29:07 -0700 Subject: [Python-ideas] Abstract metaclasses? In-Reply-To: References: <5410C581.5030903@stoneleaf.us> Message-ID: <5410D0B3.3050100@stoneleaf.us> On 09/10/2014 03:15 PM, Erik Bray wrote: > On Wed, Sep 10, 2014 at 5:41 PM, Ethan Furman wrote: >> On 09/10/2014 01:59 PM, Erik Bray wrote: >>> >>> >>> --> import abc >>> --> class Meta(type, metaclass=abc.ABCMeta): >>> ... @abc.abstractmethod >>> ... def foo(cls): pass >>> ... >>> --> class A(metaclass=Meta): pass >>> ... >>> --> A >>> >> >> >> I think this is a bug. However, if the class were: >> >> --> class A(metaclass=Meta): >> ... def foo(self): >> ... pass >> ... >> >> Then this should succeed, and I don't think your Abstractable type allows >> it. > > I don't necessarily agree that that should succeed. The use of an > abstract meta-class is basically requiring there to be a concrete > *classmethod* of the name "foo", (an unbound instancemethod wouldn't > suffice). If that is what you want you should use `abstractclassmethod`. > What maybe *should* work, but doesn't with this implementation is: > > class A(metaclass=Meta): > @classmethod > def foo(cls): > pass Well, take out the 'maybe' and I'm in agreement. ;) > Alternatively it could just be required that an abstract metaclass > simply can't be used as a metaclass unless a concrete subclass is > made. -1 > But using @classmethod to override abstract class methods does > make some intuitive sense. +1 -- ~Ethan~ From erik.m.bray at gmail.com Thu Sep 11 01:00:07 2014 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 10 Sep 2014 19:00:07 -0400 Subject: [Python-ideas] Abstract metaclasses? In-Reply-To: <5410D0B3.3050100@stoneleaf.us> References: <5410C581.5030903@stoneleaf.us> <5410D0B3.3050100@stoneleaf.us> Message-ID: On Wed, Sep 10, 2014 at 6:29 PM, Ethan Furman wrote: > On 09/10/2014 03:15 PM, Erik Bray wrote: >> >> On Wed, Sep 10, 2014 at 5:41 PM, Ethan Furman wrote: >>> >>> On 09/10/2014 01:59 PM, Erik Bray wrote: >>>> >>>> >>>> >>>> --> import abc >>>> --> class Meta(type, metaclass=abc.ABCMeta): >>>> ... @abc.abstractmethod >>>> ... def foo(cls): pass >>>> ... >>>> --> class A(metaclass=Meta): pass >>>> ... >>>> --> A >>>> >>> >>> >>> >>> I think this is a bug. However, if the class were: >>> >>> --> class A(metaclass=Meta): >>> ... def foo(self): >>> ... pass >>> ... >>> >>> Then this should succeed, and I don't think your Abstractable type allows >>> it. >> >> >> I don't necessarily agree that that should succeed. The use of an >> abstract meta-class is basically requiring there to be a concrete >> *classmethod* of the name "foo", (an unbound instancemethod wouldn't >> suffice). > > > If that is what you want you should use `abstractclassmethod`. That would be fine if the classmethods were being defined in a normal class. And with a little rearchitecting maybe that would be a simpler workaround for my own issues. But I still think this should work properly for methods belonging to a metaclass. For that matter, I feel like this is a bug too: >>> class Foo(metaclass=abc.ABCMeta): ... @classmethod ... @abc.abstractmethod ... def my_classmethod(cls): pass ... >>> class FooSub(Foo): ... def my_classmethod(self): ... pass # Not actually a classmethod ... >>> FooSub() <__main__.FooSub object at 0x7f5d8b8a6dd8> Basically, FooSub does not really implement the interface expected by the Foo ABC. This is especially deceptive considering that the way classmethod.__get__ works gives the impression (to the unwary) that the classmethod is actually a method defined on the class's metaclass: >>> Foo.my_classmethod > >> What maybe *should* work, but doesn't with this implementation is: >> >> class A(metaclass=Meta): >> @classmethod >> def foo(cls): >> pass > > > Well, take out the 'maybe' and I'm in agreement. ;) > > >> Alternatively it could just be required that an abstract metaclass >> simply can't be used as a metaclass unless a concrete subclass is >> made. > > > -1 > >> But using @classmethod to override abstract class methods does >> make some intuitive sense. > > > +1 That's fine. I think that can be done. Thanks, Erik From ethan at stoneleaf.us Thu Sep 11 01:10:26 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Sep 2014 16:10:26 -0700 Subject: [Python-ideas] Abstract metaclasses? In-Reply-To: References: <5410C581.5030903@stoneleaf.us> <5410D0B3.3050100@stoneleaf.us> Message-ID: <5410DA62.9050001@stoneleaf.us> On 09/10/2014 04:00 PM, Erik Bray wrote: > On Wed, Sep 10, 2014 at 6:29 PM, Ethan Furman wrote: >> >> If that is what you want you should use `abstractclassmethod`. > > That would be fine if the classmethods were being defined in a normal > class. And with a little rearchitecting maybe that would be a simpler > workaround for my own issues. But I still think this should work > properly for methods belonging to a metaclass. Ah, right -- any method defined on a metaclass is a defacto class method. > For that matter, I feel like this is a bug too: > > --> class Foo(metaclass=abc.ABCMeta): > ... @classmethod > ... @abc.abstractmethod > ... def my_classmethod(cls): pass > ... > --> class FooSub(Foo): > ... def my_classmethod(self): > ... pass # Not actually a classmethod > ... > -> FooSub() You'll have to search the docs, bug-tracker, and mailing lists for that one -- I do seem to remember reading about it, but don't recall where. -- ~Ethan~ From chris.lasher at gmail.com Thu Sep 11 01:23:58 2014 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 10 Sep 2014 16:23:58 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: On Wed, Sep 10, 2014 at 3:09 PM, Nick Coghlan wrote: > In Python 3, "bytes" is still a hybrid type that can hold: > > * arbitrary binary data > * binary data that contains ASCII segments > Let me be clear. Here are things this proposal does NOT include: * Removing string-like methods from bytes * Removing ASCII from bytes literals Those have proven incredibly useful to the Python community. I appreciate that. This proposal does not take these behaviors away from bytes. Here's what my proposal DOES include: 1. Adjust the behavior of repr() on a bytes instance such that only hexadecimal codes appear. The returned value would be the text displaying the bytes literal of hexadecimal codes that would reproduce the bytes instance. 2. Provide a method (suggested: "bytes.asciify") that returns a printable representation of bytes that replaces bytes whose values map to printable ASCII glyphs with the glyphs. The returned value would be the text displaying the bytes literal of ASCII glyphs and hexadecimal codes that would reproduce the bytes instance. If you liked the behavior of repr() on bytes in Python 3.0 through 3.4 (or 3.5), it's still available via this method call! 3. Optionally, provide a method (suggested: "bytes.hexlify") which implements the code for creating the printable representation of the bytes with hexadecimal values only, and call this method in bytes.__repr__. > Both the default repr and the literal form assume the "binary data ASCII > compatible segments", which aligns with the behaviour of the Python 2 str > type. That isn't going to change in Python, especially since we actually > *did* try it for a while (prior to the 3.0 release) and really didn't like > it. > Yes, more specifically you said: > Early (pre-release) versions of Python 3.0 didn't have this behaviour, and > getting the raw integer dumps instead turned out to be *really* annoying > in practice, so we decided the easier debugging justified the increased > risk of creating incorrect mental models for users (especially those > migrating from Python 2). What you haven't said so far, however, and what I still don't know, is whether or not the core team has already tried providing a method on bytes objects ? la the proposed .asciify() for projecting bytes as ASCII characters, and rejected that on the basis of it being too inconvenient for the vast majority of Python use cases. Did the core team try this, before deciding that this should be the result from repr() should automatically rewrite printable ASCII characters in place of hex values for bytes? So far, I've heard a lot of requests to keep the behavior because it's convenient. But how inconvenient is it to call bytes.asciify()? Are those not in favor of changing the behavior of repr() really going to sit behind the argument that the effort expended in typing ten more characters ought to guarantee that thousands of other programmers are going to have to figure out why there's letters in their bytes ? or rather, how there's actually NOT letters in their bytes? And once again, we are talking about changing behavior that is unspecified by the Python 3 language specification. The language is gaining a reputation for confusing the two, however, as written by Armin Ronacher [1]: Python is definitely a language that is not perfect. However I think what > frustrates me about the language are largely problems that have to do with > tiny details in the interpreter and less the language itself. These > interpreter details however are becoming part of the language and this is > why they are important. I feel passionately this implicit ASCII-translation behavior should not propagate into further releases CPython 3, and I don't want to see it become a de facto specification due to calcification. We're talking about the next 10 to 15 years. Nobody guaranteed the behavior of repr() so far. With the bytes.asciify() method (or whatever it may be called), we have a fair compromise, plus a more explicit specification of behavior of bytes in Python 3. In closing on this message, I want to say that I appreciate you hearing me out, Nick. I have appreciated your answers, and certainly the historical background. And thanks to the others who have contributed here. I appreciate you taking the time to discuss this. Chris L. [1] http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Sep 11 01:39:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Sep 2014 16:39:43 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: <5410E13F.5020005@stoneleaf.us> FWIW, I find the ascii-mixed-with-hex difficult to parse, even though I know full-well what it is, and I could easily live with having a 'bytes.asciify' and 'bytes.hexlify' and have the __repr__ be something more consistent -- maybe a list of ints, that way nobody gets to be lazy! ;) -- ~Ethan~ From cs at zip.com.au Thu Sep 11 01:48:38 2014 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 11 Sep 2014 09:48:38 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140910105732.GE9293@ando.pearwood.info> References: <20140910105732.GE9293@ando.pearwood.info> Message-ID: <20140910234838.GA87114@cskk.homeip.net> As someone who uses ASCII or more commonly UTF-8 byte sequences, I find the current ascii-ish default display handy. That said... On 10Sep2014 20:57, Steven D'Aprano wrote: >However, I do support Terry's suggestion that bytes (and, I presume, >bytearray) grow some sort of easy way of displaying the bytes in hex. >The trouble is, what do we actually want? > >b'Abc' --> '0x416263' To my eye that is a single number expressed in base 16 and would imply an endianness. I imagine you really mean a transcription of the bytes in hex, with a leading 0x to indicate the transcription. But it is not what my eye sees. Of course, the natural transcription above implies big endianness, as is only right and proper:-) Why not give bytes objects a .hex method, emitting bare hex with no leading 0x? That would be my first approach. >b'Abc'.decode('hexescapes') --> '\x41\x62\x63' OTOH, this is really neat. And .decode('hex') for the former? Cheers, Cameron Simpson It never will rain roses; when we want to have more roses we must plant more trees. - George Eliot, The Spanish Gypsy From tjreedy at udel.edu Thu Sep 11 02:31:04 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 10 Sep 2014 20:31:04 -0400 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140910234838.GA87114@cskk.homeip.net> References: <20140910105732.GE9293@ando.pearwood.info> <20140910234838.GA87114@cskk.homeip.net> Message-ID: On 9/10/2014 7:48 PM, Cameron Simpson wrote: > As someone who uses ASCII or more commonly UTF-8 byte sequences, I find > the current ascii-ish default display handy. That said... > > On 10Sep2014 20:57, Steven D'Aprano > wrote: >> However, I do support Terry's suggestion that bytes (and, I presume, >> bytearray) grow some sort of easy way of displaying the bytes in hex. >> The trouble is, what do we actually want? >> >> b'Abc' --> '0x416263' > > To my eye that is a single number expressed in base 16 and would To mine also. > imply an endianness. I imagine you really mean a transcription of > the bytes in hex, with a leading 0x to indicate the transcription. > But it is not what my eye sees. > > Of course, the natural transcription above implies big endianness, as is > only right and proper:-) > > Why not give bytes objects a .hex method, emitting bare hex with no leading > 0x? That would be my first approach. The is the initial proposal of http://bugs.python.org/issue9951, which favor. -- Terry Jan Reedy From rosuav at gmail.com Thu Sep 11 02:42:41 2014 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Sep 2014 10:42:41 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Sep 11, 2014 at 4:35 AM, Chris Lasher wrote: > Unless printable representation of bytes objects appears as part of > the language specification for Python 3, it's an implementation > detail, thus, it is a candidate for change, especially if the BDFL > wills it so. So this is all about the output of repr(), right? The question then is: How important is backward compatibility with repr? Will there be code breakage? I've generally considered repr() to be exclusively "take this object and turn it into something a human can use".Nothing about what the exact string returned is. Something like this description: """Any value, debug style. Do not rely on the exact formatting; how the result looks can vary depending on locale, phase of the moon or anything else the lfun::_sprintf() method implementor wanted for debugging.""" (Replace lfun::_sprintf() with __repr__() for that to make sense for Python.) If repr's meant to be treated that way, then there's no problem changing bytes.__repr__ to produce hex-only output in 3.5 or 3.6. If it's NOT meant to be treated as opaque (and I've seen some Stack Overflow posts where people are parsing repr()), then what is the guarantee? ChrisA From ncoghlan at gmail.com Thu Sep 11 03:27:19 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Sep 2014 11:27:19 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: On 11 September 2014 09:23, Chris Lasher wrote: > On Wed, Sep 10, 2014 at 3:09 PM, Nick Coghlan wrote: >> >> In Python 3, "bytes" is still a hybrid type that can hold: >> >> * arbitrary binary data >> * binary data that contains ASCII segments > > Let me be clear. Here are things this proposal does NOT include: > > * Removing string-like methods from bytes > * Removing ASCII from bytes literals > > Those have proven incredibly useful to the Python community. I appreciate > that. This proposal does not take these behaviors away from bytes. > > Here's what my proposal DOES include: > > 1. Adjust the behavior of repr() on a bytes instance such that only > hexadecimal codes appear. The returned value would be the text displaying > the bytes literal of hexadecimal codes that would reproduce the bytes > instance. This is not an acceptable change, for two reasons: 1. It's a *major* compatibility break. It breaks single source Python 2/3 development, it breaks doctests, it breaks user expectations. 2. It breaks the symmetry between the bytes literal format and their representation. It's important to remember we changed *from* a pure binary representation back to the current hybrid representation. It's not an accident or oversight, it's a deliberate design choice, and the reasons driving that original decision haven't changed in the last 8+ years. > 2. Provide a method (suggested: "bytes.asciify") that returns a printable > representation of bytes that replaces bytes whose values map to printable > ASCII glyphs with the glyphs. The returned value would be the text > displaying the bytes literal of ASCII glyphs and hexadecimal codes that > would reproduce the bytes instance. If you liked the behavior of repr() on > bytes in Python 3.0 through 3.4 (or 3.5), it's still available via this > method call! Except that method call won't be available in Python 2 code, and thus not usable in single source Python 2/3 code bases. That's still an incredibly important environment for people to be able to program in, and we're generally aiming to make the common subset *bigger* in Python 3.5 (e.g. by adding bytes.__mod__), not smaller. > 3. Optionally, provide a method (suggested: "bytes.hexlify") which > implements the code for creating the printable representation of the bytes > with hexadecimal values only, and call this method in bytes.__repr__. As per the discussion on issue 9951, it is likely Python 3.5 will either offer bytes.hex() and bytearray.hex() methods (and perhaps even memoryview.hex()). I have also filed issue 22385 to propose allowing the "x" and "X" string formatting characters (for str.format and the format builtin) to accept arbitrary bytes-like objects. *Additive* changes like that to make it easier to work with pure binary data are relatively non-controversial (although there may still be some argument over *which* of those changes are worth including). > What you haven't said so far, however, and what I still don't know, is > whether or not the core team has already tried providing a method on bytes > objects ? la the proposed .asciify() for projecting bytes as ASCII > characters, and rejected that on the basis of it being too inconvenient for > the vast majority of Python use cases. That option was never really on the table, as once we decided back to switch to a hybrid ASCII representation, the obvious design model to use was the Python 2 str type, which has inherently hybrid behaviour, and uses the literal form for the "obj == eval(repr(obj))" round trip. > Did the core team try this, before deciding that this should be the result > from repr() should automatically rewrite printable ASCII characters in place > of hex values for bytes? > > So far, I've heard a lot of requests to keep the behavior because it's > convenient. But how inconvenient is it to call bytes.asciify()? Are those > not in favor of changing the behavior of repr() really going to sit behind > the argument that the effort expended in typing ten more characters ought to > guarantee that thousands of other programmers are going to have to figure > out why there's letters in their bytes ? or rather, how there's actually NOT > letters in their bytes? No, we're not keeping it because it's convenient, we're keeping it because changing it would be a major compatibility break for (at best) a small reduction in beginner confusion. This change simply wouldn't provide sufficient benefit to justify the massive scale of the disruption it would cause. By contrast, adding better *binary* representation tools is easy (they pose no backwards compatibility challenges), and hence the preferred choice. When teaching beginners, explaining the difference between: >>> b"abc" b'abc' >>> b"abc".hex() '616263' Is likely to be pretty straightforward (and will teach them the relevant concept of ASCII based vs hexadecimal representations for binary data). Consider the proposed alternative, which is to instead have to explain: >>> b"abc" b'\x61\x62\x63' >>> b"abc".hex() '616263' >>> b"abc".ascii() 'abc' That's 3 different representations when there are only two underlying concepts to be learned. > And once again, we are talking about changing behavior that is unspecified > by the Python 3 language specification. Something being underspecified in the language specification doesn't mean we have free rein to change it on a whim - sometimes it just means there's an assumed detail that hasn't been explicitly stated, but implementors of alternative implementations hadn't previously commented on the omission because they just followed the behaviour of CPython as the reference interpreter, or the requirements of the regression test suite. It's really necessary to look at the regression test suite, along with the written specification, as things that aren't part of the language spec are marked as "CPython only". Cases where it's CPython that is out of line when other interpreter implementations discover a compatibility issue get filed as CPython bugs (like the one where we sometimes get the operand precedence wrong if both sequences in a binary concatenation operation are implemented in C and the sequences are of different types). In this case, the underspecification relates to the fact that for builtin types that have dedicated syntax, the expectation is that their repr will use that dedicated syntax. This is not currently stated explicitly in the language reference (and I agree it probably should be), but it's tested extensively by the regression test suite, so it becomes a backwards compatibility constraint and an alternative interpreter compatibility constraint. > The language is gaining a reputation > for confusing the two It isn't "gaining" that reputation, it has always had it. The reputation for it is actually *reducing* over time, as we spend more time working with other implementations like PyPy, Jython and IronPython to get the CPython implementation details marked appropriately. (C)Python itself hasn't changed in this regard - we're just starting to do a better job of getting the wildly divergent groups of users actually talking to each other (with occasional fireworks as people have to come to grips with some radically different viewpoints on the nature and purpose of software development). In particular, we're starting to see folks that had previously focused almost entirely on the application programming and network service development side of Python (which tends to heavily abstract away the C layer) start to learn more about the system orchestration, hardware automation and scientific programming side of Python that lets you dive as deeply into the machine internals as you like. Most language runtimes only let you handle one or the other of those categories well - CPython is a relatively rare breed in supporting both, which *does* have consequences that make many of our design decisions seem weird to folks that aren't looking at *all* the use cases for the language in general, and the CPython runtime in particular. > however, as written by Armin Ronacher [1]: > >> Python is definitely a language that is not perfect. However I think what >> frustrates me about the language are largely problems that have to do with >> tiny details in the interpreter and less the language itself. These >> interpreter details however are becoming part of the language and this is >> why they are important. > > I feel passionately this implicit ASCII-translation behavior should not > propagate into further releases CPython 3, and I don't want to see it become > a de facto specification due to calcification. It's not a de facto specification it's a deliberate design choice, made before Python 3.0 was even released, and captured by the regression test suite. > We're talking about the next > 10 to 15 years. Nobody guaranteed the behavior of repr() so far. With the > bytes.asciify() method (or whatever it may be called), we have a fair > compromise, plus a more explicit specification of behavior of bytes in > Python 3. Lots of folks don't like the fact that CPython doesn't completely hide the underlying memory model of C from the user - it's a deliberately leaky abstraction. The approach certainly has its downsides, but that leaky abstraction is what allows people to be confident that they can use Python as a convenient orchestration language, knowing that we will have easy access to the kind of low level control offered by C (and other systems programming languages) if we need it. This is why the scientific Python stack currently works best on CPython, with the ports to PyPy, Jython and IronPython (which all abstract away the C layer far more heavily) at varying stages of maturity - it's simply harder to do array oriented programming in those environments, since the language runtimes weren't built with that use case in mind (neither was CPython, but the relatively close coupling to the C layer enabled the capability anyway). Computers are complicated layers of messy and leaky abstractions. Working too hard at hiding those layers from the user just means developers can't bypass the abstraction easily when they know what they need for their current use case better than the original author of the language runtime. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ron3200 at gmail.com Thu Sep 11 03:36:53 2014 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 10 Sep 2014 20:36:53 -0500 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: On 09/10/2014 05:09 PM, Nick Coghlan wrote: > > On 11 Sep 2014 06:30, "Chris Lasher" > > wrote: > > > > Put yourself in the shoes of a beginner. > > We often compromise the beginner experience for backwards compatibility > reasons, or to provide a better developer experience in the long run (cf. > changing print from a statement to a builtin function). > > In this case, I *agree* the current behaviour is confusing, since it > recreates some of the old "is it binary or is it text?" confusion that was > more endemic in Python 2. > > In Python 3, "bytes" is still a hybrid type that can hold: > * arbitrary binary data > * binary data that contains ASCII segments > > A pure teaching language wouldn't make that compromise. Python 3 isn't a > pure teaching language though - it's a pragmatic professional programming > language that is *also* useful for teaching. > > The problem is that for a lot of data it is *genuinely ambiguous* as to > which of those it actually is (and it may change at runtime depending on > the specific nature of the data). Considering "genuinely ambiguous", if it was a new feature we might quote... "In the face of ambiguity, refuse the temptation to guess." It's interesting that there is nothing in the zen rules about change or backward compatibility. If there were, it might have said... "Changing too much, too fast, is often too disruptive". > Both the default repr and the literal form assume the "binary data ASCII > compatible segments", which aligns with the behaviour of the Python 2 str > type. That isn't going to change in Python, especially since we actually > *did* try it for a while (prior to the 3.0 release) and really didn't like it. > > However, as others have noted, making it easier to get a pure hex > representation is likely worth doing. There are lots of ways of doing that > currently, but none that really qualify as "obvious". When working with hex data, I prefer the way hex editors do it. With pairs of hex digits separated by a space. "50 79 74 68 6f 6e" b'Python' But I'm not sure there's a way to make that work cleanly. :-/ Cheers, Ron From stephen at xemacs.org Thu Sep 11 03:50:15 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 11 Sep 2014 10:50:15 +0900 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87d2b353jc.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Lasher writes: > Okay, but a definite -1e6 from me on making my Python interpreter do this: > > >>> my_packed_bytes = struct.pack('ffff', 3.544294848931151e-12, > 1.853266900760489e+25, 1.6215185358725202e-19, 0.9742483496665955) > >>> my_packed_bytes > b'Why, Guido? Why?' If you actually have a struct, why aren't you wrapping your_packed_bytes in a class that validates the struct and displays it nicely formatted? Or, alternatively, simply replaces __repr__? > I do understand the utility of peering in to ASCII text, but like Cory > Benfield stated earlier: > > > I'm saying that I don't get to do debugging with a simple > > print statement when using the bytes type to do actual binary work, > > while those who are doing sort-of binary work do. > > Does the inconvenience of having to explicitly call the .asciify() > method on a bytes object justify the current behavior for repr() on a > bytes object? Yes. A choice must be made, because a type has only one repr, and there's no syntax for choosing it. It's a question of whose use case is going to become more convenient and whose becomes less so, and either choice is *justified*. Which is *preferred* is a judgment call. Your judgment doesn't rule, and it definitely doesn't have a weight of 1e6. At this point even Guido's judgment is likely to be dominated by backward compatibility, no matter how much he regrets the necessity. (But I would bet he doesn't regret it at all.) > The privilege of being lazy is obstructing the right to see what > we've actually got in the bytes object, and is jeopardizing the > very argument that "bytes are not strings". It does not jeopardize the *fact* that bytes are not strings. People who don't understand that have a fundamental confusion, and they're going to want bytes to DWIM when mixed with str in their applications. And they'll complain when their bytes don't DWIM, and they'll complain even more when the repr "obstructs the right to see what they've actually got in the bytes object", which (in their applications) is a stream containing tokens borrowed from English using the ASCII coded character set. I agree with you that they're wrong. My point is that they're wrong in such a way that they won't understand that bytes aren't text strings any better merely because they become harder to read. They *know* that there's a text string in there because they put it there! Cory Benfield wrote and Chris Lasher quoted: > > Also, while I'm being picky, 0xDEADBEEF is not a 32-bit pointer, > > it's a 32-bit something. Its type is undefined in that It has a > > standard usage as a guard word, but still, let's not jump to > > conclusions here! I was not jumping to conclusions. I was setting up a scenario. The actual use case is something like "int *pi = 0xDEADBEEF;". The point is that C programmers are deliberately choosing a guard word that is readable when printed as hexadecimal, and also satisfies certain restrictions when those bytes are used as a pointer. That doesn't mean that they are confusing text with pointers. The same is true for Python's repr for bytes. > > I do happen to believe that having it be hex would provide a > > better pedagogical position ("you know this isn't text because it > > looks like gibberish!"), but that ship sailed a long time ago. I don't think a gibberish repr will confuse people who think that bytes are text in their application. They'll just get more peeved at Python 3, because they know that there's readable text in there, and Python 3 "obstructs their right to see what's actually in the bytes object". Regards, From ncoghlan at gmail.com Thu Sep 11 03:57:35 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Sep 2014 11:57:35 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 September 2014 10:42, Chris Angelico wrote: > On Thu, Sep 11, 2014 at 4:35 AM, Chris Lasher wrote: >> Unless printable representation of bytes objects appears as part of >> the language specification for Python 3, it's an implementation >> detail, thus, it is a candidate for change, especially if the BDFL >> wills it so. > > So this is all about the output of repr(), right? The question then > is: How important is backward compatibility with repr? Will there be > code breakage? I changed PyBytes_Repr to inject a 'Z' after the opening quote to see just how extensive the damage would be in CPython's own regression test suite (as I belatedly realised the magnitude of the impact may not be obvious to everyone, so I figured it was worth quantifying): 355 tests OK. 17 tests failed: test_base64 test_bytes test_configparser test_ctypes test_doctest test_file_eintr test_hash test_io test_pdb test_pickle test_pickletools test_re test_smtpd test_subprocess test_sys test_telnetlib test_tools 1 test altered the execution environment: test_warnings 17 tests skipped: test_curses test_devpoll test_kqueue test_msilib test_ossaudiodev test_smtpnet test_socketserver test_startfile test_timeout test_tk test_ttk_guionly test_urllib2net test_urllibnet test_winreg test_winsound test_xmlrpc_net test_zipfile64 I ran those tests without enabling *any* of the optional resources (and the Windows specific tests won't run on my machine). Folks should keep in mind that when we talk about "hybrid ASCII binary data", we're not just talking about things like SMTP and HTTP 1.1 and debugging network protocol traffic, we're also talking about things like URLs, filesystem paths, email addresses, environment variables, command line arguments, process names, passing UTF-8 encoded data to GUI frameworks, etc that are often both ASCII compatible and human readable *by design*. Note the error message produced here with my modified build: $ ./python -c 'import os; print(os.listdir(b"foo"))' Traceback (most recent call last): File "", line 1, in FileNotFoundError: [Errno 2] No such file or directory: b'Zfoo' And this directory listing: $ ./python -c 'import os; print(os.listdir(b"Mac"))' [b'ZIDLE', b'ZMakefile.in', b'ZTools', b'ZREADME.orig', b'ZPythonLauncher', b'ZIcons', b'ZREADME', b'ZExtras.install.py', b'ZBuildScript', b'ZResources'] Python 3 carved out a whole lot of text processing operations and said "these are clearly and unambiguous working with text data, we shouldn't confuse them with binary data manipulation". The remaining ambiguity in the behaviour of the Python 3 bytes type is largely inherent in the way computers currently work - there's no getting away from it. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Thu Sep 11 04:17:19 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 11 Sep 2014 11:17:19 +0900 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: <87bnqm6guo.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > In Python 3, "bytes" is still a hybrid type that can hold: > * arbitrary binary data > * binary data that contains ASCII segments > > A pure teaching language wouldn't make that compromise. Of course it would, because nobody in their right mind would restrict a bytes type to the values 128-255! Yes, I know what you mean: it wouldn't use the hybrid representation for repr or for literals. My point is that even you are making the mistake of framing the issue as whether a bytes object is "arbitrary binary data" or "binary data that contains [readable] ASCII segments" as something inherent in the type. It's not! It's all about convenience of representation for particular applications, end of story. A repr that obfuscates the content in the "ASCII segment" set of applications *might* be preferable for teaching applications, but I'm not even sure of that. From ncoghlan at gmail.com Thu Sep 11 04:35:29 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Sep 2014 12:35:29 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 September 2014 11:57, Nick Coghlan wrote: > Folks should keep in mind that when we talk about "hybrid ASCII binary > data", we're not just talking about things like SMTP and HTTP 1.1 and > debugging network protocol traffic, we're also talking about things > like URLs, filesystem paths, email addresses, environment variables, > command line arguments, process names, passing UTF-8 encoded data to > GUI frameworks, etc that are often both ASCII compatible and human > readable *by design*. > > Note the error message produced here with my modified build: > > $ ./python -c 'import os; print(os.listdir(b"foo"))' > Traceback (most recent call last): > File "", line 1, in > FileNotFoundError: [Errno 2] No such file or directory: b'Zfoo' > > And this directory listing: > > $ ./python -c 'import os; print(os.listdir(b"Mac"))' > [b'ZIDLE', b'ZMakefile.in', b'ZTools', b'ZREADME.orig', > b'ZPythonLauncher', b'ZIcons', b'ZREADME', b'ZExtras.install.py', > b'ZBuildScript', b'ZResources'] After posting that version, I realised actually making the proposed change would be similarly straightforward, and better illustrate the core problem with the idea: $ ./python -c 'import os; print(os.listdir(b"foo"))' Traceback (most recent call last): File "", line 1, in FileNotFoundError: [Errno 2] No such file or directory: b'\x66\x6f\x6f' $ ./python -c 'import os; print(os.listdir(b"Mac"))' [b'\x49\x44\x4c\x45', b'\x4d\x61\x6b\x65\x66\x69\x6c\x65\x2e\x69\x6e', b'\x54\x6f\x6f\x6c\x73', b'\x52\x45\x41\x44\x4d\x45\x2e\x6f\x72\x69\x67', b'\x50\x79\x74\x68\x6f\x6e\x4c\x61\x75\x6e\x63\x68\x65\x72', b'\x49\x63\x6f\x6e\x73', b'\x52\x45\x41\x44\x4d\x45', b'\x45\x78\x74\x72\x61\x73\x2e\x69\x6e\x73\x74\x61\x6c\x6c\x2e\x70\x79', b'\x42\x75\x69\x6c\x64\x53\x63\x72\x69\x70\x74', b'\x52\x65\x73\x6f\x75\x72\x63\x65\x73'] vs $ python3 -c 'import os; print(os.listdir(b"foo"))' Traceback (most recent call last): File "", line 1, in FileNotFoundError: [Errno 2] No such file or directory: 'foo' $ python3 -c 'import os; print(os.listdir(b"Mac"))' [b'IDLE', b'Makefile.in', b'Tools', b'README.orig', b'PythonLauncher', b'Icons', b'README', b'Extras.install.py', b'BuildScript', b'Resources'] It's more than just a matter of backwards compatibility, it's a matter of asymmetry of impact when the two possible design choices are wrong: * Using a hex based repr when an ASCII based repr is more appropriate is utterly unreadable * Using an ASCII based repr when a hex based repr is more appropriate is somewhat confusing This kind of thing is why the original "binary representation by default" design didn't survive the Python 3.0 development cycle - once people started trying it out, it quickly became evident that it was the wrong approach to take (if I remember the original implementation correctly, the repr was along the lines of "bytes([1, 2, 3, 4])" since there wasn't a bytes literal until after PEP 3137 was implemented). Making hex representations of binary data easier to produce is still a good idea, though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Sep 11 04:40:17 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Sep 2014 12:40:17 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: On 11 September 2014 11:36, Ron Adam wrote: > When working with hex data, I prefer the way hex editors do it. With pairs > of hex digits separated by a space. > > "50 79 74 68 6f 6e" b'Python' > > But I'm not sure there's a way to make that work cleanly. :-/ I realised (http://bugs.python.org/issue22385) we could potentially support that style through the string formatting syntax, using the precision field to specify the number of "bytes per chunk", along with a couple of the other existing formatting flags in the mini-language: format(b"xyz", "x") -> '78797a' format(b"xyz", "X") -> '78797A' format(b"xyz", "#x") -> '0x78797a' format(b"xyz", ".1x") -> '78 79 7a' format(b"abcdwxyz", ".4x") -> '61626364 7778797a' format(b"abcdwxyz", "#.4x") -> '0x61626364 0x7778797a' format(b"xyz", ",.1x") -> '78,79,7a' format(b"abcdwxyz", ",.4x") -> '61626364,7778797a' format(b"abcdwxyz", "#,.4x") -> '0x61626364,0x7778797a' Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wichert at wiggy.net Thu Sep 11 07:58:57 2014 From: wichert at wiggy.net (Wichert Akkerman) Date: Thu, 11 Sep 2014 07:58:57 +0200 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > On 11 Sep 2014, at 02:42, Chris Angelico wrote: > > On Thu, Sep 11, 2014 at 4:35 AM, Chris Lasher wrote: >> Unless printable representation of bytes objects appears as part of >> the language specification for Python 3, it's an implementation >> detail, thus, it is a candidate for change, especially if the BDFL >> wills it so. > > So this is all about the output of repr(), right? The question then > is: How important is backward compatibility with repr? Will there be > code breakage? It?s likely to break doctests at least. Wichert. From chris.lasher at gmail.com Thu Sep 11 08:39:16 2014 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 10 Sep 2014 23:39:16 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <87d2b353jc.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <87d2b353jc.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Sep 10, 2014 at 6:50 PM, Stephen J. Turnbull wrote: > Chris Lasher writes: > > > Okay, but a definite -1e6 from me on making my Python interpreter do > this: > > > > >>> my_packed_bytes = struct.pack('ffff', 3.544294848931151e-12, > > 1.853266900760489e+25, 1.6215185358725202e-19, 0.9742483496665955) > > >>> my_packed_bytes > > b'Why, Guido? Why?' > > If you actually have a struct, why aren't you wrapping > your_packed_bytes in a class that validates the struct and displays it > nicely formatted? Or, alternatively, simply replaces __repr__? > The point was to demonstrate that although text must be represented by bytes, not all bytes represent text. I have the bytes from four 32-bit floating point numbers, but repr() displays these bytes as ASCII characters. It looks like I wrote "Why, Guido? Why?" illustrating how implicit behavior that's "usually helpful" can be rather unhelpful. Explicitly showing the hexadecimal values is always accurate, because bytes are always bytes. > Your judgment doesn't rule, and it definitely doesn't have a weight of > 1e6. I meant the "-1e6" as a cheeky response, not as a reflection of the importance of my opinions or ideas. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.lasher at gmail.com Thu Sep 11 08:47:01 2014 From: chris.lasher at gmail.com (Chris Lasher) Date: Wed, 10 Sep 2014 23:47:01 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Let me start with this, from Nick: > This is not an acceptable change, for two reasons: > 1. It's a *major* compatibility break. It breaks single source Python 2/3 > development, it breaks doctests, it breaks user expectations. > Okay, breaking doctests, I can understand the negative impact. I'm willing to give up because of this. So, on account of the fragility of doctests, I suppose, yes, this proposal will never go through. And I feel that's a shame, because I was never a fan of doctests, either. Regarding user expectations, I've already stated, yes this continues with the expectations of experienced users, who won't stumble when they see ASCII in their bytes. For all other users, though, this behavior otherwise violates the principle of least astonishment. ("Why are there English characters in my bytes?") 2. It breaks the symmetry between the bytes literal format and their > representation. Symmetry is already broken for bytes literal format because the user is allowed to enter hex codes, even if they map onto printable ASCII characters: >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' b'Hello, World!' On Wed, Sep 10, 2014 at 7:35 PM, Nick Coghlan wrote: > > After posting that version, I realised actually making the proposed > change would be similarly straightforward, and better illustrate the > core problem with the idea: > > $ ./python -c 'import os; print(os.listdir(b"foo"))' > Traceback (most recent call last): > File "", line 1, in > FileNotFoundError: [Errno 2] No such file or directory: b'\x66\x6f\x6f' > $ ./python -c 'import os; print(os.listdir(b"Mac"))' > [b'\x49\x44\x4c\x45', b'\x4d\x61\x6b\x65\x66\x69\x6c\x65\x2e\x69\x6e', > b'\x54\x6f\x6f\x6c\x73', > b'\x52\x45\x41\x44\x4d\x45\x2e\x6f\x72\x69\x67', > b'\x50\x79\x74\x68\x6f\x6e\x4c\x61\x75\x6e\x63\x68\x65\x72', > b'\x49\x63\x6f\x6e\x73', b'\x52\x45\x41\x44\x4d\x45', > b'\x45\x78\x74\x72\x61\x73\x2e\x69\x6e\x73\x74\x61\x6c\x6c\x2e\x70\x79', > b'\x42\x75\x69\x6c\x64\x53\x63\x72\x69\x70\x74', > b'\x52\x65\x73\x6f\x75\x72\x63\x65\x73'] > You passed bytes ? not an ASCII string ? as an argument to os.listdir; it gave you back bytes, not ASCII strings. You _consented_ to bytes when you put the b'Mac' in there; therefore, you are responsible for decoding those bytes. Yes, all text must be represented an bytes to a computer, but not all bytes represent text. > It's more than just a matter of backwards compatibility, it's a matter > of asymmetry of impact when the two possible design choices are wrong: > > * Using a hex based repr when an ASCII based repr is more appropriate > is utterly unreadable > * Using an ASCII based repr when a hex based repr is more appropriate > is somewhat confusing > I prefer to unframe it from ASCII. The decision is (well, was) between: * A representation that is always accurate but sometimes inconvenient versus * A representation is convenient when it is accurate, but is not always accurate (and is inconvenient when it's inaccurate). Earlier, Nick, you wrote > > What you haven't said so far, however, and what I still don't know, is > > whether or not the core team has already tried providing a method on > bytes > > objects ? la the proposed .asciify() for projecting bytes as ASCII > > characters, and rejected that on the basis of it being too inconvenient > for > > the vast majority of Python use cases. That option was never really on the table, as once we decided back to > switch to a hybrid ASCII representation, the obvious design model to > use was the Python 2 str type, which has inherently hybrid behaviour, > and uses the literal form for the "obj == eval(repr(obj))" round trip. obj == eval(repr(obj)) round-trip behavior is not violated by the proposed change >>> r = repr(b'Hello, World!') "b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'" >>> b'Hello, World!' == eval(r) True -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Sep 11 08:55:00 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Sep 2014 16:55:00 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> Message-ID: <20140911065500.GG9293@ando.pearwood.info> On Wed, Sep 10, 2014 at 01:54:17PM +0200, Wolfgang Maier wrote: > On 09/10/2014 12:57 PM, Steven D'Aprano wrote: > >However, I do support Terry's suggestion that bytes (and, I presume, > >bytearray) grow some sort of easy way of displaying the bytes in hex. > >The trouble is, what do we actually want? > > > >b'Abc' --> '0x416263' > >b'Abc' --> '\x41\x62\x63' > > > >I can see use-cases for both. After less than two minutes of thought, it > >seems to me that perhaps the most obvious APIs for these two different > >representations are: > > > >hex(b'Abc') --> '0x416263' > > This would require a change in the documented > (https://docs.python.org/3/library/functions.html#hex) behavior of > hex(), which I think is quite a big deal for a relatively special case. Any new functionality is going to require a change to the documentation. Changing hex() is no more of a big deal than adding a new method. I'd call it *less* of a big deal. In Python 2, hex() calls the dunder method __hex__. That has been removed in Python 3. Does anyone know why? As I see it, hex() returns a hexadecimal representation of its argument as a string. That's exactly what we want in this case: we're taking an object which represents a block of integer-values, and want a human- readable hexadecimal representation. So hex() is, or ought to be, the obvious solution. As an alternative, if there was an easy, obvious way to convert the bytes b'Abc' (or b'\x41\x62\x63') to the int 4285027 (or 0x416263), then the obvious solution would be hex(int(b'Abc')) and it would require no changes to hex(). Of course the int() built-in isn't the right way to do this. -- Steven From ncoghlan at gmail.com Thu Sep 11 08:58:51 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Sep 2014 16:58:51 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 September 2014 16:47, Chris Lasher wrote: > You passed bytes ? not an ASCII string ? as an argument to os.listdir; it > gave you back bytes, not ASCII strings. You _consented_ to bytes when you > put the b'Mac' in there; therefore, you are responsible for decoding those > bytes. > > > Yes, all text must be represented an bytes to a computer, but not all bytes > represent text. Yes, we know. We debated this 8 years ago. We *tried it* 8 years ago. We found it to provide a horrible developer experience, so we changed it back to be closer to the way Python 2 works. Changing the default representation of binary data to something that we already decided didn't work (or at least its very close cousin) is not up for discussion. Providing better tools for easily producing hexadecimal representations is an excellent idea. Making developers explicitly request non-horrible output when working with binary APIs on POSIX systems is not. You can keep saying "but it's potentially confusing when it really is arbitrary binary data", and I'm telling you *that doesn't matter*. The consequences of flipping the default are worse, because it means defaulting to unreadable output from supported operating system interfaces, which *will* leak through to API consumers and potentially even end users. That's not OK, which means the status quo is the lesser of two evils. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Thu Sep 11 09:30:46 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Sep 2014 17:30:46 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> Message-ID: <20140911073046.GH9293@ando.pearwood.info> On Wed, Sep 10, 2014 at 03:37:03PM +0100, Paul Moore wrote: > On 10 September 2014 15:24, Ian Cordasco wrote: > >>> b'Abc'.decode('hexescapes') --> '\x41\x62\x63' > >> > >> > >> This, OTOH, looks elegant (avoids a new method) and clear (no doubt about > >> the returned type) to me. > >> +1 > > > > Another +0.5 for me. I think this is quite elegant and reasonable. I'm > > not sure it needs to be unicode though. Perhaps it's too early for me, > > but does turning that into a unicode string make sense? repr() returns a unicode string. hex(), oct() and bin() return unicode strings. The intent is to return a human-readable representation of a binary object, that is, a string from a bytes object. So, yes, a unicode string makes sense. > It's easy enough to do by hand: > > >>> print(''.join("\\x{:02x}".format(c) for c in b'Abc')) > \x41\x62\x63 > > And you get any other format you like, just by changing the format > string in there, or the string you join on: > > >>> print(':'.join("{:02x}".format(c) for c in b'Abc')) > 41:62:63 > > Not every one-liner needs to be a builtin... Until your post just now, there has probably never been anyone anywhere who wanted to display b'Abc' as "41:62:63", and there probably never will be again. For such a specialised use-case, it's perfectly justified to reject a request for such a colon-delimited hex function with "not every one-liner...". But displaying bytes as either "0x416263" or "\x41\x62\x63" hex format is not so obscure, especially if you consider pedagogical uses. For that, your one-liner is hardly convenient: you have to manually walk the bytes objects, extracting one byte at a time, format it, debug the inevitable mistake in the formatting code *wink*, then join all the substrings. The complexity of the code (little as it is for an expert) is enough to distract from the pedagogical message, and not quite trivially simple to get right if you aren't a heavy user of string formatting codes. Converting byte strings to a hex representation is quite a common thing to do, as witnessed by the (at least) five different ways to do it: http://bugs.python.org/msg226731 none of which are really obvious or convenient. Hence the long- outstanding request for this. (At least four years now.) -- Steven From p.f.moore at gmail.com Thu Sep 11 11:44:02 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 11 Sep 2014 10:44:02 +0100 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140911073046.GH9293@ando.pearwood.info> References: <20140910105732.GE9293@ando.pearwood.info> <20140911073046.GH9293@ando.pearwood.info> Message-ID: On 11 September 2014 08:30, Steven D'Aprano wrote: >> >>> print(':'.join("{:02x}".format(c) for c in b'Abc')) >> 41:62:63 >> >> Not every one-liner needs to be a builtin... > > Until your post just now, there has probably never been anyone anywhere > who wanted to display b'Abc' as "41:62:63", and there probably never > will be again. For such a specialised use-case, it's perfectly justified > to reject a request for such a colon-delimited hex function with "not > every one-liner...". So I picked a bad example. Sorry. Someone (sorry, I can't recall who) did ask for >>> print(' '.join("{:02x}".format(c) for c in b'Abc')) 41 62 63 My point is that a simple pattern is flexible, a specific method has to pick one "obvious" representation, and there have been a number of representations discussed here. Paul From steve at pearwood.info Thu Sep 11 11:51:01 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Sep 2014 19:51:01 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140911095101.GJ9293@ando.pearwood.info> On Wed, Sep 10, 2014 at 11:47:01PM -0700, Chris Lasher wrote: > Regarding user expectations, I've already stated, yes this continues with > the expectations of experienced users, who won't stumble when they see > ASCII in their bytes. For all other users, though, this behavior otherwise > violates the principle of least astonishment. ("Why are there English > characters in my bytes?") That's easy to explain: "Because Python gets used for many programming tasks where ASCII text is mixed in with arbitrary bytes, as a convenience for those programmers, as well as backward compatibility with the Bad Old Days, Python defaults to show bytes as if they were ASCII text. But they're not, of course, under the hood they're just numbers between 0 and 255, or 0 and 0xFF in hexadecimal, and you can see that by calling the hexify() method." There are plenty of other areas of Python where decisions are made that are not necessarily ideal from a teaching standpoint. We cope. -- Steven From steve at pearwood.info Thu Sep 11 11:52:20 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Sep 2014 19:52:20 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <87bnqm6guo.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> <87bnqm6guo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140911095220.GK9293@ando.pearwood.info> On Thu, Sep 11, 2014 at 11:17:19AM +0900, Stephen J. Turnbull wrote: > It's all about convenience of representation for particular > applications, end of story. A repr that obfuscates the content in the > "ASCII segment" set of applications *might* be preferable for teaching > applications, but I'm not even sure of that. I think it is telling that hex editors, as a general rule, display byte data (i.e. the content of files) as both hex and ASCII. Real-world data is messy, and there are many cases where we want to hunt through an otherwise binary file looking for sequences of ASCII characters. Or visa versa. That's inherently mixing the concepts of text and bytes, but it needs to be done sometimes. I am sad that the default representation of bytes displays ASCII, but I am also convinced that as regrettable as that choice is, the opposite choice would be even more regrettable. So I will be satisfied by an obvious way to display the hexified representation of a byte-string, even if that way is not repr(). -- Steven From lkb.teichmann at gmail.com Thu Sep 11 15:42:43 2014 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Thu, 11 Sep 2014 15:42:43 +0200 Subject: [Python-ideas] Yielding from the command line Message-ID: Hi List, I'm currently trying to convince my company that asyncio is a great thing. After a lot of critique, the newest thing is, people complain: I cannot test my code on the command line! And indeed they are right, a simple a = yield from some_coroutine() is not possible on the command line, and doesn't make sense. Wait a minute, really? Well, it could make sense, in an asyncio-based command line. I am thinking about a python interpreter whose internal loop is something like @coroutine def commandline(): while True: cmd = yield from input_async() code = compile(cmd, "", "generator") yield from exec(code) A new compile mode would allow to directly, always create a generator, and exec should be certainly be able to handle this. I think this would not only make people happy that want to test code on the command line, but also all those people developing command line-GUI combinations (IPython comes to mind), which have to keep several event loops in sync. Greetings Martin From encukou at gmail.com Thu Sep 11 16:04:58 2014 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 11 Sep 2014 16:04:58 +0200 Subject: [Python-ideas] Yielding from the command line In-Reply-To: References: Message-ID: On Thu, Sep 11, 2014 at 3:42 PM, Martin Teichmann wrote: > Hi List, > > I'm currently trying to convince my company that asyncio is a great > thing. After a lot of critique, the newest thing is, people complain: > I cannot test my code on the command line! And indeed they are > right, a simple > > a = yield from some_coroutine() > > is not possible on the command line, and doesn't make sense. For running asyncio code from a non-asyncio context (in tests or on the REPL), you can use: def run_sync(future): return asyncio.get_event_loop().run_until_complete(future) a = run_sync(some_coroutine()) > Wait a minute, really? > > Well, it could make sense, in an asyncio-based command line. > I am thinking about a python interpreter whose internal loop is > something like > > @coroutine > def commandline(): > while True: > cmd = yield from input_async() > code = compile(cmd, "", "generator") > yield from exec(code) > > A new compile mode would allow to directly, always create a > generator, and exec should be certainly be able to handle this. > > I think this would not only make people happy that want to test > code on the command line, but also all those people developing > command line-GUI combinations (IPython comes to mind), > which have to keep several event loops in sync. That sounds like something you could experiment with in a module on pypi. From abarnert at yahoo.com Thu Sep 11 17:31:53 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Sep 2014 08:31:53 -0700 Subject: [Python-ideas] Yielding from the command line In-Reply-To: References: Message-ID: On Sep 11, 2014, at 6:42, Martin Teichmann wrote: > Hi List, > > I'm currently trying to convince my company that asyncio is a great > thing. After a lot of critique, the newest thing is, people complain: > I cannot test my code on the command line! And indeed they are > right, a simple > > a = yield from some_coroutine() > > is not possible on the command line, and doesn't make sense. > > Wait a minute, really? > > Well, it could make sense, in an asyncio-based command line. > I am thinking about a python interpreter whose internal loop is > something like > > @coroutine > def commandline(): > while True: > cmd = yield from input_async() > code = compile(cmd, "", "generator") > yield from exec(code) I would love to see this. I'm not sure if I'd love it in practice or not, but until someone implements it and I can play with it I'm not sure how I'd become sure. So... You just volunteered, right? Go build it and put it on PyPI, I want it and I'll be your best friend forever and ever no takebacks if you do it. :) > A new compile mode would allow to directly, always create a > generator, and exec should be certainly be able to handle this. > > I think this would not only make people happy that want to test > code on the command line, but also all those people developing > command line-GUI combinations (IPython comes to mind), > which have to keep several event loops in sync. If it also came with builtin wrappers to embed all the popular GUI event loops in asyncio-style coroutines (or, hell, I'd be happy with just Tkinter, or the native Cocoa runloop) and automatically toss them into the main event loop, that could make interactively experimenting with GUIs as easy as it is in Smalltalk (well, except for not being able to turn your experiment into a persistent image). But that one seems like more work than just making the command prompt event-driven and coro-friendly. From ron3200 at gmail.com Thu Sep 11 18:42:24 2014 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Sep 2014 11:42:24 -0500 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: On 09/10/2014 09:40 PM, Nick Coghlan wrote: > On 11 September 2014 11:36, Ron Adam wrote: >> >When working with hex data, I prefer the way hex editors do it. With pairs >> >of hex digits separated by a space. >> > >> > "50 79 74 68 6f 6e" b'Python' >> > >> >But I'm not sure there's a way to make that work cleanly. :-/ > I realised (http://bugs.python.org/issue22385) we could potentially > support that style through the string formatting syntax, using the > precision field to specify the number of "bytes per chunk", along with > a couple of the other existing formatting flags in the mini-language: > > format(b"xyz", "x") -> '78797a' > format(b"xyz", "X") -> '78797A' > format(b"xyz", "#x") -> '0x78797a' > > format(b"xyz", ".1x") -> '78 79 7a' > format(b"abcdwxyz", ".4x") -> '61626364 7778797a' > format(b"abcdwxyz", "#.4x") -> '0x61626364 0x7778797a' > > format(b"xyz", ",.1x") -> '78,79,7a' > format(b"abcdwxyz", ",.4x") -> '61626364,7778797a' > format(b"abcdwxyz", "#,.4x") -> '0x61626364,0x7778797a' Is there a way to go in the other direction? That is these other hex formats to bytes? Ron From tjreedy at udel.edu Thu Sep 11 19:09:51 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 11 Sep 2014 13:09:51 -0400 Subject: [Python-ideas] Yielding from the command line In-Reply-To: References: Message-ID: On 9/11/2014 9:42 AM, Martin Teichmann wrote: > Hi List, > > I'm currently trying to convince my company that asyncio is a great > thing. After a lot of critique, the newest thing is, people complain: > I cannot test my code on the command line! I view this as a flimsy excuse, not a reason. (Asyncio can be hard to wrap one's head around, but what do they propose as the alternative for the things asyncio does well?) Do the same people oppose 'yield' (and generators) and nonlocal (and writable closures) for the same reason? >>> yield from a SyntaxError: 'yield' outside function >>> yield x SyntaxError: 'yield' outside function >>> nonlocal z SyntaxError: nonlocal declaration not allowed at module level > I am thinking about a python interpreter whose internal loop is > something like > > @coroutine > def commandline(): > while True: > cmd = yield from input_async() > code = compile(cmd, "", "generator") > yield from exec(code) > > A new compile mode would allow to directly, always create a > generator, and exec should be certainly be able to handle this. Alternate command line or interactive input interpretation could well make sense, but it should stand on its own with real use cases. > I think this would not only make people happy that want to test > code on the command line, but also all those people developing > command line-GUI combinations (IPython comes to mind), > which have to keep several event loops in sync. -- Terry Jan Reedy From chris.lasher at gmail.com Thu Sep 11 19:51:38 2014 From: chris.lasher at gmail.com (Chris Lasher) Date: Thu, 11 Sep 2014 10:51:38 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910104830.5b53279f@anarchist.wooz.org> <87egvj5s3b.fsf@uwakimon.sk.tsukuba.ac.jp> <5791553B-CA5F-47EB-AB55-24BE57A6CA92@yahoo.com> Message-ID: On Thu, Sep 11, 2014 at 9:42 AM, Ron Adam wrote: > > > On 09/10/2014 09:40 PM, Nick Coghlan wrote: > >> On 11 September 2014 11:36, Ron Adam wrote: >> >>> >When working with hex data, I prefer the way hex editors do it. With >>> pairs >>> >of hex digits separated by a space. >>> > >>> > "50 79 74 68 6f 6e" b'Python' >>> > >>> >But I'm not sure there's a way to make that work cleanly. :-/ >>> >> I realised (http://bugs.python.org/issue22385) we could potentially >> support that style through the string formatting syntax, using the >> precision field to specify the number of "bytes per chunk", along with >> a couple of the other existing formatting flags in the mini-language: >> >> format(b"xyz", "x") -> '78797a' >> format(b"xyz", "X") -> '78797A' >> format(b"xyz", "#x") -> '0x78797a' >> >> format(b"xyz", ".1x") -> '78 79 7a' >> format(b"abcdwxyz", ".4x") -> '61626364 7778797a' >> format(b"abcdwxyz", "#.4x") -> '0x61626364 0x7778797a' >> >> format(b"xyz", ",.1x") -> '78,79,7a' >> format(b"abcdwxyz", ",.4x") -> '61626364,7778797a' >> format(b"abcdwxyz", "#,.4x") -> '0x61626364,0x7778797a' >> > > Is there a way to go in the other direction? That is these other hex > formats to bytes? > > Yes, for the forms not prefixed with '0x': >>> bytes.fromhex('78797A') b'xyz' >>> bytes.fromhex('78797a') b'xyz' >>> bytes.fromhex('78 79 7a') b'xyz' >>> bytes.fromhex('0x78797a') Traceback (most recent call last): File "", line 1, in bytes.fromhex('0x78797a') ValueError: non-hexadecimal number found in fromhex() arg at position 0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Sep 11 20:19:52 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 11 Sep 2014 14:19:52 -0400 Subject: [Python-ideas] Yielding from the command line In-Reply-To: References: Message-ID: On 9/11/2014 11:31 AM, Andrew Barnert wrote: > If it also came with builtin wrappers to embed all the popular GUI > event loops in asyncio-style coroutines (or, hell, I'd be happy with > just Tkinter, or the native Cocoa runloop) and automatically toss > them into the main event loop, that could make interactively > experimenting with GUIs as easy as it is in Smalltalk (well, except > for not being able to turn your experiment into a persistent image). One can interactactively experiment with the visual aspects of tkinter now and at least some dynamic behavior. At console interpreter or Idle (prompts deleted, only fully tested with Idle) import tkinter as tk root = tk.Tk() # empty window displayed def click(): print('clicked') b = tk.Button(root, text='click', command=click) b.pack() #default button default packed # click button -- button visibly changes,'clicked' is printed on shell) # experiment with Button options that affect appearance b.destroy() # button removed I presume full behavior requires the call to root.mainloop(). This has two problems for continued interaction. First, the call blocks until the window is closed, making further entry impossible through normal means. If that were solved with a 'noblock' option, there would still be the problem of getting shell input to a callback that could, on demand, execute to code to modify the tk app. The solution would have to be different for the console interpreter, there tkinter is running in the same process and Idle, where tkinter is running is a separate process. It would probably be easier to create an event handler that would pop up a separate text window and exec the code entered. -- Terry Jan Reedy From lkb.teichmann at gmail.com Fri Sep 12 09:33:04 2014 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Fri, 12 Sep 2014 09:33:04 +0200 Subject: [Python-ideas] Yielding from the command line Message-ID: Hi Terry, Hi List, > I presume full behavior requires the call to root.mainloop(). This has two > problems for continued interaction. First, the call blocks until the window > is closed, making further entry impossible through normal means. If that > were solved with a 'noblock' option, there would still be the problem of > getting shell input to a callback that could, on demand, execute to code to > modify the tk app. The solution would have to be different for the console > interpreter, there tkinter is running in the same process and Idle, where > tkinter is running is a separate process. You just gave a good reasoning for the advantages of asyncio. Because once we have an asyncio-aware version of tkinter - and an asyncio-aware command line, this is what I am proposing - all the problems you just described disappear. So, I would call that a good use case for my idea. A tkinter-aware commandline would then just look like: set_event_loop(TKInterEventLoop()) async(commandline()) # i.e. the coroutine defined in my last post get_event_loop().run_forever() # this calls root.mainloop() Greetings Martin From guido at python.org Fri Sep 12 17:01:35 2014 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Sep 2014 08:01:35 -0700 Subject: [Python-ideas] Yielding from the command line In-Reply-To: References: Message-ID: While it may be possible to build "yield from" into a custom read-eval-print loop (REPL), that's tricky because the built-in REPL in written in C. the quickest way to success is definitely the helper function shown in the first response. On Fri, Sep 12, 2014 at 12:33 AM, Martin Teichmann wrote: > Hi Terry, Hi List, > > > I presume full behavior requires the call to root.mainloop(). This has > two > > problems for continued interaction. First, the call blocks until the > window > > is closed, making further entry impossible through normal means. If that > > were solved with a 'noblock' option, there would still be the problem of > > getting shell input to a callback that could, on demand, execute to code > to > > modify the tk app. The solution would have to be different for the > console > > interpreter, there tkinter is running in the same process and Idle, where > > tkinter is running is a separate process. > > You just gave a good reasoning for the advantages of asyncio. Because once > we have an asyncio-aware version of tkinter - and an asyncio-aware command > line, this is what I am proposing - all the problems you just described > disappear. So, I would call that a good use case for my idea. > > A tkinter-aware commandline would then just look like: > > set_event_loop(TKInterEventLoop()) > async(commandline()) # i.e. the coroutine defined in my last post > get_event_loop().run_forever() # this calls root.mainloop() > > Greetings > > Martin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From paultag at gmail.com Fri Sep 12 17:10:09 2014 From: paultag at gmail.com (Paul Tagliamonte) Date: Fri, 12 Sep 2014 11:10:09 -0400 Subject: [Python-ideas] Yielding from the command line In-Reply-To: References: Message-ID: <20140912150932.GA12368@helios.pault.ag> On Fri, Sep 12, 2014 at 08:01:35AM -0700, Guido van Rossum wrote: > While it may be possible to build "yield from" into a custom > read-eval-print loop (REPL), that's tricky because the built-in REPL in > written in C. the quickest way to success is definitely the helper > function shown in the first response. While true, you can extend the REPL in Python by extending code.InteractiveConsole and using its `.interact` method. Sounds like a neat third party extension. I'd use it. Cheers, Paul -- :wq -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: Digital signature URL: From alexander.belopolsky at gmail.com Fri Sep 12 17:12:30 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 12 Sep 2014 11:12:30 -0400 Subject: [Python-ideas] Yielding from the command line In-Reply-To: References: Message-ID: On Fri, Sep 12, 2014 at 11:01 AM, Guido van Rossum wrote: > > that's tricky because the built-in REPL in written in C. CPython comes with a REPL emulation in the (pure Python) code module: https://docs.python.org/3/library/code.html one can customize it by subclassing code.InteractiveInterpreter. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Sep 12 18:20:43 2014 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Sep 2014 09:20:43 -0700 Subject: [Python-ideas] Yielding from the command line In-Reply-To: <20140912150932.GA12368@helios.pault.ag> References: <20140912150932.GA12368@helios.pault.ag> Message-ID: But that module and class work different in may ways from the built-in REPL. Anyway, don't most people use IPython these days? It should be easier to add there. On Fri, Sep 12, 2014 at 8:10 AM, Paul Tagliamonte wrote: > On Fri, Sep 12, 2014 at 08:01:35AM -0700, Guido van Rossum wrote: > > While it may be possible to build "yield from" into a custom > > read-eval-print loop (REPL), that's tricky because the built-in REPL > in > > written in C. the quickest way to success is definitely the helper > > function shown in the first response. > > While true, you can extend the REPL in Python by extending > code.InteractiveConsole and using its `.interact` method. > > Sounds like a neat third party extension. I'd use it. > > > Cheers, > Paul > > -- > :wq > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Sep 12 19:25:27 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 13 Sep 2014 03:25:27 +1000 Subject: [Python-ideas] Yielding from the command line In-Reply-To: References: <20140912150932.GA12368@helios.pault.ag> Message-ID: <20140912172527.GM9293@ando.pearwood.info> On Fri, Sep 12, 2014 at 09:20:43AM -0700, Guido van Rossum wrote: > Anyway, don't most people use IPython these days? Not so far as I can see from the questions asked on the tutor and python-list mailing lists. Windows users seem to mostly use IDLE, Linux users the vanilla Python interactive interpreter, and when a Mac user asks a question I normally faint from the shock and don't notice what they're using :-) According to a recent thread on Reddit, all the cool kids are using PyCharm, which (I think) has its own REPL. I have no doubt that there are IPython users, probably mostly in the scientific community, but if they ask questions on the two mailing lists above, they're not often copying and pasting IPython sessions into their posts. -- Steven From guido at python.org Fri Sep 12 20:01:40 2014 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Sep 2014 11:01:40 -0700 Subject: [Python-ideas] Yielding from the command line In-Reply-To: <20140912172527.GM9293@ando.pearwood.info> References: <20140912150932.GA12368@helios.pault.ag> <20140912172527.GM9293@ando.pearwood.info> Message-ID: So certainly the two-line helper function should be in the docs, as it works across all those different REPLs. On Fri, Sep 12, 2014 at 10:25 AM, Steven D'Aprano wrote: > On Fri, Sep 12, 2014 at 09:20:43AM -0700, Guido van Rossum wrote: > > > Anyway, don't most people use IPython these days? > > Not so far as I can see from the questions asked on the tutor and > python-list mailing lists. Windows users seem to mostly use IDLE, > Linux users the vanilla Python interactive interpreter, and when a Mac > user asks a question I normally faint from the shock and don't notice > what they're using :-) > > According to a recent thread on Reddit, all the cool kids are using > PyCharm, which (I think) has its own REPL. > > I have no doubt that there are IPython users, probably mostly in the > scientific community, but if they ask questions on the two mailing lists > above, they're not often copying and pasting IPython sessions into their > posts. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 12 23:51:05 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Sep 2014 14:51:05 -0700 Subject: [Python-ideas] Yielding from the command line In-Reply-To: <20140912172527.GM9293@ando.pearwood.info> References: <20140912150932.GA12368@helios.pault.ag> <20140912172527.GM9293@ando.pearwood.info> Message-ID: On Sep 12, 2014, at 10:25, Steven D'Aprano wrote: > On Fri, Sep 12, 2014 at 09:20:43AM -0700, Guido van Rossum wrote: > >> Anyway, don't most people use IPython these days? > > Not so far as I can see from the questions asked on the tutor and > python-list mailing lists. It's a lot more common on StackOverflow than on the lists for some reason. IDLE, PyDev, and especially PyCharm are definitely growing faster among askers, but there's still a good amount of IPython, and not just in the numpy questions. Also, most of the prolific answerers seem to use IPython, although it's less visible than it used to be, because everyone got sick of all the followup questions saying "I got a syntax error on that 'In [3]:' part." > Windows users seem to mostly use IDLE, > Linux users the vanilla Python interactive interpreter, and when a Mac > user asks a question I normally faint from the shock and don't notice > what they're using :-) Most of the Python devs I know use Macs. And there are definitely more Mac questions than Linux on StackOverflow. But half of them are trying to figure out how to deal with the 2 extra copies of Python 2.7 they installed because of 5 blog posts all telling them that Apple still only ships 2.5 even though it's 2008 and offering different solutions to that burning problem. Since most of them are determined to avoid learning what $PATH means, they don't get to the point of having code to ask about beyond "import spam" failing even though "pip install spam" succeeded. From sturla.molden at gmail.com Sat Sep 13 17:20:56 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 13 Sep 2014 15:20:56 +0000 (UTC) Subject: [Python-ideas] Yielding from the command line References: <20140912150932.GA12368@helios.pault.ag> <20140912172527.GM9293@ando.pearwood.info> Message-ID: <2009130136432313972.107009sturla.molden-gmail.com@news.gmane.org> Steven D'Aprano wrote: > I have no doubt that there are IPython users, probably mostly in the > scientific community, but if they ask questions on the two mailing lists > above, they're not often copying and pasting IPython sessions into their > posts. The "big thing" in the scientific community today is the IPython Notebook. http://ipython.org/notebook.html When a notebook is created and e.g. uploaded to GitHub, it can be shared using the notebook viewer: http://nbviewer.ipython.org Sharing a notebook is clearly preferred to pasting an IPython session. Sturla From skip at pobox.com Sat Sep 13 17:40:32 2014 From: skip at pobox.com (Skip Montanaro) Date: Sat, 13 Sep 2014 10:40:32 -0500 Subject: [Python-ideas] Yielding from the command line In-Reply-To: <2009130136432313972.107009sturla.molden-gmail.com@news.gmane.org> References: <20140912150932.GA12368@helios.pault.ag> <20140912172527.GM9293@ando.pearwood.info> <2009130136432313972.107009sturla.molden-gmail.com@news.gmane.org> Message-ID: IPython is moving beyond just Python as well. From the ipython.org site: *We ship the official IPython kernel, but kernels for other languages such as Julia and Haskell are actively developed and used. Additionally, the IPython kernel supports multi-language integration, letting you for example mix Python code with Cython, R, Octave, and scripting in Bash, Perl or Ruby.* Skip -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Sun Sep 14 20:48:04 2014 From: gokoproject at gmail.com (John Wong) Date: Sun, 14 Sep 2014 14:48:04 -0400 Subject: [Python-ideas] Bring namedtuple's __str__ and __repr__ behavior to regular classes Message-ID: Hi, >>> from collections import namedtuple >>> A = namedtuple("A", ["foo"]) >>> print(A(foo=1)) A(foo=1) >>> str(A(foo=1)) 'A(foo=1)' >>> repr(A(foo=1)) 'A(foo=1)' The relevant code is https://hg.python.org/cpython/file/2.7/Lib/collections.py#l356 I propose we bring the behavior to regular classes. Instead of >>> class A(object): ... def __init__(self): ... self.foo = 1 ... >>> repr(A()) '<__main__.A object at 0x1090c0990>' We should be able to see the current values to the display. >>> repr(A()) 'A(foo=1)' Reasons: 1. Helps debugging (via pdb, print and logging). We no longer have to do A().foo to find out. 2. I don't know how often people actually rely on repr(A()) or str(A()) and parse the string so breaking compatibility is, probably low. 3. People who wish to define their own repr and str is welcome. Django model for example has a more explicit representation by default (although many Django users do redefine the representation on their own). datetime.datetime by default, as a library, is also explicit. So customization will come. The main challenge: Where and how do we actually look for what attributes are relevant? namedtuple can do it because it has __slot__ and we know in advance how many attributes are set. In regular class, we deal with dynamic attribute setting, single and inheritances. I don't have an answer for this simply because I lack of experience. We can certainly start with the attributes set in the main instance and one level up in the inheritance chain. Other issues: 1. What if there are too many attributes? I don't think the number will explode beyond 30. I choose this number out of thin air. I can do more research on this. It doesn't actually hurt to see everything. If you do have a class with so many attribute (whether you have this many to begin with, or because you allow aritbary numbers of attributes to be set -- for example, a document from a collection in NoSQL like MongoDB), that's still very useful. We could limit by default up to how many. 2. How do we order them? We can order them in unsorted or sorted order. I prefer the sorted order. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Sep 14 21:00:24 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 15 Sep 2014 05:00:24 +1000 Subject: [Python-ideas] Bring namedtuple's __str__ and __repr__ behavior to regular classes In-Reply-To: References: Message-ID: On Mon, Sep 15, 2014 at 4:48 AM, John Wong wrote: >>>> class A(object): > ... def __init__(self): > ... self.foo = 1 > ... >>>> repr(A()) > '<__main__.A object at 0x1090c0990>' > > > We should be able to see the current values to the display. >>>> repr(A()) > 'A(foo=1)' Start with this: class object(object): def __repr__(self): return whatever_you_want_to_do Then whenever you subclass object in this module, you'll subclass your own subclass of object, and get your own repr. That's something that will work on all versions of Python, including 2.x which isn't going to get any changes like this. It's perfectly safe - it can't break anyone's code but your own - and if you stick that at the top of the file (or in another file and "from utils import object"), you don't have to change anything else, assuming you're explicitly subclassing object everywhere. ChrisA From rosuav at gmail.com Sun Sep 14 21:08:50 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 15 Sep 2014 05:08:50 +1000 Subject: [Python-ideas] Bring namedtuple's __str__ and __repr__ behavior to regular classes In-Reply-To: References: Message-ID: By the way: On Mon, Sep 15, 2014 at 4:48 AM, John Wong wrote: > The main challenge: > > Where and how do we actually look for what attributes are relevant? > namedtuple can do it because it has __slot__ and we know in advance how many > attributes are set. In regular class, we deal with dynamic attribute > setting, single and inheritances. I don't have an answer for this simply > because I lack of experience. We can certainly start with the attributes set > in the main instance and one level up in the inheritance chain. This is a fundamentally hard problem. Obviously it's easy to see what attributes are set, but figuring out which are relevant is usually a job for the class itself. So what you might want to do is have a class attribute, and then have your custom __repr__ scan through all of __bases__, collecting up these "important attributes". Something like this: class Point2D(object): show_in_repr = "x", "y" ... class Point3D(Point2D): show_in_repr = "z" ... Then the repr for object could follow the chain, pick up all the show_in_repr class attributes, and even use that for the order (parents first, in inheritance order, then this class's attributes). But at this point, you're definitely in the realm of custom code, not changes to the language. Which is good, because you'll likely change your mind about the details, and it's easy to recode your top-level inherit :) ChrisA From lkb.teichmann at gmail.com Sun Sep 14 23:24:40 2014 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Sun, 14 Sep 2014 23:24:40 +0200 Subject: [Python-ideas] Yielding from the command line Message-ID: Hi everyone, since there seemed to be some interest in my idea of a asyncio-enabled command line, I just sat down and wrote it. I submitted the parts that would need to go into CPython as Issue 22412 to the Python bug tracker. I added a simple command line interpreter, based on code.InteractiveConsole, which will allow for uses like >>> from asyncio import sleep >>> yield from sleep(10) The following code is mostly a copy of InteractiveConsole, with the appropriate yield froms stuck in (and comments removed. Yeah!) Greeting Martin Code follows: from asyncio import get_event_loop, coroutine, input from code import InteractiveConsole import sys class AsyncConsole(InteractiveConsole): def __init__(self, locals=None, filename=""): super().__init__(locals, filename) self.compile.compiler.flags |= 0x1000 @coroutine def runsource(self, source, filename="", symbol="single"): try: code = self.compile(source, filename, symbol) except (OverflowError, SyntaxError, ValueError): self.showsyntaxerror(filename) return False if code is None: return True yield from self.runcode(code) return False @coroutine def runcode(self, code): try: yield from eval(code, self.locals) except SystemExit: raise except: self.showtraceback() @coroutine def push(self, line): self.buffer.append(line) source = "\n".join(self.buffer) more = yield from self.runsource(source, self.filename) if not more: self.resetbuffer() return more @coroutine def interact(self, banner=None): try: sys.ps1 except AttributeError: sys.ps1 = ">>> " try: sys.ps2 except AttributeError: sys.ps2 = "... " cprt = 'Type "help", "copyright", "credits" or "license" for more information.' if banner is None: self.write("Python %s on %s\n%s\n(%s)\n" % (sys.version, sys.platform, cprt, self.__class__.__name__)) elif banner: self.write("%s\n" % str(banner)) more = 0 while 1: try: if more: prompt = sys.ps2 else: prompt = sys.ps1 try: line = yield from input(prompt) except EOFError: self.write("\n") break else: more = yield from self.push(line) except KeyboardInterrupt: self.write("\nKeyboardInterrupt\n") self.resetbuffer() more = 0 except SystemExit: return if __name__ == "__main__": console = AsyncConsole() get_event_loop().run_until_complete(console.interact()) From ncoghlan at gmail.com Sun Sep 14 23:56:17 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Sep 2014 09:56:17 +1200 Subject: [Python-ideas] Bring namedtuple's __str__ and __repr__ behavior to regular classes In-Reply-To: References: Message-ID: On 15 Sep 2014 05:09, "Chris Angelico" wrote: > > By the way: > > On Mon, Sep 15, 2014 at 4:48 AM, John Wong wrote: > > The main challenge: > > > > Where and how do we actually look for what attributes are relevant? > > namedtuple can do it because it has __slot__ and we know in advance how many > > attributes are set. In regular class, we deal with dynamic attribute > > setting, single and inheritances. I don't have an answer for this simply > > because I lack of experience. We can certainly start with the attributes set > > in the main instance and one level up in the inheritance chain. > > This is a fundamentally hard problem. Obviously it's easy to see what > attributes are set, but figuring out which are relevant is usually a > job for the class itself. So what you might want to do is have a class > attribute, and then have your custom __repr__ scan through all of > __bases__, collecting up these "important attributes". More generally, this is the kind of situation where we're more likely to provide better tools for *writing* these kinds of representations, rather than providing them by default (the latter approach is too likely to go wrong). There are currently two such key plumbing modules: pprint: https://docs.python.org/3/library/pprint.html reprlib: https://docs.python.org/3/library/reprlib.html Redesigning the way pprint works to make it easier to customise is what ?ukasz had in mind when writing PEP 443 to add functools.singledispatch to Python 3.4. It also makes it easier to write your own pretty printers and custom repr functions that fall back to object introspection to provide more details. reprlib is currently focused on providing object representations that aren't overwhelmingly long, even for large (or recursively nested) containers. However, I seem to recall also seeing proposals on the tracker to adding functions there that make it easier to emit named tuple style representations for objects. More generally, the idea of a "functionally equivalent named tuple" is also relevant to implementing hashing, equality comparisons and ordering operations in a sensible way, so there's actually potential here for a third party module dedicated specifically to making it easier to write classes that behave that way. Regards, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Sep 15 03:03:13 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 14 Sep 2014 21:03:13 -0400 Subject: [Python-ideas] Bring namedtuple's __str__ and __repr__ behavior to regular classes In-Reply-To: References: Message-ID: On 9/14/2014 3:08 PM, Chris Angelico wrote: > By the way: > > On Mon, Sep 15, 2014 at 4:48 AM, John Wong wrote: >> The main challenge: >> >> Where and how do we actually look for what attributes are relevant? >> namedtuple can do it because it has __slot__ and we know in advance how many >> attributes are set. In regular class, we deal with dynamic attribute >> setting, single and inheritances. I don't have an answer for this simply >> because I lack of experience. We can certainly start with the attributes set >> in the main instance and one level up in the inheritance chain. > > This is a fundamentally hard problem. Obviously it's easy to see what > attributes are set, but figuring out which are relevant is usually a > job for the class itself. There is also the problem that the representation of even one value can be arbitrarily long. Named tuples, as used in the stdlib, usually have a relatively small number of fields with values with (currently) short representations. That is not so in general. A third problem is that the change would apply recursively. If a named tuple now has an member, the representation of that would expand greatly, and so would the named-tuple representation and that of anything that contains a named tuple. -- Terry Jan Reedy From tjreedy at udel.edu Mon Sep 15 10:12:31 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 15 Sep 2014 04:12:31 -0400 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError Message-ID: One of the problems with new Python programmers using 3.x is that they first read 'print x' in 2.x based material, try 'print x' in 3.x, get "SyntaxError: invalid syntax" (note the uninformative redundant message), and go "huh?" or worse. Would it be possible to add detect this particular error and print a more useful message? I am thinking of something of something like SyntaxError: calling the 'print' function requires ()s, as in "print(x)" or maybe SyntaxError: did you mean "print(...)"? I was 'inspired' by a recent SO question https://stackoverflow.com/questions/24273599/idle-gui-is-unable-to-give-output which was closed as a duplicate of the 2009 question https://stackoverflow.com/questions/826948/syntax-error-on-print-with-python-3 I imagine that there have been other duplicates. The same question (and answer) has appeared multiple times on python-list also. If we do this, I am sure someone will ask why we do not automatically 'fix' the error. One answer would be that the closing ) is needed to determine the intended end of the call. A longer version would be that if we insert (, we are just guessing that the insertion is correct and we still would not know, without guessing, where to put the ). -- Terry Jan Reedy From rosuav at gmail.com Mon Sep 15 10:23:59 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 15 Sep 2014 18:23:59 +1000 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError In-Reply-To: References: Message-ID: On Mon, Sep 15, 2014 at 6:12 PM, Terry Reedy wrote: > One of the problems with new Python programmers using 3.x is that they first > read 'print x' in 2.x based material, try 'print x' in 3.x, get > "SyntaxError: invalid syntax" (note the uninformative redundant message), > and go "huh?" or worse. > > Would it be possible to add detect this particular error and print a more > useful message? I am thinking of something of something like > SyntaxError: calling the 'print' function requires ()s, as in "print(x)" > or maybe > SyntaxError: did you mean "print(...)"? You mean like this? rosuav at dewey:~$ python3 Python 3.4.1+ (default, Aug 18 2014, 12:06:51) [GCC 4.9.1] on linux Type "help", "copyright", "credits" or "license" for more information. >>> print "Hello?" File "", line 1 print "Hello?" ^ SyntaxError: Missing parentheses in call to 'print' ChrisA From steve at pearwood.info Mon Sep 15 10:43:13 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 15 Sep 2014 18:43:13 +1000 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError In-Reply-To: References: Message-ID: <20140915084312.GU9293@ando.pearwood.info> On Mon, Sep 15, 2014 at 04:12:31AM -0400, Terry Reedy wrote: > One of the problems with new Python programmers using 3.x is that they > first read 'print x' in 2.x based material, try 'print x' in 3.x, get > "SyntaxError: invalid syntax" (note the uninformative redundant > message), and go "huh?" or worse. > > Would it be possible to add detect this particular error and print a > more useful message? I am thinking of something of something like > SyntaxError: calling the 'print' function requires ()s, as in "print(x)" > or maybe > SyntaxError: did you mean "print(...)"? +1 Normally I'm a bit reluctant to include overly specific advice in exceptions, but I think this is a common and important enough case that if "print spam" can be distinguished from similar "foo spam" errors, we ought to give a specific error message rather than a generic one. > I was 'inspired' by a recent SO question > https://stackoverflow.com/questions/24273599/idle-gui-is-unable-to-give-output > which was closed as a duplicate of the 2009 question > https://stackoverflow.com/questions/826948/syntax-error-on-print-with-python-3 > I imagine that there have been other duplicates. The same question (and > answer) has appeared multiple times on python-list also. And multiple times on the tutor list too. > If we do this, I am sure someone will ask why we do not automatically > 'fix' the error. One answer would be that the closing ) is needed to > determine the intended end of the call. A longer version would be that > if we insert (, we are just guessing that the insertion is correct and > we still would not know, without guessing, where to put the ). Yes. Did you really mean "print spam", or should it be print_spam? Or print_(spam)? or prentspam? Fixing people's coding errors is not the job of the compiler -- DWIM code is dubious at best if not outright harmful. http://www.hacker-dictionary.com/terms/DWIM -- Steven From rosuav at gmail.com Mon Sep 15 11:13:59 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 15 Sep 2014 19:13:59 +1000 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError In-Reply-To: <20140915084312.GU9293@ando.pearwood.info> References: <20140915084312.GU9293@ando.pearwood.info> Message-ID: On Mon, Sep 15, 2014 at 6:43 PM, Steven D'Aprano wrote: >> If we do this, I am sure someone will ask why we do not automatically >> 'fix' the error. One answer would be that the closing ) is needed to >> determine the intended end of the call. A longer version would be that >> if we insert (, we are just guessing that the insertion is correct and >> we still would not know, without guessing, where to put the ). > > Yes. Did you really mean "print spam", or should it be print_spam? Or > print_(spam)? or prentspam? Fixing people's coding errors is not the > job of the compiler -- DWIM code is dubious at best if not outright > harmful. Yeah, auto-fixing is a really dangerous thing. It might do the wrong thing, and even if it does the right thing, it encourages sloppiness. But I've seen plenty of error (or warning) messages that tell you what the 99% likely fix is, and they're extremely helpful. I like those. ChrisA From tjreedy at udel.edu Mon Sep 15 11:35:48 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 15 Sep 2014 05:35:48 -0400 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError In-Reply-To: References: Message-ID: On 9/15/2014 4:23 AM, Chris Angelico wrote: > On Mon, Sep 15, 2014 at 6:12 PM, Terry Reedy wrote: >> One of the problems with new Python programmers using 3.x is that they first >> read 'print x' in 2.x based material, try 'print x' in 3.x, get >> "SyntaxError: invalid syntax" (note the uninformative redundant message), >> and go "huh?" or worse. >> >> Would it be possible to add detect this particular error and print a more >> useful message? I am thinking of something of something like >> SyntaxError: calling the 'print' function requires ()s, as in "print(x)" >> or maybe >> SyntaxError: did you mean "print(...)"? > > You mean like this? > > rosuav at dewey:~$ python3 > Python 3.4.1+ (default, Aug 18 2014, 12:06:51) > [GCC 4.9.1] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> print "Hello?" > File "", line 1 > print "Hello?" > ^ > SyntaxError: Missing parentheses in call to 'print' Excellent! Someone's time machine strikes again. -- Terry Jan Reedy From ncoghlan at gmail.com Mon Sep 15 11:41:17 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Sep 2014 19:41:17 +1000 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError In-Reply-To: References: Message-ID: ? On 15 Sep 2014 18:13, "Terry Reedy" wrote: > > One of the problems with new Python programmers using 3.x is that they first read 'print x' in 2.x based material, try 'print x' in 3.x, get "SyntaxError: invalid syntax" (note the uninformative redundant message), and go "huh?" or worse. > > Would it be possible to add detect this particular error and print a more useful message? I am thinking of something of something like > SyntaxError: calling the 'print' function requires ()s, as in "print(x)" > or maybe > SyntaxError: did you mean "print(...)"? > > I was 'inspired' by a recent SO question > https://stackoverflow.com/questions/24273599/idle-gui-is-unable-to-give-output > which was closed as a duplicate of the 2009 question > https://stackoverflow.com/questions/826948/syntax-error-on-print-with-python-3 Note my (relatively recent) comment there pointing to this self-answered question: https://stackoverflow.com/questions/25445439/what-does-syntaxerror-missing-parentheses-in-call-to-print-mean-in-python/ With any luck, searching for the more specific error message will bring folks directly to that page. That change will ship with 3.4.2 in a couple of weeks time. (It was one of the changes we implemented based on the post-PyCon feedback regarding barriers for newcomers) > If we do this, I am sure someone will ask why we do not automatically 'fix' the error. One answer would be that the closing ) is needed to determine the intended end of the call. A longer version would be that if we insert (, we are just guessing that the insertion is correct and we still would not know, without guessing, where to put the ). Implicit call statements are fully implementable within the constraints of CPython's parser (I wrote a proof of concept last year and posted it on the tracker). Guido isn't a fan though, and proving it was technically possible was enough to scratch my own itch. If someone else wants to dig up that patch and work it up into a full PEP, feel free - just be prepared to be very responsive to feedback regarding the genuine readability concerns with the idea, as well as for the chance it may still get rejected regardless :) Cheers, Nick. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmueller at python-academy.de Mon Sep 15 11:50:26 2014 From: mmueller at python-academy.de (=?ISO-8859-1?Q?Mike_M=FCller?=) Date: Mon, 15 Sep 2014 11:50:26 +0200 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError In-Reply-To: References: Message-ID: <5416B662.4010300@python-academy.de> Am 15.09.14 11:35, schrieb Terry Reedy: > On 9/15/2014 4:23 AM, Chris Angelico wrote: >> On Mon, Sep 15, 2014 at 6:12 PM, Terry Reedy wrote: >>> One of the problems with new Python programmers using 3.x is that they first >>> read 'print x' in 2.x based material, try 'print x' in 3.x, get >>> "SyntaxError: invalid syntax" (note the uninformative redundant message), >>> and go "huh?" or worse. >>> >>> Would it be possible to add detect this particular error and print a more >>> useful message? I am thinking of something of something like >>> SyntaxError: calling the 'print' function requires ()s, as in "print(x)" >>> or maybe >>> SyntaxError: did you mean "print(...)"? >> >> You mean like this? >> >> rosuav at dewey:~$ python3 >> Python 3.4.1+ (default, Aug 18 2014, 12:06:51) >> [GCC 4.9.1] on linux >> Type "help", "copyright", "credits" or "license" for more information. >>>>> print "Hello?" >> File "", line 1 >> print "Hello?" >> ^ >> SyntaxError: Missing parentheses in call to 'print' > > Excellent! Someone's time machine strikes again. +1 In general, I think the single most useful improvement for beginners are better hints in error message about what possibly went wrong. If this goes beyond syntax errors, adding a switch to turn this off might be useful. Just in case somebody actually scrapes messages or your doctests break. Mike From steve at pearwood.info Mon Sep 15 15:46:40 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 15 Sep 2014 23:46:40 +1000 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError In-Reply-To: <5416B662.4010300@python-academy.de> References: <5416B662.4010300@python-academy.de> Message-ID: <20140915134639.GV9293@ando.pearwood.info> On Mon, Sep 15, 2014 at 11:50:26AM +0200, Mike M?ller wrote: > In general, I think the single most useful improvement for beginners are > better hints in error message about what possibly went wrong. > > If this goes beyond syntax errors, adding a switch to turn this off might > be useful. Just in case somebody actually scrapes messages or your doctests > break. The exact exception message is not a part of the public API of Python built-ins or the standard library, so any code which relies on that is wrong. Error messages are subject to change without notice. As for doc tests, the doctest module has a directive specifically for ignoring the exception error message: https://docs.python.org/3/library/doctest.html#doctest.IGNORE_EXCEPTION_DETAIL Just add #doctest:+IGNORE_EXCEPTION_DETAIL to your test, and the message will be ignored. -- Steven From ethan at stoneleaf.us Mon Sep 15 16:20:26 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 15 Sep 2014 07:20:26 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140911073046.GH9293@ando.pearwood.info> References: <20140910105732.GE9293@ando.pearwood.info> <20140911073046.GH9293@ando.pearwood.info> Message-ID: <5416F5AA.8000805@stoneleaf.us> On 09/11/2014 12:30 AM, Steven D'Aprano wrote: > > Until your post just now, there has probably never been anyone anywhere > who wanted to display b'Abc' as "41:62:63", and there probably never > will be again. For such a specialised use-case, it's perfectly justified > to reject a request for such a colon-delimited hex function with "not > every one-liner...". Make that two. :) Space or colon delimited is far easier to read than no separator, or the noise of a \x separator. -- ~Ethan~ From ethan at stoneleaf.us Mon Sep 15 16:24:17 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 15 Sep 2014 07:24:17 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140911065500.GG9293@ando.pearwood.info> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> Message-ID: <5416F691.2070800@stoneleaf.us> On 09/10/2014 11:55 PM, Steven D'Aprano wrote: > On Wed, Sep 10, 2014 at 01:54:17PM +0200, Wolfgang Maier wrote: > >> On 09/10/2014 12:57 PM, Steven D'Aprano wrote: > >>> However, I do support Terry's suggestion that bytes (and, I presume, >>> bytearray) grow some sort of easy way of displaying the bytes in hex. >>> The trouble is, what do we actually want? >>> >>> b'Abc' --> '0x416263' >>> b'Abc' --> '\x41\x62\x63' >>> >>> I can see use-cases for both. After less than two minutes of thought, it >>> seems to me that perhaps the most obvious APIs for these two different >>> representations are: >>> >>> hex(b'Abc') --> '0x416263' >> >> This would require a change in the documented >> (https://docs.python.org/3/library/functions.html#hex) behavior of >> hex(), which I think is quite a big deal for a relatively special case. > > Any new functionality is going to require a change to the documentation. > > Changing hex() is no more of a big deal than adding a new method. I'd > call it *less* of a big deal. > > In Python 2, hex() calls the dunder method __hex__. That has been > removed in Python 3. Does anyone know why? __hex__ and __oct__ were removed in favor of __index__. __index__ returns the number as an integer (if possible to do so without conversion from, say, float or complex or ...). __hex__ and __oct__ did the same, and were redundant. -- ~Ethan~ From steve at pearwood.info Mon Sep 15 16:44:15 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 16 Sep 2014 00:44:15 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <5416F691.2070800@stoneleaf.us> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> Message-ID: <20140915144415.GW9293@ando.pearwood.info> On Mon, Sep 15, 2014 at 07:24:17AM -0700, Ethan Furman wrote: > >In Python 2, hex() calls the dunder method __hex__. That has been > >removed in Python 3. Does anyone know why? > > __hex__ and __oct__ were removed in favor of __index__. __index__ returns > the number as an integer (if possible to do so without conversion from, > say, float or complex or ...). __hex__ and __oct__ did the same, and were > redundant. No, __hex__ returned a string. It could be used to implement (say) a floating point hex representation, or hex() of bytes. py> (42).__hex__() '0x2a' In Python 2, hex() only had to return a string, and accepted anything with a __hex__ method. In Python 3, it can only be used on objects which are int-like, which completely rules out conversions of non-ints to hexadecimal notation. py> class MyList(list): ... def __hex__(self): ... return '[' + ', '.join(hex(a) for a in self) + ']' ... py> l = MyList([21, 16, 256, 73]) py> hex(l) '[0x15, 0x10, 0x100, 0x49]' Pity. I don't suppose anyone would support bringing back __hex__? -- Steven From mmueller at python-academy.de Mon Sep 15 16:56:10 2014 From: mmueller at python-academy.de (=?ISO-8859-1?Q?Mike_M=FCller?=) Date: Mon, 15 Sep 2014 16:56:10 +0200 Subject: [Python-ideas] Special-case 3.x 'print x' SyntaxError In-Reply-To: <20140915134639.GV9293@ando.pearwood.info> References: <5416B662.4010300@python-academy.de> <20140915134639.GV9293@ando.pearwood.info> Message-ID: <5416FE0A.60905@python-academy.de> Am 15.09.14 15:46, schrieb Steven D'Aprano: > On Mon, Sep 15, 2014 at 11:50:26AM +0200, Mike M?ller wrote: > >> In general, I think the single most useful improvement for beginners are >> better hints in error message about what possibly went wrong. >> >> If this goes beyond syntax errors, adding a switch to turn this off might >> be useful. Just in case somebody actually scrapes messages or your doctests >> break. > > The exact exception message is not a part of the public API of Python > built-ins or the standard library, so any code which relies on that is > wrong. Error messages are subject to change without notice. As for doc > tests, the doctest module has a directive specifically for ignoring the > exception error message: > > https://docs.python.org/3/library/doctest.html#doctest.IGNORE_EXCEPTION_DETAIL > > Just add #doctest:+IGNORE_EXCEPTION_DETAIL to your test, and the > message will be ignored. Thanks for the hints. Actually, I am familiar with both points. Even though quite a bit ugly code is out there, it maybe better to enforce good programming style instead of adding switches that allow such code to run. So, no switch. Mike From ethan at stoneleaf.us Mon Sep 15 17:16:58 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 15 Sep 2014 08:16:58 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <20140915144415.GW9293@ando.pearwood.info> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> Message-ID: <541702EA.2070001@stoneleaf.us> On 09/15/2014 07:44 AM, Steven D'Aprano wrote: > On Mon, Sep 15, 2014 at 07:24:17AM -0700, Ethan Furman wrote: > >>> In Python 2, hex() calls the dunder method __hex__. That has been >>> removed in Python 3. Does anyone know why? >> >> __hex__ and __oct__ were removed in favor of __index__. __index__ returns >> the number as an integer (if possible to do so without conversion from, >> say, float or complex or ...). __hex__ and __oct__ did the same, and were >> redundant. > > No, __hex__ returned a string. It could be used to implement (say) a > floating point hex representation, or hex() of bytes. Right, sorry. I had the wrong return type in mind. Now you have to use the hex format codes. > I don't suppose anyone would support bringing back __hex__? I don't think we need another formatting operator. we already have % and .format() -- do we still have string templates? -- ~Ethan~ From daviesk24 at yahoo.com Mon Sep 15 21:02:20 2014 From: daviesk24 at yahoo.com (Kevin Davies) Date: Mon, 15 Sep 2014 12:02:20 -0700 Subject: [Python-ideas] float comparison in doctes Message-ID: <1410807740.45297.YahooMailNeo@web124505.mail.ne1.yahoo.com> It seems that this didn't reach the list directly (see https://mail.python.org/pipermail/python-ideas/2014-August/028956.html), so I'm resending: Erik Bray (the author of the +FLOAT_CMP extension in Astropy), Bruce Leban, and I had a short off-thread email discussion. Here are the points: - [Bruce]: ALMOST_EQUAL is the best flag name. - [Erik]: If there's agreement on this, Erik will develop a patch as soon as he can. - [Erik]: There's no way to adjust the tolerance because there seems to be no easy way to parameterize doctest flags. Ideas are welcome. - [Erik]: Still, "This +FLOAT_CMP flag enabled removing tons of ellipses from the test outputs [of Astropy], and restoring the full outputs which certainly read better in the docs... For more complete unit tests of course we use assert_almost_equal type functions. - [Erik]: This PR is a better link than the one I gave: https://github.com/astropy/astropy/pull/2087 - [Erik]: Most of the code is from the SymPy project with improvements. Erik had started on a similar feature when he found that their implementation was further developed. Kevin From steve at pearwood.info Mon Sep 15 22:31:44 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 16 Sep 2014 06:31:44 +1000 Subject: [Python-ideas] float comparison in doctes In-Reply-To: <1410807740.45297.YahooMailNeo@web124505.mail.ne1.yahoo.com> References: <1410807740.45297.YahooMailNeo@web124505.mail.ne1.yahoo.com> Message-ID: <20140915203144.GY9293@ando.pearwood.info> On Mon, Sep 15, 2014 at 12:02:20PM -0700, Kevin Davies wrote: > It seems that this didn't reach the list directly (see > https://mail.python.org/pipermail/python-ideas/2014-August/028956.html), > so I'm resending: > > Erik Bray (the author of the +FLOAT_CMP extension in Astropy), Bruce > Leban, and I had a short off-thread email discussion. Here are the > points: > > - [Bruce]: ALMOST_EQUAL is the best flag name. > - [Erik]: If there's agreement on this, Erik will develop a patch as soon as he can. > - [Erik]: There's no way to adjust the tolerance because there seems > to be no easy way to parameterize doctest flags. Ideas are welcome. With no way to choose between (at minimum) *four* different "almost equal" models, and no way to specify a tolerance, I don't think doctest ought to have such a directive. Almost equal can mean: - round and compare with == (as unittest does) - absolute difference - relative difference - ULP difference Given what a blunt instrument doctest is, I think the nicest solution is also the most explicit: just use ellipses. > - [Erik]: Still, "This +FLOAT_CMP flag enabled removing tons of > ellipses from the test outputs [of Astropy], and restoring the full > outputs which certainly read better in the docs... For more complete > unit tests of course we use assert_almost_equal type functions. > - [Erik]: This PR is a better link than the one I gave: > https://github.com/astropy/astropy/pull/2087 > - [Erik]: Most of the code is from the SymPy project with > improvements. Erik had started on a similar feature when he found that > their implementation was further developed. From tjreedy at udel.edu Mon Sep 15 23:11:43 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 15 Sep 2014 17:11:43 -0400 Subject: [Python-ideas] float comparison in doctes In-Reply-To: <1410807740.45297.YahooMailNeo@web124505.mail.ne1.yahoo.com> References: <1410807740.45297.YahooMailNeo@web124505.mail.ne1.yahoo.com> Message-ID: On 9/15/2014 3:02 PM, Kevin Davies wrote: > Erik Bray (the author of the +FLOAT_CMP extension in Astropy), Bruce Leban, and I had a short off-thread email discussion. Here are the points: > > - [Bruce]: ALMOST_EQUAL is the best flag name. Agreed. I matches assertAlmostEqual. > - [Erik]: If there's agreement on this, Erik will develop a patch as soon as he can. I say go ahead. > - [Erik]: There's no way to adjust the tolerance because there seems to be no easy way to parameterize doctest flags. Ideas are welcome. After thinking about it more, I think this is OK for the purposes of doctests. > - [Erik]: Still, "This +FLOAT_CMP flag enabled removing tons of ellipses from the test outputs [of Astropy], and restoring the full outputs which certainly read better in the docs... For more complete unit tests of course we use assert_almost_equal type functions. So I would tune the doctest almost-equal compare algorithm to cover most cases, based on your experience, and leave exact tuning and broad domain coverage to unittests. > - [Erik]: This PR is a better link than the one I gave: https://github.com/astropy/astropy/pull/2087 > - [Erik]: Most of the code is from the SymPy project with improvements. Erik had started on a similar feature when he found that their implementation was further developed. -- Terry Jan Reedy From ncoghlan at gmail.com Tue Sep 16 01:47:42 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Sep 2014 11:47:42 +1200 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <541702EA.2070001@stoneleaf.us> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> Message-ID: On 16 Sep 2014 01:17, "Ethan Furman" wrote: > I don't think we need another formatting operator. we already have % and .format() -- do we still have string templates? Yes, but those were designed for a specific use case where the templates are written by language translators rather than software developers. The current suggestion on the issue tracker is to add __format__ to bytes/bytearray/memoryview with a suitable symbolic mini-language to control the formatting details. Thrashing out a mini-language design will likely require a PEP, though. Cheers, Nick. > > -- > ~Ethan~ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Sep 16 01:55:16 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Sep 2014 11:55:16 +1200 Subject: [Python-ideas] float comparison in doctes In-Reply-To: <20140915203144.GY9293@ando.pearwood.info> References: <1410807740.45297.YahooMailNeo@web124505.mail.ne1.yahoo.com> <20140915203144.GY9293@ando.pearwood.info> Message-ID: On 16 Sep 2014 06:32, "Steven D'Aprano" wrote: > > On Mon, Sep 15, 2014 at 12:02:20PM -0700, Kevin Davies wrote: > > > It seems that this didn't reach the list directly (see > > https://mail.python.org/pipermail/python-ideas/2014-August/028956.html), > > so I'm resending: > > > > > Erik Bray (the author of the +FLOAT_CMP extension in Astropy), Bruce > > Leban, and I had a short off-thread email discussion. Here are the > > points: > > > > - [Bruce]: ALMOST_EQUAL is the best flag name. > > - [Erik]: If there's agreement on this, Erik will develop a patch as soon as he can. > > - [Erik]: There's no way to adjust the tolerance because there seems > > to be no easy way to parameterize doctest flags. Ideas are welcome. > > With no way to choose between (at minimum) *four* different "almost > equal" models, and no way to specify a tolerance, I don't think doctest > ought to have such a directive. > > Almost equal can mean: > > - round and compare with == (as unittest does) > - absolute difference > - relative difference > - ULP difference I think it's OK for a doctest flag to just provide the default behaviour offered by unittest.TestCase.assertAlmostEqual. That aligns well with the originally intended use case of testing examples in documentation. If folks want more precise control, they can then switch to full unit tests. It would be reasonable for the docs for the new flag to point that out explicitly. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Sep 16 02:43:42 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 15 Sep 2014 17:43:42 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> Message-ID: <541787BE.1010805@stoneleaf.us> On 09/15/2014 04:47 PM, Nick Coghlan wrote: > > The current suggestion on the issue tracker is to add __format__ to > bytes/bytearray/memoryview with a suitable symbolic mini-language to > control the formatting details. PEP 461 specifically did not add back __format__ to bytes/bytearrays. I think a PEP is appropriate to reverse that decision. -- ~Ethan~ From eric at trueblade.com Tue Sep 16 15:02:08 2014 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 16 Sep 2014 09:02:08 -0400 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <541787BE.1010805@stoneleaf.us> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> Message-ID: <541834D0.6010402@trueblade.com> On 09/15/2014 08:43 PM, Ethan Furman wrote: > On 09/15/2014 04:47 PM, Nick Coghlan wrote: >> >> The current suggestion on the issue tracker is to add __format__ to >> bytes/bytearray/memoryview with a suitable symbolic mini-language to >> control the formatting details. > > PEP 461 specifically did not add back __format__ to bytes/bytearrays. I > think a PEP is appropriate to reverse that decision. That's different. PEP 461 excluded them because it was talking about bytes.format(). bytes.__format__() would be much easier to deal with, because its result must be unicode (str in 3.x). I don't think just adding bytes/bytearray.__format__() per se requires a PEP. It's not a very radical addition, similar to datetime.__format__(). But I wouldn't be opposed to a PEP to decide on the specifics of the mini-language that bytes.__format__() supports. Eric. From ethan at stoneleaf.us Tue Sep 16 17:04:37 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 16 Sep 2014 08:04:37 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <541834D0.6010402@trueblade.com> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> Message-ID: <54185185.2030608@stoneleaf.us> On 09/16/2014 06:02 AM, Eric V. Smith wrote: > On 09/15/2014 08:43 PM, Ethan Furman wrote: >> On 09/15/2014 04:47 PM, Nick Coghlan wrote: >>> >>> The current suggestion on the issue tracker is to add __format__ to >>> bytes/bytearray/memoryview with a suitable symbolic mini-language to >>> control the formatting details. >> >> PEP 461 specifically did not add back __format__ to bytes/bytearrays. I >> think a PEP is appropriate to reverse that decision. > > That's different. PEP 461 excluded them because it was talking about > bytes.format(). bytes.__format__() would be much easier to deal with, > because its result must be unicode (str in 3.x). > > I don't think just adding bytes/bytearray.__format__() per se requires a > PEP. It's not a very radical addition, similar to datetime.__format__(). > But I wouldn't be opposed to a PEP to decide on the specifics of the > mini-language that bytes.__format__() supports. So the difference is: b'Hello, %s' % some_bytes_var --> b'Hello, ' whilst b'Hello, {}'.format(some_uni_var) --> u'Hello, ' (Yes, I remember unicode == str, I was just being explicit ;) That would certainly go along with the idea that `format` is for strings. -- ~Ethan~ From guido at python.org Tue Sep 16 18:14:13 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Sep 2014 09:14:13 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <54185185.2030608@stoneleaf.us> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> Message-ID: TBH I've lost track what this thread is about, but if any actionable proposals come out, please send them my way in the form of a PEP. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Sep 16 22:09:59 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 16 Sep 2014 13:09:59 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <54185185.2030608@stoneleaf.us> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> Message-ID: On Sep 16, 2014, at 8:04, Ethan Furman wrote: > On 09/16/2014 06:02 AM, Eric V. Smith wrote: >> On 09/15/2014 08:43 PM, Ethan Furman wrote: >>> On 09/15/2014 04:47 PM, Nick Coghlan wrote: >>>> >>>> The current suggestion on the issue tracker is to add __format__ to >>>> bytes/bytearray/memoryview with a suitable symbolic mini-language to >>>> control the formatting details. >>> >>> PEP 461 specifically did not add back __format__ to bytes/bytearrays. I >>> think a PEP is appropriate to reverse that decision. >> >> That's different. PEP 461 excluded them because it was talking about >> bytes.format(). bytes.__format__() would be much easier to deal with, >> because its result must be unicode (str in 3.x). >> >> I don't think just adding bytes/bytearray.__format__() per se requires a >> PEP. It's not a very radical addition, similar to datetime.__format__(). >> But I wouldn't be opposed to a PEP to decide on the specifics of the >> mini-language that bytes.__format__() supports. > > So the difference is: > > b'Hello, %s' % some_bytes_var --> b'Hello, ' > > whilst > > b'Hello, {}'.format(some_uni_var) --> u'Hello, ' No, you're mixing up `format`, an explicit method on str that no one is suggesting adding to bytes, and `__format__`, a dunder method on every type that's used by `str.format` and `format`; the proposal is to extend `bytes.__format__` in some way that I don't think is entirely decided yet, but it would look something like this: u'Hello, {:a}'.format(some_bytes_var) --> u'Hello, ' Or: u'Hello, {:#x}'.format(some_bytes_var) --> u'Hello, \\x2d\\x78\\x68\\x61...' > (Yes, I remember unicode == str, I was just being explicit ;) > > That would certainly go along with the idea that `format` is for strings. > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ethan at stoneleaf.us Tue Sep 16 22:36:10 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 16 Sep 2014 13:36:10 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> Message-ID: <54189F3A.8010304@stoneleaf.us> On 09/16/2014 01:09 PM, Andrew Barnert wrote: > > No, you're mixing up `format`, an explicit method on str that no one > is suggesting adding to bytes, and `__format__`, a dunder method on > every type that's used by `str.format` and `format`; the proposal is > to extend `bytes.__format__` in some way that I don't think is entirely > decided yet, but it would look something like this: > > u'Hello, {:a}'.format(some_bytes_var) --> u'Hello, ' > > Or: > > u'Hello, {:#x}'.format(some_bytes_var) --> u'Hello, \\x2d\\x78\\x68\\x61...' Ah, that makes more sense, thanks for the clarification! -- ~Ethan~ From tjreedy at udel.edu Wed Sep 17 04:08:14 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 16 Sep 2014 22:08:14 -0400 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> Message-ID: On 9/16/2014 4:09 PM, Andrew Barnert wrote: > the proposal is to extend `bytes.__format__` Currently bytes just inherits object.__format__ >>> bytes.__format__.__qualname__ 'object.__format__' object.__format__ does not allow a non-empty format string. >>> 'a{}b'.format(b'c\xdd') "ab'c\\xdd'b" >>> 'a{:a}b'.format(b'c\xdd') Traceback (most recent call last): File "", line 1, in 'a{:a}b'.format(b'c\xdd') TypeError: non-empty format string passed to object.__format__ > in some way that I don't think is > entirely decided yet, but it would look something like this: > > u'Hello, {:a}'.format(some_bytes_var) --> u'Hello, ' > > Or: > > u'Hello, {:#x}'.format(some_bytes_var) --> u'Hello, > \\x2d\\x78\\x68\\x61...' -- Terry Jan Reedy From tleeuwenburg at gmail.com Wed Sep 17 07:19:52 2014 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Wed, 17 Sep 2014 15:19:52 +1000 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations Message-ID: I would like to be able to use named sections to organise my code, much an inline submodules, bit without using classes or functions to organise them. I would use this if I had a group of related functions which were not written in an object-oriented-style, possibly due to not needing any shared state. Rather than break these out into a new file, I would like to just be able to use internal structure to declare the relationship. I've used the keyword 'block' to indicate the start of a named block. For example, block signin: def handle_new_user(): do_it() def handle_existing_user(): do_it() while True: try: signin.handle_existing_user(): except: signin.handle_new_user() do_other_stuff() At the moment, I would have to either break out into more files, or somewhat clumsily co-opt things like functions or staticmethods. I think that supporting named blocks or inline module declarations would really help me organise some of my code much better. It could also provide a more seamless way to decide to break out into a new file. Once a named block got big enough, I could easily create a new file and import those functions into the same namespace. I hope this makes sense and that I'm not overlooking anything obvious. Cheers, -Tennessee -------------- next part -------------- An HTML attachment was scrubbed... URL: From al2 at stanford.edu Wed Sep 17 07:27:51 2014 From: al2 at stanford.edu (Ali Alkhatib) Date: Tue, 16 Sep 2014 22:27:51 -0700 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: References: Message-ID: This may be a misuse of classes, but why can't you make a class and then not instantiate it? class signin: def handle(): return "this works" signin.handle() # returns "this works" On Tue, Sep 16, 2014 at 10:19 PM, Tennessee Leeuwenburg < tleeuwenburg at gmail.com> wrote: > I would like to be able to use named sections to organise my code, much an > inline submodules, bit without using classes or functions to organise them. > I would use this if I had a group of related functions which were not > written in an object-oriented-style, possibly due to not needing any shared > state. Rather than break these out into a new file, I would like to just be > able to use internal structure to declare the relationship. I've used the > keyword 'block' to indicate the start of a named block. > > For example, > > block signin: > def handle_new_user(): > do_it() > > def handle_existing_user(): > do_it() > > > while True: > try: > signin.handle_existing_user(): > except: > signin.handle_new_user() > > do_other_stuff() > > At the moment, I would have to either break out into more files, or > somewhat clumsily co-opt things like functions or staticmethods. I think > that supporting named blocks or inline module declarations would really > help me organise some of my code much better. It could also provide a more > seamless way to decide to break out into a new file. Once a named block got > big enough, I could easily create a new file and import those functions > into the same namespace. > > I hope this makes sense and that I'm not overlooking anything obvious. > > Cheers, > -Tennessee > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ali Alkhatib Department of Computer Science PhD Student - Stanford University -------------- next part -------------- An HTML attachment was scrubbed... URL: From tleeuwenburg at gmail.com Wed Sep 17 07:43:34 2014 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Wed, 17 Sep 2014 15:43:34 +1000 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: References: Message-ID: Hi Ali, Thanks for the suggestion. I would prefer to avoid that just because it's a potential misuse of classes, and I suspect may lead to confusion for other developers. Otherwise that's exactly what I want to do. Cheers, -T On 17 September 2014 15:27, Ali Alkhatib wrote: > This may be a misuse of classes, but why can't you make a class and then > not instantiate it? > > class signin: > def handle(): > return "this works" > > signin.handle() # returns "this works" > > On Tue, Sep 16, 2014 at 10:19 PM, Tennessee Leeuwenburg < > tleeuwenburg at gmail.com> wrote: > >> I would like to be able to use named sections to organise my code, much >> an inline submodules, bit without using classes or functions to organise >> them. I would use this if I had a group of related functions which were not >> written in an object-oriented-style, possibly due to not needing any shared >> state. Rather than break these out into a new file, I would like to just be >> able to use internal structure to declare the relationship. I've used the >> keyword 'block' to indicate the start of a named block. >> >> For example, >> >> block signin: >> def handle_new_user(): >> do_it() >> >> def handle_existing_user(): >> do_it() >> >> >> while True: >> try: >> signin.handle_existing_user(): >> except: >> signin.handle_new_user() >> >> do_other_stuff() >> >> At the moment, I would have to either break out into more files, or >> somewhat clumsily co-opt things like functions or staticmethods. I think >> that supporting named blocks or inline module declarations would really >> help me organise some of my code much better. It could also provide a more >> seamless way to decide to break out into a new file. Once a named block got >> big enough, I could easily create a new file and import those functions >> into the same namespace. >> >> I hope this makes sense and that I'm not overlooking anything obvious. >> >> Cheers, >> -Tennessee >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > Ali Alkhatib > Department of Computer Science > PhD Student - Stanford University > -- -------------------------------------------------- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Wed Sep 17 08:12:33 2014 From: gokoproject at gmail.com (John Yeuk Hon Wong) Date: Wed, 17 Sep 2014 02:12:33 -0400 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: References: Message-ID: <54192651.6050708@gmail.com> On 9/17/14 1:19 AM, Tennessee Leeuwenburg wrote: > I would use this if I had a group of related functions which were not > written in an object-oriented-style, possibly due to not needing any > shared state. Rather than break these out into a new file, I would > like to just be able to use internal structure to declare the > relationship. Yeah, standalone function CAN be easier to test provided that you did everything possible to make the function testable. > It could also provide a more seamless way to decide to break out into > a new file. Once a named block got big enough, I could easily create a > new file and import those functions into the same namespace. So this is how I read your idea: 1. Import the necessary names from other modules if needed 2. Organize them local to the module 3. Use them by attribute loop-up as indicated by your example The idea sounds pretty cool, but first time I ask myself as a Python user is whether I find signing.xxx more appealing than writing xxxx directly. As an import mechanism, the regular import seems good enough to me. Personally, based on your examples, I should have a module named sigin. Because in advance I know I will have a bunch of functions that are logically related. These functions will be used when the user is signed in. So I will tell myself don't wait just do it now, put them into a module. When I use the module, I just import signin and in my code I just signin.handle_new_user It is also my preference to import the module rather than importing individual functions into the namespace; I want to be able to know the origin of the function as I write my code. Now, can handle_new_user be used outside of the signin context? I really doubt it. If handle_new_user is meant to run after the user is signed in, then that function will only live in 1 block. No resuse. As a macro, I think that the equivalent is probably grouping within a function (although that is not what most people think of macros do, but er, "close enough"). Just my two cents. John From mertz at gnosis.cx Wed Sep 17 08:21:53 2014 From: mertz at gnosis.cx (David Mertz) Date: Tue, 16 Sep 2014 23:21:53 -0700 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: References: Message-ID: Why is this a misuse? Classes are largely just namespaces to start with, and if you want to use them solely for that purpose, it's all there for you. If you wanted to make you intention even more obviously, you could do something like: >>> class NoInstance(object): ... def __new__(cls): ... raise NotImplementedError ... >>> class signin(NoInstance): ... def handle(*args): ... print("Hello", args) ... >>> signin() Traceback (most recent call last): File "", line 1, in File "", line 3, in __new__ NotImplementedError >>> signin.handle('some','args','here') Hello ('some', 'args', 'here') Maybe some other name than NoInstance would be better: NamespaceOnly? In any case, one helper class lets you use existing syntax exactly as you desire. On Tue, Sep 16, 2014 at 10:43 PM, Tennessee Leeuwenburg wrote: > Hi Ali, > > Thanks for the suggestion. I would prefer to avoid that just because it's a > potential misuse of classes, and I suspect may lead to confusion for other > developers. Otherwise that's exactly what I want to do. > > Cheers, > -T > > On 17 September 2014 15:27, Ali Alkhatib wrote: >> >> This may be a misuse of classes, but why can't you make a class and then >> not instantiate it? >> >> class signin: >> def handle(): >> return "this works" >> >> signin.handle() # returns "this works" >> >> On Tue, Sep 16, 2014 at 10:19 PM, Tennessee Leeuwenburg >> wrote: >>> >>> I would like to be able to use named sections to organise my code, much >>> an inline submodules, bit without using classes or functions to organise >>> them. I would use this if I had a group of related functions which were not >>> written in an object-oriented-style, possibly due to not needing any shared >>> state. Rather than break these out into a new file, I would like to just be >>> able to use internal structure to declare the relationship. I've used the >>> keyword 'block' to indicate the start of a named block. >>> >>> For example, >>> >>> block signin: >>> def handle_new_user(): >>> do_it() >>> >>> def handle_existing_user(): >>> do_it() >>> >>> >>> while True: >>> try: >>> signin.handle_existing_user(): >>> except: >>> signin.handle_new_user() >>> >>> do_other_stuff() >>> >>> At the moment, I would have to either break out into more files, or >>> somewhat clumsily co-opt things like functions or staticmethods. I think >>> that supporting named blocks or inline module declarations would really help >>> me organise some of my code much better. It could also provide a more >>> seamless way to decide to break out into a new file. Once a named block got >>> big enough, I could easily create a new file and import those functions into >>> the same namespace. >>> >>> I hope this makes sense and that I'm not overlooking anything obvious. >>> >>> Cheers, >>> -Tennessee >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> >> -- >> Ali Alkhatib >> Department of Computer Science >> PhD Student - Stanford University > > > > > -- > -------------------------------------------------- > Tennessee Leeuwenburg > http://myownhat.blogspot.com/ > "Don't believe everything you think" > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. From lkb.teichmann at gmail.com Wed Sep 17 09:59:53 2014 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Wed, 17 Sep 2014 09:59:53 +0200 Subject: [Python-ideas] Fwd: Yielding from the command line In-Reply-To: References: Message-ID: Hi Andrew, Hi List, > [ some discussion about calling yield from from the command line skipped ] > > I would love to see this. I'm not sure if I'd love it in practice or not, but until > someone implements it and I can play with it I'm not sure how I'd become sure. > > So... You just volunteered, right? Go build it and put it on PyPI, I want it and > I'll be your best friend forever and ever no takebacks if you do it. :) Well, so I did, I wrote an IPython extension that does it and put it up on https://github.com/tecki/ipython-yf It's more a mock-up of how it should actually look like, but it is a functioning mock-up. So now you can write on the command line stuff like: >>> %load_ext yf >>> from asyncio import sleep, async >>> def f(): ... yield from sleep(3) ... print("done") >>> yield from f() #[wait three seconds] done >>> async(f()) >>> #[wait three seconds, or type other commands] done So as you see, the event loop runs while you are typing commands, and while they are executed. Greetings Martin From mal at egenix.com Wed Sep 17 10:08:31 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 17 Sep 2014 10:08:31 +0200 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: References: Message-ID: <5419417F.9060805@egenix.com> On 17.09.2014 07:19, Tennessee Leeuwenburg wrote: > I would like to be able to use named sections to organise my code, much an > inline submodules, bit without using classes or functions to organise them. > I would use this if I had a group of related functions which were not > written in an object-oriented-style, possibly due to not needing any shared > state. Rather than break these out into a new file, I would like to just be > able to use internal structure to declare the relationship. I've used the > keyword 'block' to indicate the start of a named block. > > For example, > > block signin: > def handle_new_user(): > do_it() > > def handle_existing_user(): > do_it() > > > while True: > try: > signin.handle_existing_user(): > except: > signin.handle_new_user() > > do_other_stuff() > > At the moment, I would have to either break out into more files, or > somewhat clumsily co-opt things like functions or staticmethods. I think > that supporting named blocks or inline module declarations would really > help me organise some of my code much better. It could also provide a more > seamless way to decide to break out into a new file. Once a named block got > big enough, I could easily create a new file and import those functions > into the same namespace. > > I hope this makes sense and that I'm not overlooking anything obvious. Change "block" to "class" and you're done :-) You can make your code even better (i.e. more OO-style and future proof), by implementing those functions as true methods and instantiating your Signin class as signin singleton. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 17 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-09-19: PyCon UK 2014, Coventry, UK ... 2 days to go 2014-09-27: PyDDF Sprint 2014 ... 10 days to go 2014-09-30: Python Meeting Duesseldorf ... 13 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ben+python at benfinney.id.au Wed Sep 17 10:39:08 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 17 Sep 2014 18:39:08 +1000 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations References: Message-ID: <85lhpivdxv.fsf@benfinney.id.au> David Mertz writes: > Why is this a misuse? Classes are largely just namespaces to start > with, and if you want to use them solely for that purpose, it's all > there for you. I think it's a misuse of the semantic meaning of classes to use them as pure namespaces. The semantic intent conveyed by defining a class is that you're defining a class *of objects*, and therefore that the class is intended to be instantiated. The programmer reading a class definition is receiving a strong signal that there will be objects of this class in the program. To have a class and not instantiate it, merely to have a namespace, is at least misleading the reader of that code. To do this is not an error. But it is IMO a code smell. If you're defining a class and using it only as a namespace, your design is likely poor and you need to re-think it. In this case, I think Tennessee's intent is much better met using modules; those *are* semantically namespace singletons, matching the intent here and therefore much better at communicating that intent. I see no justification given here for avoiding modules if this is what's needed. -- \ Lucifer: ?Just sign the Contract, sir, and the Piano is yours.? | `\ Ray: ?Sheesh! This is long! Mind if I sign it now and read it | _o__) later?? ?http://www.achewood.com/ | Ben Finney From abarnert at yahoo.com Wed Sep 17 10:51:29 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 17 Sep 2014 01:51:29 -0700 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: References: Message-ID: <70CB0A34-7B73-407E-AC4A-7F25D1C94BF2@yahoo.com> On Sep 16, 2014, at 23:21, David Mertz wrote: > Why is this a misuse? Well, for one thing, you're relying on the fact that unbound methods are just plain functions, which was not true in 2.x and is still not described that way in the documentation. You're also ignoring the fact that the first parameter of a method should be self and the convention (enforced by the interpreter 2.x, although no longer in 3.x, and by various lint tools, and likely relied on by IDEs, etc.) that when calling an unbound method you pass an instance of the class (or a subclass) as the first argument. In short, you're going to confuse both human and automated readers. Anyone who gets how methods and descriptors work is going to be able to figure it out, but is that really sufficient for readability? Of course you could solve all of that by declaring each method @staticmethod--or by writing a metaclass or class decorator that does that for you automatically, or (if you don't care about 2.x, or about consenting adults accidentally creating an instance and discovering that the methods don't work) just writing one that does nothing except indicate to the human reader that this is a non-instantiatable class whose methods are all static. > Classes are largely just namespaces to start > with, and if you want to use them solely for that purpose, it's all > there for you. If you wanted to make you intention even more > obviously, you could do something like: > >>>> class NoInstance(object): > ... def __new__(cls): > ... raise NotImplementedError > ... >>>> class signin(NoInstance): > ... def handle(*args): > ... print("Hello", args) > ... >>>> signin() > Traceback (most recent call last): > File "", line 1, in > File "", line 3, in __new__ > NotImplementedError >>>> signin.handle('some','args','here') > Hello ('some', 'args', 'here') > > Maybe some other name than NoInstance would be better: NamespaceOnly? > In any case, one helper class lets you use existing syntax exactly as > you desire. > > On Tue, Sep 16, 2014 at 10:43 PM, Tennessee Leeuwenburg > wrote: >> Hi Ali, >> >> Thanks for the suggestion. I would prefer to avoid that just because it's a >> potential misuse of classes, and I suspect may lead to confusion for other >> developers. Otherwise that's exactly what I want to do. >> >> Cheers, >> -T >> >> On 17 September 2014 15:27, Ali Alkhatib wrote: >>> >>> This may be a misuse of classes, but why can't you make a class and then >>> not instantiate it? >>> >>> class signin: >>> def handle(): >>> return "this works" >>> >>> signin.handle() # returns "this works" >>> >>> On Tue, Sep 16, 2014 at 10:19 PM, Tennessee Leeuwenburg >>> wrote: >>>> >>>> I would like to be able to use named sections to organise my code, much >>>> an inline submodules, bit without using classes or functions to organise >>>> them. I would use this if I had a group of related functions which were not >>>> written in an object-oriented-style, possibly due to not needing any shared >>>> state. Rather than break these out into a new file, I would like to just be >>>> able to use internal structure to declare the relationship. I've used the >>>> keyword 'block' to indicate the start of a named block. >>>> >>>> For example, >>>> >>>> block signin: >>>> def handle_new_user(): >>>> do_it() >>>> >>>> def handle_existing_user(): >>>> do_it() >>>> >>>> >>>> while True: >>>> try: >>>> signin.handle_existing_user(): >>>> except: >>>> signin.handle_new_user() >>>> >>>> do_other_stuff() >>>> >>>> At the moment, I would have to either break out into more files, or >>>> somewhat clumsily co-opt things like functions or staticmethods. I think >>>> that supporting named blocks or inline module declarations would really help >>>> me organise some of my code much better. It could also provide a more >>>> seamless way to decide to break out into a new file. Once a named block got >>>> big enough, I could easily create a new file and import those functions into >>>> the same namespace. >>>> >>>> I hope this makes sense and that I'm not overlooking anything obvious. >>>> >>>> Cheers, >>>> -Tennessee >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >>> >>> >>> -- >>> Ali Alkhatib >>> Department of Computer Science >>> PhD Student - Stanford University >> >> >> >> >> -- >> -------------------------------------------------- >> Tennessee Leeuwenburg >> http://myownhat.blogspot.com/ >> "Don't believe everything you think" >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Wed Sep 17 11:10:54 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 17 Sep 2014 19:10:54 +1000 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: References: Message-ID: <20140917091054.GF9293@ando.pearwood.info> On Wed, Sep 17, 2014 at 03:19:52PM +1000, Tennessee Leeuwenburg wrote: > I would like to be able to use named sections to organise my code, much an > inline submodules, bit without using classes or functions to organise them. > I would use this if I had a group of related functions which were not > written in an object-oriented-style, possibly due to not needing any shared > state. Rather than break these out into a new file, I would like to just be > able to use internal structure to declare the relationship. I've used the > keyword 'block' to indicate the start of a named block. > > For example, > > block signin: > def handle_new_user(): > do_it() > > def handle_existing_user(): > do_it() I think that this is very close to what C++ calls namespaces, and I think that the Zen of Python has something to say about namespaces :-) For quite some time I've been mulling over the idea of having multiple namespaces within a single module, but my ideas haven't been advanced enough to raise here. While having dedicated syntax for it would be nice: namespace stuff: a = 2 def stuff(x): ... assert stuff.a == 2 I *think* it should be possible to abuse the class statement to get the same effect, by use of a metaclass or possibly a class decorator: @namespace class stuff: a = 2 def stuff(x): ... assert isinstance(stuff, ModuleType) assert stuff.a == 2 The hardest part, I think, is getting scoping right in the functions. What I would expect is that inside a namespace, scoping should go: local current namespace module globals built-ins so that functions inside a single namespace can refer to each other without needing to give a fully-qualified name. In other words, this sort of namespace is just like a module, but it doesn't need to be written in an external file. -- Steven From steve at pearwood.info Wed Sep 17 11:21:54 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 17 Sep 2014 19:21:54 +1000 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: <70CB0A34-7B73-407E-AC4A-7F25D1C94BF2@yahoo.com> References: <70CB0A34-7B73-407E-AC4A-7F25D1C94BF2@yahoo.com> Message-ID: <20140917092154.GG9293@ando.pearwood.info> On Wed, Sep 17, 2014 at 01:51:29AM -0700, Andrew Barnert wrote: > On Sep 16, 2014, at 23:21, David Mertz wrote: > > > Why is this a misuse? > > Well, for one thing, you're relying on the fact that unbound methods > are just plain functions, which was not true in 2.x and is still not > described that way in the documentation. You're also ignoring the fact > that the first parameter of a method should be self and the convention > (enforced by the interpreter 2.x, although no longer in 3.x, and by > various lint tools, and likely relied on by IDEs, etc.) that when > calling an unbound method you pass an instance of the class (or a > subclass) as the first argument. While all this is true, one can work around it by declaring all your methods @staticmethod. But it's worse than that. By using a class, you imply inheritance and instantiation. Neither is relevant to the basic "namespace" idea. Furthermore, there's no point (in my opinion) in having this sort of namespace unless functions inside a namespace can refer to each other without caring about the name of the namespace they are in. Think of modules. Given a module a.py containing functions f and g, f can call g: def f(): return g() without writing: def f(): return a.g() Classes don't give you that, so they are not up to the job. Modules, on the other hand, give us almost exactly what is needed here. We can create module instances on the fly, and populate them. A class decorator could accept a class and return a module instance, on the fly. That would still be ugly, since @namespace class stuff: *looks* like a class even though it isn't, but it will do as a proof-of-concept. -- Steven From ncoghlan at gmail.com Wed Sep 17 12:57:29 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 17 Sep 2014 20:57:29 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> Message-ID: On 17 September 2014 06:09, Andrew Barnert wrote: > No, you're mixing up `format`, an explicit method on str that no one is suggesting adding to bytes, and `__format__`, a dunder method on every type that's used by `str.format` and `format`; the proposal is to extend `bytes.__format__` in some way that I don't think is entirely decided yet, but it would look something like this: > > u'Hello, {:a}'.format(some_bytes_var) --> u'Hello, ' > > Or: > > u'Hello, {:#x}'.format(some_bytes_var) --> u'Hello, \\x2d\\x78\\x68\\x61...' Ignoring the specifics of the minilanguage, here are the examples I posted to http://bugs.python.org/issue22385: format(b"xyz", "x") -> '78797a' format(b"xyz", "X") -> '78797A' format(b"xyz", "#x") -> '0x78797a' format(b"xyz", ".1x") -> '78 79 7a' format(b"abcdwxyz", ".4x") -> '61626364 7778797a' format(b"abcdwxyz", "#.4x") -> '0x61626364 0x7778797a' format(b"xyz", ",.1x") -> '78,79,7a' format(b"abcdwxyz", ",.4x") -> '61626364,7778797a' format(b"abcdwxyz", "#,.4x") -> '0x61626364,0x7778797a' The point on the issue tracker was that while this is a good way to obtain the flexibility, adhering too closely to the "standard format syntax" as I did likely isn't a good idea. Instead, we'd be better going for the strftime model where a type specific format (e.g. as an argument to the new *.hex() methods being discussed in http://bugs.python.org/issue) is *also* supported via __format__. For example, inspired directly by the way hex editors work, you could envision an approach where you had a base format character (chosen to be orthogonal to the default format characters): "h": lowercase hex "H": uppercase hex "A": ASCII (using "." for unprintable & extended ASCII) format(b"xyz", "A") -> 'xyz' format(b"xyz", "h") -> '78797a' format(b"xyz", "H") -> '78797A' Followed by a separator and "chunk size": format(b"xyz", "h 1") -> '78 79 7a' format(b"abcdwxyz", "h 4") -> '61626364 7778797a' format(b"xyz", "h,1") -> '78,79,7a' format(b"abcdwxyz", "h,4") -> '61626364,7778797a' format(b"xyz", "h:1") -> '78:79:7a' format(b"abcdwxyz", "h:4") -> '61626364:7778797a' In the "h" and "H" cases, you could request a preceding "0x" on the chunks: format(b"xyz", "h#") -> '0x78797a' format(b"xyz", "h# 1") -> '0x78 0x79 0x7a' format(b"abcdwxyz", "h# 4") -> '0x61626364 0x7778797a' The section before the format character would use the standard string formatting rules: alignment, fill character, width, precision Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eric at trueblade.com Wed Sep 17 13:48:51 2014 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 17 Sep 2014 07:48:51 -0400 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> Message-ID: <54197523.6080700@trueblade.com> On 09/17/2014 06:57 AM, Nick Coghlan wrote: > The point on the issue tracker was that while this is a good way to > obtain the flexibility, adhering too closely to the "standard format > syntax" as I did likely isn't a good idea. Instead, we'd be better > going for the strftime model where a type specific format (e.g. as an > argument to the new *.hex() methods being discussed in > http://bugs.python.org/issue) is *also* supported via __format__. One thing I'd like to not support here that strftime does: arbitrary pass-through text in the format string. It's useful for date/time, but not here. And your examples below don't allow it, I just wanted to be clear. > For example, inspired directly by the way hex editors work, you could > envision an approach where you had a base format character (chosen to > be orthogonal to the default format characters): > > "h": lowercase hex > "H": uppercase hex > "A": ASCII (using "." for unprintable & extended ASCII) > > format(b"xyz", "A") -> 'xyz' > format(b"xyz", "h") -> '78797a' > format(b"xyz", "H") -> '78797A' > > Followed by a separator and "chunk size": > > format(b"xyz", "h 1") -> '78 79 7a' > format(b"abcdwxyz", "h 4") -> '61626364 7778797a' > > format(b"xyz", "h,1") -> '78,79,7a' > format(b"abcdwxyz", "h,4") -> '61626364,7778797a' > > format(b"xyz", "h:1") -> '78:79:7a' > format(b"abcdwxyz", "h:4") -> '61626364:7778797a' > > In the "h" and "H" cases, you could request a preceding "0x" on the chunks: > > format(b"xyz", "h#") -> '0x78797a' > format(b"xyz", "h# 1") -> '0x78 0x79 0x7a' > format(b"abcdwxyz", "h# 4") -> '0x61626364 0x7778797a' > > The section before the format character would use the standard string > formatting rules: alignment, fill character, width, precision I think that's too confusing. For example, '#' is also allowed before the format character: [[fill]align][sign][#][0][width][,][.precision][type] And precision doesn't make sense for bytes (and is currently not allowed for int). So you'd instead have the complete format specifier be: [[fill]align][sign][#][0][width][type][#][internal-fill][chunksize] I think "sign" might have to go: it doesn't make sense. Not sure about "0". Let's say they both go, and we're left with: [[fill]align][width][type][#][internal-fill][chunksize] I'm not completely opposed to this, but I think we can do better. I see basically 3 options for byte format specifiers: 1. Support exactly what the standard types (int, str, float, etc.) support, but give slightly different semantics to it. This is what Nick originally proposed on the issue tracker. 2. Support a slightly different format specifier. This is what Nick proposes above, and I discuss more below. The downside of this is that it might be confusing to some users, who see the printf-like formatting as some universal standard. It's also hard to document. 3. Do something radically different. I gave an example on the issue tracker, but I'm not totally serious about this. Here's my proposal for #2: The format specifier becomes: [[fill]align][#][width][separator]][/chunksize][type] The only difference here (from the standard format specifier) is that I've substituted "/chunksize" for ".precision", and generalized the separator. I think "/" reads well as "divide into chunks this size". We might want to restrict "separator" to a few characters, maybe one of space, colon, dot, comma. I think Nick's usage of 'A', 'H', and 'h' for the "type" character is good, although I'd really prefer 'a'. And it's possible 'x' and 'X' would be less confusing (because it's more familiar), but maybe it does increase confusion. Let's keep 'h' and 'H' for now, just for discussion purposes. So, Nick's examples become: format(b"xyz", "a") -> 'xyz' format(b"xyz", "h") -> '78797a' format(b"xyz", "H") -> '78797A' Followed by a separator and "chunk size": format(b"xyz", "/1h") -> '78 79 7a' format(b"abcdwxyz", "/4h") -> '61626364 7778797a' format(b"xyz", ",/1h") -> '78,79,7a' format(b"abcdwxyz", ",/4h") -> '61626364,7778797a' format(b"xyz", ":/1h") -> '78:79:7a' format(b"abcdwxyz", ":/4h") -> '61626364:7778797a' format(b"xyz", "#h") -> '0x78797a' format(b"xyz", "#/1h") -> '0x78 0x79 0x7a' format(b"abcdwxyz", "#/4h") -> '0x61626364 0x7778797a' I really haven't thought through parsing this format specifier. Obviously "separator" will have some restrictions, like it can't be a slash. I'll have to give it some more thought. As with the standard format specifiers, there are some restrictions. 'a' couldn't have '#', for example. But I don't see why it couldn't have 'chunksize'. Eric. From abarnert at yahoo.com Wed Sep 17 16:51:22 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 17 Sep 2014 07:51:22 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <54197523.6080700@trueblade.com> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> Message-ID: On Sep 17, 2014, at 4:48, "Eric V. Smith" wrote: > 2. Support a slightly different format specifier. This is what Nick > proposes above, and I discuss more below. The downside of this is that > it might be confusing to some users, who see the printf-like formatting > as some universal standard. It's also hard to document. The possibility of confusion might be increased if some of the options to bytes look like they should work for str. People will ask, "I can chunk bytes into groups of 4 with /4, why can't I do that with characters when the rest of the format specifier is the same?" Also, are there other languages that use printf-style specifiers and have %x, with the same options as for int, working for their bytes type? IIRC Go lets you print strings as numbers, as if their UTF-8 representation were a giant big-endian integer; if it's just a consequence of little-used feature in a language that's nobody's first that probably won't add to confusion, but if it's more common and widespread, it might be worth either matching what the others do, or deliberately being as different as possible to prevent confusion. Nick's use of 'h' instead of 'x' and his rearrangement of the fields definitely avoids giving the appearance of being printf-like and any confusion that might cause, while still being able to share fields that make sense. But of course avoiding printf-like-ness means it's a new thing people have to learn. (Of course eventually they want to do something where the format isn't identical to printf, and many of them seem to go to StackOverflow or IRC and complain that there's a "bug in str.format" instead of just glancing at the docs, so maybe making them learn early isn't such a bad thing...) From erik.m.bray at gmail.com Wed Sep 17 17:32:55 2014 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 17 Sep 2014 11:32:55 -0400 Subject: [Python-ideas] float comparison in doctes In-Reply-To: References: <1410807740.45297.YahooMailNeo@web124505.mail.ne1.yahoo.com> <20140915203144.GY9293@ando.pearwood.info> Message-ID: On Mon, Sep 15, 2014 at 7:55 PM, Nick Coghlan wrote: > > On 16 Sep 2014 06:32, "Steven D'Aprano" wrote: >> >> On Mon, Sep 15, 2014 at 12:02:20PM -0700, Kevin Davies wrote: >> >> > It seems that this didn't reach the list directly (see >> > https://mail.python.org/pipermail/python-ideas/2014-August/028956.html), >> > so I'm resending: >> >> > >> > Erik Bray (the author of the +FLOAT_CMP extension in Astropy), Bruce >> > Leban, and I had a short off-thread email discussion. Here are the >> > points: >> > >> > - [Bruce]: ALMOST_EQUAL is the best flag name. >> > - [Erik]: If there's agreement on this, Erik will develop a patch as >> > soon as he can. >> > - [Erik]: There's no way to adjust the tolerance because there seems >> > to be no easy way to parameterize doctest flags. Ideas are welcome. >> >> With no way to choose between (at minimum) *four* different "almost >> equal" models, and no way to specify a tolerance, I don't think doctest >> ought to have such a directive. >> >> Almost equal can mean: >> >> - round and compare with == (as unittest does) >> - absolute difference >> - relative difference >> - ULP difference > > I think it's OK for a doctest flag to just provide the default behaviour > offered by unittest.TestCase.assertAlmostEqual. That aligns well with the > originally intended use case of testing examples in documentation. I think that sounds like a perfect compromise. As long as it's documented that "this is equivalent to assertAlmostEqual with the default arguments" it should be clear, and that would handle all the basic use cases this feature was intended for. Naming the flag ALMOST_EQUAL would also help make that clear. > If folks want more precise control, they can then switch to full unit tests. > It would be reasonable for the docs for the new flag to point that out > explicitly. Exactly--it's mostly a matter of making doctests more readable. Note, ellipses won't work at all in some cases either. I've had cases where we expected something like 1.0 and got 0.99999999..., for example. Erik From ram.rachum at gmail.com Wed Sep 17 18:42:51 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Wed, 17 Sep 2014 09:42:51 -0700 (PDT) Subject: [Python-ideas] Make `float('inf') //1 == float('inf')` Message-ID: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Please see this discussion on python-list: https://groups.google.com/forum/#!topic/comp.lang.python/maDZoc-n4bA Currently `float('inf') //1` is equal to NaN. I think that this is really weird. If I understand correctly it's to maintain the invariant `div*y + mod == x`. The question is, do we really care more about maintaining this invariant rather than providing a mathematically reasonable value for floor division? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From graffatcolmingov at gmail.com Wed Sep 17 23:52:05 2014 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Wed, 17 Sep 2014 16:52:05 -0500 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: ---------- Forwarded message ---------- From: Ian Cordasco Date: Wed, Sep 17, 2014 at 4:51 PM Subject: Re: [Python-ideas] Make `float('inf') //1 == float('inf')` To: Ram Rachum Cc: "python-ideas at googlegroups.com" On Wed, Sep 17, 2014 at 11:42 AM, Ram Rachum wrote: > Please see this discussion on python-list: > > https://groups.google.com/forum/#!topic/comp.lang.python/maDZoc-n4bA > > Currently `float('inf') //1` is equal to NaN. I think that this is really > weird. If I understand correctly it's to maintain the invariant `div*y + mod > == x`. The question is, do we really care more about maintaining this > invariant rather than providing a mathematically reasonable value for floor > division? Actually there are 2 things here: 1. Mathematically speaking, infinity is a real number and modulo arithmetic is algebraically not defined for it. So the "mathematically reasonable value" is NaN. Is it intuitive for someone who hasn't studied abstract algebra? Probably not. Is it functional for the scientific python community? Almost certainly although I won't pretend to speak on their behalf 2. Changing this behaviour is not something I think we should do in a minor version of 3.4 or in 3.5 (or really 3.x). From ncoghlan at gmail.com Thu Sep 18 00:08:31 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Sep 2014 08:08:31 +1000 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On 18 September 2014 07:52, Ian Cordasco wrote: > Actually there are 2 things here: > > 1. Mathematically speaking, infinity is a real number and modulo > arithmetic is algebraically not defined for it. So the "mathematically > reasonable value" is NaN. Is it intuitive for someone who hasn't > studied abstract algebra? Probably not. Is it functional for the > scientific python community? Almost certainly although I won't pretend > to speak on their behalf Right, as with NaN, infinity is a concept rather than a value. The fact that they both map to "kinda sorta values" in a programming language like Python is a limitation of the computer's underlying representational system, and the end result is a leaky abstraction that has some weird artefacts like this one (the fact that key lookup based containers enforce reflexivity, even though floating point NaN comparisons are explicitly defined as non-reflexive, is another). There's no real way to make floating point arithmetic "not surprising" as soon as NaN and infinities get involved (while I'd be surprised if anyone was inclined to dispute that, here's a fun link on the "infinite values" side that will hopefully deter the unduly optimistic: https://en.wikipedia.org/wiki/Aleph_number). > 2. Changing this behaviour is not something I think we should do in a > minor version of 3.4 or in 3.5 (or really 3.x). For a topic as inherently confusing as infinite values, I believe it would take a battery of extensive usability studies to make the case that any change in behaviour from the status quo would be worth the hassle, and most researchers are going to have more interesting things to do with their time. Some of the things we changed in the Python 3 transition (like rearranging modules) were based on intuition as to what would be easier for newcomers to learn, and if I learned anything from that, it's to rely more heavily on the ethos of "status quo wins a stalemate" when it comes to the way we represent concepts that are just plain hard to learn in their own right. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ram.rachum at gmail.com Thu Sep 18 01:02:44 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Wed, 17 Sep 2014 16:02:44 -0700 (PDT) Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` Message-ID: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> I suggest introducing a `start=1` argument to `math.factorial`, so the result would be (the C-optimized version of) `product(range(start, x+1), start=1)`. This'll be useful for combinatorical calculations. -------------- next part -------------- An HTML attachment was scrubbed... URL: From clint.hepner at gmail.com Thu Sep 18 05:04:29 2014 From: clint.hepner at gmail.com (Clint Hepner) Date: Wed, 17 Sep 2014 23:04:29 -0400 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> Message-ID: -1 on overloading math.factorial to compute something that isn't a factorial, but a falling factorial. Such a new function would be easy to add, though, if deemed useful. math.falling_factorial(x, n) = product(range(x - n + 1, x + 1)) and the similar function math.rising_factorial(x, n) = product(range(x, x+n)) On Wed, Sep 17, 2014 at 7:02 PM, Ram Rachum wrote: > I suggest introducing a `start=1` argument to `math.factorial`, so the > result would be (the C-optimized version of) `product(range(start, x+1), > start=1)`. This'll be useful for combinatorical calculations. > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Sep 18 06:13:50 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 18 Sep 2014 14:13:50 +1000 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> Message-ID: <20140918041349.GH9293@ando.pearwood.info> On Wed, Sep 17, 2014 at 04:02:44PM -0700, Ram Rachum wrote: > I suggest introducing a `start=1` argument to `math.factorial`, so the > result would be (the C-optimized version of) `product(range(start, x+1), > start=1)`. This'll be useful for combinatorical calculations. Then it wouldn't be the factorial function any more. There are lots of functions which could be useful for combinatorical calculations, including !n and n!!, do you think this particular one would be of broad enough interest that it deserves to be in the standard library? Do you know of any other programming languages which offer this "partial factorial" function in their standard library? -- Steven From stephen at xemacs.org Thu Sep 18 06:21:56 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 18 Sep 2014 13:21:56 +0900 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> Message-ID: <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > The possibility of confusion might be increased if some of the > options to bytes look like they should work for str. People will > ask, "I can chunk bytes into groups of 4 with /4, why can't I do > that with characters when the rest of the format specifier is the > same?" Isn't the answer to that kind of question "because you haven't written the PEP yet?" Or "Repeat after me, 'bytes are not str' ... Very good, now do a set of 100 before each meal for a week." After all, there are things you can do with integer or float formats that you can't do with str and vice versa. bytes are indeed very similar to str as streams of code units (octets vs. characters), but the specific usages for human-oriented text (including such unnatural languages as C and Perl) require some differences in semantics. The sooner people get comfortable with that, the better, of course, but I don't think the language should be prevented from evolving because many people are going to take a while to get the difference and its importance. > (Of course eventually they want to do something where the format > isn't identical to printf, and many of them seem to go to > StackOverflow or IRC and complain that there's a "bug in > str.format" instead of just glancing at the docs, so maybe making > them learn early isn't such a bad thing...) Obviously, given the snotty remark above, I sympathize. But I doubt it's really going to help that. It's just going to give them one more thing to complain about. From abarnert at yahoo.com Thu Sep 18 08:05:30 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 17 Sep 2014 23:05:30 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> On Sep 17, 2014, at 21:21, "Stephen J. Turnbull" wrote: > Andrew Barnert writes: > >> The possibility of confusion might be increased if some of the >> options to bytes look like they should work for str. People will >> ask, "I can chunk bytes into groups of 4 with /4, why can't I do >> that with characters when the rest of the format specifier is the >> same?" > > Isn't the answer to that kind of question "because you haven't written > the PEP yet?" > > Or "Repeat after me, 'bytes are not str' ... Very good, now do a set > of 100 before each meal for a week." As long as you don't ask for a set of 100 bytearrays, because they're not hashable. > After all, there are things you > can do with integer or float formats that you can't do with str and > vice versa. > > bytes are indeed very similar to str as streams of code units (octets > vs. characters), but the specific usages for human-oriented text > (including such unnatural languages as C and Perl) require some > differences in semantics. The sooner people get comfortable with > that, the better, of course, but I don't think the language should be > prevented from evolving because many people are going to take a while > to get the difference and its importance. I think we agree on all of that. (By the way, is there a word for that Unicode ignorance and confusion? Something like "illiteracy" and "innumeracy", but probably spelled with a non-BMP character, maybe U+1F4A9?) My point is that, given a choice between two APIs, one which reinforces the illusion that bytes are text and one which doesn't, the latter gets points. (And similarly for format vs. printf.) Of course on the other hand, when str and bytes really _are_ perfect parallels in some way, making them gratuitously inconsistent just adds more things to learn and memorize. At this point, I'm not sure that adds up to an argument for Nick's less-str-like version of his original proposal, or against it, but I'm pretty sure it's a good argument for one or other... >> (Of course eventually they want to do something where the format >> isn't identical to printf, and many of them seem to go to >> StackOverflow or IRC and complain that there's a "bug in >> str.format" instead of just glancing at the docs, so maybe making >> them learn early isn't such a bad thing...) > > Obviously, given the snotty remark above, I sympathize. But I doubt > it's really going to help that. It's just going to give them one more > thing to complain about. Yes, people can be amazingly good at avoiding learning. From ncoghlan at gmail.com Thu Sep 18 08:15:38 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Sep 2014 16:15:38 +1000 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: <20140918041349.GH9293@ando.pearwood.info> References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> Message-ID: On 18 September 2014 14:13, Steven D'Aprano wrote: > On Wed, Sep 17, 2014 at 04:02:44PM -0700, Ram Rachum wrote: >> I suggest introducing a `start=1` argument to `math.factorial`, so the >> result would be (the C-optimized version of) `product(range(start, x+1), >> start=1)`. This'll be useful for combinatorical calculations. > > Then it wouldn't be the factorial function any more. > > There are lots of functions which could be useful for combinatorical > calculations, including !n and n!!, do you think this particular one > would be of broad enough interest that it deserves to be in the standard > library? > > Do you know of any other programming languages which offer this "partial > factorial" function in their standard library? It's also worth noting that "pip install mpmath" will provide rising and falling factorials (http://mpmath.org/doc/current/functions/gamma.html#rising-and-falling-factorials) and a whole lot more. There's no need to add such complexity to the standard library. However, now that CPython ships with pip by default, we may want to consider providing more explicit pointers to such "If you want more advanced functionality than the standard library provides" libraries. Yes, that may be contentious in the near term as folks argue over which "stdlib++" modules to recommend, but in some cases there are clear "next step beyond the standard library" category winners that are worth introducing to newcomers, rather than making them do their own research. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu Sep 18 09:30:59 2014 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 18 Sep 2014 09:30:59 +0200 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: For the record, this gives inf in Numpy. >>> import numpy >>> numpy.array(float('inf')) // 1 inf AFAIK this and http://bugs.python.org/issue22198 are the only differences from Python floats, at least on my machine. On Thu, Sep 18, 2014 at 12:08 AM, Nick Coghlan wrote: > On 18 September 2014 07:52, Ian Cordasco wrote: >> Actually there are 2 things here: >> >> 1. Mathematically speaking, infinity is a real number and modulo >> arithmetic is algebraically not defined for it. So the "mathematically >> reasonable value" is NaN. Is it intuitive for someone who hasn't >> studied abstract algebra? Probably not. Is it functional for the >> scientific python community? Almost certainly although I won't pretend >> to speak on their behalf > > Right, as with NaN, infinity is a concept rather than a value. The > fact that they both map to "kinda sorta values" in a programming > language like Python is a limitation of the computer's underlying > representational system, and the end result is a leaky abstraction > that has some weird artefacts like this one (the fact that key lookup > based containers enforce reflexivity, even though floating point NaN > comparisons are explicitly defined as non-reflexive, is another). > > There's no real way to make floating point arithmetic "not surprising" > as soon as NaN and infinities get involved (while I'd be surprised if > anyone was inclined to dispute that, here's a fun link on the > "infinite values" side that will hopefully deter the unduly > optimistic: https://en.wikipedia.org/wiki/Aleph_number). > >> 2. Changing this behaviour is not something I think we should do in a >> minor version of 3.4 or in 3.5 (or really 3.x). > > For a topic as inherently confusing as infinite values, I believe it > would take a battery of extensive usability studies to make the case > that any change in behaviour from the status quo would be worth the > hassle, and most researchers are going to have more interesting things > to do with their time. > > Some of the things we changed in the Python 3 transition (like > rearranging modules) were based on intuition as to what would be > easier for newcomers to learn, and if I learned anything from that, > it's to rely more heavily on the ethos of "status quo wins a > stalemate" when it comes to the way we represent concepts that are > just plain hard to learn in their own right. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From stephen at xemacs.org Thu Sep 18 13:21:16 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 18 Sep 2014 20:21:16 +0900 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> Message-ID: <878ulh18er.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > (By the way, is there a word for that Unicode ignorance and > confusion? Something like "illiteracy" and "innumeracy", but > probably spelled with a non-BMP character, maybe U+1F4A9?) "Non-superhuman." "Noncharacter" is a case in point. And yes, it's properly spelled with U+1F4A9, but my spellchecker has "parental controls" and I can't enter it. [various perceptive comments elided] > At this point, I'm not sure that adds up to an argument for Nick's > less-str-like version of his original proposal, or against it, but > I'm pretty sure it's a good argument for one or other... That's exactly the way I feel. So I would say "damn the torpedos" and "Just Do It" and if it's wrong we'll fix it in the mythical-never-to- be-implemented-and-so-unmentionable-that-Big-Brother-will-undoubtedly- come-take-me-away Python 4000. Of course we should wait to see if Guido or other reliable oracle has a particular opinion, but I really don't think we're going to get proof without trying. From ncoghlan at gmail.com Thu Sep 18 13:51:35 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Sep 2014 21:51:35 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <878ulh18er.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> <878ulh18er.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 18 Sep 2014 21:22, "Stephen J. Turnbull" wrote: > > Of course we should wait to see if Guido or other reliable oracle has > a particular opinion, but I really don't think we're going to get > proof without trying. 3.5 is still a year or so away, so we have time to ponder the details. I do think it's a good direction to be considering, though. Note also that this is something that could (and probably should) be experimented with on PyPI via a wrapper class that iterated over a wrapped buffer exporter in __format__. Cheers, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 18 13:51:38 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Sep 2014 21:51:38 +1000 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <878ulh18er.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> <878ulh18er.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 18 Sep 2014 21:22, "Stephen J. Turnbull" wrote: > > Of course we should wait to see if Guido or other reliable oracle has > a particular opinion, but I really don't think we're going to get > proof without trying. 3.5 is still a year or so away, so we have time to ponder the details. I do think it's a good direction to be considering, though. Note also that this is something that could (and probably should) be experimented with on PyPI via a wrapper class that iterated over a wrapped buffer exporter in __format__. Cheers, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From graffatcolmingov at gmail.com Thu Sep 18 15:38:58 2014 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Thu, 18 Sep 2014 08:38:58 -0500 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Sep 18, 2014 2:31 AM, "Petr Viktorin" wrote: > > For the record, this gives inf in Numpy. > > >>> import numpy > >>> numpy.array(float('inf')) // 1 > inf > > AFAIK this and http://bugs.python.org/issue22198 are the only > differences from Python floats, at least on my machine. That's an interesting bug report and it's significantly different (mathematically speaking) from the discussion here. That aside, I have to wonder if numpy has its own way of representing infinity and how that behaves. I still maintain that it's least surprising for float('inf') // 1 to be NaN. You're trying to satisfy float('inf') = mod + 1 * y and in this case mod and y are both indeterminate (because this is basically a nonsensical equation). -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Thu Sep 18 16:09:26 2014 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 18 Sep 2014 16:09:26 +0200 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 3:38 PM, Ian Cordasco wrote: > > On Sep 18, 2014 2:31 AM, "Petr Viktorin" wrote: >> >> For the record, this gives inf in Numpy. >> >> >>> import numpy >> >>> numpy.array(float('inf')) // 1 >> inf >> >> AFAIK this and http://bugs.python.org/issue22198 are the only >> differences from Python floats, at least on my machine. > > That's an interesting bug report and it's significantly different > (mathematically speaking) from the discussion here. That aside, I have to > wonder if numpy has its own way of representing infinity and how that > behaves. I still maintain that it's least surprising for float('inf') // 1 > to be NaN. You're trying to satisfy float('inf') = mod + 1 * y and in this > case mod and y are both indeterminate (because this is basically a > nonsensical equation). Well, in `x = y // a`, as y tends towards infinity, x will also tend towards infinity, though in discrete steps. Yes, you get an indeterminate value, but one that's larger than any real number. Are any Numpy developers around? Is there a reason it has different behavior from Python? From tjreedy at udel.edu Thu Sep 18 16:14:18 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 18 Sep 2014 10:14:18 -0400 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> Message-ID: On 9/18/2014 2:15 AM, Nick Coghlan wrote: > On 18 September 2014 14:13, Steven D'Aprano wrote: >> On Wed, Sep 17, 2014 at 04:02:44PM -0700, Ram Rachum wrote: >>> I suggest introducing a `start=1` argument to `math.factorial`, so the >>> result would be (the C-optimized version of) `product(range(start, x+1), >>> start=1)`. This'll be useful for combinatorical calculations. >> >> Then it wouldn't be the factorial function any more. >> >> There are lots of functions which could be useful for combinatorical >> calculations, including !n and n!!, do you think this particular one >> would be of broad enough interest that it deserves to be in the standard >> library? >> >> Do you know of any other programming languages which offer this "partial >> factorial" function in their standard library? > > It's also worth noting that "pip install mpmath" will provide rising > and falling factorials > (http://mpmath.org/doc/current/functions/gamma.html#rising-and-falling-factorials) > and a whole lot more. There's no need to add such complexity to the > standard library. > > However, now that CPython ships with pip by default, we may want to > consider providing more explicit pointers to such "If you want more > advanced functionality than the standard library provides" libraries. Having used pip install a few times, I have begun to regard pip-installable packages as almost being extensions of the stdlib. I think the main remaining problem is equally easy access to documentation as to the code. It would be nice, for instance, if /Doc, like /Lib, had a site-packages subdirectory with an index.html updated by pip with a link to either a package.html put in the directory or an external file, such as one at readthedocs.org. If there were something like this, I would add an item to Idle's help menu. > Yes, that may be contentious in the near term as folks argue over > which "stdlib++" modules to recommend, but in some cases there are > clear "next step beyond the standard library" category winners that > are worth introducing to newcomers, rather than making them do their > own research. Choosing 1 *or more* packages to list should not be more contentions than choosing just 1 package to add to the stdlib. -- Terry Jan Reedy From steve at pearwood.info Thu Sep 18 16:45:50 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Sep 2014 00:45:50 +1000 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> Message-ID: <20140918144549.GK9293@ando.pearwood.info> On Thu, Sep 18, 2014 at 10:14:18AM -0400, Terry Reedy wrote: > On 9/18/2014 2:15 AM, Nick Coghlan wrote: > >However, now that CPython ships with pip by default, we may want to > >consider providing more explicit pointers to such "If you want more > >advanced functionality than the standard library provides" libraries. > > Having used pip install a few times, I have begun to regard > pip-installable packages as almost being extensions of the stdlib. Sounds great, but let's not get carried away. Remember that many people, for reasons of company policy, cannot easily, or at all, install unapproved software. Whether for good or bad reasons, they're still stuck with what is in the std lib and nothing else. -- Steven From graffatcolmingov at gmail.com Thu Sep 18 17:20:19 2014 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Thu, 18 Sep 2014 10:20:19 -0500 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 9:09 AM, Petr Viktorin wrote: > On Thu, Sep 18, 2014 at 3:38 PM, Ian Cordasco > wrote: >> >> On Sep 18, 2014 2:31 AM, "Petr Viktorin" wrote: >>> >>> For the record, this gives inf in Numpy. >>> >>> >>> import numpy >>> >>> numpy.array(float('inf')) // 1 >>> inf >>> >>> AFAIK this and http://bugs.python.org/issue22198 are the only >>> differences from Python floats, at least on my machine. >> >> That's an interesting bug report and it's significantly different >> (mathematically speaking) from the discussion here. That aside, I have to >> wonder if numpy has its own way of representing infinity and how that >> behaves. I still maintain that it's least surprising for float('inf') // 1 >> to be NaN. You're trying to satisfy float('inf') = mod + 1 * y and in this >> case mod and y are both indeterminate (because this is basically a >> nonsensical equation). > > Well, in `x = y // a`, as y tends towards infinity, x will also tend > towards infinity, though in discrete steps. Yes, you get an > indeterminate value, but one that's larger than any real number. Sorry? If you've studied mathematics you'd know there's no discrete value that is the same as infinity. I'm not even sure how anyone could begin define floor(infinity). Infinity is not present in any discrete set. Yes float('inf') / 1 should be float('inf'), no one is arguing that. That's easily shown through limits. floor(float('inf') / 1) has no intrinsic meaning. Discrete sets such as the naturals, integers, and rationals are all "countably infinite" but there's no bijective mapping between them and the real numbers (and therefore, no such mapping to the complex numbers) because the are uncountably infinite real numbers. I understand that intuitively float('inf') // 1 being equal to infinity is nice, but it is inherently undefined. We don't really have the concept of undefined so NaN seems most acceptable. > Are any Numpy developers around? Is there a reason it has different > behavior from Python? I expect because of np.array semantics it is different. I'm not sure if it's intentional or if it's a bug, but I'm curious as well. From encukou at gmail.com Thu Sep 18 17:23:39 2014 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 18 Sep 2014 17:23:39 +0200 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: <20140918144549.GK9293@ando.pearwood.info> References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> <20140918144549.GK9293@ando.pearwood.info> Message-ID: On Thu, Sep 18, 2014 at 4:45 PM, Steven D'Aprano wrote: > On Thu, Sep 18, 2014 at 10:14:18AM -0400, Terry Reedy wrote: >> On 9/18/2014 2:15 AM, Nick Coghlan wrote: > >> >However, now that CPython ships with pip by default, we may want to >> >consider providing more explicit pointers to such "If you want more >> >advanced functionality than the standard library provides" libraries. >> >> Having used pip install a few times, I have begun to regard >> pip-installable packages as almost being extensions of the stdlib. > > Sounds great, but let's not get carried away. Remember that many people, > for reasons of company policy, cannot easily, or at all, install > unapproved software. Whether for good or bad reasons, they're still > stuck with what is in the std lib and nothing else. Not just company policy -- it can be licencing issues. Or just general trust/paranoia -- installing packages from PyPI just because they look useful is not the most secure thing to do. Another reason is sustainability -- I trust Python won't go unmaintained in a few years, and the few necessary breaking API changes will be well thought out and properly announced. For a PyPI project, there are no expectations. Even if it is well run (which would presumably be a requirement to land in a "stdlib++" list), you need to gauge an extra project's health, and keep up with an extra release note stream. I believe that's what Nick meant by "[doing] research". Listing "stdlib++" projects would mean vouching for them, even if only implicitly. Indeed, let's not get too carried away. From p.f.moore at gmail.com Thu Sep 18 17:37:23 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 18 Sep 2014 16:37:23 +0100 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> <20140918144549.GK9293@ando.pearwood.info> Message-ID: On 18 September 2014 16:23, Petr Viktorin wrote: > Listing "stdlib++" projects would mean vouching for them, even if only > implicitly. Indeed, let's not get too carried away. Nevertheless, there is community knowledge "out there" on what constitute best of breed packages. For example "everyone knows" that requests is the thing to use if you want to issue web requests. And equally, requests is "clearly" well-maintained and something you can rely on. Collecting that knowledge together somewhere so that people for whom the above is *not* self-evident could easily find it, would be a worthwhile exercise. Paul. From python at mrabarnett.plus.com Thu Sep 18 17:40:41 2014 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 18 Sep 2014 16:40:41 +0100 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: <541AFCF9.9030407@mrabarnett.plus.com> On 2014-09-18 16:20, Ian Cordasco wrote: > On Thu, Sep 18, 2014 at 9:09 AM, Petr Viktorin wrote: >> On Thu, Sep 18, 2014 at 3:38 PM, Ian Cordasco >> wrote: >>> >>> On Sep 18, 2014 2:31 AM, "Petr Viktorin" wrote: >>>> >>>> For the record, this gives inf in Numpy. >>>> >>>> >>> import numpy >>>> >>> numpy.array(float('inf')) // 1 >>>> inf >>>> >>>> AFAIK this and http://bugs.python.org/issue22198 are the only >>>> differences from Python floats, at least on my machine. >>> >>> That's an interesting bug report and it's significantly different >>> (mathematically speaking) from the discussion here. That aside, I have to >>> wonder if numpy has its own way of representing infinity and how that >>> behaves. I still maintain that it's least surprising for float('inf') // 1 >>> to be NaN. You're trying to satisfy float('inf') = mod + 1 * y and in this >>> case mod and y are both indeterminate (because this is basically a >>> nonsensical equation). >> >> Well, in `x = y // a`, as y tends towards infinity, x will also tend >> towards infinity, though in discrete steps. Yes, you get an >> indeterminate value, but one that's larger than any real number. > > Sorry? If you've studied mathematics you'd know there's no discrete > value that is the same as infinity. [snip] He didn't say that infinity was a discrete value, he said that x will tend towards infinity in discrete steps. From alexander.belopolsky at gmail.com Thu Sep 18 17:48:43 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 18 Sep 2014 11:48:43 -0400 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 11:20 AM, Ian Cordasco wrote: > I understand that intuitively float('inf') // 1 being equal to > infinity is nice, but it is inherently undefined. We don't really have > the concept of undefined so NaN seems most acceptable. > The most acceptable would be to have x // 1 do the same as math.floor(x) for any float x. Note that math.floor() behavior changed from 2.x to 3.x: Python 2.7.7: >>> math.floor(float('inf')) inf Python 3.4.1: >>> math.floor(float('inf')) Traceback (most recent call last): File "", line 1, in OverflowError: cannot convert float infinity to integer It looks like float // float case was overlooked when it was decided that math.floor() should return int. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Sep 18 17:55:53 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 18 Sep 2014 17:55:53 +0200 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> <20140918144549.GK9293@ando.pearwood.info> Message-ID: <20140918175553.26040a79@fsol> On Thu, 18 Sep 2014 16:37:23 +0100 Paul Moore wrote: > On 18 September 2014 16:23, Petr Viktorin wrote: > > Listing "stdlib++" projects would mean vouching for them, even if only > > implicitly. Indeed, let's not get too carried away. > > Nevertheless, there is community knowledge "out there" on what > constitute best of breed packages. For example "everyone knows" that > requests is the thing to use if you want to issue web requests. Is it? That sounds like a caricatural statement. If I'm using Tornado, Twisted or asyncio, then requests is certainly not "the thing to use" to issue Web requests. And there are many cases where urlopen() is good enough, as well. Not to mention other contenders such as pycurl. > Collecting that knowledge together somewhere so that people for whom > the above is *not* self-evident could easily find it, would be a > worthwhile exercise. If it's community knowledge, then surely that job can be done by the community. I don't think Python's official documentation is the right place to reify that knowledge. Regards Antoine. From alexander.belopolsky at gmail.com Thu Sep 18 17:58:17 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 18 Sep 2014 11:58:17 -0400 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 11:48 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > It looks like float // float case was overlooked when it was decided that > math.floor() should return int. Note that PEP 3141 specifies that Real.__floordiv__ should return an integer. http://legacy.python.org/dev/peps/pep-3141/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Thu Sep 18 17:59:02 2014 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 18 Sep 2014 17:59:02 +0200 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 5:20 PM, Ian Cordasco wrote: > On Thu, Sep 18, 2014 at 9:09 AM, Petr Viktorin wrote: >> On Thu, Sep 18, 2014 at 3:38 PM, Ian Cordasco >> wrote: >>> >>> On Sep 18, 2014 2:31 AM, "Petr Viktorin" wrote: >>>> >>>> For the record, this gives inf in Numpy. >>>> >>>> >>> import numpy >>>> >>> numpy.array(float('inf')) // 1 >>>> inf >>>> >>>> AFAIK this and http://bugs.python.org/issue22198 are the only >>>> differences from Python floats, at least on my machine. >>> >>> That's an interesting bug report and it's significantly different >>> (mathematically speaking) from the discussion here. That aside, I have to >>> wonder if numpy has its own way of representing infinity and how that >>> behaves. I still maintain that it's least surprising for float('inf') // 1 >>> to be NaN. You're trying to satisfy float('inf') = mod + 1 * y and in this >>> case mod and y are both indeterminate (because this is basically a >>> nonsensical equation). >> >> Well, in `x = y // a`, as y tends towards infinity, x will also tend >> towards infinity, though in discrete steps. Yes, you get an >> indeterminate value, but one that's larger than any real number. > > Sorry? If you've studied mathematics you'd know there's no discrete > value that is the same as infinity. I'm not even sure how anyone could > begin define floor(infinity). Infinity is not present in any discrete > set. Yes float('inf') / 1 should be float('inf'), no one is arguing > that. That's easily shown through limits. floor(float('inf') / 1) has > no intrinsic meaning. Discrete sets such as the naturals, integers, > and rationals are all "countably infinite" but there's no bijective > mapping between them and the real numbers (and therefore, no such > mapping to the complex numbers) because the are uncountably infinite > real numbers. There is even no *real* number that is the same as infinity. The result of // on floats is a float; discrete sets don't come into play. Practically, floor(f) is f minus a value between 0 and 1. If you have (x1 = y1 // a) and (x2 = y2 // a) and (y1 < y2 - a), then (x1 < x2); as y grows without bound, x will also grow without bound. Correct me if I'm wrong, I did study math but it is rusty. > I understand that intuitively float('inf') // 1 being equal to > infinity is nice, but it is inherently undefined. We don't really have > the concept of undefined so NaN seems most acceptable. I disagree with is the conclusion that NaN is the most appropriate. But the discussion is starting to go in circles here, and I actually don't much care for the actual result. >> Are any Numpy developers around? Is there a reason it has different >> behavior from Python? > > I expect because of np.array semantics it is different. I'm not sure > if it's intentional or if it's a bug, but I'm curious as well. What I do care about is that Python and Numpy should give the same result. It would be nice to see this changed in either Python or Numpy, whatever the "correct" result is. From abarnert at yahoo.com Thu Sep 18 18:10:40 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 18 Sep 2014 09:10:40 -0700 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> <20140918144549.GK9293@ando.pearwood.info> Message-ID: <1114D326-304E-46C8-B04A-48F13F9E0D38@yahoo.com> On Sep 18, 2014, at 8:37, Paul Moore wrote: > On 18 September 2014 16:23, Petr Viktorin wrote: >> Listing "stdlib++" projects would mean vouching for them, even if only >> implicitly. Indeed, let's not get too carried away. > > Nevertheless, there is community knowledge "out there" on what > constitute best of breed packages. For example "everyone knows" that > requests is the thing to use if you want to issue web requests. Except in those cases that requests actually makes harder, like trying to send files over SOAP+MIME. Or when you can most easily explain what you want in terms of libcurl code or a curl command line. But as Terry pointed out earlier in the thread, one advantage of the "stdlib++" idea is that you don't have to pick one, you can pick one or more. If the urllib.request docs said that requests makes easy cases, and even many pretty complex ones, easy; urllib3 provides as much flexibility as possible in a stdlib-like interface for those rare cases that requests can't make easy; pycurl makes it easier to translate web requests from C programs or shell scripts; etc., then there's no problem. > And > equally, requests is "clearly" well-maintained and something you can > rely on. If Kenneth got hit by a bus, requests would be in more trouble than something in the stdlib would if Guido, or the module's maintainer, did. The risk isn't _high_--it's certainly never deterred me from using it, or even convincing managers in corporate settings that we should use it--but that doesn't mean it's not _higher than the stdlib_. > Collecting that knowledge together somewhere so that people for whom > the above is *not* self-evident could easily find it, would be a > worthwhile exercise. Agreed. From steve at pearwood.info Thu Sep 18 18:17:36 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Sep 2014 02:17:36 +1000 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: <20140918161735.GO9293@ando.pearwood.info> On Thu, Sep 18, 2014 at 05:59:02PM +0200, Petr Viktorin wrote: [...] > > Sorry? If you've studied mathematics you'd know there's no discrete > > value that is the same as infinity. I'm not even sure how anyone could > > begin define floor(infinity). Infinity is not present in any discrete > > set. Yes float('inf') / 1 should be float('inf'), no one is arguing > > that. That's easily shown through limits. floor(float('inf') / 1) has > > no intrinsic meaning. Discrete sets such as the naturals, integers, > > and rationals are all "countably infinite" but there's no bijective > > mapping between them and the real numbers (and therefore, no such > > mapping to the complex numbers) because the are uncountably infinite > > real numbers. > > There is even no *real* number that is the same as infinity. Correct. Once we start talking about a value representing infinity, we're being impure and mathematically dubious, regardless of whether we have ints or floats. But more than that, an IEEE-754 infinity doesn't just represent mathematical infinity, but also finite numbers which overflowed. Hence: py> 1.7976931348623157e+308 * 1.1 inf I don't think it is clear what inf//1 should return, which suggests to me that returning a NAN is perhaps less wrong than returning inf. (If we really don't know how to interpret inf//1, then a NAN, or an exception, is the right answer.) [...] > What I do care about is that Python and Numpy should give the same > result. It would be nice to see this changed in either Python or > Numpy, whatever the "correct" result is. That is reasonable. -- Steven From abarnert at yahoo.com Thu Sep 18 19:01:01 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 18 Sep 2014 10:01:01 -0700 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> Message-ID: <444A4235-3211-48A1-B32E-8DF5817F6EF5@yahoo.com> On Sep 17, 2014, at 23:15, Nick Coghlan wrote: > However, now that CPython ships with pip by default, we may want to > consider providing more explicit pointers to such "If you want more > advanced functionality than the standard library provides" libraries. >> libraries. I love this idea, but there's one big potential problem, and one smaller one. Many of the most popular and useful packages require C extensions. In itself, that doesn't have to be a problem; if you provide wheels for the official 3.4+ Win32, Win64, and Mac64 CPython builds, it can still be as simple as `pip install spam` for most users, including the ones with the least ability to figure it out for themselves. But what about packages that require third-party C libraries? Does lxml have to have wheels that statically link libxml2, or that download the DLLs at install time for Windows users, or some other such solution before it can be recommended? Many of the most popular packages fall into similar situations, but lxml may be the most obvious because many of its users don't think of it as a wrapper around libxml2, they just think of it as a better ElementTree (or even a thing that magically makes BeautifulSoup work better). Also, is it acceptable to recommend packages whose C extension modules don't work, or don't work well, with PyPy? > Yes, that may be contentious in the near term as folks argue over > which "stdlib++" modules to recommend, but in some cases there are > clear "next step beyond the standard library" category winners that > are worth introducing to newcomers, rather than making them do their > own research. There are plenty of clear winners that are worth introducing to newcomers, but aren't the next step beyond a particular module. In fact, I think that's the case for _most_ of them. pytz, dateutil, requests, urllib3, pycurl, and maybe more-itertools or blist and a couple of math libs, the main things people are going to want to find (not counting frameworks like Django or Scrapy or PySide) are things like NumPy, Pandas, Pillow, PyYAML, lxml, BeautifulSoup, PyWin32, PyObjC, pyparsing, paramiko, ? and where do any of those go? Does this mean we have to add pages in the docs for things the stdlib doesn't do, just to provide external references? Or turn the chapter-header blurbs into real pages? Or reorganize the docs more dramatically? Or just leave out some of the most prominent and useful libraries on PyPI just because they don't fit anywhere, while mentioning others? From alexander.belopolsky at gmail.com Thu Sep 18 19:04:50 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 18 Sep 2014 13:04:50 -0400 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: <20140918161735.GO9293@ando.pearwood.info> References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> <20140918161735.GO9293@ando.pearwood.info> Message-ID: On Thu, Sep 18, 2014 at 12:17 PM, Steven D'Aprano wrote: > > What I do care about is that Python and Numpy should give the same > > result. It would be nice to see this changed in either Python or > > Numpy, whatever the "correct" result is. > > That is reasonable. Consistency with NumPy cannot aways be achieved. Note that numpy.floor returns floats while math.floor returns ints. (NumPy cannot follow stdlib here because it does not have arbitrary precision integers.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Thu Sep 18 19:32:25 2014 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 18 Sep 2014 19:32:25 +0200 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> <20140918161735.GO9293@ando.pearwood.info> Message-ID: On Thu, Sep 18, 2014 at 7:04 PM, Alexander Belopolsky wrote: > > On Thu, Sep 18, 2014 at 12:17 PM, Steven D'Aprano > wrote: >> >> > What I do care about is that Python and Numpy should give the same >> > result. It would be nice to see this changed in either Python or >> > Numpy, whatever the "correct" result is. >> >> That is reasonable. > > > Consistency with NumPy cannot aways be achieved. Note that numpy.floor > returns floats while math.floor returns ints. (NumPy cannot follow stdlib > here because it does not have arbitrary precision integers.) Right. That particular difference has a reason behind it, however. There is no reason to differ here, except perhaps backwards compatibility. And if backwards compatibility is the reason for Python not changing, it should be documented (as an unfortunate mistake) so other implementations do the same thing. And it should have a test (which I'm happy to write once a decision is reached -- wanting to write a test for floor floatdiv is largely why I'm here debating this). From graffatcolmingov at gmail.com Thu Sep 18 20:10:58 2014 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Thu, 18 Sep 2014 13:10:58 -0500 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: <1114D326-304E-46C8-B04A-48F13F9E0D38@yahoo.com> References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> <20140918144549.GK9293@ando.pearwood.info> <1114D326-304E-46C8-B04A-48F13F9E0D38@yahoo.com> Message-ID: On Thu, Sep 18, 2014 at 11:10 AM, Andrew Barnert wrote: > On Sep 18, 2014, at 8:37, Paul Moore wrote: > >> On 18 September 2014 16:23, Petr Viktorin wrote: >>> Listing "stdlib++" projects would mean vouching for them, even if only >>> implicitly. Indeed, let's not get too carried away. >> >> Nevertheless, there is community knowledge "out there" on what >> constitute best of breed packages. For example "everyone knows" that >> requests is the thing to use if you want to issue web requests. > > Except in those cases that requests actually makes harder, like trying to send files over SOAP+MIME. Or when you can most easily explain what you want in terms of libcurl code or a curl command line. > > But as Terry pointed out earlier in the thread, one advantage of the "stdlib++" idea is that you don't have to pick one, you can pick one or more. If the urllib.request docs said that requests makes easy cases, and even many pretty complex ones, easy; urllib3 provides as much flexibility as possible in a stdlib-like interface for those rare cases that requests can't make easy; pycurl makes it easier to translate web requests from C programs or shell scripts; etc., then there's no problem. > >> And >> equally, requests is "clearly" well-maintained and something you can >> rely on. > > If Kenneth got hit by a bus, requests would be in more trouble than something in the stdlib would if Guido, or the module's maintainer, did. The risk isn't _high_--it's certainly never deterred me from using it, or even convincing managers in corporate settings that we should use it--but that doesn't mean it's not _higher than the stdlib_. Seeing as Kenneth has two core-developers working on it, one of whom works on it as part of their contributions to OpenStack, I think your estimation of the bus number is too low. Granted either Cory or I need the ability to push to PyPI, but I think Richard and Donald know Cory and I well enough to trust us to maintain the package as we currently do. From casevh at gmail.com Thu Sep 18 19:27:52 2014 From: casevh at gmail.com (Case Van Horsen) Date: Thu, 18 Sep 2014 10:27:52 -0700 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 8:20 AM, Ian Cordasco wrote: > On Thu, Sep 18, 2014 at 9:09 AM, Petr Viktorin wrote: >> On Thu, Sep 18, 2014 at 3:38 PM, Ian Cordasco >> wrote: >>> >>> On Sep 18, 2014 2:31 AM, "Petr Viktorin" wrote: >>>> >>>> For the record, this gives inf in Numpy. >>>> >>>> >>> import numpy >>>> >>> numpy.array(float('inf')) // 1 >>>> inf >>>> >>>> AFAIK this and http://bugs.python.org/issue22198 are the only >>>> differences from Python floats, at least on my machine. >>> >>> That's an interesting bug report and it's significantly different >>> (mathematically speaking) from the discussion here. That aside, I have to >>> wonder if numpy has its own way of representing infinity and how that >>> behaves. I still maintain that it's least surprising for float('inf') // 1 >>> to be NaN. You're trying to satisfy float('inf') = mod + 1 * y and in this >>> case mod and y are both indeterminate (because this is basically a >>> nonsensical equation). >> >> Well, in `x = y // a`, as y tends towards infinity, x will also tend >> towards infinity, though in discrete steps. Yes, you get an >> indeterminate value, but one that's larger than any real number. > > Sorry? If you've studied mathematics you'd know there's no discrete > value that is the same as infinity. I'm not even sure how anyone could > begin define floor(infinity). Infinity is not present in any discrete > set. Yes float('inf') / 1 should be float('inf'), no one is arguing > that. That's easily shown through limits. floor(float('inf') / 1) has > no intrinsic meaning. Discrete sets such as the naturals, integers, > and rationals are all "countably infinite" but there's no bijective > mapping between them and the real numbers (and therefore, no such > mapping to the complex numbers) because the are uncountably infinite > real numbers. > > I understand that intuitively float('inf') // 1 being equal to > infinity is nice, but it is inherently undefined. We don't really have > the concept of undefined so NaN seems most acceptable. > >> Are any Numpy developers around? Is there a reason it has different >> behavior from Python? > > I expect because of np.array semantics it is different. I'm not sure > if it's intentional or if it's a bug, but I'm curious as well. The ISO C99/C11 (Annex F) (and POSIX and IEEE-754) standards define a function called "roundToIntegralTowardsNegative". For the special value +Inf, it specifies a return value of +Inf. However, the integral value returned is still an IEEE-754 formatted value and can return +Inf. PEP-3141 changed the behavior of math.floor() (and __floor__ in general) to return an actual integer. That makes it impossible to comply with ISO etc. standards. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Thu Sep 18 20:26:56 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 18 Sep 2014 19:26:56 +0100 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) Message-ID: On 18 September 2014 18:01, Andrew Barnert wrote: > On Sep 17, 2014, at 23:15, Nick Coghlan wrote: > >> However, now that CPython ships with pip by default, we may want to >> consider providing more explicit pointers to such "If you want more >> advanced functionality than the standard library provides" libraries. > > I love this idea, but there's one big potential problem, and one smaller one. > > Many of the most popular and useful packages require C extensions. In itself, > that doesn't have to be a problem; if you provide wheels for the official 3.4+ Win32, > Win64, and Mac64 CPython builds, it can still be as simple as `pip install spam` > for most users, including the ones with the least ability to figure it out for themselves. OK, the key thing to look at here is the user experience for someone who has Python installed, and has a job to do, but needs to branch out into external packages because the stdlib doesn't provide enough functionality. To make this example concrete, I'll focus on a specific use case, which I believe is relatively common, although I can't back this up with hard data. Assume: * A user who is comfortable with Python, or with scripting languages in general * No licensing or connectivity issues to worry about * An existing manual process that the user wants to automate In my line of work, this constitutes the vast bulk of Python use - informal, simple automation scripts. So I'm writing this script, and I discover I need to do something that the stdlib doesn't cover, but I feel like it should be available "out there", and it's sufficiently fiddly that I'd prefer not to write it myself. Examples I've come across in the past: * A console progress bar * Scraping some data off a web page * Writing data into an Excel spreadsheet with formatting * Querying an Oracle database Every time an issue like this comes up, I know that I'm looking to do "pip install XXX". It's working out what XXX is that's the problem. So I go and ask Google. A quick check on the progress bar case gets me to a StackOverflow article that offers me a lot of "write it yourself" solutions, and pointers to a couple of libraries. Further down there are a few pointers to python-progressbar, which was mentioned in the StackOverflow article, which in turn leads me to the PyPI page for it. The latest version (2.3-dev) is not hosted on PyPI, so I hit all the fun of --allow-external. Ironically, "pip install tqdm" gives me what I want instantly. But it never came up via Google. The rest of the cases are similar, lots of Google searching, often combined with evaluating multiple options, followed by more or less pain installing the software. Things that aren't Python 3 or Windows compatible suck me into the "shall I patch it and submit a PR" minefield. For the last case (an Oracle driver), where I need a C extension and access to external libraries, ironically it's pretty easy. There's no real competition to cx_Oracle, and the PyPI page has what I need, although they ship wininst exes rather than wheels, which means I need to do a download then a wheel convert then a pip install, so it's not ideal, but doable. >From this example, I'd like to see the following improvements to the process: 1. Somewhere I can go to find useful modules, that's better than Google. 2. Someone else choosing the "best option" - I don't want to evaluate 3 different progressbar modules, I just want to write "57% complete" and a few dots! 3. C extensions aren't a huge problem to me on Windows, although I'm looking forward to the day when everyone distributes wheels (wheel convert is good enough for now though). [1] 4. Much more community pressure for projects to host their code on PyPI. Some projects have genuine issues with hosting on PyPI, and there are changes being looked at to support them, but for most projects it seems to just be history and inertia. [1] A Linux/OS X user might have more more issues with C extensions. Maybe this can't be solved in any meaningful sense, and maybe it's not something the "Python project" should take responsibility for, but without any doubt, it's the single most significant improvement that could be made to my experience with PyPI. Paul. PS I should also note that even in its current state, PyPI is streets ahead of the 3rd party module story I've experienced for any other language - C/C++, Lua, Powershell, and Java are all far worse. Perl/CPAN may be as good or better, it's so long since I used Perl that I don't really know these days. From g.brandl at gmx.net Thu Sep 18 20:53:37 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 18 Sep 2014 20:53:37 +0200 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: Message-ID: On 09/18/2014 08:26 PM, Paul Moore wrote: > Every time an issue like this comes up, I know that I'm looking to do > "pip install XXX". It's working out what XXX is that's the problem. > > So I go and ask Google. A quick check on the progress bar case gets me > to a StackOverflow article that offers me a lot of "write it yourself" > solutions, and pointers to a couple of libraries. Further down there > are a few pointers to python-progressbar, which was mentioned in the > StackOverflow article, which in turn leads me to the PyPI page for it. > The latest version (2.3-dev) is not hosted on PyPI, so I hit all the > fun of --allow-external. I'd recommend searching PyPI itself. This: https://pypi.python.org/pypi?%3Aaction=search&term=progress+bar gives about 20 top results that look highly relevant. A Google search with "site:pypi.python.org progress bar" also looks like it could have given you what you wanted. > Ironically, "pip install tqdm" gives me what I want instantly. But it > never came up via Google. It also doesn't come up in the PyPI search, because its PyPI page isn't exactly very full of information :) cheers, Georg From g.brandl at gmx.net Thu Sep 18 21:01:31 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 18 Sep 2014 21:01:31 +0200 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: <444A4235-3211-48A1-B32E-8DF5817F6EF5@yahoo.com> References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> <444A4235-3211-48A1-B32E-8DF5817F6EF5@yahoo.com> Message-ID: On 09/18/2014 07:01 PM, Andrew Barnert wrote: >> Yes, that may be contentious in the near term as folks argue over which >> "stdlib++" modules to recommend, but in some cases there are clear "next >> step beyond the standard library" category winners that are worth >> introducing to newcomers, rather than making them do their own research. > > There are plenty of clear winners that are worth introducing to newcomers, > but aren't the next step beyond a particular module. In fact, I think that's > the case for _most_ of them. pytz, dateutil, requests, urllib3, pycurl, and > maybe more-itertools or blist and a couple of math libs, the main things > people are going to want to find (not counting frameworks like Django or > Scrapy or PySide) are things like NumPy, Pandas, Pillow, PyYAML, lxml, > BeautifulSoup, PyWin32, PyObjC, pyparsing, paramiko, ? and where do any of > those go? > > Does this mean we have to add pages in the docs for things the stdlib doesn't > do, just to provide external references? Or turn the chapter-header blurbs > into real pages? Or reorganize the docs more dramatically? Or just leave out > some of the most prominent and useful libraries on PyPI just because they > don't fit anywhere, while mentioning others? I don't think the docs should generically recommend external packages, except for cases like "if you need this functionality that exists only in 3.5 and higher, use backports.foobar from PyPI". Sure, you basically can't wrong with recommending the "big players" like Numpy, but usually they are well known anyway. Any smaller package could quickly become obsolete, and we're not exactly quick with updating outdated docs (that do not deal with a specific API item) anyway -- see e.g. the HOWTO documents. I think that a list of "stdlib ++" should be maintained by the greater community, after all, it is about stuff *not* prepared by the CPython team. It may be that a better categorization of PyPI is all we need (i.e. replace the Trove classifiers with something more prominent and more straightforward). cheers, Georg From alexander.belopolsky at gmail.com Thu Sep 18 21:31:05 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 18 Sep 2014 15:31:05 -0400 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 1:27 PM, Case Van Horsen wrote: > The ISO C99/C11 (Annex F) (and POSIX and IEEE-754) standards define > a function called "roundToIntegralTowardsNegative". For the special value > +Inf, it specifies a return value of +Inf. > > However, the integral value returned is still an IEEE-754 formatted value > and can return +Inf. PEP-3141 changed the behavior of math.floor() (and > __floor__ in general) to return an actual integer. That makes it impossible > to comply with ISO etc. standards. > I don't think there is any dispute over what math.floor(inf) should return. POSIX, C99, IEEE and probably many other standards agree that inf should be returned as long as the resulting type can represent it. Note that before math.floor() was changed to return int in 3.x, we had >>> math.floor(float('inf')) inf (Python 2.7.8) The question here is not about floor, but about floor_division. NumPy implements it as simple-minded floor(x/y), while Python attempts to be more precise by doing floor((x - fmod(x, y))/y). $ hg blame -d -u -w Objects/floatobject.c nascheme Thu Jan 04 01:44:34 2001 +0000: float_divmod(PyObject *v, PyObject *w) .. guido Sun Oct 20 20:16:45 1991 +0000: mod = fmod(vx, wx); tim Sat Sep 16 03:54:24 2000 +0000: /* fmod is typically exact, so vx-mod is *mathematically* an guido Thu May 06 14:26:34 1999 +0000: exact multiple of wx. But this is fp arithmetic, and fp guido Thu May 06 14:26:34 1999 +0000: vx - mod is an approximation; the result is that div may guido Thu May 06 14:26:34 1999 +0000: not be an exact integral value after the division, although guido Thu May 06 14:26:34 1999 +0000: it will always be very close to one. guido Thu May 06 14:26:34 1999 +0000: */ guido Sun Oct 20 20:16:45 1991 +0000: div = (vx - mod) / wx; .. Given that this logic dates back to 1991, I doubt not-finite case was seriously considered. In light of PEP 3141, if anything should be done about float // float it would be to make it return and int and as a consequence inf // 1 should raise an OverflowError. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Sep 18 21:35:46 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 18 Sep 2014 12:35:46 -0700 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Sep 18, 2014 12:31 PM, "Alexander Belopolsky" < alexander.belopolsky at gmail.com> wrote: > > > On Thu, Sep 18, 2014 at 1:27 PM, Case Van Horsen wrote: >> >> The ISO C99/C11 (Annex F) (and POSIX and IEEE-754) standards define >> a function called "roundToIntegralTowardsNegative". For the special value >> +Inf, it specifies a return value of +Inf. >> >> However, the integral value returned is still an IEEE-754 formatted value >> and can return +Inf. PEP-3141 changed the behavior of math.floor() (and >> __floor__ in general) to return an actual integer. That makes it impossible >> to comply with ISO etc. standards. > > > I don't think there is any dispute over what math.floor(inf) should return. POSIX, C99, IEEE and probably many other standards agree that inf should be returned as long as the resulting type can represent it. > > Note that before math.floor() was changed to return int in 3.x, we had > > >>> math.floor(float('inf')) > inf > > (Python 2.7.8) > > The question here is not about floor, but about floor_division. NumPy implements it as simple-minded floor(x/y), while Python attempts to be more precise by doing floor((x - fmod(x, y))/y). > > > $ hg blame -d -u -w Objects/floatobject.c > nascheme Thu Jan 04 01:44:34 2001 +0000: float_divmod(PyObject *v, PyObject *w) > .. > guido Sun Oct 20 20:16:45 1991 +0000: mod = fmod(vx, wx); > tim Sat Sep 16 03:54:24 2000 +0000: /* fmod is typically exact, so vx-mod is *mathematically* an > guido Thu May 06 14:26:34 1999 +0000: exact multiple of wx. But this is fp arithmetic, and fp > guido Thu May 06 14:26:34 1999 +0000: vx - mod is an approximation; the result is that div may > guido Thu May 06 14:26:34 1999 +0000: not be an exact integral value after the division, although > guido Thu May 06 14:26:34 1999 +0000: it will always be very close to one. > guido Thu May 06 14:26:34 1999 +0000: */ > guido Sun Oct 20 20:16:45 1991 +0000: div = (vx - mod) / wx; > .. > > Given that this logic dates back to 1991, I doubt not-finite case was seriously considered. Right. > In light of PEP 3141, if anything should be done about float // float it would be to make it return and int and as a consequence inf // 1 should raise an OverflowError. +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 18 23:27:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Sep 2014 07:27:50 +1000 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: Message-ID: On 19 Sep 2014 04:54, "Georg Brandl" wrote: > > On 09/18/2014 08:26 PM, Paul Moore wrote: > > > Every time an issue like this comes up, I know that I'm looking to do > > "pip install XXX". It's working out what XXX is that's the problem. > > > > So I go and ask Google. A quick check on the progress bar case gets me > > to a StackOverflow article that offers me a lot of "write it yourself" > > solutions, and pointers to a couple of libraries. Further down there > > are a few pointers to python-progressbar, which was mentioned in the > > StackOverflow article, which in turn leads me to the PyPI page for it. > > The latest version (2.3-dev) is not hosted on PyPI, so I hit all the > > fun of --allow-external. Paul, this could make a good "What problem are we actually trying to fix?" summary on pypa.io. We have spent a lot of time so far on the "getting people the packages they ask for" side of things, but have barely scratched the surface of "helping people find the packages that can help them". At the moment "word of mouth" is one of our main discovery tools, and that's an issue for newcomers that may not have a big network of fellow developers yet. This is a problem I think the Django community actually addressed fairly well through https://www.djangopackages.com/ There are similar comparison sites for Pyramid & Plone, after Audrey & Danny broke out the back end of Django Packages to make it independently deployable (see http://opencomparison.readthedocs.org/en/latest/) > > I'd recommend searching PyPI itself. This: > > https://pypi.python.org/pypi?%3Aaction=search&term=progress+bar > > gives about 20 top results that look highly relevant. A Google search > with "site:pypi.python.org progress bar" also looks like it could have > given you what you wanted. > > > Ironically, "pip install tqdm" gives me what I want instantly. But it > > never came up via Google. It actually occurs to me: I wonder whether anyone at Google might be interested in enhancing it's usefulness as a Python packaging search tool by looking specifically at PyPI's own search index data. (And if they could do that for us, they might be willing to do it for CPAN, RubyGems, npm, CPAN, PEAR, etc) (on the other hand, being a vector for influencing Google search results would mean being a higher priority target for spam, so we may not actually want that) > > It also doesn't come up in the PyPI search, because its PyPI page isn't > exactly very full of information :) "SEO for Python Packages" could be a good advanced topic for packaging.python.org :) Cheers, Nick. > > cheers, > Georg > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Sep 18 23:55:16 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 18 Sep 2014 22:55:16 +0100 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: Message-ID: On 18 September 2014 22:27, Nick Coghlan wrote: >> It also doesn't come up in the PyPI search, because its PyPI page isn't >> exactly very full of information :) > > "SEO for Python Packages" could be a good advanced topic for > packaging.python.org :) At a bare minimum, some sort of guidance on what keywords to include in the metadata would be useful... A quick sample of 4-5 packages got none with *any* keywords specified. Paul From donald at stufft.io Thu Sep 18 23:56:37 2014 From: donald at stufft.io (Donald Stufft) Date: Thu, 18 Sep 2014 17:56:37 -0400 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: Message-ID: <5D99BF1F-F0AA-4302-AE30-539AF31A2463@stufft.io> On Sep 18, 2014, at 2:26 PM, Paul Moore wrote: > > Maybe this can't be solved in any meaningful sense, and maybe it's not > something the "Python project" should take responsibility for, but > without any doubt, it's the single most significant improvement that > could be made to my experience with PyPI. Package Discovery is absolutely a thing we stink at, and something we should do better at. This is squarly in the PyPI side of things, I don't think that python-dev needs to recommend a stdlib++ nor any hand picked group of people.. maybe. The PyPI search is kinda grody, it's a horrible inefficienct SQL query that uses a bunch of LIKEs and regexes if I recall. Warehouse has switched to an Elasticsearch backend and I've attempted to do a little tuning of it, however I haven't done a whole lot largely because I'm not an expert and It wasn't that high of a priority. Fundamentally though the problem is what do we use to determine if a package is "good" or not. Folks may or may not remember the great ratings/comments war of yore which was an attempt to add some end user driven ratings to packages on PyPI. That didn't particularly go well and they've long since been disabled. One of the problems is that the names of packages are basically either extremely relevant or not relevant at all. Taking a look at Django migrations prior to Django 1.7 you had "South" which was the de facto standard and "django-migrations" which was a more or less defunct project. Then you'd have a ton of django-* packages which mention both Django and the fact that they have migrations shipped within their long_description. It got somewhat hard to get South to score fairly high on "django migrations"[1][2]. Popularity is a reasonable metric, if I recall Warehouse uses the download counts of a project to weight the search results (although it should probably use rolling counts and not total counts). The idea here is that something that is downloaded more often is likely to be a better overall choice for most people. On Crate I had "favorites" which were functionally equivilant to stars on GitHub which would influence the search results as well. That's about all that I've been able to think of to glean information from, any sort of ratings or what have you system needs to be done very carefully so that it actually provides value and isn't just a pain point. [1] https://pypi.python.org/pypi?%3Aaction=search&term=django+migrations&submit=search [2] https://warehouse.python.org/search/project/?q=django+migrations --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 19 00:20:21 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 18 Sep 2014 15:20:21 -0700 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: Message-ID: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> On Sep 18, 2014, at 11:26, Paul Moore wrote: > On 18 September 2014 18:01, Andrew Barnert > wrote: >> On Sep 17, 2014, at 23:15, Nick Coghlan wrote: >> >>> However, now that CPython ships with pip by default, we may want to >>> consider providing more explicit pointers to such "If you want more >>> advanced functionality than the standard library provides" libraries. >> >> I love this idea, but there's one big potential problem, and one smaller one. >> >> Many of the most popular and useful packages require C extensions. In itself, >> that doesn't have to be a problem; if you provide wheels for the official 3.4+ Win32, >> Win64, and Mac64 CPython builds, it can still be as simple as `pip install spam` >> for most users, including the ones with the least ability to figure it out for themselves. > > OK, the key thing to look at here is the user experience for someone > who has Python installed, and has a job to do, but needs to branch out > into external packages because the stdlib doesn't provide enough > functionality. > > To make this example concrete, I'll focus on a specific use case, > which I believe is relatively common, although I can't back this up > with hard data. > > Assume: > > * A user who is comfortable with Python, or with scripting languages in general > * No licensing or connectivity issues to worry about > * An existing manual process that the user wants to automate > > In my line of work, this constitutes the vast bulk of Python use - > informal, simple automation scripts. > > So I'm writing this script, and I discover I need to do something that > the stdlib doesn't cover, but I feel like it should be available "out > there", and it's sufficiently fiddly that I'd prefer not to write it > myself. Examples I've come across in the past: > > * A console progress bar > * Scraping some data off a web page > * Writing data into an Excel spreadsheet with formatting > * Querying an Oracle database > > Every time an issue like this comes up, I know that I'm looking to do > "pip install XXX". It's working out what XXX is that's the problem. > > So I go and ask Google. Hold on. I'm pretty sure that the intended answer to this problem has, for years, been that you go and search PyPI. Is that too broken to use, or are people just not aware of it? > A quick check on the progress bar case gets me > to a StackOverflow article that offers me a lot of "write it yourself" > solutions, and pointers to a couple of libraries. StackOverflow is the _last_ place you should be looking. It's part of their policy that "library-shopping" questions should be closed, and has been for quite some time. Anything you find is likely to be either out of date, or in some niche area that few of the active users see. > From this example, I'd like to see the following improvements to the process: > > 1. Somewhere I can go to find useful modules, that's better than Google. Again, isn't that PyPI? > 2. Someone else choosing the "best option" - I don't want to evaluate > 3 different progressbar modules, I just want to write "57% complete" > and a few dots! I don't think anything like this can be "curated", unless it really is restricted to the dozen or two projects that are clearly "best in category" in areas of widespread demand, as (I think) Nick and Terry were discussing. So you're looking for something crowd sourced, which I don't think exists yet. Maybe the newish Software Recs StackExchange site will eventually serve. Or maybe someone will build something specific to the Python community, possibly even built on top of PyPI. But whatever it is, I don't think it's going to be designed by a mailing list discussion; someone has to have a clever idea and hack it up until it works, or at least inspires further ideas. The same way we got easy_install and then pip. There's also the problem that in many areas "the best" is different for different applications, and often you don't know the right question to ask. The best console-mode progress bar, OK, people can disagree with the answer, but they won't disagree much on the criteria. But for anything less trivial, that's not likely to be true. Which is the best XMPP client library, plugin framework, arbitrary-precision float library, sorted mapping, markdown converter, ...? In other words, I agree that this is an important problem, but it's not going to be an easy one to solve, and I don't think solving it should be a prerequisite, or even really relevant, to Nick's stdlib++ idea. > 3. C extensions aren't a huge problem to me on Windows, although I'm > looking forward to the day when everyone distributes wheels (wheel > convert is good enough for now though). [1] I think this is pretty uncommon for the user group you're representing. Most Windows users who are comfortable with scripting languages don't have MSVC or MinGW set up, don't know how to do so (or even which one they should choose), etc. Many packages come with official Windows binaries, but an awful lot of them don't--and, even when they do, the "install" docs often start off with how to set up prereqs and build on *nix machines. Most of the Windows users I see tend to go straight to Christoph Gohlke's archive and use his installers, and for anything not there they just throw their hands up and panic. Making wheels widespread is not just a nice-to-have; wheels are a great solution to a very real problem, and until they're pervasive the problem isn't solved. But meanwhile, wheels don't solve everything. I don't want to repeat the whole second point from my previous email, but given that pip doesn't handle non-Python requirements, some of the most important packages on PyPI are still going to be out of reach for many people, and I don't know what the solution is. (Build a statically linked lxml? Include libxml2.dll in the wheel? Just give a better error message that includes a link to how to download it?) > 4. Much more community pressure for projects to host their code on > PyPI. Some projects have genuine issues with hosting on PyPI, and > there are changes being looked at to support them, but for most > projects it seems to just be history and inertia. > > [1] A Linux/OS X user might have more more issues with C extensions. In general, I think they have far fewer problems. Linux users have one thing to learn (install libxnl2-devel, not just libxml2). Same for Mac users (install Xcode and Homebrew). After that, they tend to have a lot fewer problems. (Except with scipy, since it needs a fortran compiler and sometimes needs you to force-rebuild numpy. And except for Mac users who end up with multiple copies of Python 2.7, which isn't going to be a problem for 3.x to worry about in the foreseeable future.) > Maybe this can't be solved in any meaningful sense, and maybe it's not > something the "Python project" should take responsibility for, but > without any doubt, it's the single most significant improvement that > could be made to my experience with PyPI. I agree that having some kind of > > Paul. > > PS I should also note that even in its current state, PyPI is streets > ahead of the 3rd party module story I've experienced for any other > language - C/C++, Lua, Powershell, and Java are all far worse. > Perl/CPAN may be as good or better, it's so long since I used Perl > that I don't really know these days. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Fri Sep 19 00:24:00 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 18 Sep 2014 23:24:00 +0100 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> Message-ID: On 18 September 2014 23:20, Andrew Barnert wrote: > Hold on. I'm pretty sure that the intended answer to this problem has, for years, been that you go and search PyPI. Is that too broken to use, or are people just not aware of it? Sorry, I should have mentioned that. I've found that PyPI search is great for getting packages named like you expect, but a lot worse when you have something vague ("website scraping" didn't bring up html5lib, BeautifulSoup or lxml, all of which are tools I've used in the past). So I tend to not bother these days on the assumption that it'll just be another dead end. I know, I'm old and cynical, sorry ;-) Paul From donald at stufft.io Fri Sep 19 00:44:13 2014 From: donald at stufft.io (Donald Stufft) Date: Thu, 18 Sep 2014 18:44:13 -0400 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> Message-ID: > On Sep 18, 2014, at 6:20 PM, Andrew Barnert wrote: > > But meanwhile, wheels don't solve everything. I don't want to repeat the whole second point from my previous email, but given that pip doesn't handle non-Python requirements, some of the most important packages on PyPI are still going to be out of reach for many people, and I don't know what the solution is. (Build a statically linked lxml? Include libxml2.dll in the wheel? Just give a better error message that includes a link to how to download it?) I think the cryptography project has had good success recently switching to statically linking OpenSSL on Windows in the Wheels. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- An HTML attachment was scrubbed... URL: From casevh at gmail.com Fri Sep 19 00:57:29 2014 From: casevh at gmail.com (Case Van Horsen) Date: Thu, 18 Sep 2014 15:57:29 -0700 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 12:31 PM, Alexander Belopolsky wrote: > > On Thu, Sep 18, 2014 at 1:27 PM, Case Van Horsen wrote: >> >> The ISO C99/C11 (Annex F) (and POSIX and IEEE-754) standards define >> a function called "roundToIntegralTowardsNegative". For the special value >> +Inf, it specifies a return value of +Inf. >> >> However, the integral value returned is still an IEEE-754 formatted value >> and can return +Inf. PEP-3141 changed the behavior of math.floor() (and >> __floor__ in general) to return an actual integer. That makes it >> impossible >> to comply with ISO etc. standards. > > > I don't think there is any dispute over what math.floor(inf) should return. > POSIX, C99, IEEE and probably many other standards agree that inf should be > returned as long as the resulting type can represent it. I dispute that there is no dispute over what math.floot(inf) should return. ;-) All the standards specify a result type can represent +-Inf and +-0. A standards compliant version should return +-Inf and +-0. lrint() and llrint() are defined to return long or long long, respectively. It would be fine if they raised an exception. The current math.floor() actually behaves more like llrint() than floor(). I accpet that having math.floor() return an integer (and raise an exception for +-Inf) may be useful in many cases but it is different from the standard. Other floating-point libraries still return a floating-point value. >>> numpy.floor(numpy.float64('Inf')) inf >>> mpmath.floor(mpmath.mpf('inf')) mpf('+inf') >>> gmpy2.floor(gmpy2.mpfr('inf')) mpfr('inf') >>> bigfloat.floor(bigfloat.BigFloat('inf')) BigFloat.exact('Infinity', precision=53) Disclaimer: I maintain gmpy2. > > Note that before math.floor() was changed to return int in 3.x, we had > >>>> math.floor(float('inf')) > inf > > (Python 2.7.8) > > The question here is not about floor, but about floor_division. NumPy > implements it as simple-minded floor(x/y), while Python attempts to be more > precise by doing floor((x - fmod(x, y))/y). > > > $ hg blame -d -u -w Objects/floatobject.c > nascheme Thu Jan 04 01:44:34 2001 +0000: float_divmod(PyObject *v, > PyObject *w) > .. > guido Sun Oct 20 20:16:45 1991 +0000: mod = fmod(vx, wx); > tim Sat Sep 16 03:54:24 2000 +0000: /* fmod is typically exact, > so vx-mod is *mathematically* an > guido Thu May 06 14:26:34 1999 +0000: exact multiple of wx. But > this is fp arithmetic, and fp > guido Thu May 06 14:26:34 1999 +0000: vx - mod is an > approximation; the result is that div may > guido Thu May 06 14:26:34 1999 +0000: not be an exact integral > value after the division, although > guido Thu May 06 14:26:34 1999 +0000: it will always be very > close to one. > guido Thu May 06 14:26:34 1999 +0000: */ > guido Sun Oct 20 20:16:45 1991 +0000: div = (vx - mod) / wx; > .. > > Given that this logic dates back to 1991, I doubt not-finite case was > seriously considered. > > In light of PEP 3141, if anything should be done about float // float it > would be to make it return and int and as a consequence inf // 1 should > raise an OverflowError. Since divmod() and floor_division aren't defined by a standard, Python can define its own standard. But each time Python changes behavior, external libraries will need to change, or not; and another difference between Python versions is introduced. From alexander.belopolsky at gmail.com Fri Sep 19 04:16:14 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 18 Sep 2014 22:16:14 -0400 Subject: [Python-ideas] What math.floor(inf) should return? Was: Make `float('inf') //1 == float('inf')` Message-ID: On Thu, Sep 18, 2014 at 6:57 PM, Case Van Horsen wrote: > > > I don't think there is any dispute over what math.floor(inf) should return. > > POSIX, C99, IEEE and probably many other standards agree that inf should be > > returned as long as the resulting type can represent it. > > I dispute that there is no dispute over what math.floor(inf) should return. ;-) > I am changing the subject so that we don't mix making new decisions with a critique and defense of the decisions that were made in the past. I wrote: "inf should be returned as long as the resulting type can represent it". This is the part that I still believe is not disputed. No one has suggested that math.floor(inf) should return nan, for example. > All the standards specify a result type can represent +-Inf and +-0. A > standards compliant version should return +-Inf and +-0. lrint() and llrint() > are defined to return long or long long, respectively. It would be fine if > they raised an exception. The current math.floor() actually behaves more > like llrint() than floor(). > POSIX does not preclude raising an exception: "If the correct value would cause overflow, a range error shall occur" ( http://pubs.opengroup.org/onlinepubs/009695399/functions/floor.html). > > I accept that having math.floor() return an integer (and raise an exception > for +-Inf) may be useful in many cases but it is different from the standard. > Other floating-point libraries still return a floating-point value. The standards are influenced by the limitation inherent in many languages where ints have finite range and cannot represent floor() of many finite floating point values. Python does not have this limitation. (Granted - PEP 3141 could do a better job explaining why floor, ceil, round, //, etc. should return Integer rather than Real.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Sep 19 05:20:00 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 18 Sep 2014 23:20:00 -0400 Subject: [Python-ideas] What math.floor(inf) should return? Was: Make `float('inf') //1 == float('inf')` In-Reply-To: References: Message-ID: On Thu, Sep 18, 2014 at 11:16 PM, Case Van Horsen wrote: > 2) Add an alternate math library, say stdmath, (or ieeemath or...) > .. or numpy :-) > that can follow a different set of rules than the math module. In > addition to the functions provided by the math module, it could > define additional functions such as stdmath.div such that > stdmath.div(2.0, 0) return Inf instead of raising an exception. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From casevh at gmail.com Fri Sep 19 05:16:25 2014 From: casevh at gmail.com (Case Van Horsen) Date: Thu, 18 Sep 2014 20:16:25 -0700 Subject: [Python-ideas] What math.floor(inf) should return? Was: Make `float('inf') //1 == float('inf')` In-Reply-To: References: Message-ID: On Thu, Sep 18, 2014 at 7:16 PM, Alexander Belopolsky wrote: > > On Thu, Sep 18, 2014 at 6:57 PM, Case Van Horsen wrote: >> >> > I don't think there is any dispute over what math.floor(inf) should >> > return. >> > POSIX, C99, IEEE and probably many other standards agree that inf should >> > be >> > returned as long as the resulting type can represent it. >> >> I dispute that there is no dispute over what math.floor(inf) should >> return. ;-) >> > > I am changing the subject so that we don't mix making new decisions with a > critique and defense of the decisions that were made in the past. > > I wrote: "inf should be returned as long as the resulting type can represent > it". This is the part that I still believe is not disputed. No one has > suggested that math.floor(inf) should return nan, for example. > > >> All the standards specify a result type can represent +-Inf and +-0. A >> standards compliant version should return +-Inf and +-0. lrint() and >> llrint() >> are defined to return long or long long, respectively. It would be fine if >> they raised an exception. The current math.floor() actually behaves more >> like llrint() than floor(). >> > > POSIX does not preclude raising an exception: "If the correct value would > cause overflow, a range error shall occur" > (http://pubs.opengroup.org/onlinepubs/009695399/functions/floor.html). Under the section RETURN VALUE, it also states: If x = +-0 or +-Inf, then x shall be returned. And under APPLICATION USAGE, it states: The floor() function can only overflow when the floating-point representation has DBL_MANT_DIG > DBL_MAX_EXP. > >> >> I accept that having math.floor() return an integer (and raise an >> exception >> for +-Inf) may be useful in many cases but it is different from the >> standard. >> Other floating-point libraries still return a floating-point value. > > The standards are influenced by the limitation inherent in many languages > where ints have finite range and cannot represent floor() of many finite > floating point values. Python does not have this limitation. (Granted - > PEP 3141 could do a better job explaining why floor, ceil, round, //, etc. > should return Integer rather than Real.) For ceil, trunc, and floor, I can't think of any situation where returning an integer is more "precise" than returning a float. Assuming standard IEEE 64-bit format, only the first 53 bits will be significant. The remaining bits will just be 0. (If there is a counter-example, please let me know. I know enough about floating-point to know I don't very much about floating-point. ;-) ) I appreciate that returning an Integral value can make using the result of floor(), etc., easier by eliminating the need for int(floor()) but I think it adds a false sense of precision and changes the behavior of math.floor() for +-0 and +-Inf. Even though I don't think it is worth changing, I can think of two options. 1) Add trunc, ceil, floor as reserved works (just like round) and let them call the methods defined by PEP-3141. Then math.floor() can revert back to returning a float. 2) Add an alternate math library, say stdmath, (or ieeemath or...) that can follow a different set of rules than the math module. In addition to the functions provided by the math module, it could define additional functions such as stdmath.div such that stdmath.div(2.0, 0) return Inf instead of raising an exception. From greg.ewing at canterbury.ac.nz Fri Sep 19 00:09:23 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Sep 2014 10:09:23 +1200 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> Message-ID: <541B5813.9040404@canterbury.ac.nz> Andrew Barnert wrote: > (By the way, is there a word for that Unicode ignorance and confusion? > Something like "illiteracy" and "innumeracy", Disunicodancy? -- Greg From guido at python.org Fri Sep 19 07:10:26 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 18 Sep 2014 22:10:26 -0700 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: <878ulh18er.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> <878ulh18er.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: What is this "it" that you propose to just do? I'm sure I have an opinion on it once you describe it to me. On Thursday, September 18, 2014, Stephen J. Turnbull wrote: > Andrew Barnert writes: > > > (By the way, is there a word for that Unicode ignorance and > > confusion? Something like "illiteracy" and "innumeracy", but > > probably spelled with a non-BMP character, maybe U+1F4A9?) > > "Non-superhuman." "Noncharacter" is a case in point. And yes, it's > properly spelled with U+1F4A9, but my spellchecker has "parental > controls" and I can't enter it. > > [various perceptive comments elided] > > > At this point, I'm not sure that adds up to an argument for Nick's > > less-str-like version of his original proposal, or against it, but > > I'm pretty sure it's a good argument for one or other... > > That's exactly the way I feel. So I would say "damn the torpedos" and > "Just Do It" and if it's wrong we'll fix it in the mythical-never-to- > be-implemented-and-so-unmentionable-that-Big-Brother-will-undoubtedly- > come-take-me-away Python 4000. > > Of course we should wait to see if Guido or other reliable oracle has > a particular opinion, but I really don't think we're going to get > proof without trying. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Fri Sep 19 07:39:12 2014 From: gokoproject at gmail.com (John Wong) Date: Fri, 19 Sep 2014 01:39:12 -0400 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> Message-ID: I like stdlib++, but I also want to say it should remain as non-Python-dev endorsement kind of thing. As a user and a library developer, I see pros and cons. Official endorsement can lead to people to abandon whatever they are working or make them feel excluded or unappreciated, which is not a very positive thing to do. On the other hand, there may only be a very limited number of stdlib replacement people can vouch for easily. For HTTP request, it is obvious that at the moment, Requests is the most widely used library in modern Python codebase in the past several years. For asyn-networking and asyn-task, Twisted and Tornado are probably the best. You might celery in the asyn-task just because many people who choose to run asyn task will end up queuing. For cryptography and security, what do you suggest? There are APIs cryptography doesn't have yet but pycrypto or M2Crypto does. It isn't that neither are bad library, I've used pycrypto and I am careful with using the API, but do we endorse cryptography over the other two (the latter one is pretty dead based on its commit activity, I don't know). -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Fri Sep 19 07:42:02 2014 From: gokoproject at gmail.com (John Wong) Date: Fri, 19 Sep 2014 01:42:02 -0400 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> Message-ID: On 09/18/2014 08:26 PM, Paul Moore wrote: > From this example, I'd like to see the following improvements to the process: > 1. Somewhere I can go to find useful modules, that's better than Google. > 2. Someone else choosing the "best option" - I don't want to evaluate Well judgement is always required. As an example: search async version of the famous Requests library on PyPi doesn't give beginners any immediate obvious choices. https://pypi.python.org/pypi?%3Aaction=search&term=async+requests&submit=search If we based off just popularity count, we will mislead users. The description ought to be more descriptive, but how descriptive? Can author cover all the possible keywords? This is why, naturally, most people use a search engine and eventually end up either on stackoverflow or some blog post written by me \o/. One idea, similar to npm or gem store is commentary. It can become spam so how about, we can enable tagging and suggestion? What if people can suggest their workflow in a more obvious way than a blog post? If anyone have the time and interest, maybe scan each PyPI package and analyze how a package on PyPI is being used by other packages. We can even go as fa as scanning public repositories should anyone feel the urge to do that, privately. That, however, is a M-B dollar industry, known as search. Though like App Store, Play Store, gem store, whatever, the number of useful comments and the number of participants can vary. Thus, the improvement is may be suboptimal. On Fri, Sep 19, 2014 at 1:39 AM, John Wong wrote: > I like stdlib++, but I also want to say it should remain as non-Python-dev > endorsement > kind of thing. As a user and a library developer, I see pros and cons. > > Official endorsement can lead to people to abandon whatever they are > working > or make them feel excluded or unappreciated, which is not a very positive > thing to do. > > On the other hand, there may only be a very limited number of stdlib > replacement > people can vouch for easily. > > For HTTP request, it is obvious that at the moment, Requests is the most > widely > used library in modern Python codebase in the past several years. > > For asyn-networking and asyn-task, Twisted and Tornado are probably the > best. > You might celery in the asyn-task just because many people who choose to > run asyn task will end up queuing. > > > For cryptography and security, what do you suggest? There are APIs > cryptography > doesn't have yet but pycrypto or M2Crypto does. It isn't that neither are > bad library, > I've used pycrypto and I am careful with using the API, but do we endorse > cryptography > over the other two (the latter one is pretty dead based on its commit > activity, I don't know). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Fri Sep 19 09:35:47 2014 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 19 Sep 2014 09:35:47 +0200 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: Message-ID: On 18.09.2014 23:27, Nick Coghlan wrote: > > It actually occurs to me: I wonder whether anyone at Google might be > interested in enhancing it's usefulness as a Python packaging search > tool by looking specifically at PyPI's own search index data. (And if > they could do that for us, they might be willing to do it for CPAN, > RubyGems, npm, CPAN, PEAR, etc) > > (on the other hand, being a vector for influencing Google search results > would mean being a higher priority target for spam, so we may not > actually want that) > I'm not sure, but isn't Google doing this already? At least package updates uploaded to PyPI usually appear in Google searches quite fast (sometimes instantaneously) so I don't think this is only the result of regular crawls. From toddrjen at gmail.com Fri Sep 19 10:50:24 2014 From: toddrjen at gmail.com (Todd) Date: Fri, 19 Sep 2014 10:50:24 +0200 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: Message-ID: On Thu, Sep 18, 2014 at 11:27 PM, Nick Coghlan wrote: > > On 19 Sep 2014 04:54, "Georg Brandl" wrote: > > > > On 09/18/2014 08:26 PM, Paul Moore wrote: > > > > > Every time an issue like this comes up, I know that I'm looking to do > > > "pip install XXX". It's working out what XXX is that's the problem. > > > > > > So I go and ask Google. A quick check on the progress bar case gets me > > > to a StackOverflow article that offers me a lot of "write it yourself" > > > solutions, and pointers to a couple of libraries. Further down there > > > are a few pointers to python-progressbar, which was mentioned in the > > > StackOverflow article, which in turn leads me to the PyPI page for it. > > > The latest version (2.3-dev) is not hosted on PyPI, so I hit all the > > > fun of --allow-external. > > Paul, this could make a good "What problem are we actually trying to fix?" > summary on pypa.io. > > We have spent a lot of time so far on the "getting people the packages > they ask for" side of things, but have barely scratched the surface of > "helping people find the packages that can help them". At the moment "word > of mouth" is one of our main discovery tools, and that's an issue for > newcomers that may not have a big network of fellow developers yet. > > This is a problem I think the Django community actually addressed fairly > well through https://www.djangopackages.com/ > > There are similar comparison sites for Pyramid & Plone, after Audrey & > Danny broke out the back end of Django Packages to make it independently > deployable (see http://opencomparison.readthedocs.org/en/latest/) > > > There is also scipy central for scientific code: http://scipy-central.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Fri Sep 19 10:54:22 2014 From: encukou at gmail.com (Petr Viktorin) Date: Fri, 19 Sep 2014 10:54:22 +0200 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 9:35 PM, Guido van Rossum wrote: > > On Sep 18, 2014 12:31 PM, "Alexander Belopolsky" > wrote: > > Right. > >> In light of PEP 3141, if anything should be done about float // float it >> would be to make it return and int and as a consequence inf // 1 should >> raise an OverflowError. > > +1 But, is this a good change to make in 3.5? I assume this would apply do __divmod__[0] as well. Should there be a float-only math.divmod? From toddrjen at gmail.com Fri Sep 19 10:57:59 2014 From: toddrjen at gmail.com (Todd) Date: Fri, 19 Sep 2014 10:57:59 +0200 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> Message-ID: On Fri, Sep 19, 2014 at 12:20 AM, Andrew Barnert < abarnert at yahoo.com.dmarc.invalid> wrote: > On Sep 18, 2014, at 11:26, Paul Moore wrote: > > > > OK, the key thing to look at here is the user experience for someone > > who has Python installed, and has a job to do, but needs to branch out > > into external packages because the stdlib doesn't provide enough > > functionality. > > > > To make this example concrete, I'll focus on a specific use case, > > which I believe is relatively common, although I can't back this up > > with hard data. > > > > Assume: > > > > * A user who is comfortable with Python, or with scripting languages in > general > > * No licensing or connectivity issues to worry about > > * An existing manual process that the user wants to automate > > > > In my line of work, this constitutes the vast bulk of Python use - > > informal, simple automation scripts. > > > > So I'm writing this script, and I discover I need to do something that > > the stdlib doesn't cover, but I feel like it should be available "out > > there", and it's sufficiently fiddly that I'd prefer not to write it > > myself. Examples I've come across in the past: > > > > * A console progress bar > > * Scraping some data off a web page > > * Writing data into an Excel spreadsheet with formatting > > * Querying an Oracle database > > > > Every time an issue like this comes up, I know that I'm looking to do > > "pip install XXX". It's working out what XXX is that's the problem. > > > > So I go and ask Google. > > Hold on. I'm pretty sure that the intended answer to this problem has, for > years, been that you go and search PyPI. Is that too broken to use, or are > people just not aware of it? > > In addition to the issues others have brought up, this is only useful if you are trying to get a subject specific tools that incorporates a whole range of general tools on that subject. So if you are looking for something like scipy, django, requests, etc. However, what if you are looking for something much more specific? For example, say I have some specific thing I want to do with a dict. How can I find a library that provides that functionality? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Fri Sep 19 11:16:15 2014 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 19 Sep 2014 11:16:15 +0200 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: Message-ID: <20140919091615.GA20011@sleipnir.bytereef.org> Case Van Horsen wrote: > > I don't think there is any dispute over what math.floor(inf) should return. > > POSIX, C99, IEEE and probably many other standards agree that inf should be > > returned as long as the resulting type can represent it. > > I dispute that there is no dispute over what math.floot(inf) should return. ;-) > > All the standards specify a result type can represent +-Inf and +-0. A > standards compliant version should return +-Inf and +-0. lrint() and llrint() > are defined to return long or long long, respectively. It would be fine if > they raised an exception. The current math.floor() actually behaves more > like llrint() than floor(). I've always found the behavior of math.floor() a bit surprising, especially since most functions in the module are thin wrappers around the C functions from math.h. >>> math.modf(float("inf")) (0.0, inf) So (if we can change it) I'd prefer that floor() behaves according to the C standard. Stefan Krah From stefan at bytereef.org Fri Sep 19 11:40:38 2014 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 19 Sep 2014 11:40:38 +0200 Subject: [Python-ideas] What math.floor(inf) should return? Was: Make `float('inf') //1 == float('inf')` In-Reply-To: References: Message-ID: <20140919094038.GA20185@sleipnir.bytereef.org> Alexander Belopolsky wrote: > > I accept that having math.floor() return an integer (and raise an exception > > for +-Inf) may be useful in many cases but it is different from the standard. > > Other floating-point libraries still return a floating-point value. > > The standards are influenced by the limitation inherent in many languages where > ints have finite range and cannot represent floor() of many finite floating > point values. Python does not have this limitation. (Granted - PEP 3141 could > do a better job explaining why floor, ceil, round, //, etc. should return > Integer rather than Real.) Scheme (which IIRC infuenced PEP-3141) has arbitrary precision ints, but guile at least returns floats: scheme@(guile-user)> (floor 2e308) $3 = +inf.0 [I'm mentioning Decimal now since the standard is *very* close to IEEE 754-2008.] Decimal's divide_int function returns "integers" (Decimals with exponent 0), but only if they don't overflow the context precision: c = getcontext() >>> c.prec 28 >>> c.divide_int(Decimal("333e25"), 1) Decimal('3330000000000000000000000000') >>> c.divide_int(Decimal("333e250"), 1) decimal.InvalidOperation: quotient too large in //, % or divmod Despite this fact, Decimal still returns inf in the disputed case: >>> c.divide_int(Decimal("inf"), 1) Decimal('Infinity') Also, even when the Overflow trap is set, Decimal only raises Overflow when an operation actually overflows: >>> c.traps[Overflow] = True >>> Decimal("1e999999999") * 10 decimal.Overflow: above Emax >>> Decimal("inf") * 10 Decimal('Infinity') >>> c.divide_int(Decimal("inf"), 1) Decimal('Infinity') Stefan Krah From steve at pearwood.info Fri Sep 19 11:52:20 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Sep 2014 19:52:20 +1000 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: <20140919095220.GZ9293@ando.pearwood.info> On Thu, Sep 18, 2014 at 03:31:05PM -0400, Alexander Belopolsky wrote: > I don't think there is any dispute over what math.floor(inf) should return. > POSIX, C99, IEEE and probably many other standards agree that inf should > be returned as long as the resulting type can represent it. Why don't we add an integer infinity? Python int is not a low-level primitive type, it's an object, and can support as rich a set of behaviour as we like. We could subclass int, give it a pair of singletons Infinity and -Infinity (say), and have math.floor(float('inf')) return one of them as needed. - The subclass need not be a builtin public class, like bool, it could be a private implementation detail. The only promise made is that isinstance(Infinity, int) returns True. - The Infinity and -Infinity instances need not be built-in. - For that matter, they may not even be singletons. - We could, but don't necessarily need to, support int('inf'). - But int(float('inf')) and int(Decimal('inf')) should return the int Infinity. -- Steven From abarnert at yahoo.com Fri Sep 19 12:13:13 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 19 Sep 2014 03:13:13 -0700 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: <20140919095220.GZ9293@ando.pearwood.info> References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> <20140919095220.GZ9293@ando.pearwood.info> Message-ID: On Sep 19, 2014, at 2:52, Steven D'Aprano wrote: > On Thu, Sep 18, 2014 at 03:31:05PM -0400, Alexander Belopolsky wrote: > >> I don't think there is any dispute over what math.floor(inf) should return. >> POSIX, C99, IEEE and probably many other standards agree that inf should >> be returned as long as the resulting type can represent it. > > Why don't we add an integer infinity? I was thinking the same thing. In addition to your points, while it's not _useful_ as often, there's nothing inherently any more strange or unmathematical about an affine extended integral line than an extended real line. (In fact, if anything, not having a "largest int" akin to the largest float makes it more sensible.) And it would mean people wouldn't have to use the float -inf as a starting value for finding a maximum, they could use an actual integer. That does raise the question of whether there should be an int NaN... > Python int is not a low-level primitive type, it's an object, and can > support as rich a set of behaviour as we like. We could subclass int, > give it a pair of singletons Infinity and -Infinity (say), and have > math.floor(float('inf')) return one of them as needed. > > - The subclass need not be a builtin public class, like bool, it > could be a private implementation detail. The only promise made > is that isinstance(Infinity, int) returns True. > > - The Infinity and -Infinity instances need not be built-in. > > - For that matter, they may not even be singletons. > > - We could, but don't necessarily need to, support int('inf'). > > - But int(float('inf')) and int(Decimal('inf')) should return > the int Infinity. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Fri Sep 19 13:13:38 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Sep 2014 21:13:38 +1000 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <20140919095220.GZ9293@ando.pearwood.info> Message-ID: <20140919111338.GB9293@ando.pearwood.info> On Fri, Sep 19, 2014 at 03:13:13AM -0700, Andrew Barnert wrote: > On Sep 19, 2014, at 2:52, Steven D'Aprano wrote: > > > On Thu, Sep 18, 2014 at 03:31:05PM -0400, Alexander Belopolsky wrote: > > > >> I don't think there is any dispute over what math.floor(inf) should return. > >> POSIX, C99, IEEE and probably many other standards agree that inf should > >> be returned as long as the resulting type can represent it. > > > > Why don't we add an integer infinity? > > I was thinking the same thing. [...] > That does raise the question of whether there should be an int NaN... Ah, you had to raise that issue! Now it's sure to be rejected! *wink* An int NAN can and should be considered only when there is a clear need for it. Since Python tends to prefer raising exceptions than returning NANs, e.g. 0/0 raises, I'm not sure that there are any operations that return ints which would benefit from an int NAN. (Personally, I'd rather get NANs than exceptions, but apparently I'm in the minority.) -- Steven From steve at pearwood.info Fri Sep 19 13:59:28 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Sep 2014 21:59:28 +1000 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> Message-ID: <20140919115928.GD9293@ando.pearwood.info> On Fri, Sep 19, 2014 at 01:39:12AM -0400, John Wong wrote: > I like stdlib++, but I also want to say it should remain as non-Python-dev > endorsement > kind of thing. As a user and a library developer, I see pros and cons. > > Official endorsement can lead to people to abandon whatever they are working > or make them feel excluded or unappreciated, which is not a very positive > thing to do. Yes. Also, it can discourage innovation and encourage monoculture. As has often been said, the standard library is where good packages go to die. In the standard library, a package is ubiquitous, but it's also pretty much in stasis, unlikely to change much. "We biologists have a word for stable: 'dead'." - some biologist So if you want something under active development, likely to gain new features, you normally look outside the std lib, where there is plenty of innovation, experimentation and competition. Now imagine that one package gets recommended as "best of breed" for some particular task in this hypothetical stdlib++. It will combine the ubiquity of the stdlib (because everyone uses it) without the disadvantage of stasis. That could make it much harder for a new package in the same field to become well-known enough to compete for users and developers. Now obviously there are "best of breed" third party libraries. I don't see many people trying to build a better BeautifulSoup. (Maybe they are, and I just don't know about them, which demonstrates the problem.) That's not necessarily a bad thing. But I think we should be careful of putting the thumb on the scales too much. -- Steven From 4kir4.1i at gmail.com Fri Sep 19 14:15:49 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Fri, 19 Sep 2014 16:15:49 +0400 Subject: [Python-ideas] The stdlib++ user experience References: Message-ID: <87wq8zst56.fsf@gmail.com> Paul Moore writes: ... > 1. Somewhere I can go to find useful modules, that's better than > Google. Google is very good at searching programming related topics on the web compared to other search engines or custom searches on niche sites. I sporadically try to use alternatives and at best they are good enough but worse than Google. If you know any examples to the contrary, please share. > 2. Someone else choosing the "best option" - I don't want to evaluate > 3 different progressbar modules, I just want to write "57% complete" > and a few dots! The issue is that "best option" is often different for different people or fashion-driven and therefore transient. > 3. C extensions aren't a huge problem to me on Windows, although I'm > looking forward to the day when everyone distributes wheels (wheel > convert is good enough for now though). [1] I have the opposite impression. http://pythonforengineers.com/stop-struggling-with-python-on-windows/ C extensions are not an issue on POSIX systems: there are package managers (official or not) and the compilers are easily available if you want the latest and greatest. ... > [1] A Linux/OS X user might have more more issues with C extensions. > ... > PS I should also note that even in its current state, PyPI is streets > ahead of the 3rd party module story I've experienced for any other > language - C/C++, Lua, Powershell, and Java are all far worse. > Perl/CPAN may be as good or better, it's so long since I used Perl > that I don't really know these days. Opinions may vary: http://www.reddit.com/r/Python/comments/1ew4l5/im_giving_a_demo_of_python_to_a_bunch_of_java/ Or: "The artifact approach is unambiguously better for any production deployment. The source-based approach found in Ruby, Perl, and Python is a problem for me more often than a solution." https://news.ycombinator.com/item?id=7070464 Though wheel binary package format is designed to solve it. -- Akira From p.f.moore at gmail.com Fri Sep 19 14:26:53 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 19 Sep 2014 13:26:53 +0100 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: <20140919115928.GD9293@ando.pearwood.info> References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> <20140919115928.GD9293@ando.pearwood.info> Message-ID: On 19 September 2014 12:59, Steven D'Aprano wrote: > Now obviously there are "best of breed" third party libraries. I don't > see many people trying to build a better BeautifulSoup. (Maybe they are, > and I just don't know about them, which demonstrates the problem.) > That's not necessarily a bad thing. But I think we should be careful of > putting the thumb on the scales too much. Agreed. I hadn't considered the point about discouraging innovation. So it seems to me that the key things needed are: 1. Discoverability - making it easy for package authors to ensure people find their packages. There's a social aspect here as well - quirky project names like celery or freezegun are a part of the Python culture, but they do make things less discoverable (what do those two projects do?). We need a way to balance that. 2. A clearer way for projects to declare what they support. Python 2 or 3 or both? Is Windows supported? And it's not just "yes or no", often it's somewhere in the middle - for example I know projects that want to work on Windows, but don't have the expertise, so they'll accept patches but you have to expect to do some of the work for them... Classifiers should do this, but use is patchy (particularly around the supported OS). 3. Most controversially, some way of rating otherwise-equal packages. Ultimately, I don't want a choice, I just want to install something and move on. Download counts may be the easiest compromise here (at least lots of people use the project...) As well as the technical aspect of making it possible to provide this information (much of it, projects *can* provide right now, although maybe not always as easily as might be ideal) there is a social aspect - making it so that it's the norm for projects to do so, and not doing so is viewed as an indication of overall attention to detail on the project. Paul From p.f.moore at gmail.com Fri Sep 19 14:42:08 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 19 Sep 2014 13:42:08 +0100 Subject: [Python-ideas] The stdlib++ user experience In-Reply-To: <87wq8zst56.fsf@gmail.com> References: <87wq8zst56.fsf@gmail.com> Message-ID: On 19 September 2014 13:15, Akira Li <4kir4.1i at gmail.com> wrote: > Paul Moore writes: >> 3. C extensions aren't a huge problem to me on Windows, although I'm >> looking forward to the day when everyone distributes wheels (wheel >> convert is good enough for now though). [1] > > I have the opposite impression. > http://pythonforengineers.com/stop-struggling-with-python-on-windows/ Well yes, but that article is clearly focused on scientific use (a known "worst case" area) and ends up recommending conda (which is a perfectly fair solution, and as the author states, works well). My comment was a bit unfair, though. I have a C compiler installed (which, although it's not hard to set up VS Express, isn't normal). And I also treat "find a wininst installer or egg and run wheel convert on it" as trivial and acceptable, which it isn't for people who aren't packaging specialists. But we're working on this, and I stand by the statement that when projects routinely distribute wheels if they include C extensions, binary dependencies will be a minor issue. >> PS I should also note that even in its current state, PyPI is streets >> ahead of the 3rd party module story I've experienced for any other >> language - C/C++, Lua, Powershell, and Java are all far worse. >> Perl/CPAN may be as good or better, it's so long since I used Perl >> that I don't really know these days. > > Opinions may vary: > http://www.reddit.com/r/Python/comments/1ew4l5/im_giving_a_demo_of_python_to_a_bunch_of_java/ The discussion here is about packaging, not about finding 3rd party packages to solve a problem. I know of no better way to find a package to parse an ini file in Java than google, which is no better than Python. And what I found was packages on sourceforge and other generic hosting sites. Maven may be a central repository - I've never used it myself as the complexity has always scared me off (you could say that about most of Java, though ;-)) > Or: "The artifact approach is unambiguously better for any production > deployment. The source-based approach found in Ruby, Perl, and Python is > a problem for me more often than a solution." > https://news.ycombinator.com/item?id=7070464 Again, that's deployment rather than discovery. Paul From dickinsm at gmail.com Fri Sep 19 14:44:20 2014 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 19 Sep 2014 13:44:20 +0100 Subject: [Python-ideas] What math.floor(inf) should return? Was: Make `float('inf') //1 == float('inf')` In-Reply-To: References: Message-ID: On Fri, Sep 19, 2014 at 3:16 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > The standards are influenced by the limitation inherent in many languages > where ints have finite range and cannot represent floor() of many finite > floating point values. Python does not have this limitation. (Granted - > PEP 3141 could do a better job explaining why floor, ceil, round, //, etc. > should return Integer rather than Real.) > Indeed. FWIW, I think it was a mistake to change the return type of math.floor and math.ceil in Python 3. There's no longer any way to spell the simple, fast, float->float floor operation. It would be nice to have those basic floating-point math operations accessible again. -- Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Fri Sep 19 14:57:33 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Fri, 19 Sep 2014 16:57:33 +0400 Subject: [Python-ideas] The stdlib++ user experience References: <87wq8zst56.fsf@gmail.com> Message-ID: <87r3z7sr7m.fsf@gmail.com> Paul Moore writes: > On 19 September 2014 13:15, Akira Li <4kir4.1i at gmail.com> wrote: >> Paul Moore writes: > >>> 3. C extensions aren't a huge problem to me on Windows, although I'm >>> looking forward to the day when everyone distributes wheels (wheel >>> convert is good enough for now though). [1] >> >> I have the opposite impression. >> http://pythonforengineers.com/stop-struggling-with-python-on-windows/ > > Well yes, but that article is clearly focused on scientific use (a > known "worst case" area) and ends up recommending conda (which is a > perfectly fair solution, and as the author states, works well). > > My comment was a bit unfair, though. I have a C compiler installed > (which, although it's not hard to set up VS Express, isn't normal). > And I also treat "find a wininst installer or egg and run wheel > convert on it" as trivial and acceptable, which it isn't for people > who aren't packaging specialists. > > But we're working on this, and I stand by the statement that when > projects routinely distribute wheels if they include C extensions, > binary dependencies will be a minor issue. > >>> PS I should also note that even in its current state, PyPI is streets >>> ahead of the 3rd party module story I've experienced for any other >>> language - C/C++, Lua, Powershell, and Java are all far worse. >>> Perl/CPAN may be as good or better, it's so long since I used Perl >>> that I don't really know these days. >> >> Opinions may vary: >> http://www.reddit.com/r/Python/comments/1ew4l5/im_giving_a_demo_of_python_to_a_bunch_of_java/ > > The discussion here is about packaging, not about finding 3rd party > packages to solve a problem. I know of no better way to find a package > to parse an ini file in Java than google, which is no better than > Python. And what I found was packages on sourceforge and other generic > hosting sites. The main idea of my message [1] is that Google is very good at "finding 3rd party packages to solve a problem." If you know a better solution for any programming language; do tell. [1]: https://mail.python.org/pipermail/python-ideas/2014-September/029417.html The rest is just some references to balance your statements. > Maven may be a central repository - I've never used it myself as the > complexity has always scared me off (you could say that about most of > Java, though ;-)) > >> Or: "The artifact approach is unambiguously better for any production >> deployment. The source-based approach found in Ruby, Perl, and Python is >> a problem for me more often than a solution." >> https://news.ycombinator.com/item?id=7070464 > > Again, that's deployment rather than discovery. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Akira From ncoghlan at gmail.com Fri Sep 19 15:08:06 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Sep 2014 23:08:06 +1000 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: <20140919115928.GD9293@ando.pearwood.info> References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> <20140919115928.GD9293@ando.pearwood.info> Message-ID: On 19 September 2014 21:59, Steven D'Aprano wrote: > > Now obviously there are "best of breed" third party libraries. I don't > see many people trying to build a better BeautifulSoup. (Maybe they are, > and I just don't know about them, which demonstrates the problem.) > That's not necessarily a bad thing. But I think we should be careful of > putting the thumb on the scales too much. The problem is that *failing* to provide recommendations can induce analysis paralysis in newcomers, such that they decide a less mature, more centralised ecosystem with fewer choices is "easier", or even "more powerful". While the "easier" can be accurate (since there are fewer choices to make), the "more powerful" is generally. It's just that the discoverability problem is *much* easier to solve when a language community is dominated by a single use case or a single major vendor, so providing opinionated guidance becomes less controversial. This can lead to comparing one language's entire ecosystem to another (less familiar) language's standard library, rather than comparing the full strength of both ecosystems. As an example of how a default recommendation can help address the discoverability problem, these days, if someone is completely new to Python web development, I will tell them "start with Django, and use djangopackages.com for package recommendations". However, I will also tell them that Django isn't necessarily the best fit for everything, so sometimes something like Flask or Pyramid (et al) might be a better choice. The reason I do it that way is that Django is far more opinionated than most other Python web frameworks and makes a lot more decisions *for* you. Experts have legitimate reasons for debating several of Django's choices, but newcomers don't yet know enough to understand when and why you might want to do something differently. Recommending a framework that makes those choices on their behalf is actually helpful at that point, even if it means they'll be missing out on some other cool libraries (at least for the time being). This approach provides an experience far closer to the modern Ruby web development model where Rails is the default choice (by a long way), and folks only later start considering the use of something less opinionated like Sinatra for problems where an all-inclusive solution like Rails is less appropriate. Providing "pip" by default is the same way - there are actually times when it isn't the best choice for solving particular packaging related problems, but it's always enough to get you started, *and* it lets you more easily bootstrap other tools like conda and zc.buildout when they're a better fit. The goal in providing "default recommendations" is thus to minimise the "time to awesome" for newcomers, while gently nudging them in the direction of good development practices, rather than raising barriers in front of them that say "you must first learn how to make this apparently irrelevant decision before you can continue". And if someone never needs to tackle problems that require moving beyond the default recommendations? That's fine - there are an awful lot of people happily solving problems without moving beyond what the standard library or their redistributor provide. You also see this pattern with some of the design guidance we give newcomers like "don't use metaclasses" and "don't use monkeypatching (except as part of a test mocking library)". What we actually mean is "these are power tools, and genuinely hard to use well. By the time you're ready to wield them, you'll likely also have realised that the implied caveat on the standard advice is 'unless it's the only way to solve a specific problem'". That's a complex mouthful to inflict on someone that is still learning though, so we just oversimplify the situation instead (even though the end result is so oversimplified as to technically be a lie). My experience is also that the scientific community appear to be *far* more pragmatic about library choices than the professional development community (or anyone that is interested in programming for its own sake, rather than what it lets us do). We're a bit more prone to looking under the hood and deciding we don't like a library because of how it's made - competing libraries may arise based on differences in opinion regarding good design and development practices, rather than fundamental limitations in what a library makes possible. Scientists and data analysts are far more likely to just grab a library because of what it lets them *do* (or communicate), without looking too closely at how its put together. That greater level of pragmatism then seems to make it easier for category killers to arise at the library level. However, I have to admit this particular idea is a speculative hypothesis, rather than something that has been subjected to rigorous objective analysis. Anyone looking for a PhD topic in sociology? :) Regards, Nick. P.S. As far as I can tell, the relative ease with which dominance can be asserted is also why VCs tend to favour newer languages with more centralised ecosystems - it's easier for them to assert control, and attempt to "punch the ticket" if the language achieves future success. It's much harder to do that for a massive decentralised sprawl like the Python community. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Fri Sep 19 16:05:47 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 19 Sep 2014 23:05:47 +0900 Subject: [Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3 In-Reply-To: References: <20140910105732.GE9293@ando.pearwood.info> <20140911065500.GG9293@ando.pearwood.info> <5416F691.2070800@stoneleaf.us> <20140915144415.GW9293@ando.pearwood.info> <541702EA.2070001@stoneleaf.us> <541787BE.1010805@stoneleaf.us> <541834D0.6010402@trueblade.com> <54185185.2030608@stoneleaf.us> <54197523.6080700@trueblade.com> <87oaud1rtn.fsf@uwakimon.sk.tsukuba.ac.jp> <4353FDEB-373F-4592-8016-CA8500D8DC58@yahoo.com> <878ulh18er.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87tx43zow4.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > What is this "it" that you propose to just do? I'm sure I have an > opinion on it once you describe it to me. I'm sorry, I probably shouldn't have taken your name in vain at this stage. There are no solid proposals yet, the details of format characters, how to use "precision", the symbol to indicate "chunking" etc are all under discussion. Brief summary and links, if you care to read further: At present there are at least three kinds of proposals on the table for a __format__ for bytes objects, with the proposals and dicussion being collected in http://bugs.python.org/issue22385. Eric V. Smith gave the following summary (edited by me for brevity) in https://mail.python.org/pipermail/python-ideas/2014-September/029353.html 1. Support exactly what the standard types (int, str, float, etc.) support, but give slightly different semantics to it. 2. Support a slightly different format specifier. The downside of this is that it might be confusing to some users, who see the printf-like formatting as some universal standard. It's also hard to document. 3. Do something radically different. I gave an example on the issue tracker [cited above], but I'm not totally serious about this. My "Just Do It" was mostly ignoring the possibility of Eric's #3 (Eric was even more deprecatory in the issue, saying "although it's insane, you could ..."). I was specifically referring to Eric's and Andrew Barnhart's discussion of potential confusion, Eric saying "if it's different, users used to printf may get confused" and Andrew saying (among other ideas) "if it's too close to the notation for str, it could exacerbate the existing confusion between bytes and str". I don't see the too close/too different issue as something we can decide without implementing it. Perhaps experience with a PyPI module would give guidance, but I'm not optimistic, the kind of user who would use a PyPI module for this feature is atypical, I think. ***** In somewhat more detail, Nick's original proposal (in that issue) follows existing format strings very closely: "x": display a-f as lowercase digits "X": display A-F as uppercase digits "#": includes 0x prefix ".prec": chunks output, placing a space after every bytes ",": uses a comma as the separator, rather than a space Further discussion and examples in https://mail.python.org/pipermail/python-ideas/2014-September/029352.html. There he made a second proposal, rather different: "h": lowercase hex "H": uppercase hex "A": ASCII (using "." for unprintable & extended ASCII) format(b"xyz", "A") -> 'xyz' format(b"xyz", "h") -> '78797a' format(b"xyz", "H") -> '78797A' Followed by a separator and "chunk size": format(b"xyz", "h 1") -> '78 79 7a' format(b"abcdwxyz", "h 4") -> '61626364 7778797a' format(b"xyz", "h,1") -> '78,79,7a' format(b"abcdwxyz", "h,4") -> '61626364,7778797a' format(b"xyz", "h:1") -> '78:79:7a' format(b"abcdwxyz", "h:4") -> '61626364:7778797a' In the "h" and "H" cases, you could request a preceding "0x" on the chunks: format(b"xyz", "h#") -> '0x78797a' format(b"xyz", "h# 1") -> '0x78 0x79 0x7a' format(b"abcdwxyz", "h# 4") -> '0x61626364 0x7778797a' Nick was clear that all of the notation in the above is tentative in his mind. The third proposal is from Eric Smith, in https://mail.python.org/pipermail/python-ideas/2014-September/029353.html (already cited above): Here's my proposal for #2: The format specifier becomes: [[fill]align][#][width][separator]][/chunksize][type] From abarnert at yahoo.com Fri Sep 19 17:15:23 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 19 Sep 2014 08:15:23 -0700 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> <20140919115928.GD9293@ando.pearwood.info> Message-ID: <14231BE5-26AF-45F9-800B-594A030F4DDF@yahoo.com> On Sep 19, 2014, at 5:26, Paul Moore wrote: > On 19 September 2014 12:59, Steven D'Aprano wrote: >> Now obviously there are "best of breed" third party libraries. I don't >> see many people trying to build a better BeautifulSoup. (Maybe they are, >> and I just don't know about them, which demonstrates the problem.) >> That's not necessarily a bad thing. But I think we should be careful of >> putting the thumb on the scales too much. > > Agreed. I hadn't considered the point about discouraging innovation. > So it seems to me that the key things needed are: > > 1. Discoverability - making it easy for package authors to ensure > people find their packages. There's a social aspect here as well - > quirky project names like celery or freezegun are a part of the Python > culture, but they do make things less discoverable (what do those two > projects do?). We need a way to balance that. There's more to it. Many of the most important packages are doing something that didn't have a category until they invented it. For example, what is BeautifulSoup exactly? Originally, IIRC, it was intended to be a more lenient HTML parser. But nowadays it doesn't even do the parsing itself; it drives a stdlib or other third-party parser. It's sort-of a better (than what?) DOM interface for HTML and XML, which is something no one realized they needed until it existed. If the Python community had the kind of less-fanciful, easier-to-find, specific names that, say, perl has, what would it be called? HTML::Parsers::TagSoupParser? Also, the boring names tend to mean that, when there _is_ an obvious category, the first entrant stakes out the best name, and everyone else ends up with just less-felicitous variants of the same name. The fact that SleekXMPP and Wokkel replaced Jabberpy as the most popular XMPP libraries means that it's pretty easy for me to recommend SleekXMPP over Jabberpy. In another language, you have to tell people to use libXMPPClient or XMPPLib instead of libXMPP. > 2. A clearer way for projects to declare what they support. Python 2 > or 3 or both? Is Windows supported? And it's not just "yes or no", > often it's somewhere in the middle - for example I know projects that > want to work on Windows, but don't have the expertise, so they'll > accept patches but you have to expect to do some of the work for > them... Classifiers should do this, but use is patchy (particularly > around the supported OS). > 3. Most controversially, some way of rating otherwise-equal packages. > Ultimately, I don't want a choice, I just want to install something > and move on. Download counts may be the easiest compromise here (at > least lots of people use the project...) > > As well as the technical aspect of making it possible to provide this > information (much of it, projects *can* provide right now, although > maybe not always as easily as might be ideal) there is a social aspect > - making it so that it's the norm for projects to do so, and not doing > so is viewed as an indication of overall attention to detail on the > project. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Fri Sep 19 17:25:41 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 19 Sep 2014 16:25:41 +0100 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: <14231BE5-26AF-45F9-800B-594A030F4DDF@yahoo.com> References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> <20140919115928.GD9293@ando.pearwood.info> <14231BE5-26AF-45F9-800B-594A030F4DDF@yahoo.com> Message-ID: On 19 September 2014 16:15, Andrew Barnert wrote: > There's more to it. Many of the most important packages are doing something that didn't have a category until they invented it. For example, what is BeautifulSoup exactly? Originally, IIRC, it was intended to be a more lenient HTML parser. But nowadays it doesn't even do the parsing itself; it drives a stdlib or other third-party parser. It's sort-of a better (than what?) DOM interface for HTML and XML, which is something no one realized they needed until it existed. If the Python community had the kind of less-fanciful, easier-to-find, specific names that, say, perl has, what would it be called? HTML::Parsers::TagSoupParser? > > Also, the boring names tend to mean that, when there _is_ an obvious category, the first entrant stakes out the best name, and everyone else ends up with just less-felicitous variants of the same name. The fact that SleekXMPP and Wokkel replaced Jabberpy as the most popular XMPP libraries means that it's pretty easy for me to recommend SleekXMPP over Jabberpy. In another language, you have to tell people to use libXMPPClient or XMPPLib instead of libXMPP. I wasn't trying to say that "boring" names were better - far from it, quirky names are part of the Python culture. But there isn't anything stopping Beautiful Soup adding keywords like "html parser scraping" or something like that. The trick is picking keywords that people will search for, which is often hard (and that's probably why the keyword metadata is underused). Having a registry of commonly-used keywords, and a user interface that means you don't have to go online and search when you're writing your app, might help. Think of something like Stack Overflow's auto-completed tags. I don't have any good solutions here, I just think it *should* be possible for the author of Beautiful Soup to make it clear to potential users that it might be what they are looking for when they think of . As somebody said earlier in this thread, think of it as search engine optimisation for projects. Paul From abarnert at yahoo.com Fri Sep 19 17:39:11 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 19 Sep 2014 08:39:11 -0700 Subject: [Python-ideas] The stdlib++ user experience In-Reply-To: References: <87wq8zst56.fsf@gmail.com> Message-ID: On Sep 19, 2014, at 5:42, Paul Moore wrote: > Maven may be a central repository - I've never used it myself as the > complexity has always scared me off (you could say that about most of > Java, though ;-)) Don't you just go to the authority factory, ask it to give you an authority, then ask the authority for repository recommender factor, and so on through 8 more steps until you get a general package object that you can cast to the type you need? And the beauty is that, thanks to the magic of static type checking with a rigid but weak type system, if it turns out not to be able to do what you want, instead of a TypeError: BeautifulSoup has no SGML parser that you've never seen before? you get the same convenient NullPointerException from your ISGMLParser as in every other bug you've had so far. :) But seriously, I think it is worth looking at what other languages' package systems have that we might want to be jealous of as far as discovery, especially those (Cargo, go pkg, Node, gems, CocoaPods, etc.) that were designed after CPAN and the Cheeseshop and the related communities existed. I think Java, C++, and C may be the _least_ relevant places to look (although I could definitely be wrong about that). For example, some of the newer languages' systems make it easier for you to fork an existing package, and to tie packaging to DVCS repos, with the consequence that if abarnert/spamify falls by the wayside, it might be more likely pmoore/spamify that replaces it than in Python, where it would more likely be a punny PIL->Pillow type of thing. That seems like a nice feature. Do we want that? Can we get from here to there without radical changes? From donald at stufft.io Fri Sep 19 17:45:38 2014 From: donald at stufft.io (Donald Stufft) Date: Fri, 19 Sep 2014 11:45:38 -0400 Subject: [Python-ideas] The stdlib++ user experience In-Reply-To: References: <87wq8zst56.fsf@gmail.com> Message-ID: > On Sep 19, 2014, at 11:39 AM, Andrew Barnert wrote: > > On Sep 19, 2014, at 5:42, Paul Moore wrote: > >> Maven may be a central repository - I've never used it myself as the >> complexity has always scared me off (you could say that about most of >> Java, though ;-)) > > Don't you just go to the authority factory, ask it to give you an authority, then ask the authority for repository recommender factor, and so on through 8 more steps until you get a general package object that you can cast to the type you need? And the beauty is that, thanks to the magic of static type checking with a rigid but weak type system, if it turns out not to be able to do what you want, instead of a TypeError: BeautifulSoup has no SGML parser that you've never seen before? you get the same convenient NullPointerException from your ISGMLParser as in every other bug you've had so far. :) > > But seriously, I think it is worth looking at what other languages' package systems have that we might want to be jealous of as far as discovery, especially those (Cargo, go pkg, Node, gems, CocoaPods, etc.) that were designed after CPAN and the Cheeseshop and the related communities existed. I think Java, C++, and C may be the _least_ relevant places to look (although I could definitely be wrong about that). > > For example, some of the newer languages' systems make it easier for you to fork an existing package, and to tie packaging to DVCS repos, with the consequence that if abarnert/spamify falls by the wayside, it might be more likely pmoore/spamify that replaces it than in Python, where it would more likely be a punny PIL->Pillow type of thing. That seems like a nice feature. Do we want that? Can we get from here to there without radical changes? > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ I think at the point we?re well into the territory of what should move to distutils-sig. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 19 17:54:25 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 19 Sep 2014 08:54:25 -0700 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> <20140919115928.GD9293@ando.pearwood.info> <14231BE5-26AF-45F9-800B-594A030F4DDF@yahoo.com> Message-ID: On Sep 19, 2014, at 8:25, Paul Moore wrote: > On 19 September 2014 16:15, Andrew Barnert wrote: >> There's more to it. Many of the most important packages are doing something that didn't have a category until they invented it. For example, what is BeautifulSoup exactly? Originally, IIRC, it was intended to be a more lenient HTML parser. But nowadays it doesn't even do the parsing itself; it drives a stdlib or other third-party parser. It's sort-of a better (than what?) DOM interface for HTML and XML, which is something no one realized they needed until it existed. If the Python community had the kind of less-fanciful, easier-to-find, specific names that, say, perl has, what would it be called? HTML::Parsers::TagSoupParser? >> >> Also, the boring names tend to mean that, when there _is_ an obvious category, the first entrant stakes out the best name, and everyone else ends up with just less-felicitous variants of the same name. The fact that SleekXMPP and Wokkel replaced Jabberpy as the most popular XMPP libraries means that it's pretty easy for me to recommend SleekXMPP over Jabberpy. In another language, you have to tell people to use libXMPPClient or XMPPLib instead of libXMPP. > > I wasn't trying to say that "boring" names were better - far from it, > quirky names are part of the Python culture. But there isn't anything > stopping Beautiful Soup adding keywords like "html parser scraping" or > something like that. The trick is picking keywords that people will > search for, which is often hard (and that's probably why the keyword > metadata is underused). Having a registry of commonly-used keywords, > and a user interface that means you don't have to go online and search > when you're writing your app, might help. Think of something like > Stack Overflow's auto-completed tags. > > I don't have any good solutions here, I just think it *should* be > possible for the author of Beautiful Soup to make it clear to > potential users that it might be what they are looking for when they > think of . As somebody said earlier in this > thread, think of it as search engine optimisation for projects. That's a good point. BeautifulSoup, lxml, requests, Scrapy, Mechanize, and pyv8 certainly don't belong in any sane category together, but they also clearly share in common the fact that they're useful for (different but overlapping kinds of) scraping projects, and a keyword-influenced search (as opposed to keyword browsing) seems like the right way to get that across. The problem is that search tends to work best when, even if you don't know what you're looking for, you'll easily recognize it when you see it. But if I'm searching for scraping, is Mechanize or BeautifulSoup better? That question doesn't just not have an answer, it doesn't even make sense. I pretty much have to look at what they both do to figure out which one fills the hole in my scraping project. Which means that "search is the answer" tends to conflict with the earlier goal from this discussion of "just tell me the best answer, don't make me research it". That may just be because that goal is impossible to do well, and making it a lot easier to research by giving you pointers at a few different things that you are likely to want to look into. Or it may mean that "find the most stable implementation of frequently-implemented simple thing X" (like a console-mode progress bar) and "find me good tools to help with complex thing Y" (like scraping web sites) are inherently different problems, and there might be a solution that's _helpful_ for both, but anything that's perfect for one will be useless for the other. From toddrjen at gmail.com Fri Sep 19 18:09:35 2014 From: toddrjen at gmail.com (Todd) Date: Fri, 19 Sep 2014 18:09:35 +0200 Subject: [Python-ideas] The stdlib++ user experience (Was: Introduce `start=1` argument to `math.factorial`) In-Reply-To: References: <23C6FBF1-5FB0-4042-93DA-E0C5DB11FFE6@yahoo.com> <20140919115928.GD9293@ando.pearwood.info> <14231BE5-26AF-45F9-800B-594A030F4DDF@yahoo.com> Message-ID: On Fri, Sep 19, 2014 at 5:25 PM, Paul Moore wrote: > On 19 September 2014 16:15, Andrew Barnert wrote: > > There's more to it. Many of the most important packages are doing > something that didn't have a category until they invented it. For example, > what is BeautifulSoup exactly? Originally, IIRC, it was intended to be a > more lenient HTML parser. But nowadays it doesn't even do the parsing > itself; it drives a stdlib or other third-party parser. It's sort-of a > better (than what?) DOM interface for HTML and XML, which is something no > one realized they needed until it existed. If the Python community had the > kind of less-fanciful, easier-to-find, specific names that, say, perl has, > what would it be called? HTML::Parsers::TagSoupParser? > > > > Also, the boring names tend to mean that, when there _is_ an obvious > category, the first entrant stakes out the best name, and everyone else > ends up with just less-felicitous variants of the same name. The fact that > SleekXMPP and Wokkel replaced Jabberpy as the most popular XMPP libraries > means that it's pretty easy for me to recommend SleekXMPP over Jabberpy. In > another language, you have to tell people to use libXMPPClient or XMPPLib > instead of libXMPP. > > I wasn't trying to say that "boring" names were better - far from it, > quirky names are part of the Python culture. But there isn't anything > stopping Beautiful Soup adding keywords like "html parser scraping" or > something like that. The trick is picking keywords that people will > search for, which is often hard (and that's probably why the keyword > metadata is underused). Having a registry of commonly-used keywords, > and a user interface that means you don't have to go online and search > when you're writing your app, might help. Think of something like > Stack Overflow's auto-completed tags. > > Or some sort of keyword aliases, so that if someone searches for one word in the list of aliases, any project with any keyword in the corresponding list of alias is also matched. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Sep 19 18:58:44 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 19 Sep 2014 12:58:44 -0400 Subject: [Python-ideas] Fwd: Make `float('inf') //1 == float('inf')` In-Reply-To: References: <783e53bd-fda3-4f6a-aa42-69596fd8cb05@googlegroups.com> Message-ID: On Thu, Sep 18, 2014 at 3:35 PM, Guido van Rossum wrote: > > In light of PEP 3141, if anything should be done about float // float it > would be to make it return an int and as a consequence inf // 1 should > raise an OverflowError. > > +1 > I opened http://bugs.python.org/issue22444 for this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Sep 20 05:40:09 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 19 Sep 2014 23:40:09 -0400 Subject: [Python-ideas] Introduce `start=1` argument to `math.factorial` In-Reply-To: <20140918175553.26040a79@fsol> References: <59660dbc-35df-4040-ba97-75b7f1cccfad@googlegroups.com> <20140918041349.GH9293@ando.pearwood.info> <20140918144549.GK9293@ando.pearwood.info> <20140918175553.26040a79@fsol> Message-ID: On 9/18/2014 11:55 AM, Antoine Pitrou wrote: > On Thu, 18 Sep 2014 16:37:23 +0100 > Paul Moore wrote: >> On 18 September 2014 16:23, Petr Viktorin wrote: >>> Listing "stdlib++" projects would mean vouching for them, even if only >>> implicitly. Indeed, let's not get too carried away. >> >> Nevertheless, there is community knowledge "out there" on what >> constitute best of breed packages. For example "everyone knows" that >> requests is the thing to use if you want to issue web requests. > > Is it? That sounds like a caricatural statement. If I'm using Tornado, > Twisted or asyncio, then requests is certainly not "the thing to use" > to issue Web requests. And there are many cases where urlopen() is good > enough, as well. Not to mention other contenders such as pycurl. > >> Collecting that knowledge together somewhere so that people for whom >> the above is *not* self-evident could easily find it, would be a >> worthwhile exercise. > > If it's community knowledge, then surely that job can be done by the > community. I don't think Python's official documentation is the right > place to reify that knowledge. In some cases, perhaps most, the official docs could simply point to community-maintained wiki pages. Web requests seems to be a topic with multiple alternatives, and to me a good candidate for a wiki page. -- Terry Jan Reedy From casevh at gmail.com Sat Sep 20 06:08:22 2014 From: casevh at gmail.com (Case Van Horsen) Date: Fri, 19 Sep 2014 21:08:22 -0700 Subject: [Python-ideas] What math.floor(inf) should return? Was: Make `float('inf') //1 == float('inf')` In-Reply-To: References: Message-ID: On Fri, Sep 19, 2014 at 5:44 AM, Mark Dickinson wrote: > On Fri, Sep 19, 2014 at 3:16 AM, Alexander Belopolsky > wrote: >> >> The standards are influenced by the limitation inherent in many languages >> where ints have finite range and cannot represent floor() of many finite >> floating point values. Python does not have this limitation. (Granted - >> PEP 3141 could do a better job explaining why floor, ceil, round, //, etc. >> should return Integer rather than Real.) > > > Indeed. FWIW, I think it was a mistake to change the return type of > math.floor and math.ceil in Python 3. There's no longer any way to spell > the simple, fast, float->float floor operation. It would be nice to have > those basic floating-point math operations accessible again. It looks like the Scheme standards committee agree with you. PEP-3141 references: http://groups.csail.mit.edu/mac/ftpdir/scheme-reports/r5rs-html/r5rs_8.html#SEC50 There is a later version that specifically discusses the behavior for +-Inf and Nan. http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-14.html#node_sec_11.7.2 Here is the pertinent section: Although infinities and NaNs are not integer objects, these procedures return an infinity when given an infinity as an argument, and a NaN when given a NaN. (floor -4.3) ?? -5.0 (ceiling -4.3) ?? -4.0 (truncate -4.3) ?? -4.0 (round -4.3) ?? -4.0 (floor 3.5) ?? 3.0 (ceiling 3.5) ?? 4.0 (truncate 3.5) ?? 3.0 (round 3.5) ?? 4.0 (round 7/2) ?? 4 (round 7) ?? 7 (floor +inf.0) ?? +inf.0 (ceiling -inf.0) ?? -inf.0 (round +nan.0) ?? +nan.0 Also note that a floating point representation is used for the results when the argument is floating point number. Scheme does not use the internal representation (IEEE-754 or long int or....) to determine if a number is an integer or real but rather its value. Here is an example from a Scheme session: (integer? 2) ;Value: #t (integer? 2.0) ;Value: #t (integer? 2.1) ;Value: #f (#t and #f are equivalent to True and False.) I think PEP-3141 incorrectly interpreted the meaning of "integer" to imply the use of an "integer representation" while I think the Scheme standard implies an "integer value"; i.e. x is an "integer" iff x-round(x) == 0. > > -- > Mark > From mistersheik at gmail.com Sat Sep 20 17:50:21 2014 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 20 Sep 2014 11:50:21 -0400 Subject: [Python-ideas] keyword for introducing generators In-Reply-To: References: <084223ab-aa29-43e3-b3bd-1af57b07d859@googlegroups.com> <53EFD891.5030602@sotecware.net> Message-ID: Sorry for the late reply. If I mark a function as returning Iterable, the linter can check that I use yield in the function, but if I don't mark the function as returning anything, the linter should also check that I didn't accidentally use yield, which is the time-consuming bug that we're talking about. So this works for me. Best, Neil On Sat, Aug 16, 2014 at 7:42 PM, Guido van Rossum wrote: > On Sat, Aug 16, 2014 at 3:17 PM, Jonas Wielicki > wrote: > >> On 16.08.2014 23:46, Neil Girdhar wrote: >> > I'm sure this has been suggested before, but I just spent two days >> trying >> > to figure out why a method wasn't being called only to find that I'd >> > accidentally pasted a yield into the function. What is the argument >> > against a different keyword for introducing generator functions/methods? >> > >> > If it's backward compatibility, then my suggestion to have a from >> > __future__ and then make it real in Python 4. >> >> For what it?s worth, I know this problem very well, and it can take >> hours to figure out whats wrong. >> > > A linter should be able to figure this out. For example, mypy will insist > that a generator has a return type of Iterable[...]. So maybe you won't > have to wait for Python 4; if the mypy proposal goes forward you will be > able to use type annotations to distinguish generators. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/5-Qm2od4xQ8/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Sat Sep 20 17:18:54 2014 From: ram.rachum at gmail.com (Ram Rachum) Date: Sat, 20 Sep 2014 08:18:54 -0700 (PDT) Subject: [Python-ideas] Putting `blist` into collections module Message-ID: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> Hi everybody, In 2007, this PEP was created that suggested integrating Daniel Stutzbach's blist into Python: http://legacy.python.org/dev/peps/pep-3128/ The PEP was rejected, but Raymond Hettinger made a note that "after a few months, I intend to poll comp.lang.python for BList success stories. If they exist, then I have no problem with inclusion in the collections module." I realize that way more than a few months have passed, but I'd still like to give my input. I really wish that `blist` would be made available in the `collections` module. I'm working on an open-source project right now that needs to use it and I'm really reluctant to include `blist` as a dependency, given that it would basically mean my package wouldn't be pip-installable on Windows machines. Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Sep 21 06:37:39 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 21 Sep 2014 14:37:39 +1000 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> Message-ID: <20140921043739.GD29494@ando.pearwood.info> On Sat, Sep 20, 2014 at 08:18:54AM -0700, Ram Rachum wrote: > Hi everybody, > > In 2007, this PEP was created that suggested integrating Daniel Stutzbach's > blist into Python: http://legacy.python.org/dev/peps/pep-3128/ > > The PEP was rejected, but Raymond Hettinger made a note that "after a few > months, I intend to poll comp.lang.python for BList success stories. If > they exist, then I have no problem with inclusion in the collections > module." I have not used blist, but I have no objection to it becoming a collections type if the author agrees. > I realize that way more than a few months have passed, but I'd still like > to give my input. I really wish that `blist` would be made available in the > `collections` module. I'm working on an open-source project right now that > needs to use it and I'm really reluctant to include `blist` as a > dependency, given that it would basically mean my package wouldn't be > pip-installable on Windows machines. Does it need to be a dependency though? You could make it optional. Assuming that blist has the same API as a list, you could do: try: from blist import blist except ImportError: blist = list which makes blist an optimization, if and when it is available. -- Steven From dw+python-ideas at hmmz.org Sun Sep 21 07:18:38 2014 From: dw+python-ideas at hmmz.org (David Wilson) Date: Sun, 21 Sep 2014 05:18:38 +0000 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140921043739.GD29494@ando.pearwood.info> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> Message-ID: <20140921051838.GA863@k2> On Sun, Sep 21, 2014 at 02:37:39PM +1000, Steven D'Aprano wrote: > > In 2007, this PEP was created that suggested integrating Daniel Stutzbach's > > blist into Python: http://legacy.python.org/dev/peps/pep-3128/ > > > > The PEP was rejected, but Raymond Hettinger made a note that "after a few > > months, I intend to poll comp.lang.python for BList success stories. If > > they exist, then I have no problem with inclusion in the collections > > module." > > I have not used blist, but I have no objection to it becoming a > collections type if the author agrees. It seems unsettling that we'd consider adding another special use collection to the stdlib, when more widely applicable generalizations of it are missing. In blist's case, it can (mostly) be trivially reimplemented using an ordered map, which the standard library lacks. The remainder of blist (IIRC) are some fancy slicing and merging methods that exploit the underlying structure. Even after reviewing the original PEP, the presence of OrderedDict (and particularly under that moniker) feels wrong. Since its addition, in every case I've encountered it in commercial code, the use has been superfluous, diabolically miscomprehended, or used as a hacky stand-in for some cleaner, simpler approach. Coming from this perspective, I'd prefer that further additions were limited to clean and far better understood structures. In this light, could we perhaps instead discuss the merits of a collections.Tree, collections.SortedDict or similar? > which makes blist an optimization, if and when it is available. blist has functionality that would require significant boilerplate to replicate in normal code, so it's definitely not just useful as an optimization. David From guido at python.org Sun Sep 21 07:36:11 2014 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Sep 2014 22:36:11 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140921051838.GA863@k2> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> Message-ID: So write a PEP for sorted dict and tree. I rarely seem to need ordereddict myself, but it makes a clean LRU cache. On Saturday, September 20, 2014, David Wilson wrote: > On Sun, Sep 21, 2014 at 02:37:39PM +1000, Steven D'Aprano wrote: > > > > In 2007, this PEP was created that suggested integrating Daniel > Stutzbach's > > > blist into Python: http://legacy.python.org/dev/peps/pep-3128/ > > > > > > The PEP was rejected, but Raymond Hettinger made a note that "after a > few > > > months, I intend to poll comp.lang.python for BList success stories. If > > > they exist, then I have no problem with inclusion in the collections > > > module." > > > > I have not used blist, but I have no objection to it becoming a > > collections type if the author agrees. > > It seems unsettling that we'd consider adding another special use > collection to the stdlib, when more widely applicable generalizations of > it are missing. In blist's case, it can (mostly) be trivially > reimplemented using an ordered map, which the standard library lacks. > The remainder of blist (IIRC) are some fancy slicing and merging methods > that exploit the underlying structure. > > Even after reviewing the original PEP, the presence of OrderedDict (and > particularly under that moniker) feels wrong. Since its addition, in > every case I've encountered it in commercial code, the use has been > superfluous, diabolically miscomprehended, or used as a hacky stand-in > for some cleaner, simpler approach. > > Coming from this perspective, I'd prefer that further additions were > limited to clean and far better understood structures. In this light, > could we perhaps instead discuss the merits of a collections.Tree, > collections.SortedDict or similar? > > > > which makes blist an optimization, if and when it is available. > > blist has functionality that would require significant boilerplate to > replicate in normal code, so it's definitely not just useful as an > optimization. > > > David > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Sep 21 07:50:32 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Sep 2014 15:50:32 +1000 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140921051838.GA863@k2> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> Message-ID: On 21 September 2014 15:18, David Wilson wrote: > > Even after reviewing the original PEP, the presence of OrderedDict (and > particularly under that moniker) feels wrong. Since its addition, in > every case I've encountered it in commercial code, the use has been > superfluous, diabolically miscomprehended, or used as a hacky stand-in > for some cleaner, simpler approach. The main intended use case for OrderedDict was preserving insertion order, such as when executing code, parsing a JSON file, or maintaining an LRU cache. For many cases involving a *sorted* key, just sorting when necessary is often easier than preserving sort order on every update (not necessarily *faster*, but often fast enough that the extra dependency isn't justified). That's the background any proposal needs to compete against: OrderedDict covers preserving insertion order, while actually sorting the keys or items when the sorted order is needed covers the sorting case. The needle to be threaded to get a "sorted container" into the standard library is clearly explaining *in terms a relatively junior programmer can understand* when you would use it over dict or OrderedDict. In particular, it likely needs to explain the trade-offs between maintaining sort order on insert, and using an unordered container that is sorted for display or serialisation. That justification needs to be in the PEP to justify the initial inclusion, but also in the eventual docs to help folks know when their current problem is the "it" in There Should Be One Obvious Way To Do It for the new type. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan at drees.name Sun Sep 21 11:21:15 2014 From: stefan at drees.name (Stefan Drees) Date: Sun, 21 Sep 2014 11:21:15 +0200 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> Message-ID: <541E988B.4010701@drees.name> On 2014-09-21 07:36 +02:00, Guido van Rossum wrote: > So write a PEP for sorted dict and tree. but let us all read it - and the implementation samples - exactly before even thinking about rejecing it ;-) as in the rejected PEP3128 I read (in the fifth paragraph of http://legacy.python.org/dev/peps/pep-3128/#use-case-trade-offs not counting the list): """ The performance for the LIFO use case could be improved to O(n) time, by caching a pointer to the right-most leaf within the root node. For lists that do not change size, the common case of sequential access could also be improved to O(n) time via caching in the root node. [...] """ which - not really being in the topic - I would rather expect to read: """ The performance for the LIFO use case could be improved to O(1) time, by caching a pointer to the right-most leaf within the root node. For lists that do not change size, the common case of sequential access could also be improved to O(1) time via caching in the root node. [...] """ At least this is what I would consider an enhancement over O(log n) and also expect from a single cached pointer implementation and the like. All the best, Stefan. > I rarely seem to need ordereddict myself, but it makes a clean LRU cache. > > On Saturday, September 20, 2014, David Wilson > wrote: > > On Sun, Sep 21, 2014 at 02:37:39PM +1000, Steven D'Aprano wrote: > > > > In 2007, this PEP was created that suggested integrating Daniel > Stutzbach's > > > blist into Python: http://legacy.python.org/dev/peps/pep-3128/ > > > > > > The PEP was rejected, but Raymond Hettinger made a note that > "after a few > > > months, I intend to poll comp.lang.python for BList success > stories. If > > > they exist, then I have no problem with inclusion in the > collections > > > module." > > > > I have not used blist, but I have no objection to it becoming a > > collections type if the author agrees. > > It seems unsettling that we'd consider adding another special use > collection to the stdlib, when more widely applicable generalizations of > it are missing. In blist's case, it can (mostly) be trivially > reimplemented using an ordered map, which the standard library lacks. > The remainder of blist (IIRC) are some fancy slicing and merging methods > that exploit the underlying structure. > > Even after reviewing the original PEP, the presence of OrderedDict (and > particularly under that moniker) feels wrong. Since its addition, in > every case I've encountered it in commercial code, the use has been > superfluous, diabolically miscomprehended, or used as a hacky stand-in > for some cleaner, simpler approach. > > Coming from this perspective, I'd prefer that further additions were > limited to clean and far better understood structures. In this light, > could we perhaps instead discuss the merits of a collections.Tree, > collections.SortedDict or similar? > > > > which makes blist an optimization, if and when it is available. > > blist has functionality that would require significant boilerplate to > replicate in normal code, so it's definitely not just useful as an > optimization. > > > David > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > --Guido van Rossum (on iPad) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From solipsis at pitrou.net Sun Sep 21 11:54:17 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Sep 2014 11:54:17 +0200 Subject: [Python-ideas] Putting `blist` into collections module References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <541E988B.4010701@drees.name> Message-ID: <20140921115417.6ebdb5cc@fsol> On Sun, 21 Sep 2014 11:21:15 +0200 Stefan Drees wrote: > On 2014-09-21 07:36 +02:00, Guido van Rossum wrote: > > So write a PEP for sorted dict and tree. > > but let us all read it - and the implementation samples - exactly before > even thinking about rejecing it ;-) as in the rejected PEP3128 I read > (in the fifth paragraph of > http://legacy.python.org/dev/peps/pep-3128/#use-case-trade-offs not > counting the list): > > """ > The performance for the LIFO use case could be improved to O(n) time, by > caching a pointer to the right-most leaf within the root node. For lists > that do not change size, the common case of sequential access could also > be improved to O(n) time via caching in the root node. [...] > """ > > which - not really being in the topic - I would rather expect to read: > """ > The performance for the LIFO use case could be improved to O(1) time, by > caching a pointer to the right-most leaf within the root node. For lists > that do not change size, the common case of sequential access could also > be improved to O(1) time via caching in the root node. [...] > """ > > At least this is what I would consider an enhancement over O(log n) and > also expect from a single cached pointer implementation and the like. I suspect O(n) means when n doing LIFO iterations, so O(1) amortized. Also, the LIFO (i.e. stack) use case was important for the prospect of replacing the list type with blist. If blist is merely a new container, the LIFO use case is perfectly satisfied by the list type already. By the way, one should remember the PEP was written years ago. I don't know how much the blist type has changed (the author doesn't seem to provide a changelog, unfortunately), but according to the following benchmarks there doesn't seem to be any remaining performance drawback compared to list: http://stutzbachenterprises.com/performance-blist Regards Antoine. From solipsis at pitrou.net Sun Sep 21 11:56:40 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Sep 2014 11:56:40 +0200 Subject: [Python-ideas] Putting `blist` into collections module References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> Message-ID: <20140921115640.40b23c35@fsol> On Sun, 21 Sep 2014 15:50:32 +1000 Nick Coghlan wrote: > On 21 September 2014 15:18, David Wilson wrote: > > > > Even after reviewing the original PEP, the presence of OrderedDict (and > > particularly under that moniker) feels wrong. Since its addition, in > > every case I've encountered it in commercial code, the use has been > > superfluous, diabolically miscomprehended, or used as a hacky stand-in > > for some cleaner, simpler approach. > > The main intended use case for OrderedDict was preserving insertion > order, such as when executing code, parsing a JSON file, or > maintaining an LRU cache. > > For many cases involving a *sorted* key, just sorting when necessary > is often easier than preserving sort order on every update (not > necessarily *faster*, but often fast enough that the extra dependency > isn't justified). Except when precisely you need to preserve sort order after every update ;-) Which is at least O(n) using .sort(), but O(log n) using an appropriate structure. That said, I agree with your basic point that OrderedDict is quite useful in itself, and not a poor man's replacement for a binary tree. (what's more, the O(1) lookup behaviour makes it superior to a tree for the many cases where lookup performance is dominant) Regards Antoine. From solipsis at pitrou.net Sun Sep 21 12:04:02 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Sep 2014 12:04:02 +0200 Subject: [Python-ideas] Putting `blist` into collections module References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> Message-ID: <20140921120402.0aa7ccb3@fsol> On Sun, 21 Sep 2014 05:18:38 +0000 David Wilson wrote: > On Sun, Sep 21, 2014 at 02:37:39PM +1000, Steven D'Aprano wrote: > > > > In 2007, this PEP was created that suggested integrating Daniel Stutzbach's > > > blist into Python: http://legacy.python.org/dev/peps/pep-3128/ > > > > > > The PEP was rejected, but Raymond Hettinger made a note that "after a few > > > months, I intend to poll comp.lang.python for BList success stories. If > > > they exist, then I have no problem with inclusion in the collections > > > module." > > > > I have not used blist, but I have no objection to it becoming a > > collections type if the author agrees. > > It seems unsettling that we'd consider adding another special use > collection to the stdlib, when more widely applicable generalizations of > it are missing. In blist's case, it can (mostly) be trivially > reimplemented using an ordered map, which the standard library lacks. But can it be *efficiently* reimplemented using an ordered map? blist uses chunks of 128 pointers, which makes it almost as memory efficient as a standard list. A traditional ordered map (I suppose you mean some kind of balanced tree) would generally show much more overhead, if only because it has to store the keys explicitly. And also, you now have to do costly key comparisons every time you do a lookup ("costly" because they go through the object layer and its indirections, as opposed to simple integer arithmetic). > Coming from this perspective, I'd prefer that further additions were > limited to clean and far better understood structures. In this light, > could we perhaps instead discuss the merits of a collections.Tree, > collections.SortedDict or similar? This sounds like a false dilemma. We could have a collections.blist *and* a collections.Tree. Regards Antoine. From breamoreboy at yahoo.co.uk Sun Sep 21 13:32:34 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 21 Sep 2014 12:32:34 +0100 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> Message-ID: On 20/09/2014 16:18, Ram Rachum wrote: > Hi everybody, > > In 2007, this PEP was created that suggested integrating > Daniel Stutzbach's blist into Python: > http://legacy.python.org/dev/peps/pep-3128/ > > The PEP was rejected, but Raymond Hettinger made a note that "after a > few months, I intend to poll comp.lang.python for BList success stories. > If they exist, then I have no problem with inclusion in the collections > module." > > I realize that way more than a few months have passed, but I'd still > like to give my input. I really wish that `blist` would be made > available in the `collections` module. I'm working on an open-source > project right now that needs to use it and I'm really reluctant to > include `blist` as a dependency, given that it would basically me an my > package wouldn't be pip-installable on Windows machines. > > > Thanks, > Ram. > I think we need one PEP that says "let's include everything from pypi in the stdlib". Pros - saves writing and reiewing lots of PEPs. Cons - maintainance of Python is made fractionally more difficult. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ncoghlan at gmail.com Sun Sep 21 14:36:35 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Sep 2014 22:36:35 +1000 Subject: [Python-ideas] Fwd: Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> Message-ID: And forwarding to the list because Google Groups is still broken as a mailing list mirror. ---------- Forwarded message ---------- From: Nick Coghlan Date: 21 September 2014 22:26 Subject: Re: [Python-ideas] Putting `blist` into collections module To: Ram Rachum Cc: "python-ideas at googlegroups.com" On 21 September 2014 01:18, Ram Rachum wrote: > I realize that way more than a few months have passed, but I'd still like to > give my input. I really wish that `blist` would be made available in the > `collections` module. I'm working on an open-source project right now that > needs to use it and I'm really reluctant to include `blist` as a dependency, > given that it would basically mean my package wouldn't be pip-installable on > Windows machines. I didn't originally notice that your main concern was with installation on Windows. For that particular concern, I suggest filing an RFE with the blist project, requesting that they publish wheel files for Windows. Offering to help in building them would likely be appreciated, as many projects may not have access to systems that allow them to build Windows binaries (alternatively, if Daniel approached the PSF for assistance in providing Windows binaries, we can generally provide some help in situations like that, especially to folks that have already been accepted as CPython core developers). "Compilation can be hard on Windows" is no longer a factor that is taken into account when deciding whether or not to add things to the standard library - we're taking other steps to deal with that problem as part of the packaging toolchain, and one of the key ones is allowing publication of binary wheels for Mac OS X and Windows on PyPI. (Further down the track, we'd like to offer a build farm as part of PyPI, but we're still a *long* way from reaching a point where such a proposal can be seriously considered) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sun Sep 21 14:53:22 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Sep 2014 14:53:22 +0200 Subject: [Python-ideas] Putting `blist` into collections module References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> Message-ID: <20140921145322.7f1f38c6@fsol> On Sun, 21 Sep 2014 22:36:35 +1000 Nick Coghlan wrote: > > "Compilation can be hard on Windows" is no longer a factor that is > taken into account when deciding whether or not to add things to the > standard library Where is the pronouncement or discussion on that point? > - we're taking other steps to deal with that problem > as part of the packaging toolchain, and one of the key ones is > allowing publication of binary wheels for Mac OS X and Windows on > PyPI. I don't see how that changes anything. The hard (or, at least, tedious) part is not publishing packages, it's building the packages in the first place. "setup.py upload" has always worked fine. Regards Antoine. From ncoghlan at gmail.com Sun Sep 21 15:20:14 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Sep 2014 23:20:14 +1000 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140921145322.7f1f38c6@fsol> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: On 21 September 2014 22:53, Antoine Pitrou wrote: > > On Sun, 21 Sep 2014 22:36:35 +1000 > Nick Coghlan wrote: >> >> "Compilation can be hard on Windows" is no longer a factor that is >> taken into account when deciding whether or not to add things to the >> standard library > > Where is the pronouncement or discussion on that point? "no longer taken into acount" was too strong - "has significantly less weight due to the existence of other options" is more accurate. It's still part of the rationale for the ssl feature backports, for example. >> - we're taking other steps to deal with that problem >> as part of the packaging toolchain, and one of the key ones is >> allowing publication of binary wheels for Mac OS X and Windows on >> PyPI. > > I don't see how that changes anything. The hard (or, at least, tedious) > part is not publishing packages, it's building the packages in the first > place. "setup.py upload" has always worked fine. Yep, one of the later steps is a build farm integrated into PyPI, so you can just upload the tarball in most cases and you're done. We' re still a *long* way from having that be feasible at this point, but we'll get there eventually. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From graffatcolmingov at gmail.com Sun Sep 21 15:27:45 2014 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Sun, 21 Sep 2014 08:27:45 -0500 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: On Sun, Sep 21, 2014 at 8:20 AM, Nick Coghlan wrote: > On 21 September 2014 22:53, Antoine Pitrou wrote: >> >> On Sun, 21 Sep 2014 22:36:35 +1000 >> Nick Coghlan wrote: >>> >>> "Compilation can be hard on Windows" is no longer a factor that is >>> taken into account when deciding whether or not to add things to the >>> standard library >> >> Where is the pronouncement or discussion on that point? > > "no longer taken into acount" was too strong - "has significantly less > weight due to the existence of other options" is more accurate. It's > still part of the rationale for the ssl feature backports, for > example. > >>> - we're taking other steps to deal with that problem >>> as part of the packaging toolchain, and one of the key ones is >>> allowing publication of binary wheels for Mac OS X and Windows on >>> PyPI. >> >> I don't see how that changes anything. The hard (or, at least, tedious) >> part is not publishing packages, it's building the packages in the first >> place. "setup.py upload" has always worked fine. > > Yep, one of the later steps is a build farm integrated into PyPI, so > you can just upload the tarball in most cases and you're done. We' re > still a *long* way from having that be feasible at this point, but > we'll get there eventually. > > Cheers, > Nick. There's also an option that's free for Open Source that I've been looking at for some Ruby projects I maintain. AppVeyor [1] is a continuous integration system that integrates well with services like GitHub and BitBucket and will build wheels for Python projects once they've passed tests. This may be a good solution until PyPI can produce a build farm. A quick search of GitHub shows that this seems to be picking up momentum in the Python community more than others. [1]: http://www.appveyor.com/ From ncoghlan at gmail.com Sun Sep 21 15:47:05 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Sep 2014 23:47:05 +1000 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: On 21 September 2014 23:27, Ian Cordasco wrote: > There's also an option that's free for Open Source that I've been > looking at for some Ruby projects I maintain. AppVeyor [1] is a > continuous integration system that integrates well with services like > GitHub and BitBucket and will build wheels for Python projects once > they've passed tests. This may be a good solution until PyPI can > produce a build farm. Oh, that's very cool - yes, I'll definitely recommend it to folks now I'm aware of it :) > A quick search of GitHub shows that this seems to be picking up > momentum in the Python community more than others. If it means more wheel files on PyPI, even before we're able to put together a build farm, great :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Sun Sep 21 22:54:53 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 21 Sep 2014 21:54:53 +0100 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: On 21 September 2014 14:47, Nick Coghlan wrote: > On 21 September 2014 23:27, Ian Cordasco wrote: >> There's also an option that's free for Open Source that I've been >> looking at for some Ruby projects I maintain. AppVeyor [1] is a >> continuous integration system that integrates well with services like >> GitHub and BitBucket and will build wheels for Python projects once >> they've passed tests. This may be a good solution until PyPI can >> produce a build farm. > > Oh, that's very cool - yes, I'll definitely recommend it to folks now > I'm aware of it :) That's a *very* good point. I was aware of AppVeyor as a CI tool, I'd thought of it as essentially "Travis for Windows" but it had never occurred to me that it would work for building wheels as well. I may try to put together a "How to set up AppVeyor to build wheels for your project" document - Ian, do you have any examples of projects doing this, that I could look to for details? Paul From rymg19 at gmail.com Sun Sep 21 23:47:46 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 21 Sep 2014 16:47:46 -0500 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: I'm guessing the appveyor.yml file might look like this: install: - cinst python - cinst pip - pip install wheel build: off # It's Python; no building allowed! test_script: - py.test # or whatever to run tests deploy_script: - python setup.py sdist bdist_wheel upload On Sun, Sep 21, 2014 at 3:54 PM, Paul Moore wrote: > On 21 September 2014 14:47, Nick Coghlan wrote: > > On 21 September 2014 23:27, Ian Cordasco > wrote: > >> There's also an option that's free for Open Source that I've been > >> looking at for some Ruby projects I maintain. AppVeyor [1] is a > >> continuous integration system that integrates well with services like > >> GitHub and BitBucket and will build wheels for Python projects once > >> they've passed tests. This may be a good solution until PyPI can > >> produce a build farm. > > > > Oh, that's very cool - yes, I'll definitely recommend it to folks now > > I'm aware of it :) > > That's a *very* good point. I was aware of AppVeyor as a CI tool, I'd > thought of it as essentially "Travis for Windows" but it had never > occurred to me that it would work for building wheels as well. > > I may try to put together a "How to set up AppVeyor to build wheels > for your project" document - Ian, do you have any examples of projects > doing this, that I could look to for details? > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Sep 22 00:16:45 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 21 Sep 2014 23:16:45 +0100 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: On 21 September 2014 22:47, Ryan Gonzalez wrote: > I'm guessing the appveyor.yml file might look like this: > > install: > - cinst python > - cinst pip > - pip install wheel > > build: off # It's Python; no building allowed! > > test_script: > - py.test # or whatever to run tests > > deploy_script: > - python setup.py sdist bdist_wheel upload The one I'm working from (cookiecutter) is more complex - essentially because it manually installs Python etc. I'd not seen cinst before, but from a quick search I see that's chocolatey. So yes, something like that. Can you specify which version of Python cinst installs? You'd actually want to make sure you had all the versions of Python you supported installed. Also, you probably couldn't do the upload in deploy_script unless you were willing to store your credentials in AppVeyor. But essentially, it's not hard to set up, AFAICT. Paul From p.f.moore at gmail.com Mon Sep 22 00:18:23 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 21 Sep 2014 23:18:23 +0100 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: On 21 September 2014 23:16, Paul Moore wrote: > On 21 September 2014 22:47, Ryan Gonzalez wrote: >> I'm guessing the appveyor.yml file might look like this: >> >> install: >> - cinst python >> - cinst pip >> - pip install wheel >> >> build: off # It's Python; no building allowed! >> >> test_script: >> - py.test # or whatever to run tests >> >> deploy_script: >> - python setup.py sdist bdist_wheel upload > > The one I'm working from (cookiecutter) is more complex - essentially > because it manually installs Python etc. I'd not seen cinst before, > but from a quick search I see that's chocolatey. So yes, something > like that. Can you specify which version of Python cinst installs? > You'd actually want to make sure you had all the versions of Python > you supported installed. > > Also, you probably couldn't do the upload in deploy_script unless you > were willing to store your credentials in AppVeyor. > > But essentially, it's not hard to set up, AFAICT. > Paul Also, I've yet to work out how to get AppVeyor to do 32-bit and 64-bit builds. But I'm looking into it... Paul From rymg19 at gmail.com Mon Sep 22 00:24:04 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 21 Sep 2014 17:24:04 -0500 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: After more testing, my code doesn't work. AppVeyor seems to come with Python 3.4 and 2.7 installed in their usual locations on drive C. I'm testing it out at https://github.com/kirbyfan64/appveyor_python. My new file looks like this: install: - ps: (new-object net.webclient).DownloadFile(' https://raw.github.com/pypa/pip/master/contrib/get-pip.py', 'C:/get-pip.py') # install pip - C:/Python34/python.exe C:/get-pip.py # install pip part 2 - C:/Python34/Scripts/pip.exe install wheel # install wheel build_script: - python setup.py build # or whatever test_script: - py.test # or something else deploy_script: - python setup.py sdist bdist_wheel upload The build hasn't completed, though; I'm still waiting for the results. I keep making dumb mistakes with pytest. Status is at https://ci.appveyor.com/project/kirbyfan64/appveyor-python. On Sun, Sep 21, 2014 at 5:16 PM, Paul Moore wrote: > On 21 September 2014 22:47, Ryan Gonzalez wrote: > > I'm guessing the appveyor.yml file might look like this: > > > > install: > > - cinst python > > - cinst pip > > - pip install wheel > > > > build: off # It's Python; no building allowed! > > > > test_script: > > - py.test # or whatever to run tests > > > > deploy_script: > > - python setup.py sdist bdist_wheel upload > > The one I'm working from (cookiecutter) is more complex - essentially > because it manually installs Python etc. I'd not seen cinst before, > but from a quick search I see that's chocolatey. So yes, something > like that. Can you specify which version of Python cinst installs? > You'd actually want to make sure you had all the versions of Python > you supported installed. > > Also, you probably couldn't do the upload in deploy_script unless you > were willing to store your credentials in AppVeyor. > > But essentially, it's not hard to set up, AFAICT. > Paul > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Mon Sep 22 00:50:02 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 21 Sep 2014 17:50:02 -0500 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: Also, you might find this interesting: http://kirbyfan64.github.io/posts/using-appveyor-to-distribute-python-wheels.html . On Sun, Sep 21, 2014 at 5:16 PM, Paul Moore wrote: > On 21 September 2014 22:47, Ryan Gonzalez wrote: > > I'm guessing the appveyor.yml file might look like this: > > > > install: > > - cinst python > > - cinst pip > > - pip install wheel > > > > build: off # It's Python; no building allowed! > > > > test_script: > > - py.test # or whatever to run tests > > > > deploy_script: > > - python setup.py sdist bdist_wheel upload > > The one I'm working from (cookiecutter) is more complex - essentially > because it manually installs Python etc. I'd not seen cinst before, > but from a quick search I see that's chocolatey. So yes, something > like that. Can you specify which version of Python cinst installs? > You'd actually want to make sure you had all the versions of Python > you supported installed. > > Also, you probably couldn't do the upload in deploy_script unless you > were willing to store your credentials in AppVeyor. > > But essentially, it's not hard to set up, AFAICT. > Paul > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmcgibbo at gmail.com Mon Sep 22 02:05:40 2014 From: rmcgibbo at gmail.com (Robert McGibbon) Date: Sun, 21 Sep 2014 17:05:40 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921145322.7f1f38c6@fsol> Message-ID: The following python appveyor examples might be useful: - https://github.com/rmcgibbo/python-appveyor-conda-example - https://github.com/ogrisel/python-appveyor-demo -Robert On Sun, Sep 21, 2014 at 3:50 PM, Ryan Gonzalez wrote: > Also, you might find this interesting: > http://kirbyfan64.github.io/posts/using-appveyor-to-distribute-python-wheels.html > . > > On Sun, Sep 21, 2014 at 5:16 PM, Paul Moore wrote: > >> On 21 September 2014 22:47, Ryan Gonzalez wrote: >> > I'm guessing the appveyor.yml file might look like this: >> > >> > install: >> > - cinst python >> > - cinst pip >> > - pip install wheel >> > >> > build: off # It's Python; no building allowed! >> > >> > test_script: >> > - py.test # or whatever to run tests >> > >> > deploy_script: >> > - python setup.py sdist bdist_wheel upload >> >> The one I'm working from (cookiecutter) is more complex - essentially >> because it manually installs Python etc. I'd not seen cinst before, >> but from a quick search I see that's chocolatey. So yes, something >> like that. Can you specify which version of Python cinst installs? >> You'd actually want to make sure you had all the versions of Python >> you supported installed. >> >> Also, you probably couldn't do the upload in deploy_script unless you >> were willing to store your credentials in AppVeyor. >> >> But essentially, it's not hard to set up, AFAICT. >> Paul >> > > > > -- > Ryan > If anybody ever asks me why I prefer C++ to C, my answer will be simple: > "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was > nul-terminated." > Personal reality distortion fields are immune to contradictory evidence. - > srean > Check out my website: http://kirbyfan64.github.io/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From grant.jenks at gmail.com Mon Sep 22 02:30:06 2014 From: grant.jenks at gmail.com (Grant Jenks) Date: Sun, 21 Sep 2014 17:30:06 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140921120402.0aa7ccb3@fsol> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <20140921120402.0aa7ccb3@fsol> Message-ID: Long time lurker, first time poster. I think there may be multiple discussions happening here so I wanted to highlight a competing module. I think blist.blist is an excellent data type with a lot of value. But the performance graphs can be a bit misleading if you think they apply to the sortedlist, sorteddict, and sortedset types. In those scenarios, I do not believe blist is the best choice. The SortedContainers module (https://pypi.python.org/pypi/sortedcontainers) provides SortedList, SortedDict, and SortedSet data types. It is implemented in pure-Python, has 100% coverage and hours of stress testing. The API implemented is very close to blist's and a lot of effort has been put into documentation (http://www.grantjenks.com/docs/sortedcontainers/). Furthermore, the data types provided are often faster than their blist counterparts. Extensive performance comparisons against other implementations of sorted list, sorted dict, and sorted set types are documented ( http://www.grantjenks.com/docs/sortedcontainers/performance.html) along with a comparison of runtimes and load-factors (similar to blist, sortedcontainers uses a modified B-tree but with a tunable node size.) For SortedList, an analysis of use cases on Github has been made. These describe five use cases with performance analysis ( http://www.grantjenks.com/docs/sortedcontainers/performance-workload.html) Disclaimer: I am the author of the Python SortedContainers project. Feedback welcome. Grant Jenks On Sun, Sep 21, 2014 at 3:04 AM, Antoine Pitrou wrote: > On Sun, 21 Sep 2014 05:18:38 +0000 > David Wilson wrote: > > On Sun, Sep 21, 2014 at 02:37:39PM +1000, Steven D'Aprano wrote: > > > > > > In 2007, this PEP was created that suggested integrating Daniel > Stutzbach's > > > > blist into Python: http://legacy.python.org/dev/peps/pep-3128/ > > > > > > > > The PEP was rejected, but Raymond Hettinger made a note that "after > a few > > > > months, I intend to poll comp.lang.python for BList success stories. > If > > > > they exist, then I have no problem with inclusion in the collections > > > > module." > > > > > > I have not used blist, but I have no objection to it becoming a > > > collections type if the author agrees. > > > > It seems unsettling that we'd consider adding another special use > > collection to the stdlib, when more widely applicable generalizations of > > it are missing. In blist's case, it can (mostly) be trivially > > reimplemented using an ordered map, which the standard library lacks. > > But can it be *efficiently* reimplemented using an ordered map? > blist uses chunks of 128 pointers, which makes it almost as memory > efficient as a standard list. A traditional ordered map (I suppose you > mean some kind of balanced tree) would generally show much more > overhead, if only because it has to store the keys explicitly. And also, > you now have to do costly key comparisons every time you do a lookup > ("costly" because they go through the object layer and its > indirections, as opposed to simple integer arithmetic). > > > Coming from this perspective, I'd prefer that further additions were > > limited to clean and far better understood structures. In this light, > > could we perhaps instead discuss the merits of a collections.Tree, > > collections.SortedDict or similar? > > This sounds like a false dilemma. We could have a collections.blist > *and* a collections.Tree. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Sep 22 03:32:43 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 21 Sep 2014 18:32:43 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <20140921120402.0aa7ccb3@fsol> Message-ID: <890F1132-817E-4FEE-AD36-A8DA24F59BED@yahoo.com> On Sep 21, 2014, at 17:30, Grant Jenks wrote: > Long time lurker, first time poster. I think there may be multiple discussions happening here so I wanted to highlight a competing module. > > I think blist.blist is an excellent data type with a lot of value. But the performance graphs can be a bit misleading if you think they apply to the sortedlist, sorteddict, and sortedset types. In those scenarios, I do not believe blist is the best choice. > > The SortedContainers module (https://pypi.python.org/pypi/sortedcontainers) provides SortedList, SortedDict, and SortedSet data types. It is implemented in pure-Python, has 100% coverage and hours of stress testing. The API implemented is very close to blist's and a lot of effort has been put into documentation (http://www.grantjenks.com/docs/sortedcontainers/). Furthermore, the data types provided are often faster than their blist counterparts. Honestly, this is exactly why I think, despite what I've said in the past, maybe we don't need SortedDict, etc. in the stdlib. The API is something nontrivial, but arguably with a single right answer (and half the implementations I've seen on PyPI get either SortedSet or SortedList wrong), so that makes perfect sense to belong in the stdlib, in collections.abc. But the implementation, there are multiple right answers: B-tree, binary tree, skip list, array that sorts after every insert or before every lookup after an insert, same plus hash... Which one is "best" depends entirely on your data and how you intend to use them. The fact most other languages out there go with red-black trees, but it's hard to find a Python application where those actually perform best, seems like further argument against picking a best solution. Also, making people glance at the docs and see that they have to "pip install SortedCollections" seems like just enough hurdle to slow down the kind of people who have no idea why they're using it beyond "I saw some Java code that used a sorted map", without being a serious roadblock for anyone who actually does need it. So, unless Nick's stdlib++ idea is shot down, I think the best thing to do would be: Add the ABCs, then link to a few different libraries on PyPI--blist, SortedCollections, bintrees, the one I forget the name of that wraps up the sort-an-OrderedDict-on-the-fly recipe that the docs already link to, while encouraging the authors of those packages to implement the APIs and to provide wheels if they have any extension modules. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Sep 22 09:37:45 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 22 Sep 2014 08:37:45 +0100 Subject: [Python-ideas] Building wheels on AppVeyor (was: Putting `blist` into collections module) Message-ID: On 21 September 2014 23:18, Paul Moore wrote: > Also, I've yet to work out how to get AppVeyor to do 32-bit and 64-bit > builds. But I'm looking into it... AppVeyor has Python 2.7, 3.3 and 3.4 32-bit, and Visual Studio Express, installed on builders by default. I've put in a request to get the 64-bit versions of Python installed as well, and the details added to the "List of installed software" page (http://www.appveyor.com/docs/installed-software). They seem very responsive to such requests. Paul From stutzbach at google.com Mon Sep 22 18:49:13 2014 From: stutzbach at google.com (Daniel Stutzbach) Date: Mon, 22 Sep 2014 09:49:13 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140921043739.GD29494@ando.pearwood.info> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> Message-ID: On Sat, Sep 20, 2014 at 9:37 PM, Steven D'Aprano wrote: > I have not used blist, but I have no objection to it becoming a > collections type if the author agrees. > While I would be delighted to see blist in the collections package, I worry that it would create a burden on other implementations of Python. I did make a prototype of blist in Python before I wrote the C implementation, but the performance of the Python version was abysmal in practice. > which makes blist an optimization, if and when it is available. > If someone uses blist, it usually means they're relying on the better asymptotic performance for certain operations, and the code will be unacceptably slow when using list. -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: From stutzbach at google.com Mon Sep 22 19:07:16 2014 From: stutzbach at google.com (Daniel Stutzbach) Date: Mon, 22 Sep 2014 10:07:16 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140921115417.6ebdb5cc@fsol> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <541E988B.4010701@drees.name> <20140921115417.6ebdb5cc@fsol> Message-ID: On Sun, Sep 21, 2014 at 2:54 AM, Antoine Pitrou wrote: > By the way, one should remember the PEP was written years ago. I don't > know how much the blist type has changed >From memory, below are the major changes that were made after the PEP was written. Functionality: - Added sorteddict, sortedlist, sortedset, weaksortedlist, weaksortedset types. - Added a btuple type (works like a regular tuple but slices use copy-on-write). Performance: - Added a cache to find the leaf nodes in one step for code that's dominated by __getitem__ operations. - When sorting, switch to a O(n) radix sort if all of the keys are C integers or floats. - Lots of fine-tuning. (the author doesn't seem to provide a changelog, unfortunately), but The changelog is at: https://github.com/DanielStutzbach/blist/commits/master according to the following > benchmarks there doesn't seem to be any remaining performance drawback > compared to list: > > http://stutzbachenterprises.com/performance-blist Microbenchmarks should always be taken with a grain of salt. :-) I was hoping http://speed.python.org/ create an opportunity to make some macrobenchmark comparisons, but the project stalled. -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: From stutzbach at google.com Mon Sep 22 19:15:08 2014 From: stutzbach at google.com (Daniel Stutzbach) Date: Mon, 22 Sep 2014 10:15:08 -0700 Subject: [Python-ideas] Fwd: Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> Message-ID: On Sun, Sep 21, 2014 at 5:36 AM, Nick Coghlan wrote: > Offering to help in building them would likely be > appreciated, as many projects may not have access to systems that > allow them to build Windows binaries (alternatively, if Daniel > approached the PSF for assistance in providing Windows binaries, we > can generally provide some help in situations like that, especially to > folks that have already been accepted as CPython core developers). > Help building Windows binaries would definitely be appreciated. The only Windows machine in my life runs an entertainment center and isn't set up to build anything. Who should I contact at the PSF? -- Daniel Stutzbach -------------- next part -------------- An HTML attachment was scrubbed... URL: From grant.jenks at gmail.com Mon Sep 22 20:20:33 2014 From: grant.jenks at gmail.com (Grant Jenks) Date: Mon, 22 Sep 2014 11:20:33 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <890F1132-817E-4FEE-AD36-A8DA24F59BED@yahoo.com> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <20140921120402.0aa7ccb3@fsol> <890F1132-817E-4FEE-AD36-A8DA24F59BED@yahoo.com> Message-ID: On Sun, Sep 21, 2014 at 6:32 PM, Andrew Barnert wrote: > The API is something nontrivial, but arguably with a single right answer > (and half the implementations I've seen on PyPI get either SortedSet or > SortedList wrong), so that makes perfect sense to belong in the stdlib, in > collections.abc. +1 It would give the dozen or so implementations an API under which to unify and it would make performance comparisons easier for end-users if they could just drop-in an alternative. Also, I think when Andrew refers to SortedCollections, he means SortedContainers. Unless that was a kind of placeholder name. Grant -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Mon Sep 22 20:48:02 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 22 Sep 2014 14:48:02 -0400 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140921051838.GA863@k2> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> Message-ID: <1411411682.4169904.170444585.208F39AC@webmail.messagingengine.com> On Sun, Sep 21, 2014, at 01:18, David Wilson wrote: > Coming from this perspective, I'd prefer that further additions were > limited to clean and far better understood structures. In this light, > could we perhaps instead discuss the merits of a collections.Tree, > collections.SortedDict or similar? As I understand it, the problem with a tree, SortedDict/SortedSet, or in general any collection that relies on comparison relationships (heap, etc), is: unlike hashing (where Hashable implies an immutable hash relationship), there is no way to detect whether an object implements an immutable well-defined ordering. And that, unlike a mutable hashed object (which can AIUI only lose itself), a mutable sorted object (or a badly-behaved one like NaN) can cause other objects in the set to be inserted in the wrong place or not found. From guido at python.org Mon Sep 22 20:54:03 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Sep 2014 11:54:03 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <1411411682.4169904.170444585.208F39AC@webmail.messagingengine.com> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <1411411682.4169904.170444585.208F39AC@webmail.messagingengine.com> Message-ID: On Mon, Sep 22, 2014 at 11:48 AM, wrote: > On Sun, Sep 21, 2014, at 01:18, David Wilson wrote: > > Coming from this perspective, I'd prefer that further additions were > > limited to clean and far better understood structures. In this light, > > could we perhaps instead discuss the merits of a collections.Tree, > > collections.SortedDict or similar? > > As I understand it, the problem with a tree, SortedDict/SortedSet, or in > general any collection that relies on comparison relationships (heap, > etc), is: unlike hashing (where Hashable implies an immutable hash > relationship), there is no way to detect whether an object implements an > immutable well-defined ordering. And that, unlike a mutable hashed > object (which can AIUI only lose itself), a mutable sorted object (or a > badly-behaved one like NaN) can cause other objects in the set to be > inserted in the wrong place or not found. > A good point, too often overlooked. We could introduce a convention requiring __hash__() even though it is not used by the implementation. After all, if the object is immutable when it comes to a well-defined ordering, it is immutable when it comes to equality. I don't really want to think about classes that are immutable when it comes to == but not when it comes to <; that seems terrible. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dw+python-ideas at hmmz.org Mon Sep 22 21:19:45 2014 From: dw+python-ideas at hmmz.org (David Wilson) Date: Mon, 22 Sep 2014 19:19:45 +0000 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <1411411682.4169904.170444585.208F39AC@webmail.messagingengine.com> Message-ID: <20140922191945.GA6365@k2> On Mon, Sep 22, 2014 at 11:54:03AM -0700, Guido van Rossum wrote: > On Mon, Sep 22, 2014 at 11:48 AM, wrote: > As I understand it, the problem with a tree, SortedDict/SortedSet, or in > general any collection that relies on comparison relationships (heap, > etc), is: unlike hashing (where Hashable implies an immutable hash > relationship), there is no way to detect whether an object implements an > immutable well-defined ordering. And that, unlike a mutable hashed > object (which can AIUI only lose itself), a mutable sorted object (or a > badly-behaved one like NaN) can cause other objects in the set to be > inserted in the wrong place or not found. > A good point, too often overlooked. Another concern is structures that rely on comparison could become performance hazards as they are mixed with user subclasses that perform nontrivial work in their __lt__ methods, and suchlike. That's a potentially undesirable trait for a built-in type. Revisiting my previous concern for blist, I realize it can't be separated from any hypothetical Tree or SortedDict or whatever else -- it's not possible to avoid every new addition to the stdlib from becoming an "attractive nuisance". A SortedDict would be as open to misuse as OrderedDict is today, and there would always be calls for a blist literal, etc. My only thought would be to make the interface to such classes sufficiently weird as to ward the unwary away, i.e. not to support the dict interface. Grant's reply (accidentally?) went furthest in convincing me there is sufficient variety in the structures available that fulfill this need, that I'm probably unqualified to comment on which (if any) are appropriate for inclusion in the stdlib. David From guido at python.org Mon Sep 22 21:28:37 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Sep 2014 12:28:37 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <20140922191945.GA6365@k2> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <1411411682.4169904.170444585.208F39AC@webmail.messagingengine.com> <20140922191945.GA6365@k2> Message-ID: On Mon, Sep 22, 2014 at 12:19 PM, David Wilson wrote: > Another concern is structures that rely on comparison could become > performance hazards as they are mixed with user subclasses that perform > nontrivial work in their __lt__ methods, and suchlike. That's a > potentially undesirable trait for a built-in type. > I don't get this objection. If a user-defined subclass has slow comparisons then the container containing it becomes slow. You can't blame that on the (fast) base class nor on the container type. The same applies to ==: if I write a str subclass that overrides __eq__ with a slower version than dict lookups become slower. The solution is simple: don't do that. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Sep 23 00:20:56 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 22 Sep 2014 15:20:56 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <1411411682.4169904.170444585.208F39AC@webmail.messagingengine.com> Message-ID: On Sep 22, 2014, at 11:54, Guido van Rossum wrote: > On Mon, Sep 22, 2014 at 11:48 AM, wrote: >> On Sun, Sep 21, 2014, at 01:18, David Wilson wrote: >> > Coming from this perspective, I'd prefer that further additions were >> > limited to clean and far better understood structures. In this light, >> > could we perhaps instead discuss the merits of a collections.Tree, >> > collections.SortedDict or similar? >> >> As I understand it, the problem with a tree, SortedDict/SortedSet, or in >> general any collection that relies on comparison relationships (heap, >> etc), is: unlike hashing (where Hashable implies an immutable hash >> relationship), there is no way to detect whether an object implements an >> immutable well-defined ordering. And that, unlike a mutable hashed >> object (which can AIUI only lose itself), a mutable sorted object (or a >> badly-behaved one like NaN) can cause other objects in the set to be >> inserted in the wrong place or not found. > > A good point, too often overlooked. > > We could introduce a convention requiring __hash__() even though it is not used by the implementation. After all, if the object is immutable when it comes to a well-defined ordering, it is immutable when it comes to equality. I don't really want to think about classes that are immutable when it comes to == but not when it comes to <; that seems terrible. I went through this and other API issues in http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html In addition to the mutability issue, there's also nothing stopping you from adding elements that don't have even a total weak order. You will sometimes get an exception, but often get away with it even though you shouldn't. Most of the existing implementations seem to take a "consenting users" approach: they document that only immutable objects with a total (strong) order are supported, but don't attempt to check or enforce that, and I didn't find a single bug report or StackOverflow question or rant blog complaining about that. Obviously the bar would be raised a bit for a collection in, or even recommended by, the stdlib, but I think this might still be acceptable. (Notice that even statically typed languages like C++ and Swift only validate that the objects are less-than-comparable to the same type, not that they're comparable to all possible values of that type.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 23 00:26:40 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Sep 2014 15:26:40 -0700 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <1411411682.4169904.170444585.208F39AC@webmail.messagingengine.com> Message-ID: I think the consenting users model is fine. Again, it's not so different for __eq__ and __hash__: you can define a hash on a mutable object and get into a big mess. (However, there are limits to the mess you can get yourself into: the dict implementation shouldn't crash or read or write memory locations it shouldn't.) On Mon, Sep 22, 2014 at 3:20 PM, Andrew Barnert < abarnert at yahoo.com.dmarc.invalid> wrote: > On Sep 22, 2014, at 11:54, Guido van Rossum wrote: > > On Mon, Sep 22, 2014 at 11:48 AM, wrote: > >> On Sun, Sep 21, 2014, at 01:18, David Wilson wrote: >> > Coming from this perspective, I'd prefer that further additions were >> > limited to clean and far better understood structures. In this light, >> > could we perhaps instead discuss the merits of a collections.Tree, >> > collections.SortedDict or similar? >> >> As I understand it, the problem with a tree, SortedDict/SortedSet, or in >> general any collection that relies on comparison relationships (heap, >> etc), is: unlike hashing (where Hashable implies an immutable hash >> relationship), there is no way to detect whether an object implements an >> immutable well-defined ordering. And that, unlike a mutable hashed >> object (which can AIUI only lose itself), a mutable sorted object (or a >> badly-behaved one like NaN) can cause other objects in the set to be >> inserted in the wrong place or not found. >> > > A good point, too often overlooked. > > We could introduce a convention requiring __hash__() even though it is not > used by the implementation. After all, if the object is immutable when it > comes to a well-defined ordering, it is immutable when it comes to > equality. I don't really want to think about classes that are immutable > when it comes to == but not when it comes to <; that seems terrible. > > > I went through this and other API issues in > > http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html > > In addition to the mutability issue, there's also nothing stopping you > from adding elements that don't have even a total weak order. You will > sometimes get an exception, but often get away with it even though you > shouldn't. > > Most of the existing implementations seem to take a "consenting users" > approach: they document that only immutable objects with a total (strong) > order are supported, but don't attempt to check or enforce that, and I > didn't find a single bug report or StackOverflow question or rant blog > complaining about that. > > Obviously the bar would be raised a bit for a collection in, or even > recommended by, the stdlib, but I think this might still be acceptable. > (Notice that even statically typed languages like C++ and Swift only > validate that the objects are less-than-comparable to the same type, not > that they're comparable to all possible values of that type.) > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Tue Sep 23 02:01:41 2014 From: joshua at landau.ws (Joshua Landau) Date: Tue, 23 Sep 2014 01:01:41 +0100 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <20140921120402.0aa7ccb3@fsol> Message-ID: On 22 September 2014 01:30, Grant Jenks wrote: > I think blist.blist is an excellent data type with a lot of value. But the > performance graphs can be a bit misleading if you think they apply to the > sortedlist, sorteddict, and sortedset types. In those scenarios, I do not > believe blist is the best choice. > > The SortedContainers module (https://pypi.python.org/pypi/sortedcontainers) > provides SortedList, SortedDict, and SortedSet data types. > > Disclaimer: I am the author of the Python SortedContainers project. Feedback > welcome. How about positive feedback? I think your module is ridiculously awesome. However, despite keeping these in the back of my mind for times I might want to use them, I so rarely seem to. With the overhead of a few small Python files being so low, and the need for these collections being so little, I honestly can't recommend this for the standard library. And if I can't recommend this, I definitely can't recommend the blist alternatives I view as typically worse. I think there is merit in considering a blist.blist as the standard list type; the asymptotics are pretty awesome. We'd be part of a language that's trying something I don't think any other language has dared. But with the amount of one-directional iteration, it seems like the risks are high. I would also hesitate to do anything that would damage PyPy's claim to fame, seeing as it's faster than C++ in some benchmarks? and that is the kind of crown you want to keep. But having a second list type would mostly prevent the it from ever showing up because people just don't test with multiple list types. Hooking into speed.pypy.org and getting positive results would really invigorate the debate. ? https://gist.github.com/Veedrac/d25148faf20669589993 This is actually a somewhat real-life scenario; I was helping someone yesterday on their thesis where they were optimising some Python code which had exactly the sort of characteristics that this benchmark shows off. Getting close to the speed of C++ by just changing interpreters is a much better option than rewriting in C++ and missing all the optimisations you have to do for it to run fast! From breamoreboy at yahoo.co.uk Tue Sep 23 02:32:31 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 23 Sep 2014 01:32:31 +0100 Subject: [Python-ideas] Putting `blist` into collections module In-Reply-To: <890F1132-817E-4FEE-AD36-A8DA24F59BED@yahoo.com> References: <9ff64ccf-7f5c-47b3-a32e-338b4b6575f4@googlegroups.com> <20140921043739.GD29494@ando.pearwood.info> <20140921051838.GA863@k2> <20140921120402.0aa7ccb3@fsol> <890F1132-817E-4FEE-AD36-A8DA24F59BED@yahoo.com> Message-ID: On 22/09/2014 02:32, Andrew Barnert wrote: > On Sep 21, 2014, at 17:30, Grant Jenks > > wrote: > >> Long time lurker, first time poster. I think there may be multiple >> discussions happening here so I wanted to highlight a competing module. >> >> I think blist.blist is an excellent data type with a lot of value. But >> the performance graphs can be a bit misleading if you think they apply >> to the sortedlist, sorteddict, and sortedset types. In those >> scenarios, I do not believe blist is the best choice. >> >> The SortedContainers module >> (https://pypi.python.org/pypi/sortedcontainers) provides SortedList, >> SortedDict, and SortedSet data types. It is implemented in >> pure-Python, has 100% coverage and hours of stress testing. The API >> implemented is very close to blist's and a lot of effort has been put >> into documentation (http://www.grantjenks.com/docs/sortedcontainers/). >> Furthermore, the data types provided are often faster than their blist >> counterparts. > > Honestly, this is exactly why I think, despite what I've said in the > past, maybe we don't need SortedDict, etc. in the stdlib. > > The API is something nontrivial, but arguably with a single right answer > (and half the implementations I've seen on PyPI get either SortedSet or > SortedList wrong), so that makes perfect sense to belong in the stdlib, > in collections.abc. > > But the implementation, there are multiple right answers: B-tree, binary > tree, skip list, array that sorts after every insert or before every > lookup after an insert, same plus hash... Which one is "best" depends > entirely on your data and how you intend to use them. The fact most > other languages out there go with red-black trees, but it's hard to find > a Python application where those actually perform best, seems like > further argument against picking a best solution. > > Also, making people glance at the docs and see that they have to "pip > install SortedCollections" seems like just enough hurdle to slow down > the kind of people who have no idea why they're using it beyond "I saw > some Java code that used a sorted map", without being a serious > roadblock for anyone who actually does need it. > > So, unless Nick's stdlib++ idea is shot down, I think the best thing to > do would be: Add the ABCs, then link to a few different libraries on > PyPI--blist, SortedCollections, bintrees, the one I forget the name of > that wraps up the sort-an-OrderedDict-on-the-fly recipe that the docs > already link to, while encouraging the authors of those packages to > implement the APIs and to provide wheels if they have any extension modules. > There's all sorts of interesting things discussed here http://kmike.ru/python-data-structures/ so if you didn't know about it you do now :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From tarek at ziade.org Tue Sep 23 09:19:29 2014 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 23 Sep 2014 09:19:29 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() Message-ID: <54211F01.9030606@ziade.org> Hello I realize I am using a lot this pattern: >>> os.path.join(os.path.expanduser('~'), 'something', 'here') '/Users/tarek/something/here' It's quite complicated, and not really intuitive. What about adding in os.path a "joinuser()" equivalent function, e.g. >>> os.path.joinuser('something', here') '/Users/tarek/something/here' With an optional "user" argument when you want to specify the ~user >>> os.path.joinuser('something', here', user="foo") '/Users/bill/something/here' That would be, in my opinion, much more explicit & readable. If there's already something like that in the stdlib, forgive my ignorance! Cheers Tarek From g.rodola at gmail.com Tue Sep 23 09:52:56 2014 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Tue, 23 Sep 2014 09:52:56 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: <54211F01.9030606@ziade.org> References: <54211F01.9030606@ziade.org> Message-ID: On Tue, Sep 23, 2014 at 9:19 AM, Tarek Ziad? wrote: > Hello > > I realize I am using a lot this pattern: > > >>> os.path.join(os.path.expanduser('~'), 'something', 'here') > '/Users/tarek/something/here' > > > It's quite complicated, and not really intuitive. > I don't find that complicated, just a bit verbose perhaps, but it's definitively clear what it's doing and it is also explicit. os.path.joinuser('something', here') would probably be a bit less verbose but IMO it's not a good enough reason to introduce a new API. -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Sep 23 10:40:57 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 23 Sep 2014 18:40:57 +1000 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: <54211F01.9030606@ziade.org> References: <54211F01.9030606@ziade.org> Message-ID: On Tue, Sep 23, 2014 at 5:19 PM, Tarek Ziad? wrote: > I realize I am using a lot this pattern: > > >>> os.path.join(os.path.expanduser('~'), 'something', 'here') > '/Users/tarek/something/here' > > > It's quite complicated, and not really intuitive. Not every one-liner needs to be in the stdlib. def joinuser(*parts, user=''): return os.path.join(os.path.expanduser('~'+user), *parts) Suggestion: Build up your own utilities module (I used to call it "oddsends" - Odds & Ends - although back when I started, it was a Q-Basic file from which I would copy and paste code, rather than a Python module from which I'd import), and put this kind of convenience function in it. Then it's as simple as: from utils import joinuser at the top of your script. ChrisA From tarek at ziade.org Tue Sep 23 14:32:24 2014 From: tarek at ziade.org (=?UTF-8?B?VGFyZWsgWmlhZMOp?=) Date: Tue, 23 Sep 2014 14:32:24 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: References: <54211F01.9030606@ziade.org> Message-ID: <54216858.4000300@ziade.org> Le 23/09/14 09:52, Giampaolo Rodola' a ?crit : > > > On Tue, Sep 23, 2014 at 9:19 AM, Tarek Ziad? > wrote: > > Hello > > I realize I am using a lot this pattern: > > >>> os.path.join(os.path.expanduser('~'), 'something', 'here') > '/Users/tarek/something/here' > > > It's quite complicated, and not really intuitive. > > > I don't find that complicated, just a bit verbose perhaps, but it's > definitively clear what it's doing and it is also explicit. ~ is a Unix notion I think, and since expanduser() works under Windows, I don't think it's that intuitive and explicit. Unless we'd change it so we omit "~" => e.g. os.path.expanduser() and os.path.expanduser('specificuser') Cheers Tarek -------------- next part -------------- An HTML attachment was scrubbed... URL: From tarek at ziade.org Tue Sep 23 14:34:14 2014 From: tarek at ziade.org (=?UTF-8?B?VGFyZWsgWmlhZMOp?=) Date: Tue, 23 Sep 2014 14:34:14 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: References: <54211F01.9030606@ziade.org> Message-ID: <542168C6.70202@ziade.org> Le 23/09/14 10:40, Chris Angelico a ?crit : > On Tue, Sep 23, 2014 at 5:19 PM, Tarek Ziad? wrote: >> I realize I am using a lot this pattern: >> >> >>> os.path.join(os.path.expanduser('~'), 'something', 'here') >> '/Users/tarek/something/here' >> >> >> It's quite complicated, and not really intuitive. > Not every one-liner needs to be in the stdlib. Unless it makes a lot of sense - thus my e-mail here to see if it gets traction :-) From tarek at ziade.org Tue Sep 23 14:37:30 2014 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 23 Sep 2014 14:37:30 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: <54216858.4000300@ziade.org> References: <54211F01.9030606@ziade.org> <54216858.4000300@ziade.org> Message-ID: <5421698A.6040103@ziade.org> Le 23/09/14 14:32, Tarek Ziad? a ?crit : [...] > ~ is a Unix notion I think, and since expanduser() works under Windows, I don't think it's that intuitive and explicit. > > Unless we'd change it so we omit "~" => > > e.g. os.path.expanduser() and os.path.expanduser('specificuser') I would even say that the api name "expanduser" is redundant with the fact that you absolutely need to pass a '~' as a first char if you want it to do something at all. The more I think about it, the more I find it unintuitive From toddrjen at gmail.com Tue Sep 23 14:51:04 2014 From: toddrjen at gmail.com (Todd) Date: Tue, 23 Sep 2014 14:51:04 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: <5421698A.6040103@ziade.org> References: <54211F01.9030606@ziade.org> <54216858.4000300@ziade.org> <5421698A.6040103@ziade.org> Message-ID: On Tue, Sep 23, 2014 at 2:37 PM, Tarek Ziad? wrote: > Le 23/09/14 14:32, Tarek Ziad? a ?crit : > [...] > > ~ is a Unix notion I think, and since expanduser() works under > Windows, I don't think it's that intuitive and explicit. > > > > Unless we'd change it so we omit "~" => > > > > e.g. os.path.expanduser() and os.path.expanduser('specificuser') > I would even say that the api name "expanduser" is redundant with the > fact that you absolutely need to pass a '~' as a first char > if you want it to do something at all. > > The more I think about it, the more I find it unintuitive > > > What about something like os.path.homedir or os.homedir? You could use, for example, homedir() to get the home directory of the current user, and homedir('username') to get the home directory of a given user (if that is possible). -------------- next part -------------- An HTML attachment was scrubbed... URL: From tarek at ziade.org Tue Sep 23 14:52:45 2014 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Tue, 23 Sep 2014 14:52:45 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: References: <54211F01.9030606@ziade.org> <54216858.4000300@ziade.org> <5421698A.6040103@ziade.org> Message-ID: <54216D1D.5060207@ziade.org> Le 23/09/14 14:51, Todd a ?crit : > > What about something like os.path.homedir or os.homedir? You could > use, for example, homedir() to get the home directory of the current > user, and homedir('username') to get the home directory of a given > user (if that is possible). Sounds way better! From ncoghlan at gmail.com Tue Sep 23 15:03:36 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Sep 2014 23:03:36 +1000 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: <54216D1D.5060207@ziade.org> References: <54211F01.9030606@ziade.org> <54216858.4000300@ziade.org> <5421698A.6040103@ziade.org> <54216D1D.5060207@ziade.org> Message-ID: On 23 September 2014 22:52, Tarek Ziad? wrote: > Le 23/09/14 14:51, Todd a ?crit : >> >> What about something like os.path.homedir or os.homedir? You could >> use, for example, homedir() to get the home directory of the current >> user, and homedir('username') to get the home directory of a given >> user (if that is possible). > Sounds way better! Given the naming of site.getuserbase(), site.getusersite(), and os.expanduser(), a simple "os.userdir()" (to parallel "os.curdir()") may be appropriate. Then making a path that is relative to the user dir absolute would be: abspath = os.path.join(os.userdir(), relpath) While the inverse operation would be: relpath = os.path.relpath(abspath, start=os.userdir()) os.path.expanduser() would then be primarily about dealing with *longer* paths that have "~" embedded, rather than the degenerate case of "os.expanduser('~')". Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tarek at ziade.org Tue Sep 23 15:34:41 2014 From: tarek at ziade.org (=?UTF-8?B?VGFyZWsgWmlhZMOp?=) Date: Tue, 23 Sep 2014 15:34:41 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: References: <54211F01.9030606@ziade.org> <54216858.4000300@ziade.org> <5421698A.6040103@ziade.org> <54216D1D.5060207@ziade.org> Message-ID: <542176F1.2030108@ziade.org> Le 23/09/14 15:03, Nick Coghlan a ?crit : > On 23 September 2014 22:52, Tarek Ziad? wrote: >> Le 23/09/14 14:51, Todd a ?crit : >>> What about something like os.path.homedir or os.homedir? You could >>> use, for example, homedir() to get the home directory of the current >>> user, and homedir('username') to get the home directory of a given >>> user (if that is possible). >> Sounds way better! > Given the naming of site.getuserbase(), site.getusersite(), and > os.expanduser(), a simple "os.userdir()" (to parallel "os.curdir()") > may be appropriate. > > Then making a path that is relative to the user dir absolute would be: > > abspath = os.path.join(os.userdir(), relpath) > > While the inverse operation would be: > > relpath = os.path.relpath(abspath, start=os.userdir()) > > os.path.expanduser() would then be primarily about dealing with > *longer* paths that have "~" embedded, rather than the degenerate case > of "os.expanduser('~')". > > Regards, > Nick. > This is getting better and better :-) From brett at python.org Tue Sep 23 17:18:40 2014 From: brett at python.org (Brett Cannon) Date: Tue, 23 Sep 2014 15:18:40 +0000 Subject: [Python-ideas] proposal: os.path.joinuser() References: <54211F01.9030606@ziade.org> Message-ID: On Tue Sep 23 2014 at 3:20:09 AM Tarek Ziad? wrote: > Hello > > I realize I am using a lot this pattern: > > >>> os.path.join(os.path.expanduser('~'), 'something', 'here') > '/Users/tarek/something/here' > > > It's quite complicated, and not really intuitive. > > What about adding in os.path a "joinuser()" equivalent function, e.g. > > >>> os.path.joinuser('something', here') > '/Users/tarek/something/here' > > > With an optional "user" argument when you want to specify the ~user > > >>> os.path.joinuser('something', here', user="foo") > '/Users/bill/something/here' > > That would be, in my opinion, much more explicit & readable. > > If there's already something like that in the stdlib, forgive my ignorance! > You probably want to propose a change off of pathlib instead of os.path since that's the future of high-level path manipulation in the stdlib. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Sep 23 17:50:09 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Sep 2014 17:50:09 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() References: <54211F01.9030606@ziade.org> Message-ID: <20140923175009.10ff9c43@fsol> On Tue, 23 Sep 2014 15:18:40 +0000 Brett Cannon wrote: > > You probably want to propose a change off of pathlib instead of os.path > since that's the future of high-level path manipulation in the stdlib. Speaking of which: http://bugs.python.org/issue19776 http://bugs.python.org/issue19777 Regards Antoine. From tarek at ziade.org Wed Sep 24 10:08:45 2014 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 24 Sep 2014 10:08:45 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() In-Reply-To: <20140923175009.10ff9c43@fsol> References: <54211F01.9030606@ziade.org> <20140923175009.10ff9c43@fsol> Message-ID: <54227C0D.5050101@ziade.org> Le 23/09/14 17:50, Antoine Pitrou a ?crit : > On Tue, 23 Sep 2014 15:18:40 +0000 > Brett Cannon wrote: >> You probably want to propose a change off of pathlib instead of os.path >> since that's the future of high-level path manipulation in the stdlib. > Speaking of which: > http://bugs.python.org/issue19776 > http://bugs.python.org/issue19777 oh ok, so I guess http://bugs.python.org/issue19777 is it. - beside the name, and location Is there anything I can do to help moving this forward ? > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From solipsis at pitrou.net Wed Sep 24 14:35:47 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Sep 2014 14:35:47 +0200 Subject: [Python-ideas] proposal: os.path.joinuser() References: <54211F01.9030606@ziade.org> <20140923175009.10ff9c43@fsol> <54227C0D.5050101@ziade.org> Message-ID: <20140924143547.7bc71847@fsol> On Wed, 24 Sep 2014 10:08:45 +0200 Tarek Ziad? wrote: > Le 23/09/14 17:50, Antoine Pitrou a ?crit : > > On Tue, 23 Sep 2014 15:18:40 +0000 > > Brett Cannon wrote: > >> You probably want to propose a change off of pathlib instead of os.path > >> since that's the future of high-level path manipulation in the stdlib. > > Speaking of which: > > http://bugs.python.org/issue19776 > > http://bugs.python.org/issue19777 > oh ok, so I guess http://bugs.python.org/issue19777 is it. - beside the > name, and location > > Is there anything I can do to help moving this forward ? You can write a patch! cheers Antoine. From t_glaessle at gmx.de Wed Sep 24 19:57:36 2014 From: t_glaessle at gmx.de (=?UTF-8?B?VGhvbWFzIEdsw6TDn2xl?=) Date: Wed, 24 Sep 2014 19:57:36 +0200 Subject: [Python-ideas] Implicit submodule imports Message-ID: <54230610.7060305@gmx.de> Hey folks, What do you think about making it easier to use packages by automatically importing submodules on attribute access. Consider this example: >>> import matplotlib >>> figure = matplotlib.figure.Figure() AttributeError: 'module' object has no attribute 'figure' For the newcomer (like me some months ago) it's not obvious that the solution is to import matplotlib.figure. Worse even: it may sometimes/later on work, if the submodule has been imported from another place. How, I'd like it to behave instead (in pseudo code, since `package` is not a python class right now): class package: def __getattr__(self, name): try: return self.__dict__[name] except KeyError: # either try to import `name` or raise a nicer error message The automatic import feature could also play nicely when porting a package with submodules to or from a simple module with namespaces (as suggested in [1]), making this transition seemless to any user. I'm not sure about potential problems from auto-importing. I currently see the following issues: - harmless looking attribute access can lead to significant code execution including side effects. On the other hand, that could always be the case. - you can't use attribute access anymore to test whether a submodule is imported (must use sys.modules instead, I guess) In principle one can already make this feature happen today, by replacing the object in sys.modules - which is kind of ugly and has probably more flaws. This would also be made easier if there were a module.__getattr__ ([2]) or "metaclass" like feature for modules (which would be just a class then, I guess). Sorry, if this has come up before and I missed it. Anyhow, just interested if anyone else considers this a nice feature. Best regards, Thomas [1] https://mail.python.org/pipermail/python-ideas/2014-September/029341.html [2] https://mail.python.org/pipermail/python-ideas/2012-April/014957.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: OpenPGP digital signature URL: From mal at egenix.com Wed Sep 24 20:10:27 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Sep 2014 20:10:27 +0200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <54230610.7060305@gmx.de> References: <54230610.7060305@gmx.de> Message-ID: <54230913.4060401@egenix.com> On 24.09.2014 19:57, Thomas Gl??le wrote: > Hey folks, > > What do you think about making it easier to use packages by > automatically importing submodules on attribute access. > > Consider this example: > > >>> import matplotlib > >>> figure = matplotlib.figure.Figure() > AttributeError: 'module' object has no attribute 'figure' > > For the newcomer (like me some months ago) it's not obvious that the > solution is to import matplotlib.figure. > > Worse even: it may sometimes/later on work, if the submodule has been > imported from another place. > > How, I'd like it to behave instead (in pseudo code, since `package` is > not a python class right now): > > class package: > > def __getattr__(self, name): > try: > return self.__dict__[name] > except KeyError: > # either try to import `name` or raise a nicer error message > > The automatic import feature could also play nicely when porting a > package with submodules to or from a simple module with namespaces (as > suggested in [1]), making this transition seemless to any user. > > I'm not sure about potential problems from auto-importing. I currently > see the following issues: > > - harmless looking attribute access can lead to significant code > execution including side effects. On the other hand, that could always > be the case. > > - you can't use attribute access anymore to test whether a submodule is > imported (must use sys.modules instead, I guess) > > > In principle one can already make this feature happen today, by > replacing the object in sys.modules - which is kind of ugly and has > probably more flaws. This would also be made easier if there were a > module.__getattr__ ([2]) or "metaclass" like feature for modules (which > would be just a class then, I guess). > > Sorry, if this has come up before and I missed it. Anyhow, just > interested if anyone else considers this a nice feature. Agreed, it's a nice feature :-) I've been using this in our mx packages since 1999 using a module called LazyModule.py. See e.g. http://educommons.com/dev/browser/3.2/installers/windows/src/eduCommons/python/Lib/site-packages/mx/URL/LazyModule.py Regarding making module more class like: we've played with this a bit at PyCon UK and it's really easy to turn a module into a regular class (with all its features) by tweaking sys.modules - we even got .__getattr__() to work. With some more effort, we could have a main() function automatically called upon direct import from the command line. The whole thing is a huge hack, though, so I'll leave out the details :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 24 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-09-27: PyDDF Sprint 2014 ... 3 days to go 2014-09-30: Python Meeting Duesseldorf ... 6 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From abarnert at yahoo.com Wed Sep 24 20:14:06 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 24 Sep 2014 11:14:06 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <54230610.7060305@gmx.de> References: <54230610.7060305@gmx.de> Message-ID: On Sep 24, 2014, at 10:57, Thomas Gl??le wrote: > Hey folks, > > What do you think about making it easier to use packages by > automatically importing submodules on attribute access. Doesn't IPython already have this feature as an option? I know that not everyone who uses scipy and matplotlib uses IPython, and they aren't the only two packages used by novices that have sub modules they don't automatically import for you, but... I'm guessing the percentages are high. Of course this support could also be added to scipy and matplotlib itself. And maybe importlib could have a function that makes automatic lazy loading of submodules on demand a one-liner for packages that want to support it. > > Consider this example: > >>>> import matplotlib >>>> figure = matplotlib.figure.Figure() > AttributeError: 'module' object has no attribute 'figure' > > For the newcomer (like me some months ago) it's not obvious that the > solution is to import matplotlib.figure. > > Worse even: it may sometimes/later on work, if the submodule has been > imported from another place. > > How, I'd like it to behave instead (in pseudo code, since `package` is > not a python class right now): > > class package: > > def __getattr__(self, name): > try: > return self.__dict__[name] > except KeyError: > # either try to import `name` or raise a nicer error message > > The automatic import feature could also play nicely when porting a > package with submodules to or from a simple module with namespaces (as > suggested in [1]), making this transition seemless to any user. > > I'm not sure about potential problems from auto-importing. I currently > see the following issues: > > - harmless looking attribute access can lead to significant code > execution including side effects. On the other hand, that could always > be the case. > > - you can't use attribute access anymore to test whether a submodule is > imported (must use sys.modules instead, I guess) > > > In principle one can already make this feature happen today, by > replacing the object in sys.modules - which is kind of ugly and has > probably more flaws. This would also be made easier if there were a > module.__getattr__ ([2]) or "metaclass" like feature for modules (which > would be just a class then, I guess). > > Sorry, if this has come up before and I missed it. Anyhow, just > interested if anyone else considers this a nice feature. > > Best regards, > Thomas > > > > [1] > https://mail.python.org/pipermail/python-ideas/2014-September/029341.html > [2] https://mail.python.org/pipermail/python-ideas/2012-April/014957.html > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Wed Sep 24 20:22:59 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 24 Sep 2014 11:22:59 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <54230913.4060401@egenix.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> Message-ID: <70843F4E-FC41-4F5D-B20D-C4F7AEE1E37E@yahoo.com> On Sep 24, 2014, at 11:10, "M.-A. Lemburg" wrote: > On 24.09.2014 19:57, Thomas Gl??le wrote: >> Hey folks, >> >> What do you think about making it easier to use packages by >> automatically importing submodules on attribute access. >> >> Consider this example: >> >>>>> import matplotlib >>>>> figure = matplotlib.figure.Figure() >> AttributeError: 'module' object has no attribute 'figure' >> >> For the newcomer (like me some months ago) it's not obvious that the >> solution is to import matplotlib.figure. >> >> Worse even: it may sometimes/later on work, if the submodule has been >> imported from another place. >> >> How, I'd like it to behave instead (in pseudo code, since `package` is >> not a python class right now): >> >> class package: >> >> def __getattr__(self, name): >> try: >> return self.__dict__[name] >> except KeyError: >> # either try to import `name` or raise a nicer error message >> >> The automatic import feature could also play nicely when porting a >> package with submodules to or from a simple module with namespaces (as >> suggested in [1]), making this transition seemless to any user. >> >> I'm not sure about potential problems from auto-importing. I currently >> see the following issues: >> >> - harmless looking attribute access can lead to significant code >> execution including side effects. On the other hand, that could always >> be the case. >> >> - you can't use attribute access anymore to test whether a submodule is >> imported (must use sys.modules instead, I guess) >> >> >> In principle one can already make this feature happen today, by >> replacing the object in sys.modules - which is kind of ugly and has >> probably more flaws. This would also be made easier if there were a >> module.__getattr__ ([2]) or "metaclass" like feature for modules (which >> would be just a class then, I guess). >> >> Sorry, if this has come up before and I missed it. Anyhow, just >> interested if anyone else considers this a nice feature. > > Agreed, it's a nice feature :-) > > I've been using this in our mx packages since 1999 using a module > called LazyModule.py. See e.g. > http://educommons.com/dev/browser/3.2/installers/windows/src/eduCommons/python/Lib/site-packages/mx/URL/LazyModule.py Could LazyModule be easily added to the stdlib, or split out into a separate PyPI package? It seems to me that would be a pretty good solution. Today, a package has to eagerly preload modules, make the users do it manually, or write a few dozen lines of code to lazily load modules on demand, so it's not surprising that many of them don't use the third option even when it would be best for their users. If that could be one or two lines instead, I'm guessing a lot more packages would do so. From stefano.borini at ferrara.linux.it Wed Sep 24 21:21:21 2014 From: stefano.borini at ferrara.linux.it (Stefano Borini) Date: Wed, 24 Sep 2014 21:21:21 +0200 Subject: [Python-ideas] including psutil in the standard library? Message-ID: <542319B1.2090606@ferrara.linux.it> I am wondering if it would be possible to include psutil (https://pypi.python.org/pypi/psutil ) in the standard library, and if not, what would be needed. I am not a developer of it, but I am using psutil at work with good success. it provides a good deal of services for querying and managing processes in a cross platform way. Any thoughts? From rymg19 at gmail.com Wed Sep 24 22:10:17 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 24 Sep 2014 15:10:17 -0500 Subject: [Python-ideas] including psutil in the standard library? In-Reply-To: <542319B1.2090606@ferrara.linux.it> References: <542319B1.2090606@ferrara.linux.it> Message-ID: I would honestly prefer it to be merged into an already in-place module (maybe platform?) over it being added separately. On Wed, Sep 24, 2014 at 2:21 PM, Stefano Borini < stefano.borini at ferrara.linux.it> wrote: > I am wondering if it would be possible to include psutil ( > https://pypi.python.org/pypi/psutil ) in the standard library, and if > not, what would be needed. > > I am not a developer of it, but I am using psutil at work with good > success. it provides a good deal of services for querying and managing > processes in a cross platform way. > > Any thoughts? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wielicki at sotecware.net Wed Sep 24 22:13:45 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Wed, 24 Sep 2014 22:13:45 +0200 Subject: [Python-ideas] including psutil in the standard library? In-Reply-To: References: <542319B1.2090606@ferrara.linux.it> Message-ID: <542325F9.20805@sotecware.net> On 24.09.2014 22:10, Ryan Gonzalez wrote: > I would honestly prefer it to be merged into an already in-place module > (maybe platform?) over it being added separately. That would unneccessarily break code. Also, os would be a better fit, if any. platform contains rather static informational data. regards, jwi > > On Wed, Sep 24, 2014 at 2:21 PM, Stefano Borini < > stefano.borini at ferrara.linux.it> wrote: > >> I am wondering if it would be possible to include psutil ( >> https://pypi.python.org/pypi/psutil ) in the standard library, and if >> not, what would be needed. >> >> I am not a developer of it, but I am using psutil at work with good >> success. it provides a good deal of services for querying and managing >> processes in a cross platform way. >> >> Any thoughts? >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From barry at python.org Wed Sep 24 22:24:47 2014 From: barry at python.org (Barry Warsaw) Date: Wed, 24 Sep 2014 16:24:47 -0400 Subject: [Python-ideas] including psutil in the standard library? References: <542319B1.2090606@ferrara.linux.it> Message-ID: <20140924162447.712c0846@anarchist.wooz.org> On Sep 24, 2014, at 03:10 PM, Ryan Gonzalez wrote: >I would honestly prefer it to be merged into an already in-place module >(maybe platform?) over it being added separately. Why platform? That seems out of place. If it were to be adopted into the stdlib, I think a top-level module name would be fine. psutil is great, but has anybody asked upstream if they even want to be part of the stdlib? Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From mal at egenix.com Wed Sep 24 22:30:41 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Sep 2014 22:30:41 +0200 Subject: [Python-ideas] including psutil in the standard library? In-Reply-To: <20140924162447.712c0846@anarchist.wooz.org> References: <542319B1.2090606@ferrara.linux.it> <20140924162447.712c0846@anarchist.wooz.org> Message-ID: <542329F1.1090501@egenix.com> On 24.09.2014 22:24, Barry Warsaw wrote: > On Sep 24, 2014, at 03:10 PM, Ryan Gonzalez wrote: > >> I would honestly prefer it to be merged into an already in-place module >> (maybe platform?) over it being added separately. > > Why platform? That seems out of place. Agreed. The typical approach is to put a renamed version into the stdlib, so that there's a possibility to continue using the package from PyPI. > If it were to be adopted into the stdlib, I think a top-level module name > would be fine. psutil is great, but has anybody asked upstream if they even > want to be part of the stdlib? That'll have to happen first, indeed. The next question to answer is: Who will maintain the code in the stdlib ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 24 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-09-27: PyDDF Sprint 2014 ... 3 days to go 2014-09-30: Python Meeting Duesseldorf ... 6 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Wed Sep 24 22:38:05 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Sep 2014 22:38:05 +0200 Subject: [Python-ideas] including psutil in the standard library? References: <542319B1.2090606@ferrara.linux.it> <20140924162447.712c0846@anarchist.wooz.org> <542329F1.1090501@egenix.com> Message-ID: <20140924223805.41e0af95@fsol> On Wed, 24 Sep 2014 22:30:41 +0200 "M.-A. Lemburg" wrote: > On 24.09.2014 22:24, Barry Warsaw wrote: > > On Sep 24, 2014, at 03:10 PM, Ryan Gonzalez wrote: > > > >> I would honestly prefer it to be merged into an already in-place module > >> (maybe platform?) over it being added separately. > > > > Why platform? That seems out of place. > > Agreed. The typical approach is to put a renamed version into > the stdlib, so that there's a possibility to continue using the > package from PyPI. > > > If it were to be adopted into the stdlib, I think a top-level module name > > would be fine. psutil is great, but has anybody asked upstream if they even > > want to be part of the stdlib? > > That'll have to happen first, indeed. The next question to answer > is: Who will maintain the code in the stdlib ? "Upstream" is Giampaolo Rodola, who is a core developer. Regards Antoine. From mal at egenix.com Wed Sep 24 22:44:45 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Sep 2014 22:44:45 +0200 Subject: [Python-ideas] including psutil in the standard library? In-Reply-To: <20140924223805.41e0af95@fsol> References: <542319B1.2090606@ferrara.linux.it> <20140924162447.712c0846@anarchist.wooz.org> <542329F1.1090501@egenix.com> <20140924223805.41e0af95@fsol> Message-ID: <54232D3D.6040001@egenix.com> On 24.09.2014 22:38, Antoine Pitrou wrote: > On Wed, 24 Sep 2014 22:30:41 +0200 > "M.-A. Lemburg" wrote: >> On 24.09.2014 22:24, Barry Warsaw wrote: >>> On Sep 24, 2014, at 03:10 PM, Ryan Gonzalez wrote: >>> >>>> I would honestly prefer it to be merged into an already in-place module >>>> (maybe platform?) over it being added separately. >>> >>> Why platform? That seems out of place. >> >> Agreed. The typical approach is to put a renamed version into >> the stdlib, so that there's a possibility to continue using the >> package from PyPI. >> >>> If it were to be adopted into the stdlib, I think a top-level module name >>> would be fine. psutil is great, but has anybody asked upstream if they even >>> want to be part of the stdlib? >> >> That'll have to happen first, indeed. The next question to answer >> is: Who will maintain the code in the stdlib ? > > "Upstream" is Giampaolo Rodola, who is a core developer. Cool. That makes the second question a lot easier to answer ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 24 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-09-27: PyDDF Sprint 2014 ... 3 days to go 2014-09-30: Python Meeting Duesseldorf ... 6 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From stefano.borini at ferrara.linux.it Wed Sep 24 23:18:37 2014 From: stefano.borini at ferrara.linux.it (Stefano Borini) Date: Wed, 24 Sep 2014 23:18:37 +0200 Subject: [Python-ideas] including psutil in the standard library? In-Reply-To: <20140924223805.41e0af95@fsol> References: <542319B1.2090606@ferrara.linux.it> <20140924162447.712c0846@anarchist.wooz.org> <542329F1.1090501@egenix.com> <20140924223805.41e0af95@fsol> Message-ID: <5423352D.6020803@ferrara.linux.it> On 9/24/14 10:38 PM, Antoine Pitrou wrote: > "Upstream" is Giampaolo Rodola, who is a core developer. > I already sent him an email about this. I am waiting for his answer. I would enjoy handling the library as a maintainer in the stdlib, but I have never done anything at this level, so I am not sure I am worthy/able to do it ;) From cathalgarvey at cathalgarvey.me Thu Sep 25 00:09:09 2014 From: cathalgarvey at cathalgarvey.me (Cathal Garvey) Date: Wed, 24 Sep 2014 23:09:09 +0100 Subject: [Python-ideas] "continue with" for dynamic iterable injection Message-ID: <54234105.8070806@cathalgarvey.me> Hello all, First time post, so go gentle. :) During a twitter conversation with @simeonvisser, an idea emerged that he suggested I share here. The conversation began with me complaining that I'd like a third mode of explicit flow control in Python for-loops; the ability to repeat a loop iteration in whole. The reason for this was that I was parsing data where a datapoint indicated the end of a preceding sub-group, so the datapoint was both a data structure indicator *and* data in its own right. So I'd like to have iterated over the line once, branching into the flow control management part of an if/else, and then again to branch into the data management part of the same if/else. Yes, this is unnecessary and just a convenience for parsing that I'd like to see. During the discussion though, a much more versatile solution presented itself: repurposing the `continue` keyword to `continue with`, similarly to the repurposing of `yield` with `yield from`. This would avoid keyword bloat, but it would be explicit and intuitive to use, and would allow a pretty interesting extension of iteration with the for-loop. The idea is that `continue with` would allow injection of any datatype or variable, which becomes the next iterated item. So, saying `continue with 5` means that the next *x* in a loop structured as `for x in..` would be 5. This would satisfy my original, niche-y desire to re-iterate over something, but would also more broadly allow dynamic injection of *any* value into a for-loop iteration. As an extension to the language, it would present new and exciting ways to iterate infinitely by accident. It would also present new and exciting ways for people to trip up on mutable data; one way to misuse this is to iterate over data, modifying the data, then iterating over it again. The intention, rather, is that the repeated iteration allows re-iteration over unaltered data in response to a mid-iteration change in state (i.e., in my case, parsing a token, changing parser state, and then parsing the token again because it also carries informational content). However, bad data hygiene has always been part of Python, because it's a natural consequence of the pan-mutability that makes it such a great language. So, given that the same practices that an experienced Python dev learns to use would apply in this case, I don't think it adds anything that an experienced dev would trip over. As a language extension, I've never seen something like "continue with" in another language. You can mimic it with recursive functions and `cons`-ing the injection, but as a flow-control system for an imperative language, nope..and Python doesn't encourage or facilitate recursion anyway (no TCE). I'd love thoughts on this. It's unnecessary; but then, once something's declared turing complete you can accuse any additional feature of being unnecessary. I feel, more to the point, that it's in keeping with the Python philosophy and would add a novel twist to the language not seen elsewhere. best, Cathal -- Twitter: @onetruecathal, @formabiolabs Phone: +353876363185 Blog: http://indiebiotech.com miniLock.io: JjmYYngs7akLZUjkvFkuYdsZ3PyPHSZRBKNm6qTYKZfAM -------------- next part -------------- A non-text attachment was scrubbed... Name: 0x988B9099.asc Type: application/pgp-keys Size: 6176 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From erik.m.bray at gmail.com Thu Sep 25 02:01:46 2014 From: erik.m.bray at gmail.com (Erik Bray) Date: Wed, 24 Sep 2014 20:01:46 -0400 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> Message-ID: On Wed, Sep 24, 2014 at 2:14 PM, Andrew Barnert wrote: > On Sep 24, 2014, at 10:57, Thomas Gl??le wrote: > >> Hey folks, >> >> What do you think about making it easier to use packages by >> automatically importing submodules on attribute access. > > Doesn't IPython already have this feature as an option? I don't *think* IPython has exactly this feature. Rather, when typing an import statement it will check to see if there are any submodules and add them to the autocomplete list. So I don't think typing >>> import foo will automatically mean a submodule foo.bar is imported, even though it shows up on the autocomplete list when typing the import statement. I could be wrong about this though. In any case unless it were a feature built into Python I think this has potential to be highly confusing to newcomers. They might type >>> import scipy and start using scipy.stats in their code. But then when they dump this code to a script they won't have `import scipy.stats` in script, just `import scipy`. Then, suddenly, when they write their script they'll get `AttributeError: stats` and then come complaining to some mailinglist or StackOverflow that something broke their scipy installation ;) > I know that not everyone who uses scipy and matplotlib uses IPython, and they aren't the only two packages used by novices that have sub modules they don't automatically import for you, but... I'm guessing the percentages are high. > > Of course this support could also be added to scipy and matplotlib itself. > > And maybe importlib could have a function that makes automatic lazy loading of submodules on demand a one-liner for packages that want to support it. This definitely has some appeal though, and shouldn't be outside the realm of possibility. I especially like the suggestion of making it optional. Erik From mertz at gnosis.cx Thu Sep 25 02:02:53 2014 From: mertz at gnosis.cx (David Mertz) Date: Wed, 24 Sep 2014 17:02:53 -0700 Subject: [Python-ideas] "continue with" for dynamic iterable injection In-Reply-To: <54234105.8070806@cathalgarvey.me> References: <54234105.8070806@cathalgarvey.me> Message-ID: Can't we achieve the same effect with code like the following. Maybe slightly less elegant, but only slightly and hence not worth adding to the language. class Continuation(object): def __init__(self, val): self.val = val # Other stuff, including creating the iteratable x = None while True: if isinstance(x, Continuation): x = x.val else: try: x = it.next() except StopIteration: break if something(x): do_things(x) else: x = Continuation(special_value) continue On Wed, Sep 24, 2014 at 3:09 PM, Cathal Garvey wrote: > Hello all, > First time post, so go gentle. :) > > During a twitter conversation with @simeonvisser, an idea emerged that > he suggested I share here. > > The conversation began with me complaining that I'd like a third mode of > explicit flow control in Python for-loops; the ability to repeat a loop > iteration in whole. The reason for this was that I was parsing data > where a datapoint indicated the end of a preceding sub-group, so the > datapoint was both a data structure indicator *and* data in its own > right. So I'd like to have iterated over the line once, branching into > the flow control management part of an if/else, and then again to branch > into the data management part of the same if/else. > > Yes, this is unnecessary and just a convenience for parsing that I'd > like to see. > > During the discussion though, a much more versatile solution presented > itself: repurposing the `continue` keyword to `continue with`, similarly > to the repurposing of `yield` with `yield from`. This would avoid > keyword bloat, but it would be explicit and intuitive to use, and would > allow a pretty interesting extension of iteration with the for-loop. > > The idea is that `continue with` would allow injection of any datatype > or variable, which becomes the next iterated item. So, saying `continue > with 5` means that the next *x* in a loop structured as `for x in..` > would be 5. > > This would satisfy my original, niche-y desire to re-iterate over > something, but would also more broadly allow dynamic injection of *any* > value into a for-loop iteration. > > As an extension to the language, it would present new and exciting ways > to iterate infinitely by accident. It would also present new and > exciting ways for people to trip up on mutable data; one way to misuse > this is to iterate over data, modifying the data, then iterating over it > again. The intention, rather, is that the repeated iteration allows > re-iteration over unaltered data in response to a mid-iteration change > in state (i.e., in my case, parsing a token, changing parser state, and > then parsing the token again because it also carries informational > content). > > However, bad data hygiene has always been part of Python, because it's a > natural consequence of the pan-mutability that makes it such a great > language. So, given that the same practices that an experienced Python > dev learns to use would apply in this case, I don't think it adds > anything that an experienced dev would trip over. > > As a language extension, I've never seen something like "continue with" > in another language. You can mimic it with recursive functions and > `cons`-ing the injection, but as a flow-control system for an imperative > language, nope..and Python doesn't encourage or facilitate recursion > anyway (no TCE). > > I'd love thoughts on this. It's unnecessary; but then, once something's > declared turing complete you can accuse any additional feature of being > unnecessary. I feel, more to the point, that it's in keeping with the > Python philosophy and would add a novel twist to the language not seen > elsewhere. > > best, > Cathal > > -- > Twitter: @onetruecathal, @formabiolabs > Phone: +353876363185 > Blog: http://indiebiotech.com > miniLock.io: JjmYYngs7akLZUjkvFkuYdsZ3PyPHSZRBKNm6qTYKZfAM > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Thu Sep 25 02:03:07 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Thu, 25 Sep 2014 10:03:07 +1000 Subject: [Python-ideas] "continue with" for dynamic iterable injection In-Reply-To: <54234105.8070806@cathalgarvey.me> References: <54234105.8070806@cathalgarvey.me> Message-ID: On 25 September 2014 08:09, Cathal Garvey wrote: > > The idea is that `continue with` would allow injection of any datatype > or variable, which becomes the next iterated item. So, saying `continue > with 5` means that the next *x* in a loop structured as `for x in..` > would be 5. > You can effectively do this just with generators. def process(iterable): for e in iterable: yield e yield -e for e in process([1, 2, 3]): print(e) If you need to get more complex, investigate the send() method of generators. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Sep 25 03:06:39 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 24 Sep 2014 18:06:39 -0700 Subject: [Python-ideas] "continue with" for dynamic iterable injection In-Reply-To: <54234105.8070806@cathalgarvey.me> References: <54234105.8070806@cathalgarvey.me> Message-ID: <983AC20E-B143-4347-BBB3-1D6A498E1130@yahoo.com> On Sep 24, 2014, at 15:09, Cathal Garvey wrote: > Hello all, > First time post, so go gentle. :) > > During a twitter conversation with @simeonvisser, an idea emerged that > he suggested I share here. > > The conversation began with me complaining that I'd like a third mode of > explicit flow control in Python for-loops; the ability to repeat a loop > iteration in whole. The reason for this was that I was parsing data > where a datapoint indicated the end of a preceding sub-group, so the > datapoint was both a data structure indicator *and* data in its own > right. So I'd like to have iterated over the line once, branching into > the flow control management part of an if/else, and then again to branch > into the data management part of the same if/else. > > Yes, this is unnecessary and just a convenience for parsing that I'd > like to see. It would really help to have specific use cases, so we can look at how much the syntactic sugar helps readability vs. what we can write today. Otherwise all anyone can say is, "Well, it sounds like it might be nice, but I can't tell if it would be nice enough to be worth a language change", or try to invent their own use cases that might not be as nice as yours and then unfairly dismiss it as unnecessary. > > During the discussion though, a much more versatile solution presented > itself: repurposing the `continue` keyword to `continue with`, similarly > to the repurposing of `yield` with `yield from`. This would avoid > keyword bloat, but it would be explicit and intuitive to use, and would > allow a pretty interesting extension of iteration with the for-loop. > > The idea is that `continue with` would allow injection of any datatype > or variable, which becomes the next iterated item. So, saying `continue > with 5` means that the next *x* in a loop structured as `for x in..` > would be 5. > > This would satisfy my original, niche-y desire to re-iterate over > something, but would also more broadly allow dynamic injection of *any* > value into a for-loop iteration. > > As an extension to the language, it would present new and exciting ways > to iterate infinitely by accident. It would also present new and > exciting ways for people to trip up on mutable data; one way to misuse > this is to iterate over data, modifying the data, then iterating over it > again. The intention, rather, is that the repeated iteration allows > re-iteration over unaltered data in response to a mid-iteration change > in state (i.e., in my case, parsing a token, changing parser state, and > then parsing the token again because it also carries informational content). > > However, bad data hygiene has always been part of Python, because it's a > natural consequence of the pan-mutability that makes it such a great > language. So, given that the same practices that an experienced Python > dev learns to use would apply in this case, I don't think it adds > anything that an experienced dev would trip over. > > As a language extension, I've never seen something like "continue with" > in another language. You can mimic it with recursive functions and > `cons`-ing the injection, but as a flow-control system for an imperative > language, nope..and Python doesn't encourage or facilitate recursion > anyway (no TCE). > > I'd love thoughts on this. It's unnecessary; but then, once something's > declared turing complete you can accuse any additional feature of being > unnecessary. I feel, more to the point, that it's in keeping with the > Python philosophy and would add a novel twist to the language not seen > elsewhere. > > best, > Cathal > > -- > Twitter: @onetruecathal, @formabiolabs > Phone: +353876363185 > Blog: http://indiebiotech.com > miniLock.io: JjmYYngs7akLZUjkvFkuYdsZ3PyPHSZRBKNm6qTYKZfAM > <0x988B9099.asc> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From cs at zip.com.au Thu Sep 25 03:13:40 2014 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 25 Sep 2014 11:13:40 +1000 Subject: [Python-ideas] "continue with" for dynamic iterable injection In-Reply-To: <54234105.8070806@cathalgarvey.me> References: <54234105.8070806@cathalgarvey.me> Message-ID: <20140925011340.GA97480@cskk.homeip.net> On 24Sep2014 23:09, Cathal Garvey wrote: >The conversation began with me complaining that I'd like a third mode of >explicit flow control in Python for-loops; the ability to repeat a loop >iteration in whole. The reason for this was that I was parsing data >where a datapoint indicated the end of a preceding sub-group, so the >datapoint was both a data structure indicator *and* data in its own >right. So I'd like to have iterated over the line once, branching into >the flow control management part of an if/else, and then again to branch >into the data management part of the same if/else. Sounds a bit like Perl's "redo" statement. Example (pretending Python syntax): for x in iterable_thing: if special_circumstance_here: x = 9 redo Perl lets you redo to a special label, too (good practice actually - all sorts of horrible subtle bugs can come in with bare redos). In Python you'd be better off writing a special iterator with "push back". You used to find this kind of thing on I/O streams and parsers. Example use: I2 = PushableIter(iterable_thing) for x in I2: if special_circumstance_here: I2.push_back(9) else: rest of loop ... Then "PushableIter" is a special iterator that keeps a "pushed back" variable or stack. Simple untested example: class PushableIter: def __init__(self, iterable): self.pushed = [] self.iterator = iter(iterable) def __next__(self): if self.pushed: return self.pushed.pop() return next(self.iterator) def push_back(self, value): self.pushed.append(value) Should slot into the above example directly. Keep it around in your convenience library. This doesn't require modifying the language to insert is special purpose and error prone contruct. Instead, you're just making a spceial iterator. Which also leaves you free to make _other_ special iterators for other needs. Cheers, Cameron Simpson [...] post-block actions should be allowed everywhere, not just on subroutines. The ALWAYS keyword was agreed upon as a good way of doing this, although POST was also suggested. This lead to the semi-inevitable rehash of the try- catch exception handling debate. According to John Porter, "There is no try, there is only do. :-)" - from the perl6 development discussion From njs at pobox.com Thu Sep 25 03:50:08 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 25 Sep 2014 02:50:08 +0100 Subject: [Python-ideas] "continue with" for dynamic iterable injection In-Reply-To: <983AC20E-B143-4347-BBB3-1D6A498E1130@yahoo.com> References: <54234105.8070806@cathalgarvey.me> <983AC20E-B143-4347-BBB3-1D6A498E1130@yahoo.com> Message-ID: On 25 Sep 2014 02:09, "Andrew Barnert" wrote: > > On Sep 24, 2014, at 15:09, Cathal Garvey wrote: > > > > > The conversation began with me complaining that I'd like a third mode of > > explicit flow control in Python for-loops; the ability to repeat a loop > > iteration in whole. The reason for this was that I was parsing data > > where a datapoint indicated the end of a preceding sub-group, so the > > datapoint was both a data structure indicator *and* data in its own > > right. So I'd like to have iterated over the line once, branching into > > the flow control management part of an if/else, and then again to branch > > into the data management part of the same if/else. > > > > Yes, this is unnecessary and just a convenience for parsing that I'd > > like to see. > > It would really help to have specific use cases, so we can look at how much the syntactic sugar helps readability vs. what we can write today. Otherwise all anyone can say is, "Well, it sounds like it might be nice, but I can't tell if it would be nice enough to be worth a language change", or try to invent their own use cases that might not be as nice as yours and then unfairly dismiss it as unnecessary. The way I would describe this is, the proposal is to add single-item pushback support to all for loops. Tokenizers are a common case that needs pushback ("if we are in the IDENTIFIER state and the next character is not alphanumeric, then set state to NEW_TOKEN and process it again"). I don't know how common such cases are in the grand scheme of things, but they are somewhat cumbersome to handle when they come up. The most elegant solution I know is: class PushbackAdaptor: def __init__(self, iterable): self.base = iter(iterable) self.stack = [] def next(self): if self.stack: return self.stack.pop() else: return self.base.next() def pushback(self, obj): self.stack.append(obj) it = iter(character_source) for char in it: ... if state is IDENTIFIER and char not in IDENT_CHARS: state = NEW_TOKEN it.push_back(char) continue ... In modern python, I think the natural meaning for 'continue with' wouldn't be to special-case something like this. Instead, where 'continue' triggers a call to 'it.next()', I'd expect 'continue with x' to trigger a call to 'it.send(x)'. I suspect this might enable some nice idioms in coroutiney code, though I'm not very familiar with such. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From tleeuwenburg at gmail.com Thu Sep 25 04:33:17 2014 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Thu, 25 Sep 2014 12:33:17 +1000 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: <20140917092154.GG9293@ando.pearwood.info> References: <70CB0A34-7B73-407E-AC4A-7F25D1C94BF2@yahoo.com> <20140917092154.GG9293@ando.pearwood.info> Message-ID: Thanks to everyone for the replies! I read them all with interest. The fundamental issue for me is that you shouldn't just co-opt functionality. The semantics of a class is clearly intended to be a class of objects -- and instantiatable thing which is a core part of OO design. Re-using it for named blocks really just seems like it would be massively confusing, particularly if one were to interleave the two. I also take Steve's point about the ability for functions to refer to eachother without using the full namespace. That would be another useful effect. Does anyone think this idea is worth developing further, or is it best left as an interesting discussion? On 17 September 2014 19:21, Steven D'Aprano wrote: > On Wed, Sep 17, 2014 at 01:51:29AM -0700, Andrew Barnert wrote: > > On Sep 16, 2014, at 23:21, David Mertz wrote: > > > > > Why is this a misuse? > > > > Well, for one thing, you're relying on the fact that unbound methods > > are just plain functions, which was not true in 2.x and is still not > > described that way in the documentation. You're also ignoring the fact > > that the first parameter of a method should be self and the convention > > (enforced by the interpreter 2.x, although no longer in 3.x, and by > > various lint tools, and likely relied on by IDEs, etc.) that when > > calling an unbound method you pass an instance of the class (or a > > subclass) as the first argument. > > While all this is true, one can work around it by declaring all your > methods @staticmethod. But it's worse than that. > > By using a class, you imply inheritance and instantiation. Neither is > relevant to the basic "namespace" idea. > > Furthermore, there's no point (in my opinion) in having this sort of > namespace unless functions inside a namespace can refer to each other > without caring about the name of the namespace they are in. Think of > modules. Given a module a.py containing functions f and g, f can call g: > > def f(): > return g() > > without writing: > > def f(): > return a.g() > > Classes don't give you that, so they are not up to the job. > > Modules, on the other hand, give us almost exactly what is needed here. > We can create module instances on the fly, and populate them. A class > decorator could accept a class and return a module instance, on the fly. > That would still be ugly, since > > @namespace > class stuff: > > *looks* like a class even though it isn't, but it will do as a > proof-of-concept. > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- -------------------------------------------------- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Sep 25 05:51:45 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 25 Sep 2014 04:51:45 +0100 Subject: [Python-ideas] "continue with" for dynamic iterable injection In-Reply-To: References: <54234105.8070806@cathalgarvey.me> <983AC20E-B143-4347-BBB3-1D6A498E1130@yahoo.com> Message-ID: On Thu, Sep 25, 2014 at 2:50 AM, Nathaniel Smith wrote: > The most elegant solution I know is: > > class PushbackAdaptor: > def __init__(self, iterable): > self.base = iter(iterable) > self.stack = [] > > def next(self): > if self.stack: > return self.stack.pop() > else: > return self.base.next() > > def pushback(self, obj): > self.stack.append(obj) > > it = iter(character_source) > for char in it: > ... > if state is IDENTIFIER and char not in IDENT_CHARS: > state = NEW_TOKEN > it.push_back(char) > continue > ... > > In modern python, I think the natural meaning for 'continue with' wouldn't > be to special-case something like this. Instead, where 'continue' triggers a > call to 'it.next()', I'd expect 'continue with x' to trigger a call to > 'it.send(x)'. I suspect this might enable some nice idioms in coroutiney > code, though I'm not very familiar with such. In fact, given the 'send' definition of 'continue with x', the above tokenization code would become simply: def redoable(iterable): for obj in iterable: while yield obj == "redo": pass for char in redoable(character_source): ... if state is IDENTIFIER and char not in IDENT_CHARS: state = NEW_TOKEN continue with "redo" ... which I have to admit is fairly sexy. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Thu Sep 25 07:07:29 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 25 Sep 2014 06:07:29 +0100 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <54230913.4060401@egenix.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> Message-ID: On Wed, Sep 24, 2014 at 7:10 PM, M.-A. Lemburg wrote: > Agreed, it's a nice feature :-) > > I've been using this in our mx packages since 1999 using a module > called LazyModule.py. See e.g. > http://educommons.com/dev/browser/3.2/installers/windows/src/eduCommons/python/Lib/site-packages/mx/URL/LazyModule.py > > Regarding making module more class like: we've played with this > a bit at PyCon UK and it's really easy to turn a module into a > regular class (with all its features) by tweaking sys.modules - > we even got .__getattr__() to work. With some more effort, we > could have a main() function automatically called upon direct > import from the command line. > > The whole thing is a huge hack, though, so I'll leave out the > details :-) Indeed. I can think of multiple places where there are compelling reasons to want to hook module attribute lookup: Lazy loading: as per above. E.g., ten years ago for whatever reason, someone decided that 'import numpy' ought to automatically execute 'import numpy.testing' as well. So now backcompat means we're stuck with it. 'import numpy.testing' is rather slow, to the point that it can be a substantial part of the total overhead for launching numpy-using scripts. We get bug reports about this, from people who are irritated that their production code is spending all this time loading unit-test harnesses and whatnot that it doesn't even use. Module attribute deprecation: For reasons that are even more lost in the mists of time, numpy re-exports some objects from the __builtins__ namespace (e.g., numpy.float exists but is __builtins__.float; if you want the default numpy floating-point type you have to write numpy.float_). As you can probably imagine this is massively confusing to everyone, but if we just removed these re-exports then it would break existing working code (e.g., 'numpy.array([1, 2, 3], dtype=numpy.float)' does work and do the right thing right now), so according to our deprecation policy we have to spend a few releases issuing warnings every time someone writes 'numpy.float'. Which requires executing arbitrary code at attribute lookup time. I think both of these use cases arise very commonly in long-lived projects, but right now the only ways to accomplish either of these things involve massive disgusting hacks. They are really really hard to do cleanly, and you risk all kinds of breakage in edge-cases (e.g. try reload()'ing a module that's been replaced by an object). So, we haven't dared release anything like this in production, and the above problems just hang around indefinitely. What I'd really like is for module attribute lookup to start supporting the descriptor protocol. This would be super-easy to work with and fast (you only pay the extra overhead for the attributes which have been hooked). -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From mertz at gnosis.cx Thu Sep 25 08:06:14 2014 From: mertz at gnosis.cx (David Mertz) Date: Wed, 24 Sep 2014 23:06:14 -0700 Subject: [Python-ideas] "continue with" for dynamic iterable injection In-Reply-To: References: <54234105.8070806@cathalgarvey.me> <983AC20E-B143-4347-BBB3-1D6A498E1130@yahoo.com> Message-ID: There are a couple minor errors in Nathaniels (presumably untested) code. But I think it looks quite elegant overall, actually: #!/usr/bin/env python3 from string import ascii_lowercase from random import random class PushbackAdaptor(object): def __init__(self, iterable): self.base = iter(iterable) self.stack = [] def __next__(self): if self.stack: return self.stack.pop() else: return next(self.base) def pushback(self, obj): self.stack.append(obj) def __iter__(self): return self def repeat_some(it): it = PushbackAdaptor(it) for x in it: print(x, end='') if random() > 0.5: it.pushback(x) continue print() repeat_some(ascii_lowercase) repeat_some(range(10)) On Wed, Sep 24, 2014 at 6:50 PM, Nathaniel Smith wrote: > On 25 Sep 2014 02:09, "Andrew Barnert" > wrote: > > > > On Sep 24, 2014, at 15:09, Cathal Garvey > wrote: > > > > > > > > The conversation began with me complaining that I'd like a third mode > of > > > explicit flow control in Python for-loops; the ability to repeat a loop > > > iteration in whole. The reason for this was that I was parsing data > > > where a datapoint indicated the end of a preceding sub-group, so the > > > datapoint was both a data structure indicator *and* data in its own > > > right. So I'd like to have iterated over the line once, branching into > > > the flow control management part of an if/else, and then again to > branch > > > into the data management part of the same if/else. > > > > > > Yes, this is unnecessary and just a convenience for parsing that I'd > > > like to see. > > > > It would really help to have specific use cases, so we can look at how > much the syntactic sugar helps readability vs. what we can write today. > Otherwise all anyone can say is, "Well, it sounds like it might be nice, > but I can't tell if it would be nice enough to be worth a language change", > or try to invent their own use cases that might not be as nice as yours and > then unfairly dismiss it as unnecessary. > > The way I would describe this is, the proposal is to add single-item > pushback support to all for loops. Tokenizers are a common case that needs > pushback ("if we are in the IDENTIFIER state and the next character is not > alphanumeric, then set state to NEW_TOKEN and process it again"). > > I don't know how common such cases are in the grand scheme of things, but > they are somewhat cumbersome to handle when they come up. > > The most elegant solution I know is: > > class PushbackAdaptor: > def __init__(self, iterable): > self.base = iter(iterable) > self.stack = [] > > def next(self): > if self.stack: > return self.stack.pop() > else: > return self.base.next() > > def pushback(self, obj): > self.stack.append(obj) > > it = iter(character_source) > for char in it: > ... > if state is IDENTIFIER and char not in IDENT_CHARS: > state = NEW_TOKEN > it.push_back(char) > continue > ... > > In modern python, I think the natural meaning for 'continue with' wouldn't > be to special-case something like this. Instead, where 'continue' triggers > a call to 'it.next()', I'd expect 'continue with x' to trigger a call to > 'it.send(x)'. I suspect this might enable some nice idioms in coroutiney > code, though I'm not very familiar with such. > > -n > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t_glaessle at gmx.de Thu Sep 25 12:16:31 2014 From: t_glaessle at gmx.de (=?UTF-8?B?VGhvbWFzIEdsw6TDn2xl?=) Date: Thu, 25 Sep 2014 12:16:31 +0200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> Message-ID: <5423EB7F.40309@gmx.de> Nathaniel Smith wrote on 09/25/2014 07:07 AM: > What I'd really like is for module attribute lookup to start > supporting the descriptor protocol. This would be super-easy to work > with and fast (you only pay the extra overhead for the attributes > which have been hooked). -n I'm not sure, I picture this the same way you intended, but I believe supporting the descriptor protocol is too confusing and breaks too much code in many cases. You wouldn't normally expect to execute x.__get__, etc on module attribute access if you are just trying to export some object x that happens to be a descriptor. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: OpenPGP digital signature URL: From tleeuwenburg at gmail.com Thu Sep 25 12:22:29 2014 From: tleeuwenburg at gmail.com (Tennessee Leeuwenburg) Date: Thu, 25 Sep 2014 20:22:29 +1000 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <5423EB7F.40309@gmx.de> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <5423EB7F.40309@gmx.de> Message-ID: I love it. +1 :). On 25 September 2014 20:16, Thomas Gl??le wrote: > > Nathaniel Smith wrote on 09/25/2014 07:07 AM: > > What I'd really like is for module attribute lookup to start > > supporting the descriptor protocol. This would be super-easy to work > > with and fast (you only pay the extra overhead for the attributes > > which have been hooked). -n > > I'm not sure, I picture this the same way you intended, but I believe > supporting the descriptor protocol is too confusing and breaks too much > code in many cases. You wouldn't normally expect to execute x.__get__, > etc on module attribute access if you are just trying to export some > object x that happens to be a descriptor. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- -------------------------------------------------- Tennessee Leeuwenburg http://myownhat.blogspot.com/ "Don't believe everything you think" -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 25 12:30:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 25 Sep 2014 20:30:48 +1000 Subject: [Python-ideas] Idea: Named code blocks / inline module declarations In-Reply-To: References: <70CB0A34-7B73-407E-AC4A-7F25D1C94BF2@yahoo.com> <20140917092154.GG9293@ando.pearwood.info> Message-ID: On 25 September 2014 12:33, Tennessee Leeuwenburg wrote: > Thanks to everyone for the replies! I read them all with interest. > > The fundamental issue for me is that you shouldn't just co-opt > functionality. The semantics of a class is clearly intended to be a class of > objects -- and instantiatable thing which is a core part of OO design. > Re-using it for named blocks really just seems like it would be massively > confusing, particularly if one were to interleave the two. The metaclass system already allows for fairly significant variations in "class" semantics. In this case, a metaclass that disallowed instantiation and bypassed the normal class lookup machinery seems entirely feasible. That doesn't seem any more fundamentally confusing than using the same syntax for normal classes, metaclasses, ABCs, enumerations, database ORM models, web framework form and view definitions, etc. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dholth at gmail.com Thu Sep 25 12:34:05 2014 From: dholth at gmail.com (Daniel Holth) Date: Thu, 25 Sep 2014 06:34:05 -0400 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <5423EB7F.40309@gmx.de> Message-ID: Have you tried apipkg? On Sep 25, 2014 6:22 AM, "Tennessee Leeuwenburg" wrote: > I love it. +1 :). > > On 25 September 2014 20:16, Thomas Gl??le wrote: > >> >> Nathaniel Smith wrote on 09/25/2014 07:07 AM: >> > What I'd really like is for module attribute lookup to start >> > supporting the descriptor protocol. This would be super-easy to work >> > with and fast (you only pay the extra overhead for the attributes >> > which have been hooked). -n >> >> I'm not sure, I picture this the same way you intended, but I believe >> supporting the descriptor protocol is too confusing and breaks too much >> code in many cases. You wouldn't normally expect to execute x.__get__, >> etc on module attribute access if you are just trying to export some >> object x that happens to be a descriptor. >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > -------------------------------------------------- > Tennessee Leeuwenburg > http://myownhat.blogspot.com/ > "Don't believe everything you think" > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cathalgarvey at cathalgarvey.me Thu Sep 25 13:31:15 2014 From: cathalgarvey at cathalgarvey.me (Cathal Garvey) Date: Thu, 25 Sep 2014 12:31:15 +0100 Subject: [Python-ideas] "continue with" for dynamic iterable injection In-Reply-To: References: <54234105.8070806@cathalgarvey.me> <983AC20E-B143-4347-BBB3-1D6A498E1130@yahoo.com> Message-ID: <5423FD03.8090109@cathalgarvey.me> I appreciate the comparison to coroutines, it helps frame some of the use-cases. However (forgive me for saying), I find that Python's coroutine model isn't intuitive, certainly for newcomers. I often find it hard to envisage use-cases for Python coroutines that wouldn't be better served with a class, for the sake of readability; Python's overall philosophy leans toward readability, so that suggests it's leaning away from coroutines. Now, that's an aside, so I don't want to fall off-topic; I also realise Python is drifting into more uniform and useful coroutine based code, based on asyncio. So, if `continue with` were useful to building better or more intuitive coroutines, then by all means that's a valid application. What I had more in mind was to have it as something that would stand independent of the custom-class spaghetti normally required to implement "clean" alternative looping or coroutines. That is, everyone can accept that it's not necessary; but does it clean up real-world code? So, for clarity, this is the kind of thing I had in mind, which I've encountered in various forms and which I think `continue with` would help with, at its most basic: Ingredient (unit) (unit/100g): Protein g, 5 Carbohydrate g, 50 Fibre g 10 Insoluble Fibre g 5 Soluble Fibre g 5 Starches g 20 Sugars g 20 Sucrose g 10 Glucose g 5 Fructose g 5 Vitamins mg 100 Ascorbic Acid mg 50 Niacin mg 50 The above is invented, but I was actually parsing an ingredient/nutrition list when the idea occurred to me. As you can see, there are "totals" followed by sub-categories, some of which are subtotals which form their own category. When parsed, I might want (pseudo-json): { Protein: 5g Carbohydrates: { total: 50g fibre: { total: 10g, soluble: 5 insoluble: 5 } <...> } To parse this, I create code like this, with some obviously-named functions that aren't given. With recursive subtables, obviously this isn't going to work as-is, but it illustrates the point: ``` table = {} subtable = '' for line in raw_table: name, unit, quant = line.strip().rsplit(None, 2) if subtable: if is_valid_subtable_element(subtable, name): table[subtable][name] = quant + unit else: subtable = '' table[name] = quant + unit # DRY! else: if is_subtable_leader(name): subtable = name table[subtable] = {'total': quant_unit} else: table[name] = quant + unit # DRY! ``` Now, if I have to maintain this code, which will quickly become nontrivial for enough branches, I have several locations that need parallel fixes and modifications. One solution is to functionalise this and build functions to which the container (table) and the tokens are passed; changes are then made in the functions, and the repeated calls in different code branches become more maintainable. Another is to make an object instead of a dict-table, and the object performs some magic to handle things correctly. However, with `continue with` the solution is more straightforward. Because the problem, essentially, is that some tokens cause a state-change in addition to presenting data in their own right, by using `continue with` you can handle them in one code branch first, then repeat to handle them in the other branch: ``` table = {} subtable = '' for line in raw_table.splitlines(): name, unit, quant = line.rsplit(None, 2) if subtable: if is_valid_subtable_element(subtable, name): table[subtable][name] = quant + unit else: subtable = '' continue with line else: if is_subtable_leader(name): subtable = name table[subtable] = {} continue with 'total {} {}'.format(quant, unit) else: table[name] = quant + unit ``` The result is a single table entry per branch; one for subtables, one for base table. The handling of tokens that present flow-control issues, like titles of subtables or values that indicate the subtable should end (like a vitamin, when we were parsing carbohydrates), is handled first as flow-control issues and then again as data. (in this case, assume that the function is_valid_subtable_element accepts "total" as a valid subtable element always, and judges the rest according to predefined valid items for categories like "carbohydrates", "fibres", "vitamins", etcetera). The flow control is cleaner, more intuitive to read IMO, and there is less call for the definition of special flow-control classes, functions or coroutines. In my opinion, anything that removes the need for custom classes and functions *for the purpose of flow control and readability* is an improvement to the language. Now, as indicated above, `continue with` does not merely repeat the current iteration; you can dynamically generate the next iteration cycle. In the above example, that changed a line like "Carbohydrates g 50" into "total g 50" for use in the subtable iteration. More creative uses of dynamic iterable injection will surely present themselves with further thought. Sorry for the poor clarity last night, and perhaps today; I'm recovering from illness and distracted by various other things. :) Thanks for your feedback and thoughts! Cathal On 25/09/14 04:51, Nathaniel Smith wrote: > On Thu, Sep 25, 2014 at 2:50 AM, Nathaniel Smith wrote: >> The most elegant solution I know is: >> >> class PushbackAdaptor: >> def __init__(self, iterable): >> self.base = iter(iterable) >> self.stack = [] >> >> def next(self): >> if self.stack: >> return self.stack.pop() >> else: >> return self.base.next() >> >> def pushback(self, obj): >> self.stack.append(obj) >> >> it = iter(character_source) >> for char in it: >> ... >> if state is IDENTIFIER and char not in IDENT_CHARS: >> state = NEW_TOKEN >> it.push_back(char) >> continue >> ... >> >> In modern python, I think the natural meaning for 'continue with' wouldn't >> be to special-case something like this. Instead, where 'continue' triggers a >> call to 'it.next()', I'd expect 'continue with x' to trigger a call to >> 'it.send(x)'. I suspect this might enable some nice idioms in coroutiney >> code, though I'm not very familiar with such. > > In fact, given the 'send' definition of 'continue with x', the above > tokenization code would become simply: > > def redoable(iterable): > for obj in iterable: > while yield obj == "redo": > pass > > for char in redoable(character_source): > ... > if state is IDENTIFIER and char not in IDENT_CHARS: > state = NEW_TOKEN > continue with "redo" > ... > > which I have to admit is fairly sexy. > -- Twitter: @onetruecathal, @formabiolabs Phone: +353876363185 Blog: http://indiebiotech.com miniLock.io: JjmYYngs7akLZUjkvFkuYdsZ3PyPHSZRBKNm6qTYKZfAM -------------- next part -------------- A non-text attachment was scrubbed... Name: 0x988B9099.asc Type: application/pgp-keys Size: 6176 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From t_glaessle at gmx.de Thu Sep 25 13:55:09 2014 From: t_glaessle at gmx.de (=?UTF-8?B?VGhvbWFzIEdsw6TDn2xl?=) Date: Thu, 25 Sep 2014 13:55:09 +0200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> Message-ID: <5424029D.8060407@gmx.de> Nathaniel Smith wrote on 09/25/2014 07:07 AM: > Indeed. I can think of multiple places where there are compelling > reasons to want to hook module attribute lookup: > > Lazy loading: [...] > > Module attribute deprecation: [...] > > I think both of these use cases arise very commonly in long-lived > projects, but right now the only ways to accomplish either of these > things involve massive disgusting hacks. They are really really hard > to do cleanly, and you risk all kinds of breakage in edge-cases (e.g. > try reload()'ing a module that's been replaced by an object). So, we > haven't dared release anything like this in production, and the above > problems just hang around indefinitely. The reason I brought implicit imports up in isolation from (well, maybe not isolated enough) supporting a module.__getattr__ protocol altogether, is that it's much less involved. The former can be added without also adding the latter and already cover a lot of its use cases. If module.__getattr__ can be added, I'm all for it. But it also suggests to enable other class-like features in modules, which might not be so easy anymore, conceptually. In contrast, IMO, it is natural to expect package.module to *just work*, regardless of whether the submodule has already been imported. At least, if packages were only collections of modules. Maybe, this is the more fundamental problem with packages. They are more like module/package hybrids with a mixed-up namespace. This also causes other irritating issues. E.g.: package/__init__.py: foo = "foo" from . import foo from . import bar bar = "bar" baz = "baz" # has the following submodules: package/foo.py: ... package/bar.py: ... package/baz.py: ... user: >>> package.foo >>> package.bar bar >>> import package.bar as bar >>> bar # not the module you might expect.. bar >>> package.baz baz >>> from package import baz baz >>> import package.baz as baz >>> baz The "baz" case can be especially confusing. I know, you shouldn't write code like this. But sometimes it happens, because it's just so easy. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: OpenPGP digital signature URL: From random832 at fastmail.us Thu Sep 25 15:41:44 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 25 Sep 2014 09:41:44 -0400 Subject: [Python-ideas] use gmtime(0) epoch in functions that use mktime() In-Reply-To: References: <878ulw6ej0.fsf@gmail.com> <1410040563.2792555.164455181.137FA6D8@webmail.messagingengine.com> Message-ID: <1411652504.1030461.171665257.178BFEEA@webmail.messagingengine.com> On Sat, Sep 6, 2014, at 18:19, Guido van Rossum wrote: > I'm fine with that, as long as "handle leap seconds consistently" means > "pretend they don't exist" (which is necessary for compatibility with > POSIX). Consistently pretend they don't exist. AFAIK you're more likely to encounter a system using the so-called "right" timezones in tzdata (and therefore _not_ pretending that leap seconds don't exist) than one which doesn't have an epoch of 1970. In which case you would need to use "time2posix" and "posix2time" when calling any platform-specific functions that use time_t (with the user-visible python side, of course, being the non-leap-second posix timestamps) I did an inventory of names in the time module, by whether they can be implemented platform-independently or not: Depends on system-dependent ways of getting the current time: clock clock_getres clock_gettime clock_settime get_clock_info monotonic perf_counter process_time time Various default values (gmtime etc), can be implemented in terms of time Depends on system-dependent ways of getting the local timezone: ctime, can be implemented in terms of localtime localtime mktime tzset, can be used to set constants: timezone tzname altzone daylight Otherwise system-dependent: sleep Can be implemented in a platform-independent or pure python way: asctime gmtime calendar.timegm strftime (except %z %Z and locale) strptime (except %Z and locale) struct_time The list of theoretically platform-independent functions is, as it turns out, depressingly small. It might also be worthwhile to make a windows-specific implementation of some of the platform-dependent functions, rather than one relying on the C library (for example, localtime only has a range of 1970 to 2199, whereas SystemTimeToTzSpecificLocalTime has a range of 1601 to 30828.) But it would have the issue of not having the C library's somewhat obscure support of part of the POSIX TZ standard. (However, a full implementation of POSIX TZ could be done in a platform-independent way). From g.rodola at gmail.com Thu Sep 25 19:15:09 2014 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 25 Sep 2014 19:15:09 +0200 Subject: [Python-ideas] including psutil in the standard library? In-Reply-To: <5423352D.6020803@ferrara.linux.it> References: <542319B1.2090606@ferrara.linux.it> <20140924162447.712c0846@anarchist.wooz.org> <542329F1.1090501@egenix.com> <20140924223805.41e0af95@fsol> <5423352D.6020803@ferrara.linux.it> Message-ID: On Wed, Sep 24, 2014 at 11:18 PM, Stefano Borini < stefano.borini at ferrara.linux.it> wrote: > On 9/24/14 10:38 PM, Antoine Pitrou wrote: > >> "Upstream" is Giampaolo Rodola, who is a core developer. >> >> > I already sent him an email about this. I am waiting for his answer. > I would enjoy handling the library as a maintainer in the stdlib, but I > have never done anything at this level, so I am not sure I am worthy/able > to do it ;) Hello all and thanks for the positive feedback. It's good to know psutil is so appreciated and it's the best payback for all the hard work. Personally I would be glad to offer psutil for inclusion into the stdlib but at the moment I'm crazily busy with the relocation (I moved to Berlin from Italy) and I wouldn't really have time to dedicate to such an expensive task (which would probably also deserve a PEP) and careful thinking. Also, I still want to work on a couple of new features first (NIC addresses and stats), address a couple of high-priority issues such as: https://github.com/giampaolo/psutil/issues/512 https://github.com/giampaolo/psutil/issues/496 https://github.com/giampaolo/psutil/issues/428 ...and be 100% sure that I'm happy with the API as it is right now (I already went through a major breakage once, see http://grodola.blogspot.com/2014/01/psutil-20-porting.html). In summary, I'd love to do this but not right now as I'm not quite ready yet. -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Sep 26 00:31:23 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 26 Sep 2014 10:31:23 +1200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> Message-ID: <542497BB.6020302@canterbury.ac.nz> Nathaniel Smith wrote: > They are really really hard > to do cleanly, and you risk all kinds of breakage in edge-cases (e.g. > try reload()'ing a module that's been replaced by an object). One small thing that might help is to allow the __class__ of a module to be reassigned to a subclass of the module type. That would allow a module to be given custom behaviours, while remaining a real module object so that reload() etc. continue to work. -- Greg From njs at pobox.com Fri Sep 26 03:02:01 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 26 Sep 2014 02:02:01 +0100 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <542497BB.6020302@canterbury.ac.nz> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> Message-ID: On Thu, Sep 25, 2014 at 11:31 PM, Greg Ewing wrote: > Nathaniel Smith wrote: >> >> They are really really hard >> to do cleanly, and you risk all kinds of breakage in edge-cases (e.g. >> try reload()'ing a module that's been replaced by an object). > > One small thing that might help is to allow the > __class__ of a module to be reassigned to a > subclass of the module type. That would allow > a module to be given custom behaviours, while > remaining a real module object so that reload() > etc. continue to work. Heh, I was actually just pondering whether it would be opening too big a can of worms to suggest this myself. This is the best design I managed to come up with last time I looked at it, though in existing versions of python it requires ctypes hackitude to accomplish the __class__ reassignment. (The advantages of this approach are that (1) you get to use the full class machinery to define your "metamodule", (2) any existing references to the module get transformed in-place, so you don't have to worry about ending up with a mixture of old and new instances existing in the same program, (3) by subclassing and avoiding copying you automatically support all the subtleties and internal fields of actual module objects in a forward- and backward-compatible way.) This would work today, and would solve all these problems, except for the following code in Objects/typeobject.c:object_set_class: if (!(newto->tp_flags & Py_TPFLAGS_HEAPTYPE) || !(oldto->tp_flags & Py_TPFLAGS_HEAPTYPE)) { PyErr_Format(PyExc_TypeError, "__class__ assignment: only for heap types"); return -1; } if (compatible_for_assignment(oldto, newto, "__class__")) { Py_INCREF(newto); Py_TYPE(self) = newto; Py_DECREF(oldto); return 0; } The builtin "module" type is not a HEAPTYPE, so if we try to do mymodule.__class__ = mysubclass, then the !(oldto->tp_flags & Py_TPFLAGS_HEAPTYPE) check gets triggered and the assignment fails. This code has been around forever, but I don't know why. AFAIK we could replace the above with if (compatible_for_assignment(oldto, newto, "__class__")) { if (newto->tp_flags & Py_TPFLAGS_HEAPTYPE) { Py_INCREF(newto); } Py_TYPE(self) = newto; if (oldto->tp_flags & Py_TPFLAGS_HEAPTYPE) { Py_DECREF(oldto); } return 0; } and everything would just work, but I could well be missing something? Is there some dragon lurking inside Python's memory management or is this just an ancient overabundance of caution? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From random832 at fastmail.us Fri Sep 26 05:32:49 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 25 Sep 2014 23:32:49 -0400 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> Message-ID: <1411702369.201686.171927421.3309F6F0@webmail.messagingengine.com> On Thu, Sep 25, 2014, at 21:02, Nathaniel Smith wrote: > and everything would just work, but I could well be missing something? > Is there some dragon lurking inside Python's memory management or is > this just an ancient overabundance of caution? Currently, this is the message you get if you attempt to reassign the class of a list, or an int. Is there something else that would prevent it? Maybe the "object layout differs" check? What if the class you are assigning is a legitimate subclass of the basic type? From guido at python.org Fri Sep 26 06:24:42 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Sep 2014 21:24:42 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <1411702369.201686.171927421.3309F6F0@webmail.messagingengine.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <1411702369.201686.171927421.3309F6F0@webmail.messagingengine.com> Message-ID: On Thu, Sep 25, 2014 at 8:32 PM, wrote: > On Thu, Sep 25, 2014, at 21:02, Nathaniel Smith wrote: > > and everything would just work, but I could well be missing something? > > Is there some dragon lurking inside Python's memory management or is > > this just an ancient overabundance of caution? > > Currently, this is the message you get if you attempt to reassign the > class of a list, or an int. Is there something else that would prevent > it? Maybe the "object layout differs" check? What if the class you are > assigning is a legitimate subclass of the basic type? > IIRC the caution is for the case where a built-in type has its own allocation policy, such as the custom free list used by float and a few other types. The custom deallocation code is careful not to use the free list for subclass instances. But (depending on how the free list is implemented) if you could switch the type out for an object that's in a custom free list, the free list could become corrupt. There is no custom allocation for modules, and even for float I don't see how switching types back and forth between float and a subclass could corrupt the free list (assuming the struct size and layout constraints are met), but it is certainly possible to have a custom allocation policy that would be broken. So indeed the smell of dragons is still there (they may exist in 3rd party modules). Perhaps we can rename HEAPTYPE to NO_CUSTOM_ALLOCATOR and set it for most built-in types (or at least for the module type) and all will be well. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Sep 26 10:15:59 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Sep 2014 18:15:59 +1000 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <54230610.7060305@gmx.de> References: <54230610.7060305@gmx.de> Message-ID: <20140926081559.GC19757@ando.pearwood.info> On Wed, Sep 24, 2014 at 07:57:36PM +0200, Thomas Gl??le wrote: > Hey folks, > > What do you think about making it easier to use packages by > automatically importing submodules on attribute access. I think it is a bad idea. And yet I also think that optionally supporting module.__getattr__ and friends is a good idea. What's the difference between the two? (1) "Automatically importing submodules on attribute access" implies that every package and module does this, whether it is appropriate for it or not. (2) Building some sort of support for module.__getattr__ implies that it is opt-in. Whatever the mechanism ends up being, the module author has to actively make some sort of __getattr__ hook. The Zen already has something to say about this: Explicit is better than implicit. Automatic importing is implicit importing. Now, of course the Zen is not an absolute, and modules/packages can preload sub-modules if they so choose, e.g. os automatically imports os.path. But as a general rule if you want to import a module, you have to import the module, not it's parent. > Consider this example: > > >>> import matplotlib > >>> figure = matplotlib.figure.Figure() > AttributeError: 'module' object has no attribute 'figure' > > For the newcomer (like me some months ago) it's not obvious that the > solution is to import matplotlib.figure. I sympathise. This issue comes up occasionally on the tutor@ and python-list at python.org mailing lists. Beginners sometimes don't understand when they need to do an import and when they dont, so we get things like `import sys.version`. In hindsight, it is a little unfortunate that package dotted names and attribute access use the same notation. > I'm not sure about potential problems from auto-importing. I currently > see the following issues: > > - harmless looking attribute access can lead to significant code > execution including side effects. On the other hand, that could always > be the case. True, but today it is quite rare that the second line: import spam spam.thing will execute arbitrary code. (The initial import will, of course.) By making importing automatic, every failed attribute access has to determine whether or not there is a sub-module to import, which could be quite expensive. -- Steven From rosuav at gmail.com Fri Sep 26 10:44:48 2014 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 26 Sep 2014 18:44:48 +1000 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <20140926081559.GC19757@ando.pearwood.info> References: <54230610.7060305@gmx.de> <20140926081559.GC19757@ando.pearwood.info> Message-ID: On Fri, Sep 26, 2014 at 6:15 PM, Steven D'Aprano wrote: > By making importing automatic, every failed attribute access has to > determine whether or not there is a sub-module to import, which could be > quite expensive. What if the package had to explicitly do a "stub import" that creates something that, on first access (or first access of any of *its* members), goes and loads up the module? ChrisA From ncoghlan at gmail.com Fri Sep 26 11:43:03 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Sep 2014 19:43:03 +1000 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <20140926081559.GC19757@ando.pearwood.info> Message-ID: On 26 September 2014 18:44, Chris Angelico wrote: > On Fri, Sep 26, 2014 at 6:15 PM, Steven D'Aprano wrote: >> By making importing automatic, every failed attribute access has to >> determine whether or not there is a sub-module to import, which could be >> quite expensive. > > What if the package had to explicitly do a "stub import" that creates > something that, on first access (or first access of any of *its* > members), goes and loads up the module? It's also worth noting the caution in https://docs.python.org/dev/library/importlib.html#importlib.util.LazyLoader Yes, the AttributeError when you try to access a submodule that hasn't been imported yet can be a little confusing, but it's positively crystal clear compared to the confusion you encounter when an attribute access attempt fails with ImportError (or, worse, if the AttributeError is hiding an import error). Explicit, eager imports make it clear when module level code execution might be triggered, with all the associated potential for failure (whether in module lookup, in compilation, in bytecode caching or in code execution). Implicit and lazy imports take that complexity, and run it automatically as part of a __getattr__ operation. There are valid reasons for doing that (such as to improve startup time in large applications), but postponing the point where new users need to learn the difference between "package attribute set in __init__" and "imported submodule" likely isn't one of them. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Fri Sep 26 13:32:23 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 26 Sep 2014 13:32:23 +0200 Subject: [Python-ideas] Implicit submodule imports References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> Message-ID: <20140926133223.75076d5e@fsol> On Fri, 26 Sep 2014 02:02:01 +0100 Nathaniel Smith wrote: > > This code has been around forever, but I don't know why. AFAIK we > could replace the above with > > if (compatible_for_assignment(oldto, newto, "__class__")) { > if (newto->tp_flags & Py_TPFLAGS_HEAPTYPE) { > Py_INCREF(newto); > } > Py_TYPE(self) = newto; > if (oldto->tp_flags & Py_TPFLAGS_HEAPTYPE) { > Py_DECREF(oldto); > } > return 0; > } > > and everything would just work, but I could well be missing something? > Is there some dragon lurking inside Python's memory management or is > this just an ancient overabundance of caution? The tp_dealloc for a heap type is not the same as the non-heap base type's tp_dealloc. See subtype_dealloc() in typeobject.c. Switching the __class__ would deallocate the instance with an incompatible tp_dealloc. (in particular, a heap type is always incref'ed when an instance is created and decref'ed when an instance is destroyed, but the base type wouldn't) Also, look at compatible_for_assignment(): it calls same_slots_added() which assumes both args are heap types. Note that this can be a gotcha when using the stable ABI: http://bugs.python.org/issue16690 Regards Antoine. From abarnert at yahoo.com Fri Sep 26 16:59:39 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 07:59:39 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> Message-ID: <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> On Sep 25, 2014, at 18:02, Nathaniel Smith wrote: > On Thu, Sep 25, 2014 at 11:31 PM, Greg Ewing > wrote: >> Nathaniel Smith wrote: >>> >>> They are really really hard >>> to do cleanly, and you risk all kinds of breakage in edge-cases (e.g. >>> try reload()'ing a module that's been replaced by an object). >> >> One small thing that might help is to allow the >> __class__ of a module to be reassigned to a >> subclass of the module type. That would allow >> a module to be given custom behaviours, while >> remaining a real module object so that reload() >> etc. continue to work. > > Heh, I was actually just pondering whether it would be opening too big > a can of worms to suggest this myself. This is the best design I > managed to come up with last time I looked at it, though in existing > versions of python it requires ctypes hackitude to accomplish the > __class__ reassignment. (The advantages of this approach are that (1) > you get to use the full class machinery to define your "metamodule", > (2) any existing references to the module get transformed in-place, so > you don't have to worry about ending up with a mixture of old and new > instances existing in the same program, (3) by subclassing and > avoiding copying you automatically support all the subtleties and > internal fields of actual module objects in a forward- and > backward-compatible way.) When I tried this a year or two ago, I did I with an import hook that allows you to specify metaclass=absolute.qualified.spam in any comment that comes before any non-comment lines, so you actually construct the module object as a subclass instance rather than re-classing it. In theory that seems a lot cleaner. In practice it's a weird way to specify your type; it only works if the import-hooking module and the module that defines your type have already been imported and otherwise silently does the wrong thing; and my implementation was pretty hideous. Is there a cleaner version of that we could do if we were modifying the normal import machinery instead of hooking it, and if it didn't have to work pre-3.4, and if it were part of the language instead of a hack? IIRC (too hard to check from my phone on the train), a module is built by calling exec with a new global dict and then calling the module constructor with that dict, so it's just a matter of something like: cls = g.get('__metamodule__', module) if not issubclass(cls, module): raise TypeError('metamodule {} is not a module type'.format(cls)) mod = cls(name, doc, g) # etc. Then you could import the module subclass and assign it to __metamodule__ from inside, rather than needing to pre-import stuff, and you'd get perfectly understandable errors, and so on. It seems less hacky and more flexible than re-classing the module after construction, for the same reason metaclasses and, for that matter, normal class constructors are better than reclassing after the fact. Of course I could be misremembering how modules are constructed, in which case... Never mind. > > This would work today, and would solve all these problems, except for > the following code in Objects/typeobject.c:object_set_class: > > if (!(newto->tp_flags & Py_TPFLAGS_HEAPTYPE) || > !(oldto->tp_flags & Py_TPFLAGS_HEAPTYPE)) > { > PyErr_Format(PyExc_TypeError, > "__class__ assignment: only for heap types"); > return -1; > } > if (compatible_for_assignment(oldto, newto, "__class__")) { > Py_INCREF(newto); > Py_TYPE(self) = newto; > Py_DECREF(oldto); > return 0; > } > > The builtin "module" type is not a HEAPTYPE, so if we try to do > mymodule.__class__ = mysubclass, then the !(oldto->tp_flags & > Py_TPFLAGS_HEAPTYPE) check gets triggered and the assignment fails. > > This code has been around forever, but I don't know why. AFAIK we > could replace the above with > > if (compatible_for_assignment(oldto, newto, "__class__")) { > if (newto->tp_flags & Py_TPFLAGS_HEAPTYPE) { > Py_INCREF(newto); > } > Py_TYPE(self) = newto; > if (oldto->tp_flags & Py_TPFLAGS_HEAPTYPE) { > Py_DECREF(oldto); > } > return 0; > } > > and everything would just work, but I could well be missing something? > Is there some dragon lurking inside Python's memory management or is > this just an ancient overabundance of caution? > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From random832 at fastmail.us Fri Sep 26 19:43:31 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 26 Sep 2014 13:43:31 -0400 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <1411702369.201686.171927421.3309F6F0@webmail.messagingengine.com> Message-ID: <1411753411.1878759.172167425.75AAE026@webmail.messagingengine.com> On Fri, Sep 26, 2014, at 00:24, Guido van Rossum wrote: > There is no custom allocation for modules, and even for float I don't see > how switching types back and forth between float and a subclass could > corrupt the free list For float I'd be worried more about the fact that it's supposed to be immutable. It would be entirely reasonable for an implementation to make all floats with the same value the same object (as cpython does do for ints in a certain range), and what happens if you change its type then? And even if it doesn't do so, it does for literals with the same value in the same function. So, realistically, an immutable type (especially an immutable type which has literals or another interning mechanism) needs to forbid __class__ from being assigned. From guido at python.org Fri Sep 26 19:49:56 2014 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Sep 2014 10:49:56 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <1411753411.1878759.172167425.75AAE026@webmail.messagingengine.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <1411702369.201686.171927421.3309F6F0@webmail.messagingengine.com> <1411753411.1878759.172167425.75AAE026@webmail.messagingengine.com> Message-ID: On Fri, Sep 26, 2014 at 10:43 AM, wrote: > On Fri, Sep 26, 2014, at 00:24, Guido van Rossum wrote: > > There is no custom allocation for modules, and even for float I don't see > > how switching types back and forth between float and a subclass could > > corrupt the free list > > For float I'd be worried more about the fact that it's supposed to be > immutable. It would be entirely reasonable for an implementation to make > all floats with the same value the same object (as cpython does do for > ints in a certain range), and what happens if you change its type then? > And even if it doesn't do so, it does for literals with the same value > in the same function. > > So, realistically, an immutable type (especially an immutable type which > has literals or another interning mechanism) needs to forbid __class__ > from being assigned. > That's also a good one, but probably not exactly what the code we're discussing is protecting against -- the same issue could happen with immutable values implemented in pure Python. It's likely though that the HEAPTYPE flag is a proxy for a variety of invariants maintained for the built-in base types, and that is what makes it smell like dragon. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Fri Sep 26 19:54:08 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 26 Sep 2014 20:54:08 +0300 Subject: [Python-ideas] `numbers.Natural` Message-ID: I wish the `numbers` module would include a `Natural` class that would simply check whether the number is integral and positive. -------------- next part -------------- An HTML attachment was scrubbed... URL: From t_glaessle at gmx.de Fri Sep 26 20:10:11 2014 From: t_glaessle at gmx.de (=?windows-1252?Q?Thomas_Gl=E4=DFle?=) Date: Fri, 26 Sep 2014 20:10:11 +0200 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: Message-ID: <5425AC03.3060507@gmx.de> At first glance it sounds nice and straight forward. But - is a natural positive, or just non-negative? I guess, I'd have to look it up in the docs each time, since either definition is used in lots of places. Also, Natural does not correspond to the python type, but its value. So, you couldn't use it with issubclass. Ram Rachum wrote on 09/26/2014 07:54 PM: > I wish the `numbers` module would include a `Natural` class that would > simply check whether the number is integral and positive. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: OpenPGP digital signature URL: From ram at rachum.com Fri Sep 26 20:13:03 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 26 Sep 2014 21:13:03 +0300 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: <5425AC03.3060507@gmx.de> References: <5425AC03.3060507@gmx.de> Message-ID: I agree with both the points you raised, they're both disadvantages. The question is whether the uses would be worth these two disadvantages. (`collections.Hashable` also has the second disadvantage you mentioned, and it's still in the stdlib, so there's hope.) On Fri, Sep 26, 2014 at 9:10 PM, Thomas Gl??le wrote: > At first glance it sounds nice and straight forward. > But - is a natural positive, or just non-negative? I guess, I'd have to > look it up in the docs each time, since either definition is used in lots > of places. > Also, Natural does not correspond to the python type, but its value. So, > you couldn't use it with issubclass. > > > Ram Rachum wrote on 09/26/2014 07:54 PM: > > I wish the `numbers` module would include a `Natural` class that would > simply check whether the number is integral and positive. > > > _______________________________________________ > Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Sep 26 20:15:35 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 26 Sep 2014 19:15:35 +0100 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> Message-ID: On 26 Sep 2014 15:59, "Andrew Barnert" wrote: > > On Sep 25, 2014, at 18:02, Nathaniel Smith wrote: > > > On Thu, Sep 25, 2014 at 11:31 PM, Greg Ewing > > wrote: > >> Nathaniel Smith wrote: > >>> > >>> They are really really hard > >>> to do cleanly, and you risk all kinds of breakage in edge-cases (e.g. > >>> try reload()'ing a module that's been replaced by an object). > >> > >> One small thing that might help is to allow the > >> __class__ of a module to be reassigned to a > >> subclass of the module type. That would allow > >> a module to be given custom behaviours, while > >> remaining a real module object so that reload() > >> etc. continue to work. > > > > Heh, I was actually just pondering whether it would be opening too big > > a can of worms to suggest this myself. This is the best design I > > managed to come up with last time I looked at it, though in existing > > versions of python it requires ctypes hackitude to accomplish the > > __class__ reassignment. (The advantages of this approach are that (1) > > you get to use the full class machinery to define your "metamodule", > > (2) any existing references to the module get transformed in-place, so > > you don't have to worry about ending up with a mixture of old and new > > instances existing in the same program, (3) by subclassing and > > avoiding copying you automatically support all the subtleties and > > internal fields of actual module objects in a forward- and > > backward-compatible way.) > > When I tried this a year or two ago, I did I with an import hook that allows you to specify metaclass=absolute.qualified.spam in any comment that comes before any non-comment lines, so you actually construct the module object as a subclass instance rather than re-classing it. > > In theory that seems a lot cleaner. In practice it's a weird way to specify your type; it only works if the import-hooking module and the module that defines your type have already been imported and otherwise silently does the wrong thing; and my implementation was pretty hideous. > > Is there a cleaner version of that we could do if we were modifying the normal import machinery instead of hooking it, and if it didn't have to work pre-3.4, and if it were part of the language instead of a hack? > > IIRC (too hard to check from my phone on the train), a module is built by calling exec with a new global dict and then calling the module constructor with that dict, so it's just a matter of something like: > > cls = g.get('__metamodule__', module) > if not issubclass(cls, module): > raise TypeError('metamodule > {} is not a module type'.format(cls)) > mod = cls(name, doc, g) > # etc. > > Then you could import the module subclass and assign it to __metamodule__ from inside, rather than needing to pre-import stuff, and you'd get perfectly understandable errors, and so on. > > It seems less hacky and more flexible than re-classing the module after construction, for the same reason metaclasses and, for that matter, normal class constructors are better than reclassing after the fact. > > Of course I could be misremembering how modules are constructed, in which case... Never mind. Alas, in this regard module objects are different than classes; they're constructed and placed in sys.modules before the body is exec'ed. And unfortunately it has to work that way, because if foo/__init__.py does 'import foo.bar', then the module 'foo' has to be immediately resolvable, before __init__.py finishes executing. A similar issue arises for circular imports. So this would argue for your 'magic comment' or special syntax approach, sort of like how __future__ imports work. This part we could do non-hackily if we modified the import mechanism itself. But we'd still have the other problem you mention, that the metamodule would have to be defined before the module is imported. I think this is a showstopper, given that the main use cases for metamodule support involve using it for top-level package namespaces. If the numpy project wants to define a metamodule for the 'numpy' namespace, then where do they put it? So I think we necessarily will always start out with a regular module object, and our goal is to end up with a metamodule instance instead. If this is right then it means that even in principle we really only have two options, so we should focus our attention on these. Option 1: allocate a new object, shallowly copy over all the old object properties into the new one, and then find all references to the old object and replace them with the new object. This is possible right now, but error prone: cloning a module object requires intimate knowledge of which fields exist, and swapping all the references requires that we be careful to perform the swap very early, when the only reference is the one in sys.modules. Option 2: the __class__ switcheroo. This avoids the two issues above. In exchange it's fairly brain-hurty. Oh wait, I just thought of a third option. It only works for packages, but that's okay, you can always convert a module into a package by a simple mechanical transformation. The proposal is that before exec'ing __init__.py, we check for the existence of a __preinit__.py, and if found we do something like sys.modules[package] = sentinel to block circular imports namespace = {} exec __preinit__.py in namespace cls = namespace.get("__metamodule___", ModuleType) mod = cls(name, doc, namespace) sys.modules[package] = mod exec __init__.py in namespace So preinit runs in the same namespace as init, but with a special restriction that if it tries to (directly or indirectly) import the current package, then this will trigger an ImportError. This is somewhat restrictive, but it does allow arbitrary code to be run before the module object is created. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 26 20:33:16 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 11:33:16 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: Message-ID: <7271A3FC-7A7D-42E6-9F23-044901CD07B4@yahoo.com> On Sep 26, 2014, at 10:54, Ram Rachum wrote: > I wish the `numbers` module would include a `Natural` class that would simply check whether the number is integral and positive. That's easy to add yourself--and that means nobody else has to be involved in deciding whether "positive" or "nonnegative" is the one true definition for "natural", since you can decide on a per-project basis. Also, it's kind of a strange check. Normally ABCs are used to check a value's _type_, not its _value_. For example, I don't think Integral considers 1.0 an integer, so why should Natural consider it a natural number? It makes more sense to create a concrete natural-number type, which does the appropriate thing in cases like subtraction underflow (is the appropriate thing returning an int instead? raising? depends on your application...), then just make Natural a normal ABC which returns true for any instance of anything that subclasses or registers with Natural, and false for anything else. From abarnert at yahoo.com Fri Sep 26 20:35:57 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 11:35:57 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> Message-ID: <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> On Sep 26, 2014, at 11:13, Ram Rachum wrote: > I agree with both the points you raised, they're both disadvantages. The question is whether the uses would be worth these two disadvantages. (`collections.Hashable` also has the second disadvantage you mentioned, and it's still in the stdlib, so there's hope.) Hashable doesn't have that disadvantage. It checks whether the object's class or any superclass has a __hash__ method. So it's still based on the type, not on the value, and it works as expected with issubclass. > > On Fri, Sep 26, 2014 at 9:10 PM, Thomas Gl??le wrote: >> At first glance it sounds nice and straight forward. >> But - is a natural positive, or just non-negative? I guess, I'd have to look it up in the docs each time, since either definition is used in lots of places. >> Also, Natural does not correspond to the python type, but its value. So, you couldn't use it with issubclass. >> >> >> Ram Rachum wrote on 09/26/2014 07:54 PM: >>> I wish the `numbers` module would include a `Natural` class that would simply check whether the number is integral and positive. >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Fri Sep 26 20:37:39 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 26 Sep 2014 21:37:39 +0300 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> Message-ID: I checked it and you're right. So I guess `Hashable` is a bit confusing: >>> isinstance(([3],), collections.Hashable) True On Fri, Sep 26, 2014 at 9:35 PM, Andrew Barnert wrote: > On Sep 26, 2014, at 11:13, Ram Rachum wrote: > > I agree with both the points you raised, they're both disadvantages. The > question is whether the uses would be worth these two disadvantages. > (`collections.Hashable` also has the second disadvantage you mentioned, and > it's still in the stdlib, so there's hope.) > > > Hashable doesn't have that disadvantage. It checks whether the object's > class or any superclass has a __hash__ method. So it's still based on the > type, not on the value, and it works as expected with issubclass. > > > On Fri, Sep 26, 2014 at 9:10 PM, Thomas Gl??le wrote: > >> At first glance it sounds nice and straight forward. >> But - is a natural positive, or just non-negative? I guess, I'd have to >> look it up in the docs each time, since either definition is used in lots >> of places. >> Also, Natural does not correspond to the python type, but its value. So, >> you couldn't use it with issubclass. >> >> >> Ram Rachum wrote on 09/26/2014 07:54 PM: >> >> I wish the `numbers` module would include a `Natural` class that would >> simply check whether the number is integral and positive. >> >> >> _______________________________________________ >> Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 26 20:47:16 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 11:47:16 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> Message-ID: <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> On Sep 26, 2014, at 11:37, Ram Rachum wrote: > I checked it and you're right. So I guess `Hashable` is a bit confusing: > > >>> isinstance(([3],), collections.Hashable) > True It makes sense if you think of isinstance as a type check, which is what it's supposed to be, rather than a value check. ([3],) is a tuple, and tuples are hashable as a type, even though some specific tuple values might not be. That's exactly why I think you want a type that you can check for Natural, rather than something about the value. That's what isinstance is for, and if you start subverting it to mean other things, that's when it leads to confusion. > > On Fri, Sep 26, 2014 at 9:35 PM, Andrew Barnert wrote: >> On Sep 26, 2014, at 11:13, Ram Rachum wrote: >> >>> I agree with both the points you raised, they're both disadvantages. The question is whether the uses would be worth these two disadvantages. (`collections.Hashable` also has the second disadvantage you mentioned, and it's still in the stdlib, so there's hope.) >> >> Hashable doesn't have that disadvantage. It checks whether the object's class or any superclass has a __hash__ method. So it's still based on the type, not on the value, and it works as expected with issubclass. >> >>> >>> On Fri, Sep 26, 2014 at 9:10 PM, Thomas Gl??le wrote: >>>> At first glance it sounds nice and straight forward. >>>> But - is a natural positive, or just non-negative? I guess, I'd have to look it up in the docs each time, since either definition is used in lots of places. >>>> Also, Natural does not correspond to the python type, but its value. So, you couldn't use it with issubclass. >>>> >>>> >>>> Ram Rachum wrote on 09/26/2014 07:54 PM: >>>>> I wish the `numbers` module would include a `Natural` class that would simply check whether the number is integral and positive. >>>>> >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Sep 26 21:12:17 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 26 Sep 2014 20:12:17 +0100 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> Message-ID: On 26 Sep 2014 19:15, "Nathaniel Smith" wrote: > The proposal is that before exec'ing __init__.py, we check for the existence of a __preinit__.py, Silly me, obviously the right and proper name for this file would be __new__.py. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Sep 26 21:43:12 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 26 Sep 2014 12:43:12 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> Message-ID: <5425C1D0.6030906@stoneleaf.us> On 09/26/2014 11:15 AM, Nathaniel Smith wrote: > > Option 1: allocate a new object, shallowly copy over all the old object properties into the new one, and then find all > references to the old object and replace them with the new object. This is possible right now, but error prone: cloning > a module object requires intimate knowledge of which fields exist, and swapping all the references requires that we be > careful to perform the swap very early, when the only reference is the one in sys.modules. > > Option 2: the __class__ switcheroo. This avoids the two issues above. In exchange it's fairly brain-hurty. > > Option 3: Oh wait, I just thought of a third option. It only works for packages, but that's okay, you can always convert a module > into a package by a simple mechanical transformation. The proposal is that before exec'ing __init__.py, we check for the > existence of a __preinit__.py, and if found we do something like > > sys.modules[package] = sentinel to block circular imports > namespace = {} > exec __preinit__.py in namespace > cls = namespace.get("__metamodule___", ModuleType) > mod = cls(name, doc, namespace) > sys.modules[package] = mod > exec __init__.py in namespace > > So preinit runs in the same namespace as init, but with a special restriction that if it tries to (directly or > indirectly) import the current package, then this will trigger an ImportError. This is somewhat restrictive, but it does > allow arbitrary code to be run before the module object is created. What about Option 4: have reload work with modules converted into classes ? This may mean having some extra fields in the class, and probably some extra code in the module loading, but it might be the simplest approach. -- ~Ethan~ From mertz at gnosis.cx Fri Sep 26 22:03:02 2014 From: mertz at gnosis.cx (David Mertz) Date: Fri, 26 Sep 2014 13:03:02 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> Message-ID: Isn't the right answer to create an isNatural() function? Why do we need a type or class rather than just a function? It's not any easier to write 'isinstance(x, Natural)' than it is to write 'isNatural(x)'. On Fri, Sep 26, 2014 at 11:47 AM, Andrew Barnert < abarnert at yahoo.com.dmarc.invalid> wrote: > On Sep 26, 2014, at 11:37, Ram Rachum wrote: > > I checked it and you're right. So I guess `Hashable` is a bit confusing: > > >>> isinstance(([3],), collections.Hashable) > True > > > It makes sense if you think of isinstance as a type check, which is what > it's supposed to be, rather than a value check. ([3],) is a tuple, and > tuples are hashable as a type, even though some specific tuple values might > not be. > > That's exactly why I think you want a type that you can check for Natural, > rather than something about the value. That's what isinstance is for, and > if you start subverting it to mean other things, that's when it leads to > confusion. > > > On Fri, Sep 26, 2014 at 9:35 PM, Andrew Barnert > wrote: > >> On Sep 26, 2014, at 11:13, Ram Rachum wrote: >> >> I agree with both the points you raised, they're both disadvantages. The >> question is whether the uses would be worth these two disadvantages. >> (`collections.Hashable` also has the second disadvantage you mentioned, and >> it's still in the stdlib, so there's hope.) >> >> >> Hashable doesn't have that disadvantage. It checks whether the object's >> class or any superclass has a __hash__ method. So it's still based on the >> type, not on the value, and it works as expected with issubclass. >> >> >> On Fri, Sep 26, 2014 at 9:10 PM, Thomas Gl??le wrote: >> >>> At first glance it sounds nice and straight forward. >>> But - is a natural positive, or just non-negative? I guess, I'd have to >>> look it up in the docs each time, since either definition is used in lots >>> of places. >>> Also, Natural does not correspond to the python type, but its value. So, >>> you couldn't use it with issubclass. >>> >>> >>> Ram Rachum wrote on 09/26/2014 07:54 PM: >>> >>> I wish the `numbers` module would include a `Natural` class that would >>> simply check whether the number is integral and positive. >>> >>> >>> _______________________________________________ >>> Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >>> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Fri Sep 26 22:06:34 2014 From: ram at rachum.com (Ram Rachum) Date: Fri, 26 Sep 2014 23:06:34 +0300 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> Message-ID: Hmm, I don't know, but I like it better. I probably had a reason and forgot it. Sorry :( On Fri, Sep 26, 2014 at 11:03 PM, David Mertz wrote: > Isn't the right answer to create an isNatural() function? Why do we need a > type or class rather than just a function? It's not any easier to write > 'isinstance(x, Natural)' than it is to write 'isNatural(x)'. > > On Fri, Sep 26, 2014 at 11:47 AM, Andrew Barnert < > abarnert at yahoo.com.dmarc.invalid> wrote: > >> On Sep 26, 2014, at 11:37, Ram Rachum wrote: >> >> I checked it and you're right. So I guess `Hashable` is a bit confusing: >> >> >>> isinstance(([3],), collections.Hashable) >> True >> >> >> It makes sense if you think of isinstance as a type check, which is what >> it's supposed to be, rather than a value check. ([3],) is a tuple, and >> tuples are hashable as a type, even though some specific tuple values might >> not be. >> >> That's exactly why I think you want a type that you can check for >> Natural, rather than something about the value. That's what isinstance is >> for, and if you start subverting it to mean other things, that's when it >> leads to confusion. >> >> >> On Fri, Sep 26, 2014 at 9:35 PM, Andrew Barnert >> wrote: >> >>> On Sep 26, 2014, at 11:13, Ram Rachum wrote: >>> >>> I agree with both the points you raised, they're both disadvantages. The >>> question is whether the uses would be worth these two disadvantages. >>> (`collections.Hashable` also has the second disadvantage you mentioned, and >>> it's still in the stdlib, so there's hope.) >>> >>> >>> Hashable doesn't have that disadvantage. It checks whether the object's >>> class or any superclass has a __hash__ method. So it's still based on the >>> type, not on the value, and it works as expected with issubclass. >>> >>> >>> On Fri, Sep 26, 2014 at 9:10 PM, Thomas Gl??le >>> wrote: >>> >>>> At first glance it sounds nice and straight forward. >>>> But - is a natural positive, or just non-negative? I guess, I'd have to >>>> look it up in the docs each time, since either definition is used in lots >>>> of places. >>>> Also, Natural does not correspond to the python type, but its value. >>>> So, you couldn't use it with issubclass. >>>> >>>> >>>> Ram Rachum wrote on 09/26/2014 07:54 PM: >>>> >>>> I wish the `numbers` module would include a `Natural` class that would >>>> simply check whether the number is integral and positive. >>>> >>>> >>>> _______________________________________________ >>>> Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>>> >>>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 26 22:16:31 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 13:16:31 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> Message-ID: <6F6F829B-0731-4F99-B96F-9227D8BC46E8@yahoo.com> On Sep 26, 2014, at 12:12, Nathaniel Smith wrote: > On 26 Sep 2014 19:15, "Nathaniel Smith" wrote: > > The proposal is that before exec'ing __init__.py, we check for the existence of a __preinit__.py, > > Silly me, obviously the right and proper name for this file would be __new__.py. > I had an email written just to say "this sounds brilliant, but why isn't it called __new__", with three paragraphs explaining why it was a good analogy... Now I guess I can delete draft. :) Anyway, I definitely like this better than re-classing modules in mid-initialization, and better than my magic comment hack (and looking at the code again, of course you're right that my magic comment hack was necessary with anything like my approach, I guess I just forgot in the intervening time). -------------- next part -------------- An HTML attachment was scrubbed... URL: From graffatcolmingov at gmail.com Fri Sep 26 22:27:03 2014 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Fri, 26 Sep 2014 15:27:03 -0500 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> Message-ID: On Fri, Sep 26, 2014 at 3:06 PM, Ram Rachum wrote: > Hmm, I don't know, but I like it better. I probably had a reason and forgot > it. Sorry :( > > > On Fri, Sep 26, 2014 at 11:03 PM, David Mertz wrote: >> >> Isn't the right answer to create an isNatural() function? Why do we need a >> type or class rather than just a function? It's not any easier to write >> 'isinstance(x, Natural)' than it is to write 'isNatural(x)'. >> >> On Fri, Sep 26, 2014 at 11:47 AM, Andrew Barnert >> wrote: >>> >>> On Sep 26, 2014, at 11:37, Ram Rachum wrote: >>> >>> I checked it and you're right. So I guess `Hashable` is a bit confusing: >>> >>> >>> isinstance(([3],), collections.Hashable) >>> True >>> >>> >>> It makes sense if you think of isinstance as a type check, which is what >>> it's supposed to be, rather than a value check. ([3],) is a tuple, and >>> tuples are hashable as a type, even though some specific tuple values might >>> not be. >>> >>> That's exactly why I think you want a type that you can check for >>> Natural, rather than something about the value. That's what isinstance is >>> for, and if you start subverting it to mean other things, that's when it >>> leads to confusion. >>> >>> >>> On Fri, Sep 26, 2014 at 9:35 PM, Andrew Barnert >>> wrote: >>>> >>>> On Sep 26, 2014, at 11:13, Ram Rachum wrote: >>>> >>>> I agree with both the points you raised, they're both disadvantages. The >>>> question is whether the uses would be worth these two disadvantages. >>>> (`collections.Hashable` also has the second disadvantage you mentioned, and >>>> it's still in the stdlib, so there's hope.) >>>> >>>> >>>> Hashable doesn't have that disadvantage. It checks whether the object's >>>> class or any superclass has a __hash__ method. So it's still based on the >>>> type, not on the value, and it works as expected with issubclass. >>>> >>>> >>>> On Fri, Sep 26, 2014 at 9:10 PM, Thomas Gl??le >>>> wrote: >>>>> >>>>> At first glance it sounds nice and straight forward. >>>>> But - is a natural positive, or just non-negative? I guess, I'd have to >>>>> look it up in the docs each time, since either definition is used in lots of >>>>> places. >>>>> Also, Natural does not correspond to the python type, but its value. >>>>> So, you couldn't use it with issubclass. >>>>> >>>>> >>>>> Ram Rachum wrote on 09/26/2014 07:54 PM: >>>>> >>>>> I wish the `numbers` module would include a `Natural` class that would >>>>> simply check whether the number is integral and positive. The only motivations I can think of for making this a type (not that they're good ones) is you could do something like: if isinstance(x, (Natural, Complex, Real)): # elif isinstance(x, Rational): # else: raise ValueError Which is naturally (no pun intended) slightly nicer than if isinstance(x, (Complex, Real)) or isNatural(x): # But really, I agree with David that *if* this were added to the stdlib, it should be as a function. I'm however, unconvinced about the usefulness of having this as a stdlib function (since it should be easy for anyone to define): def isNatural(x): if not isinstance(x, int): return False return x > 0 But here we could argue over whether 0 is Natural or not. Further, this could be rewritten as a one-liner: def isNatural(x): return isinstance(x, int) and x > 0 Which falls under the adage I've oft heard repeated here "Not every one-liner belongs in the standard library" From ethan at stoneleaf.us Fri Sep 26 22:33:45 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 26 Sep 2014 13:33:45 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> Message-ID: <5425CDA9.2070407@stoneleaf.us> On 09/26/2014 01:27 PM, Ian Cordasco wrote: > > The only motivations I can think of for making this a type (not that > they're good ones) is you could do something like: > > if isinstance(x, (Natural, Complex, Real)): > # > elif isinstance(x, Rational): > # > else: > raise ValueError > > Which is naturally (no pun intended) slightly nicer than > > if isinstance(x, (Complex, Real)) or isNatural(x): > # The Good Reason for having a Natural type would be not having to always check for illegal values -- if one is (attempted to be) created, an exception is raised on the spot. n5 = Natural(5) n7 = Natural(7) n5 - n7 Traceback... -- ~Ethan~ From mertz at gnosis.cx Fri Sep 26 22:38:01 2014 From: mertz at gnosis.cx (David Mertz) Date: Fri, 26 Sep 2014 13:38:01 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> Message-ID: On Fri, Sep 26, 2014 at 1:27 PM, Ian Cordasco wrote: > > The only motivations I can think of for making this a type (not that > they're good ones) is you could do something like: > > if isinstance(x, (Natural, Complex, Real)): > # > elif isinstance(x, Rational): > # > else: > raise ValueError > This code seems a bit broken, given: >>> issubclass(numbers.Rational, numbers.Real) Out[3]: True > def isNatural(x): > return isinstance(x, int) and x > 0 > I think the original desired behavior is this different one-liner: isNatural = lambda x: int(x)==x and x>0 -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Sep 26 22:42:08 2014 From: mertz at gnosis.cx (David Mertz) Date: Fri, 26 Sep 2014 13:42:08 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> Message-ID: > > isNatural = lambda x: int(x)==x and x>0 >> > Actually, sorry. This is crashy; what I mean is: from numbers import Number isNatural = lambda x: isinstance(x, Number) and int(x)==x and x>0 -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Sep 26 22:51:20 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Sep 2014 08:51:20 +1200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> Message-ID: <5425D1C8.2020306@canterbury.ac.nz> Nathaniel Smith wrote: > This code has been around forever, but I don't know why. AFAIK we > could replace the above with > > if (compatible_for_assignment(oldto, newto, "__class__")) { > if (newto->tp_flags & Py_TPFLAGS_HEAPTYPE) { > Py_INCREF(newto); > } > Py_TYPE(self) = newto; > if (oldto->tp_flags & Py_TPFLAGS_HEAPTYPE) { > Py_DECREF(oldto); > } > return 0; > } Is it even necessary to worry about the refcounting? Presumably the non-heap type objects all start out with a refcount of at least 1, so there's no danger of them getting deallocated. -- Greg From t_glaessle at gmx.de Fri Sep 26 22:54:08 2014 From: t_glaessle at gmx.de (=?windows-1252?Q?Thomas_Gl=E4=DFle?=) Date: Fri, 26 Sep 2014 22:54:08 +0200 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> Message-ID: <5425D270.4080200@gmx.de> David Mertz wrote on 09/26/2014 10:38 PM: > On Fri, Sep 26, 2014 at 1:27 PM, Ian Cordasco > > wrote: > > > The only motivations I can think of for making this a type (not that > they're good ones) is you could do something like: > > if isinstance(x, (Natural, Complex, Real)): > # > elif isinstance(x, Rational): > # > else: > raise ValueError > > > This code seems a bit broken, given: > > >>> issubclass(numbers.Rational, numbers.Real) > Out[3]: True > > > def isNatural(x): > return isinstance(x, int) and x > 0 > > > I think the original desired behavior is this different one-liner: > > isNatural = lambda x: int(x)==x and x>0 Not sure about the original desired behaviour. I would have guessed the former function as well, as it more closely resembles the isinstance checks with the numbers types: >>> isinstance(1.0, (numbers.Rational, numbers.Integral)) False (one might argue, that 1.0 is a rational value, as is true for most float values, but the type has a different intention) > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: OpenPGP digital signature URL: From t_glaessle at gmx.de Fri Sep 26 22:57:07 2014 From: t_glaessle at gmx.de (=?windows-1252?Q?Thomas_Gl=E4=DFle?=) Date: Fri, 26 Sep 2014 22:57:07 +0200 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: <5425D270.4080200@gmx.de> References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <5425D270.4080200@gmx.de> Message-ID: <5425D323.6060000@gmx.de> Thomas Gl??le wrote on 09/26/2014 10:54 PM: > > David Mertz wrote on 09/26/2014 10:38 PM: >> I think the original desired behavior is this different one-liner: >> >> isNatural = lambda x: int(x)==x and x>0 > > Not sure about the original desired behaviour. I would have guessed > the former function as well, as it more closely resembles the > isinstance checks with the numbers types: > > >>> isinstance(1.0, (numbers.Rational, numbers.Integral)) > False And more importantly: you wouldn't expect isNatural(3.5) to return True. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: OpenPGP digital signature URL: From guido at python.org Fri Sep 26 22:56:49 2014 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Sep 2014 13:56:49 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <5425D1C8.2020306@canterbury.ac.nz> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <5425D1C8.2020306@canterbury.ac.nz> Message-ID: The older custom allocators may not bump the type's refcount when creating an instance. On Fri, Sep 26, 2014 at 1:51 PM, Greg Ewing wrote: > Nathaniel Smith wrote: > >> This code has been around forever, but I don't know why. AFAIK we >> could replace the above with >> >> if (compatible_for_assignment(oldto, newto, "__class__")) { >> if (newto->tp_flags & Py_TPFLAGS_HEAPTYPE) { >> Py_INCREF(newto); >> } >> Py_TYPE(self) = newto; >> if (oldto->tp_flags & Py_TPFLAGS_HEAPTYPE) { >> Py_DECREF(oldto); >> } >> return 0; >> } >> > > Is it even necessary to worry about the refcounting? > Presumably the non-heap type objects all start out with > a refcount of at least 1, so there's no danger of them > getting deallocated. > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 26 23:00:12 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 14:00:12 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: <5425CDA9.2070407@stoneleaf.us> References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <5425CDA9.2070407@stoneleaf.us> Message-ID: <65F05F41-A6F2-4AC5-AAFB-A30BA7C24E8D@yahoo.com> On Sep 26, 2014, at 13:33, Ethan Furman wrote: > On 09/26/2014 01:27 PM, Ian Cordasco wrote: >> >> The only motivations I can think of for making this a type (not that >> they're good ones) is you could do something like: >> >> if isinstance(x, (Natural, Complex, Real)): >> # >> elif isinstance(x, Rational): >> # >> else: >> raise ValueError >> >> Which is naturally (no pun intended) slightly nicer than >> >> if isinstance(x, (Complex, Real)) or isNatural(x): >> # > > The Good Reason for having a Natural type would be not having to always check for illegal values -- if one is (attempted to be) created, an exception is raised on the spot. > > n5 = Natural(5) > n7 = Natural(7) > n5 - n7 > Traceback... Except that 2/3 is a float, and (-1)**.5 is a complex, so it might be perfectly reasonable to expect n5-n7 to be an int. Or maybe Natural(0) or Natural('nan') even, on analogy with 2//3 and whatever float operations return NaN values. As I said in my first message, which one makes more sense probably depends on your application, which is part of why this doesn't belong in the stdlib in the first place... But if it did belong, I'd expect it to act like int and float and underflow to an int. (In which case a type check for Natural might be sensible, but would not often be useful--just like a type check on the result of a/b for integers is sensible but not often useful.) From t_glaessle at gmx.de Fri Sep 26 23:03:50 2014 From: t_glaessle at gmx.de (=?windows-1252?Q?Thomas_Gl=E4=DFle?=) Date: Fri, 26 Sep 2014 23:03:50 +0200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <6F6F829B-0731-4F99-B96F-9227D8BC46E8@yahoo.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <6F6F829B-0731-4F99-B96F-9227D8BC46E8@yahoo.com> Message-ID: <5425D4B6.8050702@gmx.de> Andrew Barnert wrote on 09/26/2014 10:16 PM: > On Sep 26, 2014, at 12:12, Nathaniel Smith > wrote: > >> On 26 Sep 2014 19:15, "Nathaniel Smith" > > wrote: >> > The proposal is that before exec'ing __init__.py, we check for the >> existence of a __preinit__.py, >> >> Silly me, obviously the right and proper name for this file would be >> __new__.py. >> > > I had an email written just to say "this sounds brilliant, but why > isn't it called __new__", with three paragraphs explaining why it was > a good analogy... Now I guess I can delete draft. :) > > Anyway, I definitely like this better than re-classing modules in > mid-initialization, and better than my magic comment hack (and looking > at the code again, of course you're right that my magic comment hack > was necessary with anything like my approach, I guess I just forgot in > the intervening time). I like this one. Imagine, we had a class LazyModule (or w/e name) class in the stdlib. One could just do: __new__.py: from importlib import LazyModule as __metamodule__ # (or __metapackage?) Maybe, LazyModule could resolve only attributes, that are explicitly given in __all__ (or more explicitly __autoimport__?). > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: OpenPGP digital signature URL: From mertz at gnosis.cx Fri Sep 26 23:04:39 2014 From: mertz at gnosis.cx (David Mertz) Date: Fri, 26 Sep 2014 14:04:39 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: <5425D323.6060000@gmx.de> References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <5425D270.4080200@gmx.de> <5425D323.6060000@gmx.de> Message-ID: On Fri, Sep 26, 2014 at 1:57 PM, Thomas Gl??le wrote > And more importantly: you wouldn't expect isNatural(3.5) to return True. > That's not a problem with my one-liner, nor Ian's: >>> isNatural = lambda x: int(x)==x and x>0 >>> isNatural(3.5) False However, my correction to avoid crashing on some inputs is still wrong, given: >>> isinstance(0j, numbers.Number) True >>> int(0j) TypeError So I think I really need a couple lines: def isNatural(x): try: return int(x)==x and x>0 except (TypeError, ValueError): return False Of course... if you don't want that behavior, you need to write a different function :-). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Sep 26 23:12:34 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 26 Sep 2014 14:12:34 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <5425D270.4080200@gmx.de> <5425D323.6060000@gmx.de> Message-ID: <5425D6C2.8010108@stoneleaf.us> On 09/26/2014 02:04 PM, David Mertz wrote: > On Fri, Sep 26, 2014 at 1:57 PM, Thomas Gl??le > wrote > > And more importantly: you wouldn't expect isNatural(3.5) to return True. > > > That's not a problem with my one-liner, nor Ian's: > >>>> isNatural = lambda x: int(x)==x and x>0 >>>> isNatural(3.5) > False > > However, my correction to avoid crashing on some inputs is still wrong, given: > >>>> isinstance(0j, numbers.Number) > True >>>> int(0j) > TypeError > > So I think I really need a couple lines: > > def isNatural(x): > try: > return int(x)==x and x>0 > except (TypeError, ValueError): > return False If we had PEP 463 [1] it could still be a one-liner ;) def is_natural(x): return int(x) == x and x > 0 except TypeError, ValueError: False -- ~Ethan~ [1] http://legacy.python.org/dev/peps/pep-0463/ From t_glaessle at gmx.de Fri Sep 26 23:31:05 2014 From: t_glaessle at gmx.de (=?windows-1252?Q?Thomas_Gl=E4=DFle?=) Date: Fri, 26 Sep 2014 23:31:05 +0200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <5425D4B6.8050702@gmx.de> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <6F6F829B-0731-4F99-B96F-9227D8BC46E8@yahoo.com> <5425D4B6.8050702@gmx.de> Message-ID: <5425DB19.1080904@gmx.de> Thomas Gl??le wrote on 09/26/2014 11:03 PM: > Imagine, we had a class LazyModule (or w/e name) class in the stdlib. > One could just do: > > __new__.py: > from importlib import LazyModule as __metamodule__ # (or > __metapackage?) > > > Maybe, LazyModule could resolve only attributes, that are explicitly > given in __all__ (or more explicitly __autoimport__?). On second thought. Scratch all of that. This is easy enough to do it a few lines of code and customize to the specific use case. Sorry for the noise, I think it's too late for my brain to work well;) Using an __autoimport__ list could still be an option if not resorting to the metamodule. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 949 bytes Desc: OpenPGP digital signature URL: From greg.ewing at canterbury.ac.nz Fri Sep 26 23:35:10 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Sep 2014 09:35:10 +1200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <20140926081559.GC19757@ando.pearwood.info> References: <54230610.7060305@gmx.de> <20140926081559.GC19757@ando.pearwood.info> Message-ID: <5425DC0E.8040901@canterbury.ac.nz> Steven D'Aprano wrote: > By making importing automatic, every failed attribute access has to > determine whether or not there is a sub-module to import, which could be > quite expensive. Another thing to consider is that code executed during an import runs with the import lock held. This can lead to surprises in multi-threaded code. I got caught out by it once, and it took me a while to figure out what was going on. As long as the import lock exists, it's probably better for importing to remain an eplicit action, at least by default. -- Greg From greg.ewing at canterbury.ac.nz Fri Sep 26 23:43:53 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Sep 2014 09:43:53 +1200 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: <20140926133223.75076d5e@fsol> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> Message-ID: <5425DE19.20509@canterbury.ac.nz> Antoine Pitrou wrote: > The tp_dealloc for a heap type is not the same as the non-heap base > type's tp_dealloc. > > Also, look at compatible_for_assignment(): it calls same_slots_added() > which assumes both args are heap types. It looks like the easiest way to address this particular use case would be to make the module type a heap type. In the long term, how about turning *all* types into heap types? We're already having to call PyType_Ready on all the static type objects, so allocating them from the heap shouldn't incur much extra overhead. Seems to me that this would simplify a lot of the cpython code and make it easier to maintain. As it is, thinking about all the tricky differences between heap and non-heap types makes my head hurt. -- Greg From ben+python at benfinney.id.au Sat Sep 27 01:41:38 2014 From: ben+python at benfinney.id.au (Ben Finney) Date: Sat, 27 Sep 2014 09:41:38 +1000 Subject: [Python-ideas] `numbers.Natural` References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> Message-ID: <85zjdmt0el.fsf@benfinney.id.au> David Mertz writes: > Isn't the right answer to create an isNatural() function? No, the right answer (by PEP 8) is to create an ?is_natural? or ?isnatural? function :-) > Why do we need a type or class rather than just a function? Less facetiously: I suspect what is being asked for here is a type which will *ensure* the values are natural numbers. That is a good use of types, IMO: they enforce a domain of values. I'm not convinced there is a pressing need to add such a type to the standard type hierarchy though. -- \ ?I find the whole business of religion profoundly interesting. | `\ But it does mystify me that otherwise intelligent people take | _o__) it seriously.? ?Douglas Adams | Ben Finney From abarnert at yahoo.com Sat Sep 27 01:43:45 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 16:43:45 -0700 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: <5425DE19.20509@canterbury.ac.nz> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> Message-ID: On Sep 26, 2014, at 14:43, Greg Ewing wrote: > Antoine Pitrou wrote: >> The tp_dealloc for a heap type is not the same as the non-heap base >> type's tp_dealloc. >> Also, look at compatible_for_assignment(): it calls same_slots_added() >> which assumes both args are heap types. > > It looks like the easiest way to address this particular > use case would be to make the module type a heap type. > > In the long term, how about turning *all* types into > heap types? We're already having to call PyType_Ready > on all the static type objects, so allocating them > from the heap shouldn't incur much extra overhead. What about extension modules? Deprecate static types? Automatically copy them to heap types? Use some horrible macro tricks in Python.h or a custom preprocessor in distutils? From mertz at gnosis.cx Sat Sep 27 02:01:46 2014 From: mertz at gnosis.cx (David Mertz) Date: Fri, 26 Sep 2014 17:01:46 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: <85zjdmt0el.fsf@benfinney.id.au> References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <85zjdmt0el.fsf@benfinney.id.au> Message-ID: Well, yeah. I think I lost that discussion in the type annotation thread though. I'd love types to include predicates, but Guido wouldn't ?. An 'is_natural()' function--by whichever spelling--needs to contain a test which isn't in the type system, whichever result one chooses for 'is_natural(1.0)'. FWIW, I want 'isNatural(0j+1) == True'. But I'm on the losing end of at least three arguments there.... Unless I just roll my own in 5-8 lines of code.[*] [*] I'd want the non-PEP8 spelling because "Natural" in mathematics is always capitalized. That's why we have the built-in 'isGuido(BDFL)' rather than 'is_guido()'. On Sep 26, 2014 4:42 PM, "Ben Finney" wrote: > David Mertz writes: > > > Isn't the right answer to create an isNatural() function? > > No, the right answer (by PEP 8) is to create an ?is_natural? or > ?isnatural? function :-) > > > Why do we need a type or class rather than just a function? > > Less facetiously: I suspect what is being asked for here is a type which > will *ensure* the values are natural numbers. > > That is a good use of types, IMO: they enforce a domain of values. > > I'm not convinced there is a pressing need to add such a type to the > standard type hierarchy though. > > -- > \ ?I find the whole business of religion profoundly interesting. | > `\ But it does mystify me that otherwise intelligent people take | > _o__) it seriously.? ?Douglas Adams | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Sep 27 02:03:04 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 27 Sep 2014 01:03:04 +0100 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> Message-ID: On Sat, Sep 27, 2014 at 12:43 AM, Andrew Barnert wrote: > On Sep 26, 2014, at 14:43, Greg Ewing wrote: > >> Antoine Pitrou wrote: >>> The tp_dealloc for a heap type is not the same as the non-heap base >>> type's tp_dealloc. >>> Also, look at compatible_for_assignment(): it calls same_slots_added() >>> which assumes both args are heap types. >> >> It looks like the easiest way to address this particular >> use case would be to make the module type a heap type. >> >> In the long term, how about turning *all* types into >> heap types? We're already having to call PyType_Ready >> on all the static type objects, so allocating them >> from the heap shouldn't incur much extra overhead. > > What about extension modules? Deprecate static types? Automatically copy them to heap types? Use some horrible macro tricks in Python.h or a custom preprocessor in distutils? I think the name "heap types" is misleading. The actual distinction being made isn't really about where the type object is allocated. Static type objects are still subject to the refcounting machinery in most cases (try sys.getrefcount(int)), but this is fine because the refcount never reaches zero. AFAICT from skimming the source a bit, what happened back in the 2.2 days is that the devs went around fixing all the random places where the assumption that all type objects were immortal had snuck in, and they hid all this fixes behind a generic switch called "heap types". It's all stuff like "we'll carefully only do standard refcounting if HEAPTYPE is set" (even though refcounting could be applied to all types without causing any problems), or "we will disable the GC machinery when walking non-heap types" (even though again, who cares), or "heap types all use the same tp_dealloc function". I'm sure some of this stuff we're stuck with due to backcompat with C extension modules that make funny assumptions, but presumably a lot of it could be cleaned up -- I think that's what Greg means. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From solipsis at pitrou.net Sat Sep 27 02:07:51 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 27 Sep 2014 02:07:51 +0200 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> Message-ID: <20140927020751.044eaf32@fsol> On Sat, 27 Sep 2014 01:03:04 +0100 Nathaniel Smith wrote: > Static type objects are still subject to the refcounting machinery in > most cases (try sys.getrefcount(int)), So what about it? :-) >>> sys.getrefcount(int) 69 >>> x = list(range(10000)) >>> sys.getrefcount(int) 69 Regards Antoine. From njs at pobox.com Sat Sep 27 02:23:08 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 27 Sep 2014 01:23:08 +0100 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: <20140927020751.044eaf32@fsol> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <20140927020751.044eaf32@fsol> Message-ID: On 27 Sep 2014 01:08, "Antoine Pitrou" wrote: > > On Sat, 27 Sep 2014 01:03:04 +0100 > Nathaniel Smith wrote: > > Static type objects are still subject to the refcounting machinery in > > most cases (try sys.getrefcount(int)), > > So what about it? :-) > > >>> sys.getrefcount(int) > 69 > >>> x = list(range(10000)) > >>> sys.getrefcount(int) > 69 Yes, that's why I said "most cases", not all cases :-). My point was that being statically allocated doesn't make list a special snowflake that *needs* some sort of protection from refcounting. If heap and non-heap types were treated the same in this regard then nothing horrible would happen. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 27 02:33:29 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Sep 2014 10:33:29 +1000 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <5425C1D0.6030906@stoneleaf.us> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <5425C1D0.6030906@stoneleaf.us> Message-ID: <20140927003328.GD19757@ando.pearwood.info> On Fri, Sep 26, 2014 at 12:43:12PM -0700, Ethan Furman wrote: > What about > > Option 4: have reload work with modules converted into classes > > ? > > This may mean having some extra fields in the class, and probably some > extra code in the module loading, but it might be the simplest approach. I don't know that this is strictly necessary. You can put anything you like into sys.modules, and reload() just raises a TypeError: py> sys.modules['spam'] = 23 py> import spam py> spam 23 py> reload(spam) Traceback (most recent call last): File "", line 1, in TypeError: reload() argument must be module Since reload() is mostly intended as a convenience at the REPL, I'd be willing to forgo that convenience for special "modules". Or perhaps these special "modules" could subclass ModuleType and somehow get reloading to work correctly. In 2.7 at least you can manually copy a module to a module subclass, install it into sys.modules, and reload will accept it. Not only that, but after reloading it still uses the same subclass. Unfortunately, when I tried it in 3.3, imp.reload complained about my custom module subclass not being a module, so it seems that 3.3 at least is more restrictive than 2.7. (Perhaps 3.3 reload does a "type(obj) is ModuleType" instead of isinstance test?) Nevertheless, I got this proof of concept more-or-less working in 2.7 and 3.3: import sys from types import ModuleType class MagicModule(ModuleType): def __getattr__(self, name): if name == "spam": return "Spam spam spam!" raise AttributeError eggs = 23 _tmp = MagicModule(__name__) _tmp.__dict__.update(sys.modules[__name__].__dict__) sys.modules[__name__] = _tmp del _tmp -- Steven From abarnert at yahoo.com Sat Sep 27 03:13:53 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 18:13:53 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <85zjdmt0el.fsf@benfinney.id.au> Message-ID: On Sep 26, 2014, at 17:01, David Mertz wrote: > Well, yeah. I think I lost that discussion in the type annotation thread though. I'd love types to include predicates, but Guido wouldn't ?. An 'is_natural()' function--by whichever spelling--needs to contain a test which isn't in the type system, whichever result one chooses for 'is_natural(1.0)'. > It is (or should be) in the type system for any application which has a good use for a natural number type. That's pretty much why Python has classes in the first place. Anyway, the only time I remember ever writing a Natural class in a language that didn't have one was to demonstrate Peano arithmetic, so in Python I'd be inheriting from or encapsulating a frozenset, not an int. I can imagine doing something similar (but more useful) in a symbolic math package. I could even see a Natural type being useful for the same kinds of things C uses unsigned for, but not if it was defined as 1.. instead of 0.. Do people have an actual use case for the type we're talking about here? > FWIW, I want 'isNatural(0j+1) == True'. But I'm on the losing end of at least three arguments there.... Unless I just roll my own in 5-8 lines of code.[*] > What about something that's one denornal bit of rounding error away from 0j+1? > [*] I'd want the non-PEP8 spelling because "Natural" in mathematics is always capitalized. That's why we have the built-in 'isGuido(BDFL)' rather than 'is_guido()'. > Wouldn't is\N{ELEMENT}\N{DOUBLE_N} or something like that (sorry, don't know how to type the actual Unicode on my phone, do know how to look up the right names but too lazy to do so) be better if "in mathematics" is what you really want? In math textbooks and papers, the symbols are usually translated to "is a natural number", not "is a Natural number", so I don't think violating PEP 8 is warranted here. > On Sep 26, 2014 4:42 PM, "Ben Finney" wrote: >> David Mertz writes: >> >> > Isn't the right answer to create an isNatural() function? >> >> No, the right answer (by PEP 8) is to create an ?is_natural? or >> ?isnatural? function :-) >> >> > Why do we need a type or class rather than just a function? >> >> Less facetiously: I suspect what is being asked for here is a type which >> will *ensure* the values are natural numbers. >> >> That is a good use of types, IMO: they enforce a domain of values. >> >> I'm not convinced there is a pressing need to add such a type to the >> standard type hierarchy though. >> >> -- >> \ ?I find the whole business of religion profoundly interesting. | >> `\ But it does mystify me that otherwise intelligent people take | >> _o__) it seriously.? ?Douglas Adams | >> Ben Finney >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 27 03:25:17 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 18:25:17 -0700 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> Message-ID: On Sep 26, 2014, at 17:03, Nathaniel Smith wrote: > On Sat, Sep 27, 2014 at 12:43 AM, Andrew Barnert > wrote: >> On Sep 26, 2014, at 14:43, Greg Ewing wrote: >> >>> Antoine Pitrou wrote: >>>> The tp_dealloc for a heap type is not the same as the non-heap base >>>> type's tp_dealloc. >>>> Also, look at compatible_for_assignment(): it calls same_slots_added() >>>> which assumes both args are heap types. >>> >>> It looks like the easiest way to address this particular >>> use case would be to make the module type a heap type. >>> >>> In the long term, how about turning *all* types into >>> heap types? We're already having to call PyType_Ready >>> on all the static type objects, so allocating them >>> from the heap shouldn't incur much extra overhead. >> >> What about extension modules? Deprecate static types? Automatically copy them to heap types? Use some horrible macro tricks in Python.h or a custom preprocessor in distutils? > > I think the name "heap types" is misleading. Yes, I wasn't sure whether Greg was suggesting to get rid of actual non-heap-allocated types, or just making static types fit HEAPTYPE. The former would be a lot more work, but it would also allow simplifying a lot of additional things, so they both seem like reasonable things to suggest (whether or not they're both reasonable things to actually do), > The actual distinction > being made isn't really about where the type object is allocated. > Static type objects are still subject to the refcounting machinery in > most cases (try sys.getrefcount(int)), but this is fine because the > refcount never reaches zero. > > AFAICT from skimming the source a bit, what happened back in the 2.2 > days is that the devs went around fixing all the random places where > the assumption that all type objects were immortal had snuck in, and > they hid all this fixes behind a generic switch called "heap types". > It's all stuff like "we'll carefully only do standard refcounting if > HEAPTYPE is set" (even though refcounting could be applied to all > types without causing any problems), or "we will disable the GC > machinery when walking non-heap types" (even though again, who cares), Well, there's obviously a non-zero performance cost to doing all this stuff with all types. Of course there's also a non-zero cost to checking the heap-type-ness of all types. And both costs may be so minimal they're hard to even measure. > or "heap types all use the same tp_dealloc function". I'm sure some of > this stuff we're stuck with due to backcompat with C extension modules > that make funny assumptions, but presumably a lot of it could be > cleaned up -- I think that's what Greg means. > > -n > > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Sat Sep 27 03:32:05 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Sep 2014 18:32:05 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <20140927003328.GD19757@ando.pearwood.info> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <5425C1D0.6030906@stoneleaf.us> <20140927003328.GD19757@ando.pearwood.info> Message-ID: <00D7B5C2-5F06-4EBF-9771-01916CE6D901@yahoo.com> On Sep 26, 2014, at 17:33, Steven D'Aprano wrote: > On Fri, Sep 26, 2014 at 12:43:12PM -0700, Ethan Furman wrote: > >> What about >> >> Option 4: have reload work with modules converted into classes >> >> ? >> >> This may mean having some extra fields in the class, and probably some >> extra code in the module loading, but it might be the simplest approach. > > > I don't know that this is strictly necessary. You can put anything you > like into sys.modules, and reload() just raises a TypeError: > > > py> sys.modules['spam'] = 23 > py> import spam > py> spam > 23 > py> reload(spam) > Traceback (most recent call last): > File "", line 1, in > TypeError: reload() argument must be module > > > Since reload() is mostly intended as a convenience at the REPL, I'd be > willing to forgo that convenience for special "modules". > > Or perhaps these special "modules" could subclass ModuleType and somehow > get reloading to work correctly. In 2.7 at least you can manually copy a > module to a module subclass, install it into sys.modules, and reload > will accept it. Not only that, but after reloading it still uses the > same subclass. > > Unfortunately, when I tried it in 3.3, imp.reload complained about my > custom module subclass not being a module, so it seems that 3.3 at least > is more restrictive than 2.7. (Perhaps 3.3 reload does a "type(obj) is > ModuleType" instead of isinstance test?) I don't know about 3.3 (and who cares?), but in trunk it's an isinstance test: https://hg.python.org/cpython/file/default/Lib/importlib/__init__.py#l115 > Nevertheless, I got this proof of concept more-or-less working in 2.7 > and 3.3: > > import sys > from types import ModuleType > > class MagicModule(ModuleType): > def __getattr__(self, name): > if name == "spam": > return "Spam spam spam!" > raise AttributeError > > eggs = 23 > > _tmp = MagicModule(__name__) > _tmp.__dict__.update(sys.modules[__name__].__dict__) > sys.modules[__name__] = _tmp > del _tmp > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Sep 27 07:04:46 2014 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 27 Sep 2014 15:04:46 +1000 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <85zjdmt0el.fsf@benfinney.id.au> Message-ID: On Sat, Sep 27, 2014 at 11:13 AM, Andrew Barnert wrote: > Wouldn't is\N{ELEMENT}\N{DOUBLE_N} or something like that (sorry, don't know > how to type the actual Unicode on my phone, do know how to look up the right > names but too lazy to do so) be better if "in mathematics" is what you > really want? In math textbooks and papers, the symbols are usually > translated to "is a natural number", not "is a Natural number", so I don't > think violating PEP 8 is warranted here. Assuming the characters you're after are U+2208 'ELEMENT OF' and U+2115 'DOUBLE-STRUCK CAPITAL N', your name would be is??(). I'd prefer is?() for two reasons: firstly, ? is a symbol, so it's not valid in a name (though you could open the other can of worms and ask for it to be an operator - then you could spell it "x ? ?" instead of "is?(x)"), and secondly because it's much more common to ask "is natural?" than "is element-of natural?" in function names. But I think this has long gone into crazyland. ChrisA From mertz at gnosis.cx Sat Sep 27 08:46:11 2014 From: mertz at gnosis.cx (David Mertz) Date: Fri, 26 Sep 2014 23:46:11 -0700 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <85zjdmt0el.fsf@benfinney.id.au> Message-ID: I've mentioned before that I used vim-cute-python ( https://github.com/ehamberg/vim-cute-python), but customized myself ( http://gnosis.cx/bin/.vim/after/syntax/python.vim). I do not utilize U+2115 'DOUBLE-STRUCK CAPITAL N' anywhere, but it is an elegant character. I use others such as: syntax match pyNiceOperator "\" conceal cchar=? syntax match pyNiceOperator "\" conceal cchar=? syntax match pyNiceOperator "\" conceal cchar=? syntax match pyNiceOperator "\" conceal cchar=? syntax keyword pyNiceOperator sum conceal cchar=? syntax keyword pyNiceBuiltin all conceal cchar=? syntax keyword pyNiceBuiltin any conceal cchar=? syntax keyword pyNiceStatement int conceal cchar=? syntax keyword pyNiceStatement float conceal cchar=? syntax keyword pyNiceStatement complex conceal cchar=? And contentiously, but it makes sense to me: syntax keyword pyNiceStatement None conceal cchar=? In my mind, since aleph-naught is the first inaccessible cardinal, and 'None' can similarly not be "reached" by operations on values of other types, there is a bit of an analogy. The fact aleph visually resembles capital-N(one) add to the appeal. So onscreen I see something like: for x ? it: if isinstance(x, (?, ?, ?)): y = (x ? z) ? w I could easily define something to use ? but still have the actual source spelling of "Natural", just displayed in a fancy way. On Fri, Sep 26, 2014 at 10:04 PM, Chris Angelico wrote: > On Sat, Sep 27, 2014 at 11:13 AM, Andrew Barnert > wrote: > > Wouldn't is\N{ELEMENT}\N{DOUBLE_N} or something like that (sorry, don't > know > > how to type the actual Unicode on my phone, do know how to look up the > right > > names but too lazy to do so) be better if "in mathematics" is what you > > really want? In math textbooks and papers, the symbols are usually > > translated to "is a natural number", not "is a Natural number", so I > don't > > think violating PEP 8 is warranted here. > > Assuming the characters you're after are U+2208 'ELEMENT OF' and > U+2115 'DOUBLE-STRUCK CAPITAL N', your name would be is??(). I'd > prefer is?() for two reasons: firstly, ? is a symbol, so it's not > valid in a name (though you could open the other can of worms and ask > for it to be an operator - then you could spell it "x ? ?" instead of > "is?(x)"), and secondly because it's much more common to ask "is > natural?" than "is element-of natural?" in function names. But I think > this has long gone into crazyland. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sat Sep 27 11:00:44 2014 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 27 Sep 2014 21:00:44 +1200 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> Message-ID: <54267CBC.3090703@canterbury.ac.nz> Nathaniel Smith wrote: > I'm sure some of > this stuff we're stuck with due to backcompat with C extension modules > that make funny assumptions, but presumably a lot of it could be > cleaned up -- I think that's what Greg means. Yes, it's probably not necessary to actually allocate them on the heap (that would cause big problems for existing extension modules that assume they can statically declare them). But I'm thinking it should be possible to reduce the differences to the point where that's the *only* distinction, so the vast majority of code doesn't have to care, and the same tp_* functions can be used for both. -- Greg From njs at pobox.com Sat Sep 27 17:30:49 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 27 Sep 2014 16:30:49 +0100 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <20140927003328.GD19757@ando.pearwood.info> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <5425C1D0.6030906@stoneleaf.us> <20140927003328.GD19757@ando.pearwood.info> Message-ID: On Sat, Sep 27, 2014 at 1:33 AM, Steven D'Aprano wrote: > Or perhaps these special "modules" could subclass ModuleType and somehow > get reloading to work correctly. In 2.7 at least you can manually copy a > module to a module subclass, install it into sys.modules, and reload > will accept it. Not only that, but after reloading it still uses the > same subclass. > > Unfortunately, when I tried it in 3.3, imp.reload complained about my > custom module subclass not being a module, so it seems that 3.3 at least > is more restrictive than 2.7. (Perhaps 3.3 reload does a "type(obj) is > ModuleType" instead of isinstance test?) Yeah, it looks like 3.3 does an explicit 'type(obj) is ModuleType' check, but is the only version that works like this -- earlier and later versions both use isinstance. > Nevertheless, I got this proof of concept more-or-less working in 2.7 > and 3.3: > > import sys > from types import ModuleType > > class MagicModule(ModuleType): > def __getattr__(self, name): > if name == "spam": > return "Spam spam spam!" > raise AttributeError > > eggs = 23 > > _tmp = MagicModule(__name__) > _tmp.__dict__.update(sys.modules[__name__].__dict__) > sys.modules[__name__] = _tmp > del _tmp This approach won't work well for packages -- imagine that instead of 'eggs = 23', the body of the file imports a bunch of submodules. If those submodules then import the top-level package in turn, then they'll end up with the original module object and namespace, not the modified one. One could move the sys.modules assignment up to the top of the file, but you can't move the __dict__.update call up to the top of the file, because you can't copy the old namespace until after it's finished being initialized. OTOH leaving the __dict__.update at the bottom of the file is pretty risky too, because then any submodule that imports the top-level package will see a weird inconsistent view of it until after the import has finished. The solution is, instead of having two dicts and updating one to match the other, simply point the new module directly at the existing namespace dict, so they always stay in sync: _tmp = MagicModule(__name__) _tmp.__dict__ = sys.modules[__name__].__dict__ ...except this gives an error because module objects disallow assignment to __dict__. Sooooo you're kinda doomed no matter what you do. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From dholth at gmail.com Sat Sep 27 18:02:26 2014 From: dholth at gmail.com (Daniel Holth) Date: Sat, 27 Sep 2014 12:02:26 -0400 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <5425C1D0.6030906@stoneleaf.us> <20140927003328.GD19757@ando.pearwood.info> Message-ID: About apipkg... When using apipkg, you define your module's API in one package, and implement it in another: # mypkg/__init__.py import apipkg apipkg.initpkg(__name__, { 'path': { 'Class1': "_mypkg.somemodule:Class1", 'clsattr': "_mypkg.othermodule:Class2.attr", } } apipkg replaces sys.modules[mypkg] with a subclass of ModuleType. Anything in the apipkg is exposed under an alias and lazily imported on first use, including submodules. I've really enjoyed using it. It lets me think about the API as a separate entity from the implementation, and it lets me delay a slow import until during the first method call, for much more pleasing interactive tinkering. From random832 at fastmail.us Sun Sep 28 06:13:43 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 28 Sep 2014 00:13:43 -0400 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: References: <5425AC03.3060507@gmx.de> <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <85zjdmt0el.fsf@benfinney.id.au> Message-ID: <1411877623.951802.172534381.3236F6CB@webmail.messagingengine.com> On Sat, Sep 27, 2014, at 01:04, Chris Angelico wrote: > Assuming the characters you're after are U+2208 'ELEMENT OF' and > U+2115 'DOUBLE-STRUCK CAPITAL N', your name would be is??(). I'd > prefer is?() for two reasons: firstly, ? is a symbol, so it's not > valid in a name (though you could open the other can of worms and ask > for it to be an operator - then you could spell it "x ? ?" instead of > "is?(x)"), and secondly because it's much more common to ask "is > natural?" than "is element-of natural?" in function names. But I think > this has long gone into crazyland. Speaking of the other can of worms... we already _have_ that operator, it is spelled "in". What we don't have is infinite sets. From steve at pearwood.info Sun Sep 28 08:37:16 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Sep 2014 16:37:16 +1000 Subject: [Python-ideas] `numbers.Natural` In-Reply-To: <1411877623.951802.172534381.3236F6CB@webmail.messagingengine.com> References: <0EEDDEF3-785E-45D2-8DE8-0ACC8114004B@yahoo.com> <3F8C986F-B675-4900-9B82-316E8BA2DCB6@yahoo.com> <85zjdmt0el.fsf@benfinney.id.au> <1411877623.951802.172534381.3236F6CB@webmail.messagingengine.com> Message-ID: <20140928063715.GE19757@ando.pearwood.info> On Sun, Sep 28, 2014 at 12:13:43AM -0400, random832 at fastmail.us wrote: > On Sat, Sep 27, 2014, at 01:04, Chris Angelico wrote: > > Assuming the characters you're after are U+2208 'ELEMENT OF' and > > U+2115 'DOUBLE-STRUCK CAPITAL N', your name would be is??(). [...] > Speaking of the other can of worms... we already _have_ that operator, > it is spelled "in". What we don't have is infinite sets. Guys, a reminder please: Python is a general purpose programming language with a general-purpose notation, not Mathematica. The most specialised the task, or the notation, the less likely it is to belong in Python the language or the standard library. But feel free to create your own libraries, or even your own parser for a mini-language capable of interpreting things like x ? ?. Or help contribute to Sage. http://www.sagemath.org -- Steven From brett at python.org Sun Sep 28 16:09:09 2014 From: brett at python.org (Brett Cannon) Date: Sun, 28 Sep 2014 14:09:09 +0000 Subject: [Python-ideas] Implicit submodule imports References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <5425C1D0.6030906@stoneleaf.us> <20140927003328.GD19757@ando.pearwood.info> Message-ID: On Sat Sep 27 2014 at 11:37:16 AM Nathaniel Smith wrote: > On Sat, Sep 27, 2014 at 1:33 AM, Steven D'Aprano > wrote: > > Or perhaps these special "modules" could subclass ModuleType and somehow > > get reloading to work correctly. In 2.7 at least you can manually copy a > > module to a module subclass, install it into sys.modules, and reload > > will accept it. Not only that, but after reloading it still uses the > > same subclass. > > > > Unfortunately, when I tried it in 3.3, imp.reload complained about my > > custom module subclass not being a module, so it seems that 3.3 at least > > is more restrictive than 2.7. (Perhaps 3.3 reload does a "type(obj) is > > ModuleType" instead of isinstance test?) > > Yeah, it looks like 3.3 does an explicit 'type(obj) is ModuleType' > check, but is the only version that works like this -- earlier and > later versions both use isinstance. > Feel free to file an issue about this. -Brett > > > Nevertheless, I got this proof of concept more-or-less working in 2.7 > > and 3.3: > > > > import sys > > from types import ModuleType > > > > class MagicModule(ModuleType): > > def __getattr__(self, name): > > if name == "spam": > > return "Spam spam spam!" > > raise AttributeError > > > > eggs = 23 > > > > _tmp = MagicModule(__name__) > > _tmp.__dict__.update(sys.modules[__name__].__dict__) > > sys.modules[__name__] = _tmp > > del _tmp > > This approach won't work well for packages -- imagine that instead of > 'eggs = 23', the body of the file imports a bunch of submodules. If > those submodules then import the top-level package in turn, then > they'll end up with the original module object and namespace, not the > modified one. > > One could move the sys.modules assignment up to the top of the file, > but you can't move the __dict__.update call up to the top of the file, > because you can't copy the old namespace until after it's finished > being initialized. OTOH leaving the __dict__.update at the bottom of > the file is pretty risky too, because then any submodule that imports > the top-level package will see a weird inconsistent view of it until > after the import has finished. > > The solution is, instead of having two dicts and updating one to > match the other, simply point the new module directly at the existing > namespace dict, so they always stay in sync: > > _tmp = MagicModule(__name__) > _tmp.__dict__ = sys.modules[__name__].__dict__ > > ...except this gives an error because module objects disallow > assignment to __dict__. > > Sooooo you're kinda doomed no matter what you do. > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Sep 28 18:03:01 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 28 Sep 2014 17:03:01 +0100 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <5425C1D0.6030906@stoneleaf.us> <20140927003328.GD19757@ando.pearwood.info> Message-ID: On 28 Sep 2014 15:09, "Brett Cannon" wrote: > > On Sat Sep 27 2014 at 11:37:16 AM Nathaniel Smith wrote: >> >> On Sat, Sep 27, 2014 at 1:33 AM, Steven D'Aprano wrote: >> > Or perhaps these special "modules" could subclass ModuleType and somehow >> > get reloading to work correctly. In 2.7 at least you can manually copy a >> > module to a module subclass, install it into sys.modules, and reload >> > will accept it. Not only that, but after reloading it still uses the >> > same subclass. >> > >> > Unfortunately, when I tried it in 3.3, imp.reload complained about my >> > custom module subclass not being a module, so it seems that 3.3 at least >> > is more restrictive than 2.7. (Perhaps 3.3 reload does a "type(obj) is >> > ModuleType" instead of isinstance test?) >> >> Yeah, it looks like 3.3 does an explicit 'type(obj) is ModuleType' >> check, but is the only version that works like this -- earlier and >> later versions both use isinstance. > > > Feel free to file an issue about this. I thought 3.3 is in security-fix only mode? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Sep 28 18:44:31 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 28 Sep 2014 16:44:31 +0000 (UTC) Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> Message-ID: <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> Andrew Barnert wrote: > Well, there's obviously a non-zero performance cost to doing all this > stuff with all types. Of course there's also a non-zero cost to checking > the heap-type-ness of all types. And both costs may be so minimal they're > hard to even measure. > With branch prediction on a modern CPU an "if unlikely()" can probably push it down to inpunity. Both the Linux kernel and Cython does this liberally. Sturla From sturla.molden at gmail.com Sun Sep 28 19:13:06 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 28 Sep 2014 17:13:06 +0000 (UTC) Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> Message-ID: <95288578433616380.823406sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > With branch prediction on a modern CPU an "if unlikely()" can probably push > it down to inpunity. Both the Linux kernel and Cython does this liberally. Just for reference, the definition of these macros in Cython and Linux are: #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) Typical usecases are fd = open(...); if (unlikely(fd < 0)) { /* handle unlikely error */ } or ptr = malloc(...); if (unlikely(!ptr)) { /* handle unlikely error */ } If the conditionals fail, these checks have exactly zero impact on the run-time with a processor that supports branch prediction. Microsoft compilers don't know about __builtin_expect, but GCC, Clang and Intel compilers know what to do with it. Sturla From tjreedy at udel.edu Sun Sep 28 19:55:08 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 28 Sep 2014 13:55:08 -0400 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <5425C1D0.6030906@stoneleaf.us> <20140927003328.GD19757@ando.pearwood.info> Message-ID: On 9/28/2014 12:03 PM, Nathaniel Smith wrote: > On 28 Sep 2014 15:09, "Brett Cannon" > > wrote: > > > > On Sat Sep 27 2014 at 11:37:16 AM Nathaniel Smith > > wrote: > >> > >> On Sat, Sep 27, 2014 at 1:33 AM, Steven D'Aprano > > wrote: > >> > Or perhaps these special "modules" could subclass ModuleType and > somehow > >> > get reloading to work correctly. In 2.7 at least you can manually > copy a > >> > module to a module subclass, install it into sys.modules, and reload > >> > will accept it. Not only that, but after reloading it still uses the > >> > same subclass. > >> > > >> > Unfortunately, when I tried it in 3.3, imp.reload complained about my > >> > custom module subclass not being a module, so it seems that 3.3 at > least > >> > is more restrictive than 2.7. (Perhaps 3.3 reload does a "type(obj) is > >> > ModuleType" instead of isinstance test?) > >> > >> Yeah, it looks like 3.3 does an explicit 'type(obj) is ModuleType' > >> check, but is the only version that works like this -- earlier and > >> later versions both use isinstance. > > > > > > Feel free to file an issue about this. > > I thought 3.3 is in security-fix only mode? It is. -- Terry Jan Reedy From abarnert at yahoo.com Sun Sep 28 20:50:36 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 28 Sep 2014 11:50:36 -0700 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sep 28, 2014, at 9:44, Sturla Molden wrote: > Andrew Barnert > wrote: > >> Well, there's obviously a non-zero performance cost to doing all this >> stuff with all types. Of course there's also a non-zero cost to checking >> the heap-type-ness of all types. And both costs may be so minimal they're >> hard to even measure. > > With branch prediction on a modern CPU an "if unlikely()" can probably push > it down to inpunity. Both the Linux kernel and Cython does this liberally. On what modern CPU does unlikely have any effect at all? x86 has an opcode to provide static branch prediction hints, but it's been a no-op since Core 2; ARM doesn't have one; I don't know about other instruction sets but I'd be surprised if they did. And that's a good thing. If that macro still controlled branch prediction, using it would mean blowing away the entire pipeline on every use of a non-heap type. A modern CPU will use recent history to decide which branch is more likely, so whether your loop is using a heap type or a non-heap type, it won't mispredict anything after the first run through the loop. From sturla.molden at gmail.com Sun Sep 28 21:55:29 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 28 Sep 2014 19:55:29 +0000 (UTC) Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) References: <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> Message-ID: <801291282433626554.566166sturla.molden-gmail.com@news.gmane.org> Andrew Barnert wrote: > On what modern CPU does unlikely have any effect at all? x86 has an > opcode to provide static branch prediction hints, but it's been a no-op > since Core 2; ARM doesn't have one; I don't know about other instruction > sets but I'd be surprised if they did. AFAIK, the branch prediction is somewhat controlled by the order of instructions. And this compiler hint allows the compiler to restructure the code to better exploit this behavior. It does not result in specific opcodes being inserted. Sturla From solipsis at pitrou.net Sun Sep 28 22:08:15 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 28 Sep 2014 22:08:15 +0200 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> <95288578433616380.823406sturla.molden-gmail.com@news.gmane.org> Message-ID: <20140928220815.60057b9a@fsol> On Sun, 28 Sep 2014 17:13:06 +0000 (UTC) Sturla Molden wrote: > > If the conditionals fail, these checks have exactly zero impact on the > run-time with a processor that supports branch prediction. Branch prediction is typically implemented using branch predictors, which is a form of cache updated with the results of previous branches. "Impunity" can therefore only be achieved with an infinite number of branch predictors :-) Regards Antoine. From sturla.molden at gmail.com Sun Sep 28 22:17:43 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 28 Sep 2014 20:17:43 +0000 (UTC) Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) References: <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> Message-ID: <1567481794433627631.407791sturla.molden-gmail.com@news.gmane.org> Andrew Barnert wrote: > On what modern CPU does unlikely have any effect at all? x86 has an > opcode to provide static branch prediction hints, but it's been a no-op > since Core 2; ARM doesn't have one; I don't know about other instruction > sets but I'd be surprised if they did. http://madalanarayana.wordpress.com/2013/08/29/__builtin_expect-a-must-for-stack-developers/ http://benyossef.com/helping-the-compiler-help-you/ From abarnert at yahoo.com Mon Sep 29 04:58:08 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 28 Sep 2014 19:58:08 -0700 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: <1567481794433627631.407791sturla.molden-gmail.com@news.gmane.org> References: <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> <1567481794433627631.407791sturla.molden-gmail.com@news.gmane.org> Message-ID: <9711086C-66DB-43B2-B935-DEA358526365@yahoo.com> On Sep 28, 2014, at 13:17, Sturla Molden wrote: > Andrew Barnert > wrote: > >> On what modern CPU does unlikely have any effect at all? x86 has an >> opcode to provide static branch prediction hints, but it's been a no-op >> since Core 2; ARM doesn't have one; I don't know about other instruction >> sets but I'd be surprised if they did. > > http://madalanarayana.wordpress.com/2013/08/29/__builtin_expect-a-must-for-stack-developers/ The example in this post shows the exact opposite of what it purports to: the generated code puts the unlikely i++ operation immediately after the conditional branch; because Haswell processors assume, in the absence of any information, that forward branches are unlikely, this will cause the wrong branch to be speculatively executed. In other words, gcc has completely ignored the builtin_expect here--as it often does. Also note the comment in the quoted source: > In general, you should prefer to use actual profile feedback for this (`-fprofile-arcs'), as programmers are notoriously bad at predicting how their programs actually perform > http://benyossef.com/helping-the-compiler-help-you/ This one vaguely waves its hands at the idea without providing any examples, before concluding: > It should be noted that GCC also provide a run time parameter -fprofile-arcs, which can profile the code for the actual statistics for each branch and the use of it should be prefered above guessing. Meanwhile, this whole thing started with you saying that branch prediction means we can add conditional checks "with impunity". The exact opposite is true. On older processors, we _could_ issue checks with impunity; branch prediction means they're now an order of magnitude more expensive than they used to be unless we're very careful. The ability to hint the CPU by rearranging code (whether manually, with builtin_expect, or using PGO) partly mitigated this effect, but it doesn't reverse it. And at any rate, consider the case we're talking about. We have some heap types and some non-heap types. Neither branch is very unlikely, which means that no matter which version you mark as unlikely, it's going to be wrong quite often. Which means, exactly as I said at the start, that the check for non-heap it not free. Unnecessary refcounts are also not free. Which one is more costly? Is either one costly enough to matter? Hell if I know; that's the kind of thing you pretty much have to test. Trying to reason it from first principles is hard enough even if you get all the principles right, but even harder if you're thinking in terms of P4 chips. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Mon Sep 29 15:27:16 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 29 Sep 2014 09:27:16 -0400 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: <9711086C-66DB-43B2-B935-DEA358526365@yahoo.com> References: <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> <1567481794433627631.407791sturla.molden-gmail.com@news.gmane.org> <9711086C-66DB-43B2-B935-DEA358526365@yahoo.com> Message-ID: <1411997236.2737009.172955017.48A59CCC@webmail.messagingengine.com> On Sun, Sep 28, 2014, at 22:58, Andrew Barnert wrote: > And at any rate, consider the case we're talking about. We have some heap > types and some non-heap types. Neither branch is very unlikely, What? It is very unlikely, especially in existing code where it won't work at all, for someone to attempt to reassign the __class__ of a non heap type object. We are not talking about something that gets run on every object. From brett at python.org Mon Sep 29 16:07:29 2014 From: brett at python.org (Brett Cannon) Date: Mon, 29 Sep 2014 14:07:29 +0000 Subject: [Python-ideas] Implicit submodule imports References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <542497BB.6020302@canterbury.ac.nz> <6904EACC-ACDA-468B-9FF0-3233596C7C3C@yahoo.com> <5425C1D0.6030906@stoneleaf.us> <20140927003328.GD19757@ando.pearwood.info> Message-ID: On Sun Sep 28 2014 at 1:56:35 PM Terry Reedy wrote: > On 9/28/2014 12:03 PM, Nathaniel Smith wrote: > > On 28 Sep 2014 15:09, "Brett Cannon" > > > > wrote: > > > > > > On Sat Sep 27 2014 at 11:37:16 AM Nathaniel Smith > > > > wrote: > > >> > > >> On Sat, Sep 27, 2014 at 1:33 AM, Steven D'Aprano > > > > wrote: > > >> > Or perhaps these special "modules" could subclass ModuleType and > > somehow > > >> > get reloading to work correctly. In 2.7 at least you can manually > > copy a > > >> > module to a module subclass, install it into sys.modules, and > reload > > >> > will accept it. Not only that, but after reloading it still uses > the > > >> > same subclass. > > >> > > > >> > Unfortunately, when I tried it in 3.3, imp.reload complained about > my > > >> > custom module subclass not being a module, so it seems that 3.3 at > > least > > >> > is more restrictive than 2.7. (Perhaps 3.3 reload does a > "type(obj) is > > >> > ModuleType" instead of isinstance test?) > > >> > > >> Yeah, it looks like 3.3 does an explicit 'type(obj) is ModuleType' > > >> check, but is the only version that works like this -- earlier and > > >> later versions both use isinstance. > > > > > > > > > Feel free to file an issue about this. > > > > I thought 3.3 is in security-fix only mode? > > It is. > Sorry, my brain read that as "since 3.3", not as "in 3.3" and missed the "later versions" bit. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Sep 29 16:15:18 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 29 Sep 2014 14:15:18 +0000 (UTC) Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) References: <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> <1567481794433627631.407791sturla.molden-gmail.com@news.gmane.org> <9711086C-66DB-43B2-B935-DEA358526365@yahoo.com> <1411997236.2737009.172955017.48A59CCC@webmail.messagingengine.com> Message-ID: <437720645433691929.761968sturla.molden-gmail.com@news.gmane.org> wrote: > What? It is very unlikely, especially in existing code where it won't > work at all, for someone to attempt to reassign the __class__ of a non > heap type object. We are not talking about something that gets run on > every object. And because of that it is better to have the pipeline flushed whenever it happens, rather than, say, 50 % of the times it might happen. But I aggree with Andrew that it is something we should try to measure. Similarly, tagging functions 'hot' or 'cold' might also be a good idea. We know there are functions that will execute a lot, and there are error handlers that will only rarely be run. Anyone that has used Fortran will also know that tagging a function 'pure' is of great help to the compiler, particularly if arrays or pointers are involved. This informs the compiler that the function has no side effects. For example if we assert that a function like sin(x) is pure, it does not have to assume that calling this function will change something elsewhere. In Fortran it is a keyword, but we can use it in C as a GNU extension. Sturla From alexander.belopolsky at gmail.com Mon Sep 29 16:15:57 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Sep 2014 10:15:57 -0400 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <70843F4E-FC41-4F5D-B20D-C4F7AEE1E37E@yahoo.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <70843F4E-FC41-4F5D-B20D-C4F7AEE1E37E@yahoo.com> Message-ID: On Wed, Sep 24, 2014 at 2:22 PM, Andrew Barnert < abarnert at yahoo.com.dmarc.invalid> wrote: > Could LazyModule be easily added to the stdlib, or split out into a > separate PyPI package? How is it different from apipkg? https://pypi.python.org/pypi/apipkg/1.2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Sep 29 16:19:00 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 29 Sep 2014 07:19:00 -0700 Subject: [Python-ideas] Do we need non-heap types any more? (Was: Implicit submodule imports) In-Reply-To: <1411997236.2737009.172955017.48A59CCC@webmail.messagingengine.com> References: <542497BB.6020302@canterbury.ac.nz> <20140926133223.75076d5e@fsol> <5425DE19.20509@canterbury.ac.nz> <698291304433615279.784952sturla.molden-gmail.com@news.gmane.org> <1567481794433627631.407791sturla.molden-gmail.com@news.gmane.org> <9711086C-66DB-43B2-B935-DEA358526365@yahoo.com> <1411997236.2737009.172955017.48A59CCC@webmail.messagingengine.com> Message-ID: On Sep 29, 2014, at 6:27, random832 at fastmail.us wrote: > On Sun, Sep 28, 2014, at 22:58, Andrew Barnert wrote: >> And at any rate, consider the case we're talking about. We have some heap >> types and some non-heap types. Neither branch is very unlikely, > > What? It is very unlikely, especially in existing code where it won't > work at all, for someone to attempt to reassign the __class__ of a non > heap type object. We are not talking about something that gets run on > every object. Look at the subject of this thread. Go back to the first message in the thread. Greg's suggestion is that, instead of just working around the __class__ assignment test, "I'm thinking it should be possible to reduce the differences to the point where [heap allocation itself is] the *only* distinction, so the vast majority of code doesn't have to care, and the same tp_* functions can be used for both." That's what we're talking about here. Is there a potential performance impact for making all of those changes? There could be a benefit from removing the tests; there could be a cost from adding work we didn't used to do (e.g., extra refcounting or other tests that we can currently skip). So, the fact that the one check on __class__ can be statically predicted pretty well doesn't have much to do with the potential cost or benefit of removing all of the differences between heap and non-heap types instead of just the check on __class__. From abarnert at yahoo.com Mon Sep 29 16:25:18 2014 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 29 Sep 2014 07:25:18 -0700 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <70843F4E-FC41-4F5D-B20D-C4F7AEE1E37E@yahoo.com> Message-ID: <0BD49DAC-BF5B-4586-A17B-AB3BA2953938@yahoo.com> On Sep 29, 2014, at 7:15, Alexander Belopolsky wrote: > On Wed, Sep 24, 2014 at 2:22 PM, Andrew Barnert wrote: >> Could LazyModule be easily added to the stdlib, or split out into a separate PyPI package? > > How is it different from apipkg? > > https://pypi.python.org/pypi/apipkg/1.2 No idea. Could apipkg be easily added to the stdlib? Is it actively maintained? ("virtually all Python versions, including CPython2.3 to Python3.1" sounds a bit worrisome...). Does it provide all the same functionality as Mark-Andre's package? If the answers are all "yes" then you can take my message as support for adding either one. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Sep 29 16:32:11 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 29 Sep 2014 10:32:11 -0400 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <0BD49DAC-BF5B-4586-A17B-AB3BA2953938@yahoo.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <70843F4E-FC41-4F5D-B20D-C4F7AEE1E37E@yahoo.com> <0BD49DAC-BF5B-4586-A17B-AB3BA2953938@yahoo.com> Message-ID: On Mon, Sep 29, 2014 at 10:25 AM, Andrew Barnert wrote: > Is [apipkg] actively maintained? It is distributed as a part of the popular "py" library, so I would assume it is fairly well maintained. See . -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Sep 30 09:39:42 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 30 Sep 2014 09:39:42 +0200 Subject: [Python-ideas] Implicit submodule imports In-Reply-To: <70843F4E-FC41-4F5D-B20D-C4F7AEE1E37E@yahoo.com> References: <54230610.7060305@gmx.de> <54230913.4060401@egenix.com> <70843F4E-FC41-4F5D-B20D-C4F7AEE1E37E@yahoo.com> Message-ID: <542A5E3E.2030907@egenix.com> On 24.09.2014 20:22, Andrew Barnert wrote: > On Sep 24, 2014, at 11:10, "M.-A. Lemburg" wrote: > >> On 24.09.2014 19:57, Thomas Gl??le wrote: >>> Hey folks, >>> >>> What do you think about making it easier to use packages by >>> automatically importing submodules on attribute access. >>> >>> Consider this example: >>> >>>>>> import matplotlib >>>>>> figure = matplotlib.figure.Figure() >>> AttributeError: 'module' object has no attribute 'figure' >>> >>> For the newcomer (like me some months ago) it's not obvious that the >>> solution is to import matplotlib.figure. >>> >>> Worse even: it may sometimes/later on work, if the submodule has been >>> imported from another place. >>> >>> How, I'd like it to behave instead (in pseudo code, since `package` is >>> not a python class right now): >>> >>> class package: >>> >>> def __getattr__(self, name): >>> try: >>> return self.__dict__[name] >>> except KeyError: >>> # either try to import `name` or raise a nicer error message >>> >>> The automatic import feature could also play nicely when porting a >>> package with submodules to or from a simple module with namespaces (as >>> suggested in [1]), making this transition seemless to any user. >>> >>> I'm not sure about potential problems from auto-importing. I currently >>> see the following issues: >>> >>> - harmless looking attribute access can lead to significant code >>> execution including side effects. On the other hand, that could always >>> be the case. >>> >>> - you can't use attribute access anymore to test whether a submodule is >>> imported (must use sys.modules instead, I guess) >>> >>> >>> In principle one can already make this feature happen today, by >>> replacing the object in sys.modules - which is kind of ugly and has >>> probably more flaws. This would also be made easier if there were a >>> module.__getattr__ ([2]) or "metaclass" like feature for modules (which >>> would be just a class then, I guess). >>> >>> Sorry, if this has come up before and I missed it. Anyhow, just >>> interested if anyone else considers this a nice feature. >> >> Agreed, it's a nice feature :-) >> >> I've been using this in our mx packages since 1999 using a module >> called LazyModule.py. See e.g. >> http://educommons.com/dev/browser/3.2/installers/windows/src/eduCommons/python/Lib/site-packages/mx/URL/LazyModule.py > > Could LazyModule be easily added to the stdlib, or split out into a separate PyPI package? > > It seems to me that would be a pretty good solution. Today, a package has to eagerly preload modules, make the users do it manually, or write a few dozen lines of code to lazily load modules on demand, so it's not surprising that many of them don't use the third option even when it would be best for their users. If that could be one or two lines instead, I'm guessing a lot more packages would do so. If there's enough interest, then yes, separating it out into a PyPI package or adding it to the stdlib would be an option. The code is pretty simple. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 30 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-09-30: Python Meeting Duesseldorf ... today ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rymg19 at gmail.com Tue Sep 30 19:16:13 2014 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 30 Sep 2014 12:16:13 -0500 Subject: [Python-ideas] Python 2's re module should take longs Message-ID: This works: re.search('(abc)', 'abc').group(1) but this doesn't: re.search('(abc)', 'abc').group(1L) The latter raises "IndexError: no such group". Shouldn't that technically work? -- Ryan If anybody ever asks me why I prefer C++ to C, my answer will be simple: "It's becauseslejfp23(@#Q*(E*EIdc-SEGFAULT. Wait, I don't think that was nul-terminated." Personal reality distortion fields are immune to contradictory evidence. - srean Check out my website: http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Sep 30 21:31:59 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 30 Sep 2014 15:31:59 -0400 Subject: [Python-ideas] Python 2's re module should take longs In-Reply-To: References: Message-ID: On 9/30/2014 1:16 PM, Ryan Gonzalez wrote: > This works: > > re.search('(abc)', 'abc').group(1) > > but this doesn't: > > re.search('(abc)', 'abc').group(1L) > > The latter raises "IndexError: no such group". Shouldn't that > technically work? If groups were stored in a list, then technically, yes, not if groups are stored in a dict to support named groups with just one structure. Since the number of groups is limited to 99 or 100 in 2.7 (just changed for 3.5), there is no technical reason to use longs. Even if the exception were considered a bug, I would not change it since using longs would restrict code to 2.7.9+ and make it less portable to 3.x. -- Terry Jan Reedy From solipsis at pitrou.net Tue Sep 30 21:57:44 2014 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Sep 2014 21:57:44 +0200 Subject: [Python-ideas] Python 2's re module should take longs References: Message-ID: <20140930215744.6967bfbd@fsol> On Tue, 30 Sep 2014 12:16:13 -0500 Ryan Gonzalez wrote: > This works: > > re.search('(abc)', 'abc').group(1) > > but this doesn't: > > re.search('(abc)', 'abc').group(1L) > > The latter raises "IndexError: no such group". Shouldn't that technically > work? Yes, it's a bug. Feel free to open an issue. Regards Antoine. From random832 at fastmail.us Tue Sep 30 22:06:23 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 30 Sep 2014 16:06:23 -0400 Subject: [Python-ideas] Python 2's re module should take longs In-Reply-To: References: Message-ID: <1412107583.3704066.173609841.6B6D30C5@webmail.messagingengine.com> On Tue, Sep 30, 2014, at 15:31, Terry Reedy wrote: > If groups were stored in a list, then technically, yes, not if groups > are stored in a dict to support named groups with just one structure. Longs work fine interchangeably with ints in a dict. And even if they didn't, the group function _could_ convert a small-valued long argument to an int. This is an error raised by a function implemented in C that forces a static type checking on its arguments. The core problem is that the PyInt_AsLong function does not check (and handle) the case that its argument is a small-valued PyLong. From random832 at fastmail.us Tue Sep 30 22:12:12 2014 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 30 Sep 2014 16:12:12 -0400 Subject: [Python-ideas] Python 2's re module should take longs In-Reply-To: <1412107583.3704066.173609841.6B6D30C5@webmail.messagingengine.com> References: <1412107583.3704066.173609841.6B6D30C5@webmail.messagingengine.com> Message-ID: <1412107932.3705322.173613161.5FA58146@webmail.messagingengine.com> Disregard my last message, I was looking at the wrong code. But looking at what I think is the right code (https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules), I am confused, since this error is raised after the index has already been converted to a Py_ssize_t. On Tue, Sep 30, 2014, at 16:06, random832 at fastmail.us wrote: > On Tue, Sep 30, 2014, at 15:31, Terry Reedy wrote: > > If groups were stored in a list, then technically, yes, not if groups > > are stored in a dict to support named groups with just one structure. > > Longs work fine interchangeably with ints in a dict. And even if they > didn't, the group function _could_ convert a small-valued long argument > to an int. This is an error raised by a function implemented in C that > forces a static type checking on its arguments. The core problem is that > the PyInt_AsLong function does not check (and handle) the case that its > argument is a small-valued PyLong. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Random832 From ckaynor at zindagigames.com Tue Sep 30 22:32:14 2014 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Tue, 30 Sep 2014 13:32:14 -0700 Subject: [Python-ideas] Python 2's re module should take longs In-Reply-To: <1412107932.3705322.173613161.5FA58146@webmail.messagingengine.com> References: <1412107583.3704066.173609841.6B6D30C5@webmail.messagingengine.com> <1412107932.3705322.173613161.5FA58146@webmail.messagingengine.com> Message-ID: On Tue, Sep 30, 2014 at 1:12 PM, wrote: > Disregard my last message, I was looking at the wrong code. > > But looking at what I think is the right code > (https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules), I am > confused, since this error is raised after the index has already been > converted to a Py_ssize_t. According to my quick look at the code[1], it looks like the problem is in match_getindex (~line 3304). If the line "if (PyInt_Check(index))" read "if (PyInt_Check(index) || PyLong_Check(index))" instead, it appears that it would properly handle longs as well as ints (at least based on what is happening a little father down, near line 3312). It may be possible that the conditions need to be seperated so that the long case calls PyLong_AsSsize_t rather than PyInt_AsSsize_t, but that may not be needed. It appears that in case an index is passed in, the re module just converts that to a C size_t, otherwise it looks it up in the group name dictionary to get the index. I suspect the indexes don't exist as keys in the mapping, only the group names. As the initial conversion checks for int specifically, and ignores longs, longs are treated differently than ints. As a side note, it appears the documentation at https://docs.python.org/2/c-api/long.html is slightly incorrect: there appear to be two instances of a few functions, with slightly different documentation, but the same return, arguments, and name. The ones I can seeare "PyLong_FromSsize_t" and "PyLong_AsSsize_t". Prehaps I am just missing some subtle difference in the names or arguments? [1] https://hg.python.org/cpython/file/d49b9c8ee8ed/Modules/_sre.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Sep 30 23:03:58 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 30 Sep 2014 23:03:58 +0200 Subject: [Python-ideas] Python 2's re module should take longs In-Reply-To: References: <1412107583.3704066.173609841.6B6D30C5@webmail.messagingengine.com> <1412107932.3705322.173613161.5FA58146@webmail.messagingengine.com> Message-ID: On 09/30/2014 10:32 PM, Chris Kaynor wrote: > As a side note, it appears the documentation > at https://docs.python.org/2/c-api/long.html is slightly incorrect: there appear > to be two instances of a few functions, with slightly different documentation, > but the same return, arguments, and name. The ones I can seeare > "PyLong_FromSsize_t" and "PyLong_AsSsize_t". Prehaps I am just missing some > subtle difference in the names or arguments? It just looks like a duplication, maybe from editing a merge conflict. Fixed. Georg