From timothy.c.delaney at gmail.com Tue Jul 1 00:07:23 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 1 Jul 2014 08:07:23 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

Message-ID: On 1 July 2014 03:05, Ben Hoyt wrote: > > So, here's my alternative proposal: add an "ensure_lstat" flag to > > scandir() itself, and don't have *any* methods on DirEntry, only > > attributes. > ... > > Most importantly, *regardless of platform*, the cached stat result (if > > not None) would reflect the state of the entry at the time the > > directory was scanned, rather than at some arbitrary later point in > > time when lstat() was first called on the DirEntry object. > I'm torn between whether I'd prefer the stat fields to be populated on Windows if ensure_lstat=False or not. There are good arguments each way, but overall I'm inclining towards having it consistent with POSIX - don't populate them unless ensure_lstat=True. +0 for stat fields to be None on all platforms unless ensure_lstat=True. > Yeah, I quite like this. It does make the caching more explicit and > consistent. It's slightly annoying that it's less like pathlib.Path > now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't > matter. The differences in naming may highlight the difference in > caching, so maybe it's a good thing. > See my comments below on .fullname. > Two further questions from me: > > 1) How does error handling work? Now os.stat() will/may be called > during iteration, so in __next__. But it hard to catch errors because > you don't call __next__ explicitly. Is this a problem? How do other > iterators that make system calls or raise errors handle this? > I think it just needs to be documented that iterating may throw the same exceptions as os.lstat(). It's a little trickier if you don't want the scope of your exception to be too broad, but you can always wrap the iteration in a generator to catch and handle the exceptions you care about, and allow the rest to propagate. def scandir_accessible(path='.'): gen = os.scandir(path) while True: try: yield next(gen) except PermissionError: pass 2) There's still the open question in the PEP of whether to include a > way to access the full path. This is cheap to build, it has to be > built anyway on POSIX systems, and it's quite useful for further > operations on the file. I think the best way to handle this is a > .fullname or .full_name attribute as suggested elsewhere. Thoughts? > +1 for .fullname. The earlier suggestion to have __str__ return the name is killed I think by the fact that .fullname could be bytes. It would be nice if pathlib.Path objects were enhanced to take a DirEntry and use the .fullname automatically, but you could always call Path(direntry.fullname). Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jul 1 00:38:45 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 30 Jun 2014 15:38:45 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

Message-ID: <53B1E6F5.2040905@stoneleaf.us> On 06/30/2014 03:07 PM, Tim Delaney wrote: > On 1 July 2014 03:05, Ben Hoyt wrote: >> >> So, here's my alternative proposal: add an "ensure_lstat" flag to >> scandir() itself, and don't have *any* methods on DirEntry, only >> attributes. >> ... >> Most importantly, *regardless of platform*, the cached stat result (if >> not None) would reflect the state of the entry at the time the >> directory was scanned, rather than at some arbitrary later point in >> time when lstat() was first called on the DirEntry object. > > I'm torn between whether I'd prefer the stat fields to be populated > on Windows if ensure_lstat=False or not. There are good arguments each > way, but overall I'm inclining towards having it consistent with POSIX > - don't populate them unless ensure_lstat=True. > > +0 for stat fields to be None on all platforms unless ensure_lstat=True. If a Windows user just needs the free info, why should s/he have to pay the price of a full stat call? I see no reason to hold the Windows side back and not take advantage of what it has available. There are plenty of posix calls that Windows is not able to use, after all. -- ~Ethan~ From timothy.c.delaney at gmail.com Tue Jul 1 01:15:59 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 1 Jul 2014 09:15:59 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B1E6F5.2040905@stoneleaf.us> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B1E6F5.2040905@stoneleaf.us> Message-ID: On 1 July 2014 08:38, Ethan Furman wrote: > On 06/30/2014 03:07 PM, Tim Delaney wrote: > >> I'm torn between whether I'd prefer the stat fields to be populated >> on Windows if ensure_lstat=False or not. There are good arguments each >> way, but overall I'm inclining towards having it consistent with POSIX >> - don't populate them unless ensure_lstat=True. >> >> +0 for stat fields to be None on all platforms unless ensure_lstat=True. >> > > If a Windows user just needs the free info, why should s/he have to pay > the price of a full stat call? I see no reason to hold the Windows side > back and not take advantage of what it has available. There are plenty of > posix calls that Windows is not able to use, after all. > On Windows ensure_lstat would either be either a NOP (if the fields are always populated), or it simply determines if the fields get populated. No extra stat call. On POSIX it's the difference between an extra stat call or not. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Tue Jul 1 01:25:49 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 30 Jun 2014 16:25:49 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

Message-ID: On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney wrote: > On 1 July 2014 03:05, Ben Hoyt wrote: >> >> > So, here's my alternative proposal: add an "ensure_lstat" flag to >> > scandir() itself, and don't have *any* methods on DirEntry, only >> > attributes. >> ... >> >> > Most importantly, *regardless of platform*, the cached stat result (if >> > not None) would reflect the state of the entry at the time the >> > directory was scanned, rather than at some arbitrary later point in >> > time when lstat() was first called on the DirEntry object. > > > I'm torn between whether I'd prefer the stat fields to be populated on > Windows if ensure_lstat=False or not. There are good arguments each way, but > overall I'm inclining towards having it consistent with POSIX - don't > populate them unless ensure_lstat=True. > > +0 for stat fields to be None on all platforms unless ensure_lstat=True. This won't work well if lstat info is only needed for some entries. Is that a common use-case? It was mentioned earlier in the thread. -- Devin From ethan at stoneleaf.us Tue Jul 1 01:45:18 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 30 Jun 2014 16:45:18 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B1E6F5.2040905@stoneleaf.us> Message-ID: <53B1F68E.5000908@stoneleaf.us> On 06/30/2014 04:15 PM, Tim Delaney wrote: > On 1 July 2014 08:38, Ethan Furman wrote: >> On 06/30/2014 03:07 PM, Tim Delaney wrote: >>> >>> I'm torn between whether I'd prefer the stat fields to be populated >>> on Windows if ensure_lstat=False or not. There are good arguments each >>> way, but overall I'm inclining towards having it consistent with POSIX >>> - don't populate them unless ensure_lstat=True. >>> >>> +0 for stat fields to be None on all platforms unless ensure_lstat=True. >> >> If a Windows user just needs the free info, why should s/he have to pay >> the price of a full stat call? I see no reason to hold the Windows side >> back and not take advantage of what it has available. There are plenty >> of posix calls that Windows is not able to use, after all. > > On Windows ensure_lstat would either be either a NOP (if the fields are > always populated), or it simply determines if the fields get populated. > No extra stat call. I suppose the exact behavior is still under discussion, as there are only two or three fields one gets "for free" on Windows (I think...), where as an os.stat call would get everything available for the platform. > On POSIX it's the difference between an extra stat call or not. Agreed on this part. Still, no reason to slow down the Windows side by throwing away info unnecessarily -- that's why this PEP exists, after all. -- ~Ethan~ From benhoyt at gmail.com Tue Jul 1 03:28:00 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 30 Jun 2014 21:28:00 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B1F68E.5000908@stoneleaf.us> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B1E6F5.2040905@stoneleaf.us> <53B1F68E.5000908@stoneleaf.us> Message-ID: > I suppose the exact behavior is still under discussion, as there are only > two or three fields one gets "for free" on Windows (I think...), where as an > os.stat call would get everything available for the platform. No, Windows is nice enough to give you all the same stat_result fields during scandir (via FindFirstFile/FindNextFile) as a regular os.stat(). -Ben From v+python at g.nevcal.com Tue Jul 1 04:04:43 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 30 Jun 2014 19:04:43 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

Message-ID: <53B2173B.1010709@g.nevcal.com> On 6/30/2014 4:25 PM, Devin Jeanpierre wrote: > On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney > wrote: >> On 1 July 2014 03:05, Ben Hoyt wrote: >>>> So, here's my alternative proposal: add an "ensure_lstat" flag to >>>> scandir() itself, and don't have *any* methods on DirEntry, only >>>> attributes. >>> ... >>> >>>> Most importantly, *regardless of platform*, the cached stat result (if >>>> not None) would reflect the state of the entry at the time the >>>> directory was scanned, rather than at some arbitrary later point in >>>> time when lstat() was first called on the DirEntry object. >> >> I'm torn between whether I'd prefer the stat fields to be populated on >> Windows if ensure_lstat=False or not. There are good arguments each way, but >> overall I'm inclining towards having it consistent with POSIX - don't >> populate them unless ensure_lstat=True. >> >> +0 for stat fields to be None on all platforms unless ensure_lstat=True. > This won't work well if lstat info is only needed for some entries. Is > that a common use-case? It was mentioned earlier in the thread. If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() API to update the data for those that need it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Tue Jul 1 04:17:00 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 30 Jun 2014 19:17:00 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B2173B.1010709@g.nevcal.com> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B2173B.1010709@g.nevcal.com> Message-ID: The proposal I was replying to was that: - There is no .refresh() - ensure_lstat=False means no OS has populated attributes - ensure_lstat=True means ever OS has populated attributes Even if we add a .refresh(), the latter two items mean that you can't avoid doing extra work (either too much on windows, or too much on linux), if you want only a subset of the files' lstat info. -- Devin P.S. your mail client's quoting breaks my mail client (gmail)'s quoting. On Mon, Jun 30, 2014 at 7:04 PM, Glenn Linderman wrote: > On 6/30/2014 4:25 PM, Devin Jeanpierre wrote: > > On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney > wrote: > > On 1 July 2014 03:05, Ben Hoyt wrote: > > So, here's my alternative proposal: add an "ensure_lstat" flag to > scandir() itself, and don't have *any* methods on DirEntry, only > attributes. > > ... > > Most importantly, *regardless of platform*, the cached stat result (if > not None) would reflect the state of the entry at the time the > directory was scanned, rather than at some arbitrary later point in > time when lstat() was first called on the DirEntry object. > > I'm torn between whether I'd prefer the stat fields to be populated on > Windows if ensure_lstat=False or not. There are good arguments each way, but > overall I'm inclining towards having it consistent with POSIX - don't > populate them unless ensure_lstat=True. > > +0 for stat fields to be None on all platforms unless ensure_lstat=True. > > This won't work well if lstat info is only needed for some entries. Is > that a common use-case? It was mentioned earlier in the thread. > > > If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() > API to update the data for those that need it. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/jeanpierreda%40gmail.com > From ncoghlan at gmail.com Tue Jul 1 04:17:44 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Jul 2014 12:17:44 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B2173B.1010709@g.nevcal.com> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B2173B.1010709@g.nevcal.com> Message-ID: On 30 Jun 2014 19:13, "Glenn Linderman" wrote: > > > If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() API to update the data for those that need it. I'm -1 on a refresh API for DirEntry - just use pathlib in that case. Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jul 1 03:44:57 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 30 Jun 2014 18:44:57 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B1E6F5.2040905@stoneleaf.us> <53B1F68E.5000908@stoneleaf.us> Message-ID: <53B21299.1020006@stoneleaf.us> On 06/30/2014 06:28 PM, Ben Hoyt wrote: >> I suppose the exact behavior is still under discussion, as there are only >> two or three fields one gets "for free" on Windows (I think...), where as an >> os.stat call would get everything available for the platform. > > No, Windows is nice enough to give you all the same stat_result fields > during scandir (via FindFirstFile/FindNextFile) as a regular > os.stat(). Very nice. Even less reason then to throw it away. :) -- ~Ethan~ From eric at trueblade.com Tue Jul 1 04:59:33 2014 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 30 Jun 2014 22:59:33 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B2173B.1010709@g.nevcal.com> Message-ID: <53B22415.1080801@trueblade.com> On 6/30/2014 10:17 PM, Nick Coghlan wrote: > > On 30 Jun 2014 19:13, "Glenn Linderman" > wrote: >> >> >> If it is, use ensure_lstat=False, and use the proposed (by me) > .refresh() API to update the data for those that need it. > > I'm -1 on a refresh API for DirEntry - just use pathlib in that case. I'm not sure refresh() is the best name, but I think a "get_stat_info_from_direntry_or_call_stat()" (hah!) makes sense. If you really need the stat info, then you can write simple code like: for entry in os.scandir(path): mtime = entry.get_stat_info_from_direntry_or_call_stat().st_mtime And it won't call stat() any more times than needed. Once per file on Posix, zero times per file on Windows. Without an API like this, you'll need a check in the application code on whether or not to call stat(). Eric. From tjreedy at udel.edu Tue Jul 1 06:35:24 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 01 Jul 2014 00:35:24 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B21299.1020006@stoneleaf.us> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B1E6F5.2040905@stoneleaf.us> <53B1F68E.5000908@stoneleaf.us> <53B21299.1020006@stoneleaf.us> Message-ID: On 6/30/2014 9:44 PM, Ethan Furman wrote: > On 06/30/2014 06:28 PM, Ben Hoyt wrote: >>> I suppose the exact behavior is still under discussion, as there are >>> only >>> two or three fields one gets "for free" on Windows (I think...), >>> where as an >>> os.stat call would get everything available for the platform. >> >> No, Windows is nice enough to give you all the same stat_result fields >> during scandir (via FindFirstFile/FindNextFile) as a regular >> os.stat(). > > Very nice. Even less reason then to throw it away. :) I agree. -- Terry Jan Reedy From victor.stinner at gmail.com Tue Jul 1 08:55:12 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 08:55:12 +0200 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B2173B.1010709@g.nevcal.com> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando>

<53B2173B.1010709@g.nevcal.com> Message-ID: 2014-07-01 4:04 GMT+02:00 Glenn Linderman : >> +0 for stat fields to be None on all platforms unless ensure_lstat=True. > > This won't work well if lstat info is only needed for some entries. Is > that a common use-case? It was mentioned earlier in the thread. > > If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() > API to update the data for those that need it. We should make DirEntry as simple as possible. In Python, the classic behaviour is to not define an attribute if it's not available on a platform. For example, stat().st_file_attributes is only available on Windows. I don't like the idea of the ensure_lstat parameter because os.scandir would have to call two system calls, it makes harder to guess which syscall failed (readdir or lstat). If you need lstat on UNIX, write: if hasattr(entry, 'lstat_result'): size = entry.lstat_result.st_size else: size = os.lstat(entry.fullname()).st_size Victor From victor.stinner at gmail.com Tue Jul 1 09:44:02 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 09:44:02 +0200 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) Message-ID: Hi, IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str). scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it. There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It *can* be an issue for very deep file hierarchy. If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc. The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc. Example: fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd) Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!) Victor From victor.stinner at gmail.com Tue Jul 1 09:48:49 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 09:48:49 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) Message-ID: Hi, @Ben: it's time to update your PEP to complete it with this discussion! IMO DirEntry must be as simple as possible and portable: - os.scandir(str) - DirEntry.lstat_result object only available on Windows, same result than os.lstat() - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where directory would be an hidden attribute of DirEntry Notes: - DirEntry.lstat_result is better than DirEntry.lstat() because it makes explicitly that lstat_result is only computed once. When I call DirEntry.lstat(), I expect to get the current status of the file, not the cached one. It's also hard to explain (document) that DirEntry.lstat() may or may call a system call. Don't do that, use DirEntry.lstat_result. - I don't think that we should support scandir(bytes). If you really want to support os.scandir(bytes), it must raise an error on Windows since bytes filename are already deprecated. It wouldn't make sense to add new function with a deprecated feature. Since we have the PEP 383 (surrogateescape), it's better to advice to use Unicode on all platforms. Almost all Python functions are able to encode back Unicode filename automatically. Use os.fsencode() to encode manually if needd. - We may not define a DirEntry.fullname() method: the directory name is usually well known. Ok, but every time that I use os.listdir(), I write os.path.join(directory, name) because in some cases I want the full path. Example: interesting = [] for name in os.listdir(path): fullpath = os.path.join(path, name) if os.path.isdir(fullpath): continue if ... test on the file ...: # i need the full path here, not the relative path # (ex: my own recursive "scandir"/"walk" function) interesting.append(fullpath) - It must not be possible to "refresh" a DirEntry object. Call os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get fresh data. DirEntry is only computed once, that's all. It's well defined. - No Windows wildcard, you wrote that the feature has many corner cases, and it's only available on Windows. It's easy to combine scandir with fnmatch. Victor From benhoyt at gmail.com Tue Jul 1 14:26:15 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jul 2014 08:26:15 -0400 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. We can always add it later. -Ben On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner wrote: > Hi, > > IMO we must decide if scandir() must support or not file descriptor. > It's an important decision which has an important impact on the API. > > > To support scandir(fd), the minimum is to store dir_fd in DirEntry: > dir_fd would be None for scandir(str). > > > scandir(fd) must not close the file descriptor, it should be done by > the caller. Handling the lifetime of the file descriptor is a > difficult problem, it's better to let the user decide how to handle > it. > > There is the problem of the limit of open file descriptors, usually > 1024 but it can be lower. It *can* be an issue for very deep file > hierarchy. > > If we choose to support scandir(fd), it's probably safer to not use > scandir(fd) by default in os.walk() (use scandir(str) instead), wait > until the feature is well tested, corner cases are well known, etc. > > > The second step is to enhance pathlib.Path to support an optional file > descriptor. Path already has methods on filenames like chmod(), > exists(), rename(), etc. > > > Example: > > fd = os.open(path, os.O_DIRECTORY) > try: > for entry in os.scandir(fd): > # ... use entry to benefit of entry cache: is_dir(), lstat_result ... > path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) > # ... use path which uses dir_fd ... > finally: > os.close(fd) > > Problem: if the path object is stored somewhere and use after the > loop, Path methods will fail because dir_fd was closed. It's even > worse if a new directory uses the same file descriptor :-/ (security > issue, or at least tricky bugs!) > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com From victor.stinner at gmail.com Tue Jul 1 15:01:26 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 15:01:26 +0200 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: 2014-07-01 14:26 GMT+02:00 Ben Hoyt : > Thanks, Victor. > > I don't have any experience with dir_fd handling, so unfortunately > can't really comment here. > > What advantages does it bring? I notice that even os.listdir() on > Python 3.4 doesn't have anything related to file descriptors, so I'd > be in favour of not including support. See https://docs.python.org/dev/library/os.html#dir-fd The idea is to make sure that you get files from the same directory. Problems occur when a directory is moved or a symlink is modified. Example: - you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned by www user (website) - you would like to remove the file "x": call unlink("/tmp/copy/passwd") - ... but just before that, an attacker replaces the /tmp/copy directory with a symlink to /etc - you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh Using unlink("passwd", dir_fd=tmp_copy_fd), you don't have this issue. You are sure that you are working in /tmp/copy directory. You can imagine a lot of other scenarios to override files and read sensitive files. Hopefully, the Linux rm commands knows unlinkat() sycall ;-) haypo at selma$ mkdir -p a/b/c haypo at selma$ strace -e unlinkat rm -rf a unlinkat(5, "c", AT_REMOVEDIR) = 0 unlinkat(4, "b", AT_REMOVEDIR) = 0 unlinkat(AT_FDCWD, "a", AT_REMOVEDIR) = 0 +++ exited with 0 +++ We should implement a similar think in shutil.rmtree(). See also os.fwalk() which is a version of os.walk() providing dir_fd. > We can always add it later. I would prefer to discuss that right now. My proposition is to accept an int for scandir() and copy the int into DirEntry.dir_fd. It's not that complex :-) The enhancement of the pathlib module can be done later. By the way, I know that Antoine Pitrou wanted to implemented file descriptors in pathlib, but the feature was rejected or at least delayed. Victor From benhoyt at gmail.com Tue Jul 1 15:00:32 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jul 2014 09:00:32 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: Thanks for spinning this off to (hopefully) finished the discussion. I agree it's nearly time to update the PEP. > @Ben: it's time to update your PEP to complete it with this > discussion! IMO DirEntry must be as simple as possible and portable: > > - os.scandir(str) > - DirEntry.lstat_result object only available on Windows, same result > than os.lstat() > - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where > directory would be an hidden attribute of DirEntry I'm quite strongly against this, and I think it's actually the worst of both worlds. It is not as good an API because: (a) it doesn't call stat for you (on POSIX), so you have to check an attribute and call scandir manually if you need it, turning what should be one line of code into four. Your proposal above was kind of how I had it originally, where you had to do extra tests and call scandir manually if you needed it (see https://mail.python.org/pipermail/python-dev/2013-May/126119.html) (b) the .lstat_result attribute is available on Windows but not on POSIX, meaning it's very easy for Windows developers to write code that will run and work fine on Windows, but then break horribly on POSIX; I think it'd be better if it broke hard on Windows to make writing cross-platform code easy The two alternates are: 1) the original proposal in the current version of PEP 471, where DirEntry has an .lstat() method which calls stat() on POSIX but is free on Windows 2) Nick Coghlan's proposal on the previous thread (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) suggesting an ensure_lstat keyword param to scandir if you need the lstat_result value I would make one small tweak to Nick Coghlan's proposal to make writing cross-platform code easier. Instead of .lstat_result being None sometimes (on POSIX), have it None always unless you specify ensure_lstat=True. (Actually, call it get_lstat=True to kind of make this more obvious.) Per (b) above, this means Windows developers wouldn't accidentally write code which failed on POSIX systems -- it'd fail fast on Windows too if you accessed .lstat_result without specifying get_lstat=True. I'm still unsure which of these I like better. I think #1's API is slightly nicer without the ensure_lstat parameter, and error handling of the stat() is more explicit. But #2 always fetches the stat info at the same time as the dir entry info, so eliminates the problem of having the file info change between scandir iteration and the .lstat() call. I'm leaning towards preferring #2 (Nick's proposal) because it solves or gets around the caching issue. My one concern is error handling. Is it an issue if scandir's __next__ can raise an OSError either from the readdir() call or the call to stat()? My thinking is probably not. In practice, would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? I guess it could if the file is deleted, but then if it were deleted a microsecond earlier the readdir() would fail anyway, or not? Or does readdir give you a consistent, "snap-shotted" view on things? The one other thing I'm not quite sure about with Nick's proposal is the name .lstat_result, as it's long. I can see why he suggested that, as .lstat sounds like a verb, but maybe that's okay? If we can have .is_dir and .is_file as attributes, my thinking is an .lstat attribute is fine too. I don't feel too strongly though. > - I don't think that we should support scandir(bytes). If you really > want to support os.scandir(bytes), it must raise an error on Windows > since bytes filename are already deprecated. It wouldn't make sense to > add new function with a deprecated feature. Since we have the PEP 383 > (surrogateescape), it's better to advice to use Unicode on all > platforms. Almost all Python functions are able to encode back Unicode > filename automatically. Use os.fsencode() to encode manually if needd. Really, are bytes filenames deprecated? I think maybe they should be, as they don't work on Windows :-), but the latest Python "os" docs (https://docs.python.org/3.5/library/os.html) still say that all functions that accept path names accept either str or bytes, and return a value of the same type where necessary. So I think scandir() should do the same thing. > - We may not define a DirEntry.fullname() method: the directory name > is usually well known. Ok, but every time that I use os.listdir(), I > write os.path.join(directory, name) because in some cases I want the > full path. Agreed. I use this a lot too. However, I'd prefer a .fullname attribute rather than a method, as it's free/cheap to compute and doesn't require OS calls. Out of interest, why do we have .is_dir and .stat_result but .fullname rather than .full_name? .fullname seems reasonable to me, but maybe consistency is a good thing here? > - It must not be possible to "refresh" a DirEntry object. Call > os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get > fresh data. DirEntry is only computed once, that's all. It's well > defined. I agree refresh() is not needed -- just use os.stat() or pathlib. > - No Windows wildcard, you wrote that the feature has many corner > cases, and it's only available on Windows. It's easy to combine > scandir with fnmatch. Agreed. -Ben From victor.stinner at gmail.com Tue Jul 1 16:28:10 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 16:28:10 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: 2014-07-01 15:00 GMT+02:00 Ben Hoyt : > (a) it doesn't call stat for you (on POSIX), so you have to check an > attribute and call scandir manually if you need it, Yes, and that's something common when you use the os module. For example, don't try to call os.fork(), os.getgid() or os.fchmod() on Windows :-) Closer to your PEP, the following OS attributes are only available on UNIX: st_blocks, st_blksize, st_rdev, st_flags; and st_file_attributes is only available on Windows. I don't think that using lstat_result is a common need when browsing a directoy. In most cases, you only need is_dir() and the name attribute. > 1) the original proposal in the current version of PEP 471, where > DirEntry has an .lstat() method which calls stat() on POSIX but is > free on Windows On UNIX, does it mean that .lstat() calls os.lstat() at the first call, and then always return the same result? It would be different than os.lstat() and pathlib.Path.stat() :-( I would prefer to have the same behaviour than pathlib and os (you know, the well known consistency of Python stdlib). As I wrote, I expect a function call to always retrieve the new status. > 2) Nick Coghlan's proposal on the previous thread > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > suggesting an ensure_lstat keyword param to scandir if you need the > lstat_result value I don't like this idea because it makes error handling more complex. The syntax to catch exceptions on an iterator is verbose (while: try: next() except ...). Whereas calling os.lstat(entry.fullname()) is explicit and it's easy to surround it with try/except. > .lstat_result being None sometimes (on POSIX), Don't do that, it's not how Python handles portability. We use hasattr(). > would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? Yes, it can happen. The filesystem is system-wide and shared by all users. The file can be deleted. > Really, are bytes filenames deprecated? Yes, in all functions of the os module since Python 3.3. I'm sure because I implemented the deprecation :-) Try open(b'test.txt', w') on Windows with python -Werror. > I think maybe they should be, as they don't work on Windows :-) Windows has an API dedicated to bytes filenames, the ANSI API. But this API has annoying bugs: it replaces unencodable characters by question marks, and there is no option to be noticed on the encoding error. Different users complained about that. It was decided to not change Python since Python is a light wrapper over the kernel system calls. But bytes filenames are now deprecated to advice users to use the native type for filenames on Windows: Unicode! > but the latest Python "os" docs > (https://docs.python.org/3.5/library/os.html) still say that all > functions that accept path names accept either str or bytes, Maybe I forgot to update the documentation :-( > So I think scandir() should do the same thing. You may support scandir(bytes) on Windows but you will need to emit a deprecation warning too. (which are silent by default.) Victor From j.wielicki at sotecware.net Tue Jul 1 16:59:13 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Tue, 01 Jul 2014 16:59:13 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: <53B2CCC1.3000409@sotecware.net> On 01.07.2014 15:00, Ben Hoyt wrote: > I'm leaning towards preferring #2 (Nick's proposal) because it solves > or gets around the caching issue. My one concern is error handling. Is > it an issue if scandir's __next__ can raise an OSError either from the > readdir() call or the call to stat()? My thinking is probably not. In > practice, would it ever really happen that readdir() would succeed but > an os.stat() immediately after would fail? I guess it could if the > file is deleted, but then if it were deleted a microsecond earlier the > readdir() would fail anyway, or not? Or does readdir give you a > consistent, "snap-shotted" view on things? No need for a microsecond-timed deletion -- a directory with +r but without +x will allow you to list the entries, but stat calls on the files will fail with EPERM: $ ls -l drwxr--r--. 2 root root 60 1. Jul 16:52 test $ sudo ls -l test total 0 -rw-r--r--. 1 root root 0 1. Jul 16:52 foo $ ls test ls: cannot access test/foo: Permission denied total 0 -????????? ? ? ? ? ? foo $ stat test/foo stat: cannot stat ?test/foo?: Permission denied I had the idea to treat a failing lstat() inside scandir() as if the entry wasn?t found at all, but in this context, this seems wrong too. regards, jwi From techtonik at gmail.com Tue Jul 1 07:16:52 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 1 Jul 2014 08:16:52 +0300 Subject: [Python-Dev] Excess help() output Message-ID: Hi, The help() output is confusing for beginners: >>> class B(object): ... pass ... >>> help(B) Help on class B in module __main__: class B(__builtin__.object) | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) Is it possible to remove this section from help output? Why is it here at all? >>> dir(B) ['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] -- anatoly t. From benhoyt at gmail.com Tue Jul 1 17:30:37 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jul 2014 11:30:37 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: <53B2CCC1.3000409@sotecware.net> References: <53B2CCC1.3000409@sotecware.net> Message-ID: > No need for a microsecond-timed deletion -- a directory with +r but > without +x will allow you to list the entries, but stat calls on the > files will fail with EPERM: Ah -- very good to know, thanks. This definitely points me in the direction of wanting better control over error handling. Speaking of errors, and thinking of handling errors during iteration -- in what cases (if any) would an individual readdir fail if the opendir succeeded? -Ben From ncoghlan at gmail.com Tue Jul 1 17:33:06 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Jul 2014 01:33:06 +1000 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: <53B2CCC1.3000409@sotecware.net> References: <53B2CCC1.3000409@sotecware.net> Message-ID: On 1 Jul 2014 07:31, "Victor Stinner" wrote: > > 2014-07-01 15:00 GMT+02:00 Ben Hoyt : > > 2) Nick Coghlan's proposal on the previous thread > > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > > suggesting an ensure_lstat keyword param to scandir if you need the > > lstat_result value > > I don't like this idea because it makes error handling more complex. > The syntax to catch exceptions on an iterator is verbose (while: try: > next() except ...). Actually, we may need to copy the os.walk API and accept an "onerror" callback as a scandir argument. Regardless of whether or not we have "ensure_lstat", the iteration step could fail, so I don't believe we can just transfer the existing approach of catching exceptions from the listdir call. > Whereas calling os.lstat(entry.fullname()) is explicit and it's easy > to surround it with try/except. > > > > .lstat_result being None sometimes (on POSIX), > > Don't do that, it's not how Python handles portability. We use hasattr(). That's not true in general - we do either, depending on context. With the addition of an os.walk style onerror callback, I'm still in favour of a "get_lstat" flag (tweaked as Ben suggests to always be None unless requested, so Windows code is less likely to be inadvertently non-portable) > > would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? > > Yes, it can happen. The filesystem is system-wide and shared by all > users. The file can be deleted. We need per-iteration error handling for the readdir call anyway, so I think an onerror callback is a better option than dropping the ability to easily obtain full stat information as part of the iteration. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Tue Jul 1 17:42:25 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jul 2014 11:42:25 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: <53B2CCC1.3000409@sotecware.net> Message-ID: > We need per-iteration error handling for the readdir call anyway, so I think > an onerror callback is a better option than dropping the ability to easily > obtain full stat information as part of the iteration. I don't mind the idea of an "onerror" callback, but it's adding complexity. Putting aside the question of caching/timing for a second and assuming .lstat() as per the current PEP 471, do we really need per-iteration error handling for readdir()? When would that actually fail in practice? -Ben From ethan at stoneleaf.us Tue Jul 1 17:34:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 01 Jul 2014 08:34:20 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: <53B2CCC1.3000409@sotecware.net> References: <53B2CCC1.3000409@sotecware.net> Message-ID: <53B2D4FC.2090601@stoneleaf.us> On 07/01/2014 07:59 AM, Jonas Wielicki wrote: > > I had the idea to treat a failing lstat() inside scandir() as if the > entry wasn?t found at all, but in this context, this seems wrong too. Well, os.walk supports passing in an error handler -- perhaps scandir should as well. -- ~Ethan~ From janzert at janzert.com Tue Jul 1 18:06:58 2014 From: janzert at janzert.com (Janzert) Date: Tue, 01 Jul 2014 12:06:58 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: On 6/26/2014 6:59 PM, Ben Hoyt wrote: > Rationale > ========= > > Python's built-in ``os.walk()`` is significantly slower than it needs > to be, because -- in addition to calling ``os.listdir()`` on each > directory -- it executes the system call ``os.stat()`` or > ``GetFileAttributes()`` on each file to determine whether the entry is > a directory or not. > > But the underlying system calls -- ``FindFirstFile`` / > ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- > already tell you whether the files returned are directories or not, so > no further system calls are needed. In short, you can reduce the > number of system calls from approximately 2N to N, where N is the > total number of files and directories in the tree. (And because > directory trees are usually much wider than they are deep, it's often > much better than this.) > One of the major reasons for this seems to be efficiently using information that is already available from the OS "for free". Unfortunately it seems that the current API and most of the leading alternate proposals hide from the user what information is actually there "free" and what is going to incur an extra cost. I would prefer an API that simply gives whatever came for free from the OS and then let the user decide if the extra expense is worth the extra information. Maybe that stat information was only going to be used for an informational log that can be skipped if it's going to incur extra expense? Janzert From 4kir4.1i at gmail.com Tue Jul 1 17:58:03 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Tue, 01 Jul 2014 19:58:03 +0400 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) References: Message-ID: <87d2dpm5pw.fsf@gmail.com> Ben Hoyt writes: > Thanks, Victor. > > I don't have any experience with dir_fd handling, so unfortunately > can't really comment here. > > What advantages does it bring? I notice that even os.listdir() on > Python 3.4 doesn't have anything related to file descriptors, so I'd > be in favour of not including support. We can always add it later. > > -Ben FYI, os.listdir does support file descriptors in Python 3.3+ try: >>> import os >>> os.listdir(os.open('.', os.O_RDONLY)) NOTE: os.supports_fd and os.supports_dir_fd are different sets. See also, https://mail.python.org/pipermail/python-dev/2014-June/135265.html -- Akira P.S. Please, don't put your answer on top of the message you are replying to. > > On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner wrote: >> Hi, >> >> IMO we must decide if scandir() must support or not file descriptor. >> It's an important decision which has an important impact on the API. >> >> >> To support scandir(fd), the minimum is to store dir_fd in DirEntry: >> dir_fd would be None for scandir(str). >> >> >> scandir(fd) must not close the file descriptor, it should be done by >> the caller. Handling the lifetime of the file descriptor is a >> difficult problem, it's better to let the user decide how to handle >> it. >> >> There is the problem of the limit of open file descriptors, usually >> 1024 but it can be lower. It *can* be an issue for very deep file >> hierarchy. >> >> If we choose to support scandir(fd), it's probably safer to not use >> scandir(fd) by default in os.walk() (use scandir(str) instead), wait >> until the feature is well tested, corner cases are well known, etc. >> >> >> The second step is to enhance pathlib.Path to support an optional file >> descriptor. Path already has methods on filenames like chmod(), >> exists(), rename(), etc. >> >> >> Example: >> >> fd = os.open(path, os.O_DIRECTORY) >> try: >> for entry in os.scandir(fd): >> # ... use entry to benefit of entry cache: is_dir(), lstat_result ... >> path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) >> # ... use path which uses dir_fd ... >> finally: >> os.close(fd) >> >> Problem: if the path object is stored somewhere and use after the >> loop, Path methods will fail because dir_fd was closed. It's even >> worse if a new directory uses the same file descriptor :-/ (security >> issue, or at least tricky bugs!) >> >> Victor >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com From ncoghlan at gmail.com Tue Jul 1 18:50:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Jul 2014 09:50:48 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: <53B2CCC1.3000409@sotecware.net> Message-ID: On 1 July 2014 08:42, Ben Hoyt wrote: >> We need per-iteration error handling for the readdir call anyway, so I think >> an onerror callback is a better option than dropping the ability to easily >> obtain full stat information as part of the iteration. > > I don't mind the idea of an "onerror" callback, but it's adding > complexity. Putting aside the question of caching/timing for a second > and assuming .lstat() as per the current PEP 471, do we really need > per-iteration error handling for readdir()? When would that actually > fail in practice? An NFS mount dropping the connection or a USB key being removed are the first that come to mind, but I expect there are others. I find it's generally better to just assume that any system call may fail for obscure reasons and put the infrastructure in place to deal with it rather than getting ugly, hard to track down bugs later. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From alex.gaynor at gmail.com Tue Jul 1 20:26:27 2014 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Tue, 1 Jul 2014 18:26:27 +0000 (UTC) Subject: [Python-Dev] Network Security Backport Status Message-ID: Hi all, I wanted to bring everyone up to speed on the status of PEP 466, what's been completed, and what's left to do. First the completed stuff: * hmac.compare_digest * hashlib.pbkdf2_hmac Are both backported, and I've added support to use them in Django, so users should start seeing these benefits just as soon as we get a Python release into their hands. Now the uncompleted stuff: * Persistent file descriptor for ``os.urandom`` * SSL module It's the SSL module that I'll spend the rest of this email talking about. Backporting the features from the Python3 version of this module has proven more difficult than I had expected. This is primarily because the stdlib took a maintenance strategy that was different from what most Python projects have done for their 2/3 support: multiple independent codebases. I've tried a few different strategies for the backport, none of which has worked: * Copying the ``ssl.py``, ``test_ssl.py``, and ``_ssl.c`` files from Python3 and trying to port all the code. * Coping just ``test_ssl.py`` and then copying individual chunks/functions as necessary to get stuff to pass. * Manually doing stuff. All of these proved to be a massive undertaking, and made it too easy to accidentally introduce breaking changes. I've come up with a new approach, which I believe is most likely to be successful, but I'll need help to implement it. The idea is to find the most recent commit which is a parent of both the ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` related file on the ``default`` branch, and attempt to replay it on the ``2.7`` branch. Require manual review on each commit to make sure it compiles, and to ensure it doesn't make any backwards incompatible changes. I think this provides the most iterative and guided approach to getting this done. I can do all the work of reviewing each commit, but I need some help from a mercurial expert to automate the cherry-picking/rebasing of every single commit. What do folks think? Does this approach make sense? Anyone willing to help with the mercurial scripting? Cheers, Alex From ncoghlan at gmail.com Tue Jul 1 21:00:38 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Jul 2014 12:00:38 -0700 Subject: [Python-Dev] Network Security Backport Status In-Reply-To: References: Message-ID: On 1 Jul 2014 11:28, "Alex Gaynor" wrote: > > I've come up with a new approach, which I believe is most likely to be > successful, but I'll need help to implement it. > > The idea is to find the most recent commit which is a parent of both the > ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` > related file on the ``default`` branch, and attempt to replay it on the ``2.7`` > branch. Require manual review on each commit to make sure it compiles, and to > ensure it doesn't make any backwards incompatible changes. > > I think this provides the most iterative and guided approach to getting this > done. Sounds promising, although it may still have some challenges if the SSL code depends on earlier changes to other code. > I can do all the work of reviewing each commit, but I need some help from a > mercurial expert to automate the cherry-picking/rebasing of every single > commit. > > What do folks think? Does this approach make sense? Anyone willing to help with > the mercurial scripting? For the Mercurial part, it's probably worth posing that as a Stack Overflow question: Given two named branches in http://hg.python.org (default and 2.7) and 4 files (Python module, C module, tests, docs): - find the common ancestor - find all the commits affecting those files on default & graft them to 2.7 (with a chance to test and edit each one first) It's just a better environment for asking & answering that kind of question :) Cheers, Nick. > > Cheers, > Alex > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wielicki at sotecware.net Tue Jul 1 20:45:22 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Tue, 01 Jul 2014 20:45:22 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: <53B2CCC1.3000409@sotecware.net> Message-ID: <53B301C2.6070206@sotecware.net> On 01.07.2014 17:30, Ben Hoyt wrote: >> No need for a microsecond-timed deletion -- a directory with +r but >> without +x will allow you to list the entries, but stat calls on the >> files will fail with EPERM: > > Ah -- very good to know, thanks. This definitely points me in the > direction of wanting better control over error handling. > > Speaking of errors, and thinking of handling errors during iteration > -- in what cases (if any) would an individual readdir fail if the > opendir succeeded? readdir(3) manpage suggests that readdir can only fail if an invalid directory fd was passed. regards, jwi > > -Ben > From antoine at python.org Tue Jul 1 22:54:28 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 01 Jul 2014 16:54:28 -0400 Subject: [Python-Dev] Network Security Backport Status In-Reply-To: References: Message-ID: Le 01/07/2014 14:26, Alex Gaynor a ?crit : > > I can do all the work of reviewing each commit, but I need some help from a > mercurial expert to automate the cherry-picking/rebasing of every single > commit. > > What do folks think? Does this approach make sense? Anyone willing to help with > the mercurial scripting? I don't think this makes much sense; Mercurial won't be smarter than you are. I think you'd have a better chance of succeeding by backporting one feature at a time. IMO, you'd first want to backport the _SSLContext base class and SSLContext.wrap_socket(). The latter *will* require some manual coding to adapt to 2.7's different SSLSocket implementation, not just applying patch hunks around. Regards Antoine. From guido at python.org Tue Jul 1 22:59:00 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Jul 2014 13:59:00 -0700 Subject: [Python-Dev] Network Security Backport Status In-Reply-To: References: Message-ID: I have to agree with Antoine -- I don't think there's a shortcut that avoids *someone* actually having to understand the code to the point of being able to recreate the same behavior in the different context (pun not intended) of Python 2. On Tue, Jul 1, 2014 at 1:54 PM, Antoine Pitrou wrote: > Le 01/07/2014 14:26, Alex Gaynor a ?crit : > > >> I can do all the work of reviewing each commit, but I need some help from >> a >> mercurial expert to automate the cherry-picking/rebasing of every single >> commit. >> >> What do folks think? Does this approach make sense? Anyone willing to >> help with >> the mercurial scripting? >> > > I don't think this makes much sense; Mercurial won't be smarter than you > are. I think you'd have a better chance of succeeding by backporting one > feature at a time. IMO, you'd first want to backport the _SSLContext base > class and SSLContext.wrap_socket(). The latter *will* require some manual > coding to adapt to 2.7's different SSLSocket implementation, not just > applying patch hunks around. > > Regards > > Antoine. > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Jul 1 23:20:17 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 1 Jul 2014 22:20:17 +0100 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: On 1 July 2014 14:00, Ben Hoyt wrote: > 2) Nick Coghlan's proposal on the previous thread > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > suggesting an ensure_lstat keyword param to scandir if you need the > lstat_result value > > I would make one small tweak to Nick Coghlan's proposal to make > writing cross-platform code easier. Instead of .lstat_result being > None sometimes (on POSIX), have it None always unless you specify > ensure_lstat=True. (Actually, call it get_lstat=True to kind of make > this more obvious.) Per (b) above, this means Windows developers > wouldn't accidentally write code which failed on POSIX systems -- it'd > fail fast on Windows too if you accessed .lstat_result without > specifying get_lstat=True. This is getting very complicated (at least to me, as a Windows user, where the basic idea seems straightforward). It seems to me that the right model is the standard "thin wrapper round the OS feature" that acts as a building block - it's typical of the rest of the os module. I think that thin wrapper is needed - even if the various bells and whistles are useful, they can be built on top of a low-level version (whereas the converse is not the case). Typically, such thin wrappers expose POSIX semantics by default, and Windows behaviour follows as closely as possible (see for example stat, where st_ino makes no sense on Windows, but is present). In this case, we're exposing Windows semantics, and POSIX is the one needing to fit the model, but the principle is the same. On that basis, optional attributes (as used in stat results) seem entirely sensible. The documentation for DirEntry could easily be written to parallel that of a stat result: """ The return value is an object whose attributes correspond to the data the OS returns about a directory entry: * name - the object's name * full_name - the object's full name (including path) * is_dir - whether the object is a directory * is file - whether the object is a plain file * is_symlink - whether the object is a symbolic link On Windows, the following attributes are also available * st_size - the size, in bytes, of the object (only meaningful for files) * st_atime - time of last access * st_mtime - time of last write * st_ctime - time of creation * st_file_attributes - Windows file attribute bits (see the FILE_ATTRIBUTE_* constants in the stat module) """ That's no harder to understand (or to work with) than the equivalent stat result. The only difference is that the unavailable attributes can be queried on POSIX, there's just a separate system call involved (with implications in terms of performance, error handling and potential race conditions). The version of scandir with the ensure_lstat argument is easy to write based on one with optional arguments (I'm playing fast and loose with adding attributes to DirEntry values here, just for the sake of an example - the details are left as an exercise) def scandir_ensure(path='.', ensure_lstat=False): for entry in os.scandir(path): if ensure_lstat and not hasattr(entry, 'st_size'): stat_data = os.lstat(entry.full_name) entry.st_size = stat_data.st_size entry.st_atime = stat_data.st_atime entry.st_mtime = stat_data.st_mtime entry.st_ctime = stat_data.st_ctime # Ignore file_attributes, as we'll never get here on Windows yield entry Variations on how you handle errors in the lstat call, etc, can be added to taste. Please, let's stick to a low-level wrapper round the OS API for the first iteration of this feature. Enhancements can be added later, when real-world usage has proved their value. Paul From v+python at g.nevcal.com Tue Jul 1 23:39:51 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 01 Jul 2014 14:39:51 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References:

Message-ID: <53B32AA7.1050305@g.nevcal.com> On 7/1/2014 2:20 PM, Paul Moore wrote: > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. I almost wrote this whole message this morning, but didn't have time. Thanks, Paul, for digging through the details. +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jul 1 23:30:48 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 01 Jul 2014 14:30:48 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References:

Message-ID: <53B32888.4020604@stoneleaf.us> On 07/01/2014 02:20 PM, Paul Moore wrote: > > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. +1 From rosuav at gmail.com Wed Jul 2 03:13:56 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jul 2014 11:13:56 +1000 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References:

Message-ID: On Wed, Jul 2, 2014 at 7:20 AM, Paul Moore wrote: > I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). +1. Make everything as simple as possible (but no simpler). ChrisA From benjamin at python.org Wed Jul 2 07:55:14 2014 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 01 Jul 2014 22:55:14 -0700 Subject: [Python-Dev] [RELEASE] Python 2.7.8 Message-ID: <1404280514.30741.136823729.74EB0C0B@webmail.messagingengine.com> Greetings, I have the distinct privilege of informing you that the latest release of the Python 2.7 series, 2.7.8, has been released and is available for download. 2.7.8 contains several important regression fixes and security changes: - The openssl version bundled in the Windows installer has been updated. - A regression in the mimetypes module on Windows has been fixed. [1] - A possible overflow in the buffer type has been fixed. [2] - A bug in the CGIHTTPServer module which allows arbitrary execution of code in the server root has been patched. [3] - A regression in the handling of UNC paths in os.path.join has been fixed. [4] Downloads of 2.7.8 are at https://www.python.org/download/releases/2.7.8/ The full changelog is located at http://hg.python.org/cpython/raw-file/v2.7.8/Misc/NEWS This is a production release. As always, please report bugs to http://bugs.python.org/ Till next time, Benjamin Peterson 2.7 Release Manager (on behalf of all of Python's contributors) [1] http://bugs.python.org/issue21652 [2] http://bugs.python.org/issue21831 [3] http://bugs.python.org/issue21766 [4] http://bugs.python.org/issue21672 From ncoghlan at gmail.com Wed Jul 2 08:35:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Jul 2014 23:35:48 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References:

Message-ID: On 1 July 2014 14:20, Paul Moore wrote: > On 1 July 2014 14:00, Ben Hoyt wrote: >> 2) Nick Coghlan's proposal on the previous thread >> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) >> suggesting an ensure_lstat keyword param to scandir if you need the >> lstat_result value >> >> I would make one small tweak to Nick Coghlan's proposal to make >> writing cross-platform code easier. Instead of .lstat_result being >> None sometimes (on POSIX), have it None always unless you specify >> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make >> this more obvious.) Per (b) above, this means Windows developers >> wouldn't accidentally write code which failed on POSIX systems -- it'd >> fail fast on Windows too if you accessed .lstat_result without >> specifying get_lstat=True. > > This is getting very complicated (at least to me, as a Windows user, > where the basic idea seems straightforward). > > It seems to me that the right model is the standard "thin wrapper > round the OS feature" that acts as a building block - it's typical of > the rest of the os module. I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). > Typically, such thin wrappers expose POSIX semantics by default, and > Windows behaviour follows as closely as possible (see for example > stat, where st_ino makes no sense on Windows, but is present). In this > case, we're exposing Windows semantics, and POSIX is the one needing > to fit the model, but the principle is the same. > > On that basis, optional attributes (as used in stat results) seem > entirely sensible. > > The documentation for DirEntry could easily be written to parallel > that of a stat result: > > """ > The return value is an object whose attributes correspond to the data > the OS returns about a directory entry: > > * name - the object's name > * full_name - the object's full name (including path) > * is_dir - whether the object is a directory > * is file - whether the object is a plain file > * is_symlink - whether the object is a symbolic link > > On Windows, the following attributes are also available > > * st_size - the size, in bytes, of the object (only meaningful for files) > * st_atime - time of last access > * st_mtime - time of last write > * st_ctime - time of creation > * st_file_attributes - Windows file attribute bits (see the > FILE_ATTRIBUTE_* constants in the stat module) > """ > > That's no harder to understand (or to work with) than the equivalent > stat result. The only difference is that the unavailable attributes > can be queried on POSIX, there's just a separate system call involved > (with implications in terms of performance, error handling and > potential race conditions). > > The version of scandir with the ensure_lstat argument is easy to write > based on one with optional arguments (I'm playing fast and loose with > adding attributes to DirEntry values here, just for the sake of an > example - the details are left as an exercise) > > def scandir_ensure(path='.', ensure_lstat=False): > for entry in os.scandir(path): > if ensure_lstat and not hasattr(entry, 'st_size'): > stat_data = os.lstat(entry.full_name) > entry.st_size = stat_data.st_size > entry.st_atime = stat_data.st_atime > entry.st_mtime = stat_data.st_mtime > entry.st_ctime = stat_data.st_ctime > # Ignore file_attributes, as we'll never get here on Windows > yield entry > > Variations on how you handle errors in the lstat call, etc, can be > added to taste. > > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. +1 from me - especially if this recipe goes in at least the PEP, and potentially even the docs. I'm also OK with postponing onerror support for the time being - that should be straightforward to add later if we decide we need it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From j.wielicki at sotecware.net Wed Jul 2 12:25:47 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Wed, 02 Jul 2014 12:25:47 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References:

Message-ID: <53B3DE2B.3050209@sotecware.net> On 01.07.2014 23:20, Paul Moore wrote: > [snip] > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. > > Paul +1 to the whole thing. That?s an ingeniously simple solution to the issues we?re having here. regards, jwi From cf.natali at gmail.com Wed Jul 2 12:51:43 2014 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 2 Jul 2014 11:51:43 +0100 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: 2014-07-01 8:44 GMT+01:00 Victor Stinner : > > IMO we must decide if scandir() must support or not file descriptor. > It's an important decision which has an important impact on the API. I don't think we should support it: it's way too complicated to use, error-prone, and leads to messy APIs. From victor.stinner at gmail.com Wed Jul 2 13:59:26 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 2 Jul 2014 13:59:26 +0200 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References:

Message-ID: 2014-07-02 12:51 GMT+02:00 Charles-Fran?ois Natali : > I don't think we should support it: it's way too complicated to use, > error-prone, and leads to messy APIs. Can you please elaborate? Which kind of issue do you see? Handling the lifetime of the directory file descriptor? You don't like the dir_fd parameter of os functions? I don't have an opinion of supporting scandir(int). I asked to discuss it in the PEP directly. Victor From benhoyt at gmail.com Wed Jul 2 14:41:28 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 2 Jul 2014 08:41:28 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References:

Message-ID: Thanks for the effort in your response, Paul. I'm all for KISS, but let's just slow down a bit here. > I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). Yes, but API design is important. For example, urllib2 has a kind of the "thin wrapper approach", but millions of people use the 3rd-party "requests" library because it's just so much nicer to use. There are low-level functions in the "os" module, but there are also a lot of higher-level functions (os.walk) and functions that smooth over cross-platform issues (os.stat). Detailed comments below. > The return value is an object whose attributes correspond to the data > the OS returns about a directory entry: > > * name - the object's name > * full_name - the object's full name (including path) > * is_dir - whether the object is a directory > * is file - whether the object is a plain file > * is_symlink - whether the object is a symbolic link > > On Windows, the following attributes are also available > > * st_size - the size, in bytes, of the object (only meaningful for files) > * st_atime - time of last access > * st_mtime - time of last write > * st_ctime - time of creation > * st_file_attributes - Windows file attribute bits (see the > FILE_ATTRIBUTE_* constants in the stat module) Again, this seems like a nice simple idea, but I think it's actually a worst-of-both-worlds solution -- it has a few problems: 1) It's a nasty API to actually write code with. If you try to use it, it gives off a "made only for low-level library authors" rather than "designed for developers" smell. For example, here's a get_tree_size() function I use written in both versions (original is the PEP 471 version with the addition of .full_name): def get_tree_size_original(path): """Return total size of all files in directory tree at path.""" total = 0 for entry in os.scandir(path): if entry.is_dir(): total += get_tree_size_original(entry.full_name) else: total += entry.lstat().st_size return total def get_tree_size_new(path): """Return total size of all files in directory tree at path.""" total = 0 for entry in os.scandir(path): if hasattr(entry, 'is_dir') and hasattr(entry, 'st_size'): is_dir = entry.is_dir size = entry.st_size else: st = os.lstat(entry.full_name) is_dir = stat.S_ISDIR(st.st_mode) size = st.st_size if is_dir: total += get_tree_size_new(entry.full_name) else: total += size return total I know which version I'd rather write and maintain! It seems to me new users and folks new to Python could easily write the top version, but the bottom is longer, more complicated, and harder to get right. It would also be very easy to write code in a way that works on Windows but bombs hard on POSIX. 2) It seems like your assumption is that is_dir/is_file/is_symlink are always available on POSIX via readdir. This isn't actually the case (this was discussed in the original threads) -- if readdir() returns dirent.d_type as DT_UNKNOWN, then you actually have to call os.stat() anyway to get it. So, as the above definition of get_tree_size_new() shows, you have to use getattr/hasattr on everything: is_dir/is_file/is_symlink as well as the st_* attributes. 3) It's not much different in concept to the PEP 471 version, except that PEP 471 has a built-in .lstat() method, making the user's life much easier. This is the sense in which it's the worst of both worlds -- it's a far less nice API to use, but it still has the same issues with race conditions the original does. So thinking about this again: First, based on the +1's to Paul's new solution, I don't think people are too concerned about the race condition issue (attributes being different between the original readdir and the os.stat calls). I think this is probably fair -- if folks care, they can handle it in an application-specific way. So that means Paul's new solution and the original PEP 471 approach are both okay on that score. Second, comparing PEP 471 to Nick's solution: error handling is much more straight-forward and simple to document with the original PEP 471 approach (just try/catch around the function calls) than with Nick's get_lstat=True approach of doing the stat() if needed inside the iterator. To catch errors with that approach, you'd either have to do a "while True" loop and try/catch around next(it) manually (which is very yucky code), or we'd have to add an onerror callback, which is somewhat less nice to use and harder to document (signature of the callback, exception object, etc). So given all of the above, I'm fairly strongly in favour of the approach in the original PEP 471 due to it's easy-to-use API and straight-forward try/catch approach to error handling. (My second option would be Nick's get_lstat=True with the onerror callback. My third option would be Paul's attribute-only solution, as it's just very hard to use.) Thoughts? -Ben From p.f.moore at gmail.com Wed Jul 2 15:48:12 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 2 Jul 2014 14:48:12 +0100 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References:

Message-ID: tl;dr - I agree with your points and think that the original PEP 471 proposal is fine. The details here are just clarification of why my proposal wasn't just "use PEP 471 as written" in the first place... On 2 July 2014 13:41, Ben Hoyt wrote: > 1) It's a nasty API to actually write code with. If you try to use it, > it gives off a "made only for low-level library authors" rather than > "designed for developers" smell. For example, here's a get_tree_size() > function I use written in both versions (original is the PEP 471 > version with the addition of .full_name): > > def get_tree_size_original(path): > """Return total size of all files in directory tree at path.""" > total = 0 > for entry in os.scandir(path): > if entry.is_dir(): > total += get_tree_size_original(entry.full_name) > else: > total += entry.lstat().st_size > return total > > def get_tree_size_new(path): > """Return total size of all files in directory tree at path.""" > total = 0 > for entry in os.scandir(path): > if hasattr(entry, 'is_dir') and hasattr(entry, 'st_size'): > is_dir = entry.is_dir > size = entry.st_size > else: > st = os.lstat(entry.full_name) > is_dir = stat.S_ISDIR(st.st_mode) > size = st.st_size > if is_dir: > total += get_tree_size_new(entry.full_name) > else: > total += size > return total > > I know which version I'd rather write and maintain! Fair point. But *only* because is_dir isn't guaranteed to be available. I could debate other aspects of your translation to use my API, but it's not relevant as my proposal was flawed in terms of is_XXX. > It seems to me new > users and folks new to Python could easily write the top version, but > the bottom is longer, more complicated, and harder to get right. Given the is_dir point, agreed. > It > would also be very easy to write code in a way that works on Windows > but bombs hard on POSIX. You may have a point here - my Windows bias may be showing. It's already awfully easy to write code that works on POSIX but bombs hard on Windows (deleting open files, for example) so I find it tempting to think "give them a taste of their own medicine" :-) More seriously, it seems to me that the scandir API is specifically designed to write efficient code on platforms where the OS gives information that allows you to do so. Warping the API too much to cater for platforms where that isn't possible seems to have the priorities backwards. Making the API not be an accident waiting to happen is fine, though. And let's be careful, too. My position is that it's not too hard to write code that works on Windows, Linux and OS X but you're right you could miss the problem with platforms that don't even support a free is_dir(). It's *easier* to write Windows-only code by mistake, but the fix to cover the "big three" is pretty simple (if not hasattr, lstat). > 2) It seems like your assumption is that is_dir/is_file/is_symlink are > always available on POSIX via readdir. This isn't actually the case > (this was discussed in the original threads) -- if readdir() returns > dirent.d_type as DT_UNKNOWN, then you actually have to call os.stat() > anyway to get it. So, as the above definition of get_tree_size_new() > shows, you have to use getattr/hasattr on everything: > is_dir/is_file/is_symlink as well as the st_* attributes. Ah, the wording in the PEP says "Linux, Windows, OS X". Superficially, that said "everywhere" to me. It might be worth calling out specifically some examples where it's not available without an extra system call, just to make the point explicit. You're right, though, that blows away the simplicity of my proposal. The original PEP 471 seems precisely right to me, in that case. I was only really arguing for attributes because they seem more obviously static than a method call. And personally I don't care about that aspect. > 3) It's not much different in concept to the PEP 471 version, except > that PEP 471 has a built-in .lstat() method, making the user's life > much easier. This is the sense in which it's the worst of both worlds > -- it's a far less nice API to use, but it still has the same issues > with race conditions the original does. Agreed. My intent was never to remove the race conditions, I see them as the responsibility of the application to consider (many applications simply won't care, and those that do will likely want a specific solution, not a library-level compromise). > So thinking about this again: > > First, based on the +1's to Paul's new solution, I don't think people > are too concerned about the race condition issue (attributes being > different between the original readdir and the os.stat calls). I think > this is probably fair -- if folks care, they can handle it in an > application-specific way. So that means Paul's new solution and the > original PEP 471 approach are both okay on that score. +1. That was my main point, in actual fact > Second, comparing PEP 471 to Nick's solution: error handling is much > more straight-forward and simple to document with the original PEP 471 > approach (just try/catch around the function calls) than with Nick's > get_lstat=True approach of doing the stat() if needed inside the > iterator. To catch errors with that approach, you'd either have to do > a "while True" loop and try/catch around next(it) manually (which is > very yucky code), or we'd have to add an onerror callback, which is > somewhat less nice to use and harder to document (signature of the > callback, exception object, etc). Agreed. If my solution had worked, it would have been by isolating a few extra cases where you could guarantee errors won't happen. But actually, errors *can* happen in those cases, on certain systems. So PEP 471 wins on all counts here too. > So given all of the above, I'm fairly strongly in favour of the > approach in the original PEP 471 due to it's easy-to-use API and > straight-forward try/catch approach to error handling. (My second > option would be Nick's get_lstat=True with the onerror callback. My > third option would be Paul's attribute-only solution, as it's just > very hard to use.) Agreed. The solution I proposed isn't just "very hard to use", it's actually wrong. If is_XXX are optional attributes, that's not my solution, and I agree it's *awful*. Paul. PS I'd suggest adding a "Rejected proposals" section to the PEP which mentions the race condition issue and points to this discussion as an indication that people didn't seem to see it as a problem. On 2 July 2014 13:41, Ben Hoyt wrote: > Thanks for the effort in your response, Paul. > > I'm all for KISS, but let's just slow down a bit here. > >> I think that thin wrapper is needed - even >> if the various bells and whistles are useful, they can be built on top >> of a low-level version (whereas the converse is not the case). > > Yes, but API design is important. For example, urllib2 has a kind of > the "thin wrapper approach", but millions of people use the 3rd-party > "requests" library because it's just so much nicer to use. > > There are low-level functions in the "os" module, but there are also a > lot of higher-level functions (os.walk) and functions that smooth over > cross-platform issues (os.stat). > > Detailed comments below. > >> The return value is an object whose attributes correspond to the data >> the OS returns about a directory entry: >> >> * name - the object's name >> * full_name - the object's full name (including path) >> * is_dir - whether the object is a directory >> * is file - whether the object is a plain file >> * is_symlink - whether the object is a symbolic link >> >> On Windows, the following attributes are also available >> >> * st_size - the size, in bytes, of the object (only meaningful for files) >> * st_atime - time of last access >> * st_mtime - time of last write >> * st_ctime - time of creation >> * st_file_attributes - Windows file attribute bits (see the >> FILE_ATTRIBUTE_* constants in the stat module) > > Again, this seems like a nice simple idea, but I think it's actually a > worst-of-both-worlds solution -- it has a few problems: > > 1) It's a nasty API to actually write code with. If you try to use it, > it gives off a "made only for low-level library authors" rather than > "designed for developers" smell. For example, here's a get_tree_size() > function I use written in both versions (original is the PEP 471 > version with the addition of .full_name): > > def get_tree_size_original(path): > """Return total size of all files in directory tree at path.""" > total = 0 > for entry in os.scandir(path): > if entry.is_dir(): > total += get_tree_size_original(entry.full_name) > else: > total += entry.lstat().st_size > return total > > def get_tree_size_new(path): > """Return total size of all files in directory tree at path.""" > total = 0 > for entry in os.scandir(path): > if hasattr(entry, 'is_dir') and hasattr(entry, 'st_size'): > is_dir = entry.is_dir > size = entry.st_size > else: > st = os.lstat(entry.full_name) > is_dir = stat.S_ISDIR(st.st_mode) > size = st.st_size > if is_dir: > total += get_tree_size_new(entry.full_name) > else: > total += size > return total > > I know which version I'd rather write and maintain! It seems to me new > users and folks new to Python could easily write the top version, but > the bottom is longer, more complicated, and harder to get right. It > would also be very easy to write code in a way that works on Windows > but bombs hard on POSIX. > > 2) It seems like your assumption is that is_dir/is_file/is_symlink are > always available on POSIX via readdir. This isn't actually the case > (this was discussed in the original threads) -- if readdir() returns > dirent.d_type as DT_UNKNOWN, then you actually have to call os.stat() > anyway to get it. So, as the above definition of get_tree_size_new() > shows, you have to use getattr/hasattr on everything: > is_dir/is_file/is_symlink as well as the st_* attributes. > > 3) It's not much different in concept to the PEP 471 version, except > that PEP 471 has a built-in .lstat() method, making the user's life > much easier. This is the sense in which it's the worst of both worlds > -- it's a far less nice API to use, but it still has the same issues > with race conditions the original does. > > So thinking about this again: > > First, based on the +1's to Paul's new solution, I don't think people > are too concerned about the race condition issue (attributes being > different between the original readdir and the os.stat calls). I think > this is probably fair -- if folks care, they can handle it in an > application-specific way. So that means Paul's new solution and the > original PEP 471 approach are both okay on that score. > > Second, comparing PEP 471 to Nick's solution: error handling is much > more straight-forward and simple to document with the original PEP 471 > approach (just try/catch around the function calls) than with Nick's > get_lstat=True approach of doing the stat() if needed inside the > iterator. To catch errors with that approach, you'd either have to do > a "while True" loop and try/catch around next(it) manually (which is > very yucky code), or we'd have to add an onerror callback, which is > somewhat less nice to use and harder to document (signature of the > callback, exception object, etc). > > So given all of the above, I'm fairly strongly in favour of the > approach in the original PEP 471 due to it's easy-to-use API and > straight-forward try/catch approach to error handling. (My second > option would be Nick's get_lstat=True with the onerror callback. My > third option would be Paul's attribute-only solution, as it's just > very hard to use.) > > Thoughts? > > -Ben From benhoyt at gmail.com Wed Jul 2 16:48:50 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 2 Jul 2014 10:48:50 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References:

Message-ID: Thanks for the clarifications and support. > Ah, the wording in the PEP says "Linux, Windows, OS X". Superficially, > that said "everywhere" to me. It might be worth calling out > specifically some examples where it's not available without an extra > system call, just to make the point explicit. Good call. I'll update the wording in the PEP here and try to call out specific examples of where is_dir() could call os.stat(). Hard-core POSIX people, do you know when readdir() d_type will be DT_UNKNOWN on (for example) Linux or OS X? I suspect this can happen on certain network filesystems, but I'm not sure. > PS I'd suggest adding a "Rejected proposals" section to the PEP which > mentions the race condition issue and points to this discussion as an > indication that people didn't seem to see it as a problem. Definitely agreed. I'll add this, and clarify various other issues in the PEP, and then repost. -Ben From cf.natali at gmail.com Wed Jul 2 19:20:41 2014 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 2 Jul 2014 18:20:41 +0100 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References:

Message-ID: > 2014-07-02 12:51 GMT+02:00 Charles-Fran?ois Natali : >> I don't think we should support it: it's way too complicated to use, >> error-prone, and leads to messy APIs. > > Can you please elaborate? Which kind of issue do you see? Handling the > lifetime of the directory file descriptor? Yes, among other things. You can e.g. have a look at os.fwalk() or shutil._rmtree_safe_fd() to see that using those *properly* is far from being trivial. > You don't like the dir_fd parameter of os functions? Exactly, I think it complicates the API for little benefit (FWIW, no other language I know of exposes them). From Nikolaus at rath.org Wed Jul 2 23:59:01 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 02 Jul 2014 14:59:01 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: (Ben Hoyt's message of "Wed, 2 Jul 2014 10:48:50 -0400") References:

Message-ID: <877g3vxw0q.fsf@rath.org> Ben Hoyt writes: > Thanks for the clarifications and support. > >> Ah, the wording in the PEP says "Linux, Windows, OS X". Superficially, >> that said "everywhere" to me. It might be worth calling out >> specifically some examples where it's not available without an extra >> system call, just to make the point explicit. > > Good call. I'll update the wording in the PEP here and try to call out > specific examples of where is_dir() could call os.stat(). > > Hard-core POSIX people, do you know when readdir() d_type will be > DT_UNKNOWN on (for example) Linux or OS X? I suspect this can happen > on certain network filesystems, but I'm not sure. Any fuse file system mounted by some other user and without -o allow_other. For these entries, stat() will fail as well. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From bcannon at gmail.com Fri Jul 4 15:00:26 2014 From: bcannon at gmail.com (Brett Cannon) Date: Fri, 04 Jul 2014 13:00:26 +0000 Subject: [Python-Dev] [Python-checkins] Daily reference leaks (42917d774476): sum=9 References: Message-ID: Looks like there is an actual leak found by test_io. Any ideas on what may have introduced it? On Fri Jul 04 2014 at 5:01:02 AM, wrote: > results for 42917d774476 on branch "default" > -------------------------------------------- > > test_functools leaked [0, 0, 3] memory blocks, sum=3 > test_io leaked [2, 2, 2] references, sum=6 > > > Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R', > '3:3:/home/antoine/cpython/refleaks/reflogODkfML', '-x'] > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > https://mail.python.org/mailman/listinfo/python-checkins > -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jul 4 18:07:58 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 4 Jul 2014 18:07:58 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140704160758.440EB56A6A@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-06-27 - 2014-07-04) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4603 (-40) closed 29086 (+82) total 33689 (+42) Open issues with patches: 2150 Issues opened (34) ================== #8631: subprocess.Popen.communicate(...) hangs on Windows http://bugs.python.org/issue8631 reopened by terry.reedy #20155: Regression test test_httpservers fails, hangs on Windows http://bugs.python.org/issue20155 reopened by r.david.murray #21876: os.rename(src,dst) does nothing when src and dst files are har http://bugs.python.org/issue21876 opened by Aaron.Swan #21877: External.bat and pcbuild of tkinter do not match. http://bugs.python.org/issue21877 opened by terry.reedy #21878: wsgi.simple_server's wsgi.input read/readline waits forever in http://bugs.python.org/issue21878 opened by rschoon #21879: str.format() gives poor diagnostic on placeholder mismatch http://bugs.python.org/issue21879 opened by roysmith #21880: IDLE: Ability to run 3rd party code checkers http://bugs.python.org/issue21880 opened by sahutd #21881: python cannot parse tcl value http://bugs.python.org/issue21881 opened by schwab #21882: turtledemo modules imported by test___all__ cause side effects http://bugs.python.org/issue21882 opened by ned.deily #21883: relpath: Provide better errors when mixing bytes and strings http://bugs.python.org/issue21883 opened by Matt.Bachmann #21885: shutil.copytree hangs (on copying root directory of a lxc cont http://bugs.python.org/issue21885 opened by krichter #21886: asyncio: Future.set_result() called on cancelled Future raises http://bugs.python.org/issue21886 opened by haypo #21888: plistlib.FMT_BINARY behavior doesn't send required dict parame http://bugs.python.org/issue21888 opened by n8henrie #21889: https://docs.python.org/2/library/multiprocessing.html#process http://bugs.python.org/issue21889 opened by krichter #21890: wsgiref.simple_server sends headers on empty bytes http://bugs.python.org/issue21890 opened by rschoon #21895: signal.pause() doesn't wake up on SIGCHLD in non-main thread http://bugs.python.org/issue21895 opened by bkabrda #21896: Unexpected ConnectionResetError in urllib.request against a va http://bugs.python.org/issue21896 opened by Tymoteusz.Paul #21897: frame.f_locals causes segfault on Python >=3.4.1 http://bugs.python.org/issue21897 opened by msmhrt #21898: .hgignore: Missing ignores for Eclipse/pydev http://bugs.python.org/issue21898 opened by andymaier #21899: Futures are not marked as completed http://bugs.python.org/issue21899 opened by Sebastian.Kreft.Deezer #21901: test_selectors.PollSelectorTestCase.test_above_fd_setsize repo http://bugs.python.org/issue21901 opened by r.david.murray #21902: Docstring of math.acosh, asinh, and atanh http://bugs.python.org/issue21902 opened by kdavies4 #21903: ctypes documentation MessageBoxA example produces error http://bugs.python.org/issue21903 opened by Dan.O'Donovan #21905: RuntimeError in pickle.whichmodule when sys.modules if mutate http://bugs.python.org/issue21905 opened by Olivier.Grisel #21906: Tools\Scripts\md5sum.py doesn't work in Python 3.x http://bugs.python.org/issue21906 opened by torrin #21907: Update Windows build batch scripts http://bugs.python.org/issue21907 opened by zach.ware #21909: PyLong_FromString drops const http://bugs.python.org/issue21909 opened by h.venev #21910: File protocol should document if writelines must handle genera http://bugs.python.org/issue21910 opened by JanKanis #21911: "IndexError: tuple index out of range" should include the requ http://bugs.python.org/issue21911 opened by cool-RR #21913: Possible deadlock in threading.Condition.wait() in Python 2.7. http://bugs.python.org/issue21913 opened by sangeeth #21914: Create unit tests for Turtle guionly http://bugs.python.org/issue21914 opened by Lita.Cho #21915: telnetlib.Telnet constructor does not match telnetlib.Telnet._ http://bugs.python.org/issue21915 opened by yaneurabeya #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 opened by ingrid #21917: Python 2.7.7 Tests fail, and math is faulty http://bugs.python.org/issue21917 opened by repcsike Most recent 15 issues with no replies (15) ========================================== #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 #21909: PyLong_FromString drops const http://bugs.python.org/issue21909 #21899: Futures are not marked as completed http://bugs.python.org/issue21899 #21898: .hgignore: Missing ignores for Eclipse/pydev http://bugs.python.org/issue21898 #21889: https://docs.python.org/2/library/multiprocessing.html#process http://bugs.python.org/issue21889 #21885: shutil.copytree hangs (on copying root directory of a lxc cont http://bugs.python.org/issue21885 #21874: test_strptime fails on rhel/centos/fedora systems http://bugs.python.org/issue21874 #21865: Improve invalid category exception for warnings.filterwarnings http://bugs.python.org/issue21865 #21859: Add Python implementation of FileIO http://bugs.python.org/issue21859 #21854: Fix cookielib in unicodeless build http://bugs.python.org/issue21854 #21853: Fix inspect in unicodeless build http://bugs.python.org/issue21853 #21852: Fix optparse in unicodeless build http://bugs.python.org/issue21852 #21851: Fix gettext in unicodeless build http://bugs.python.org/issue21851 #21850: Fix httplib and SimpleHTTPServer in unicodeless build http://bugs.python.org/issue21850 #21847: Fix xmlrpc in unicodeless build http://bugs.python.org/issue21847 Most recent 15 issues waiting for review (15) ============================================= #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 #21914: Create unit tests for Turtle guionly http://bugs.python.org/issue21914 #21907: Update Windows build batch scripts http://bugs.python.org/issue21907 #21906: Tools\Scripts\md5sum.py doesn't work in Python 3.x http://bugs.python.org/issue21906 #21905: RuntimeError in pickle.whichmodule when sys.modules if mutate http://bugs.python.org/issue21905 #21903: ctypes documentation MessageBoxA example produces error http://bugs.python.org/issue21903 #21902: Docstring of math.acosh, asinh, and atanh http://bugs.python.org/issue21902 #21898: .hgignore: Missing ignores for Eclipse/pydev http://bugs.python.org/issue21898 #21897: frame.f_locals causes segfault on Python >=3.4.1 http://bugs.python.org/issue21897 #21890: wsgiref.simple_server sends headers on empty bytes http://bugs.python.org/issue21890 #21883: relpath: Provide better errors when mixing bytes and strings http://bugs.python.org/issue21883 #21880: IDLE: Ability to run 3rd party code checkers http://bugs.python.org/issue21880 #21868: Tbuffer in turtle allows negative size http://bugs.python.org/issue21868 #21865: Improve invalid category exception for warnings.filterwarnings http://bugs.python.org/issue21865 #21862: cProfile command-line should accept "-m module_name" as an alt http://bugs.python.org/issue21862 Top 10 most discussed issues (10) ================================= #21902: Docstring of math.acosh, asinh, and atanh http://bugs.python.org/issue21902 13 msgs #21911: "IndexError: tuple index out of range" should include the requ http://bugs.python.org/issue21911 11 msgs #12067: Doc: remove errors about mixed-type comparisons. http://bugs.python.org/issue12067 8 msgs #20155: Regression test test_httpservers fails, hangs on Windows http://bugs.python.org/issue20155 8 msgs #12750: datetime.strftime('%s') should respect tzinfo http://bugs.python.org/issue12750 7 msgs #21090: File read silently stops after EIO I/O error http://bugs.python.org/issue21090 7 msgs #12420: distutils tests fail if PATH is not defined http://bugs.python.org/issue12420 6 msgs #14050: Tutorial, list.sort() and items comparability http://bugs.python.org/issue14050 6 msgs #21882: turtledemo modules imported by test___all__ cause side effects http://bugs.python.org/issue21882 6 msgs #2571: can cmd.py's API/docs for the use of an alternate stdin be imp http://bugs.python.org/issue2571 5 msgs Issues closed (72) ================== #2057: difflib: add patch capability http://bugs.python.org/issue2057 closed by terry.reedy #4899: doctest should support fixtures http://bugs.python.org/issue4899 closed by terry.reedy #5207: extend strftime/strptime format for RFC3339 and RFC2822 http://bugs.python.org/issue5207 closed by belopolsky #5638: test_httpservers fails CGI tests if --enable-shared http://bugs.python.org/issue5638 closed by ned.deily #5862: multiprocessing 'using a remote manager' example errors and po http://bugs.python.org/issue5862 closed by berker.peksag #5930: Transient error in multiprocessing (test_number_of_objects) http://bugs.python.org/issue5930 closed by haypo #6692: asyncore kqueue support http://bugs.python.org/issue6692 closed by haypo #7506: multiprocessing.managers.BaseManager.__reduce__ references Bas http://bugs.python.org/issue7506 closed by berker.peksag #7885: test_distutils fails if Python built in separate directory http://bugs.python.org/issue7885 closed by ned.deily #9860: Building python outside of source directory fails http://bugs.python.org/issue9860 closed by belopolsky #10000: mark more tests as CPython specific http://bugs.python.org/issue10000 closed by rhettinger #10236: Sporadic failures of test_ssl http://bugs.python.org/issue10236 closed by ned.deily #10402: sporadic test_bsddb3 failures http://bugs.python.org/issue10402 closed by jcea #10445: _ast py3k : add lineno back to "args" node http://bugs.python.org/issue10445 closed by Claudiu.Popa #10941: imaplib: Internaldate2tuple produces wrong result if date is n http://bugs.python.org/issue10941 closed by r.david.murray #11273: asyncore creates selec (or poll) on every iteration http://bugs.python.org/issue11273 closed by haypo #11279: test_posix and lack of "id -G" support - less noise required? http://bugs.python.org/issue11279 closed by python-dev #11389: unittest: no way to control verbosity of doctests from cmd http://bugs.python.org/issue11389 closed by terry.reedy #11453: asyncore.file_wrapper should implement __del__ and call close http://bugs.python.org/issue11453 closed by haypo #11762: Ast doc: warning and version number http://bugs.python.org/issue11762 closed by berker.peksag #12401: unset PYTHON* environment variables when running tests http://bugs.python.org/issue12401 closed by haypo #12498: asyncore.dispatcher_with_send, disconnection problem + miss-co http://bugs.python.org/issue12498 closed by haypo #12814: Possible intermittent bug in test_array http://bugs.python.org/issue12814 closed by ned.deily #12842: Docs: first parameter of tp_richcompare() always has the corre http://bugs.python.org/issue12842 closed by asvetlov #12876: Make Test Error : ImportError: No module named _sha256 http://bugs.python.org/issue12876 closed by gregory.p.smith #13103: copy of an asyncore dispatcher causes infinite recursion http://bugs.python.org/issue13103 closed by haypo #13413: time.daylight incorrect behavior in linux glibc http://bugs.python.org/issue13413 closed by belopolsky #13689: fix CGI Web Applications with Python link in howto/urllib2 http://bugs.python.org/issue13689 closed by berker.peksag #13985: Menu.tk_popup : menu doesn't disapear when main window is ico http://bugs.python.org/issue13985 closed by ned.deily #14069: In extensions (?...) the lookbehind assertion cannot choose be http://bugs.python.org/issue14069 closed by ezio.melotti #14097: Improve the "introduction" page of the tutorial http://bugs.python.org/issue14097 closed by zach.ware #14235: test_cmd.py does not correctly call reload() http://bugs.python.org/issue14235 closed by berker.peksag #14709: http.client fails sending read()able Object http://bugs.python.org/issue14709 closed by ned.deily #15014: smtplib: add support for arbitrary auth methods http://bugs.python.org/issue15014 closed by r.david.murray #15549: openssl version in windows builds does not support renegotiati http://bugs.python.org/issue15549 closed by ned.deily #15750: test_localtime_daylight_false_dst_true raises OverflowError: m http://bugs.python.org/issue15750 closed by haypo #15870: PyType_FromSpec should take metaclass as an argument http://bugs.python.org/issue15870 closed by belopolsky #16188: Windows C Runtime Library Mismatch http://bugs.python.org/issue16188 closed by rlinscheer #16474: More code coverage for imp module http://bugs.python.org/issue16474 closed by berker.peksag #17399: test_multiprocessing hang on Windows, non-sockets http://bugs.python.org/issue17399 closed by terry.reedy #18258: Fix test discovery for test_codecmaps*.py http://bugs.python.org/issue18258 closed by zach.ware #18592: Idle: test SearchDialogBase.py http://bugs.python.org/issue18592 closed by terry.reedy #19024: Document asterisk (*), splat or star operator http://bugs.python.org/issue19024 closed by terry.reedy #19870: Backport Cookie fix to 2.7 (httponly / secure flag) http://bugs.python.org/issue19870 closed by berker.peksag #20218: Add methods to `pathlib.Path`: `write_text`, `read_text`, `wri http://bugs.python.org/issue20218 closed by cool-RR #20961: Fix usages of the note directive in the documentation http://bugs.python.org/issue20961 closed by berker.peksag #21046: Document formulas used in statistics http://bugs.python.org/issue21046 closed by ezio.melotti #21151: winreg.SetValueEx causes crash if value = None http://bugs.python.org/issue21151 closed by python-dev #21447: Intermittent asyncio.open_connection / futures.InvalidStateErr http://bugs.python.org/issue21447 closed by haypo #21582: use support.captured_stdx context managers - test_asyncore http://bugs.python.org/issue21582 closed by python-dev #21652: Python 2.7.7 regression in mimetypes module on Windows http://bugs.python.org/issue21652 closed by python-dev #21679: Prevent extraneous fstat during open() http://bugs.python.org/issue21679 closed by pitrou #21755: test_importlib.test_locks fails --without-threads http://bugs.python.org/issue21755 closed by berker.peksag #21778: PyBuffer_FillInfo() from 3.3 http://bugs.python.org/issue21778 closed by skrah #21780: make unicodedata module 64-bit safe http://bugs.python.org/issue21780 closed by python-dev #21781: make _ssl module 64-bit clean http://bugs.python.org/issue21781 closed by haypo #21811: Anticipate fixes to 3.x and 2.7 for OS X 10.10 Yosemite suppor http://bugs.python.org/issue21811 closed by ned.deily #21856: memoryview: test slice clamping http://bugs.python.org/issue21856 closed by terry.reedy #21857: assert that functions clearing the current exception are not c http://bugs.python.org/issue21857 closed by haypo #21863: Display module names of C functions in cProfile http://bugs.python.org/issue21863 closed by pitrou #21871: Python 2.7.7 regression in mimetypes read_windows_registry http://bugs.python.org/issue21871 closed by python-dev #21884: turtle regression of issue #21823: "uncaught exception" on "AM http://bugs.python.org/issue21884 closed by ned.deily #21887: Python3 can't detect Tcl/Tk 8.6.1 http://bugs.python.org/issue21887 closed by ned.deily #21891: sysmodule.c, #define terminated with semicolon. http://bugs.python.org/issue21891 closed by ned.deily #21892: hashtable.c not using PY_FORMAT_SIZE_T http://bugs.python.org/issue21892 closed by python-dev #21893: unicodeobject.c not using PY_FORMAT_SIZE_T http://bugs.python.org/issue21893 closed by haypo #21894: ImportError: cannot import name jit http://bugs.python.org/issue21894 closed by ned.deily #21900: .hgignore: Missing ignores for downloaded doc build tools http://bugs.python.org/issue21900 closed by r.david.murray #21904: Multiple closures accessing the same non-local variable always http://bugs.python.org/issue21904 closed by r.david.murray #21908: Grammatical error in 3.4 tutorial http://bugs.python.org/issue21908 closed by r.david.murray #21912: Deferred logging may use outdated references http://bugs.python.org/issue21912 closed by vinay.sajip #777588: asyncore/Windows: select() doesn't report errors for a non-blo http://bugs.python.org/issue777588 closed by haypo From geertj at gmail.com Sat Jul 5 20:04:04 2014 From: geertj at gmail.com (Geert Jansen) Date: Sat, 5 Jul 2014 20:04:04 +0200 Subject: [Python-Dev] Memory BIO for _ssl Message-ID: Hi, the topic of a memory BIO for the _ssl module in the stdlib was discussed before here: http://mail.python.org/pipermail/python-ideas/2012-November/017686.html Since I need this for my Gruvi async framework, I want to volunteer to write a patch. It should be useful as well to Py3K's asyncio and other async frameworks. It would be good to get some feedback before I start on this. I was thinking of the following approach: * Add a new type to _ssl: PySSLMemoryBIO * PySSLMemoryBIO has a public constructor, and at least the following methods: puts() puts_eof() and gets(). I aligned the terminology with the method names in OpenSSL. puts_eof() does a BIO_set_mem_eof_return(-1). * All accesses to the memory BIO as non-blocking. * Update PySSLSocket to add support for SSL_set_bio(). The fact that the memory BIO is non-blocking makes it easier. None of the logic in and around check_socket_and_wait_for_timeout() for example needs to be changed. For the parts that deal with the socket directly, and that are in the code path for non-blocking IO, I think the preference would be i) try to change the code to use BIO methods that works for both sockets and memory BIOs, and ii) if not possible, special case it. * At this point the PySSLSocket name is a bit of a misnomer as it does more than sockets. Probably not an issue. * Add a method _wrap_bio(rbio, wbio, ...) to _SSLContext. * Expose the low-level methods via the "ssl" module. Creating an SSLSocket with a memory BIO would work something like this: context = SSLContext() rbio = ssl.MemoryBIO() wbio = ssl.MemoryBIO() sslsock = ssl.wrap_bio(rbio, wbio) To pass SSL data from the network and decrypt it into application level data (and potentially new SSL level data): rbio.puts(ssldata) appdata = sslsock.read() ssldata = wbio.gets() I currently have a utility class in my async IO framework (gruvi.io) called SslPipe that does the above, but it uses a socketpair instead of a memory BIO, and hence it works with the current _ssl. See here: https://github.com/geertj/gruvi/blob/master/gruvi/ssl.py#L86 This approach, while fine and very fast on Linux, gives me problems on Windows. It appears that on some older Windows versions, when I write data to one side of an (emulated) socket pair, it takes some time for it to become available at the other side. That breaks the synchronous interface that I need in order for this to work. And I can't fully work around it as I do not know in all situations whether or not to expect data on the socketpair. A memory BIO should be the right solution to this. Any feedback? Regards, Geert From breamoreboy at yahoo.co.uk Sun Jul 6 02:19:02 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 06 Jul 2014 01:19:02 +0100 Subject: [Python-Dev] Pending issues Message-ID: The following is a list of the 18 pending issues on the bug tracker. All have been in this state for at least one month so I'm assuming that they can be closed or they wouldn't have been set to pending in the first place. Can somebody take a look at them with a view to closing them or setting them back to open if needed. 16221 tokenize.untokenize() "compat" mode misses the encoding when using an iterator 15600 expose the finder details used by the FileFinder path hook 12588 test_capi.test_subinterps() failed on OpenBSD (powerpc) 7979 connect_ex returns 103 often 17668 re.split loses characters matching ungrouped parts of a pattern 11204 re module: strange behaviour of space inside {m, n} 14518 Add bcrypt $2a$ to crypt.py 15883 Add Py_errno to work around multiple CRT issue 19919 SSL: test_connect_ex_error fails with EWOULDBLOCK 20026 sqlite: handle correctly invalid isolation_level 18228 AIX locale parsing failure 1602742 itemconfigure returns incorrect text property of text items 19954 test_tk floating point exception on my gentoo box with tk 8.6.1 21084 IDLE can't deal with characters above the range (U+0000-U+FFFF) 20997 Wrong URL fragment identifier in search result 6895 locale._parse_localename fails when localename does not contain encoding information 1669539 Improve Windows os.path.join (ntpath.join) "smart" joining 21231 Issue a python 3 warning when old style classes are defined. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From antoine at python.org Mon Jul 7 01:49:23 2014 From: antoine at python.org (Antoine Pitrou) Date: Sun, 06 Jul 2014 19:49:23 -0400 Subject: [Python-Dev] Memory BIO for _ssl In-Reply-To: References: Message-ID: Hi, Le 05/07/2014 14:04, Geert Jansen a ?crit : > Since I need this for my Gruvi async framework, I want to volunteer to > write a patch. It should be useful as well to Py3K's asyncio and other > async frameworks. It would be good to get some feedback before I start > on this. Thanks for volunteering! This would be a very welcome addition. Thoughts: > I was thinking of the following approach: > > * Add a new type to _ssl: PySSLMemoryBIO > * PySSLMemoryBIO has a public constructor, and at least the following > methods: puts() puts_eof() and gets(). I aligned the terminology with > the method names in OpenSSL. puts_eof() does a > BIO_set_mem_eof_return(-1). Hmm... I haven't looked in detail, but at least I'd like those to be called read() and write() (and write_eof()), like most other I/O methods in Python. Or if we want to avoid confusion, add an explicit suffix (write_incoming?). > * All accesses to the memory BIO as non-blocking. Sounds sensible indeed (otherwise what would they wait for?). > * Update PySSLSocket to add support for SSL_set_bio(). The fact that > the memory BIO is non-blocking makes it easier. None of the logic in > and around check_socket_and_wait_for_timeout() for example needs to be > changed. For the parts that deal with the socket directly, and that > are in the code path for non-blocking IO, I think the preference would > be i) try to change the code to use BIO methods that works for both > sockets and memory BIOs, and ii) if not possible, special case it. That sounds good in the principle. I don't enough about memory BIOs to know whether you will have issues doing so :-) > * At this point the PySSLSocket name is a bit of a misnomer as it > does more than sockets. Probably not an issue. Agreed. > * Add a method _wrap_bio(rbio, wbio, ...) to _SSLContext. > * Expose the low-level methods via the "ssl" module. > > Creating an SSLSocket with a memory BIO would work something like this: > > context = SSLContext() > rbio = ssl.MemoryBIO() > wbio = ssl.MemoryBIO() > sslsock = ssl.wrap_bio(rbio, wbio) The one thing I find confusing is the r(ead)bio / w(rite)bio terminology (because you actually read and write from both). Perhaps incoming and outgoing would be clearer. Regards Antoine. From nad at acm.org Mon Jul 7 01:54:50 2014 From: nad at acm.org (Ned Deily) Date: Sun, 06 Jul 2014 16:54:50 -0700 Subject: [Python-Dev] buildbot.python.org down again? Message-ID: As of the moment, buildbot.python.org seems to be down again. Where is the best place to report problems like this? -- Ned Deily, nad at acm.org From tjreedy at udel.edu Mon Jul 7 08:33:04 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 07 Jul 2014 02:33:04 -0400 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: Message-ID: On 7/6/2014 7:54 PM, Ned Deily wrote: > As of the moment, buildbot.python.org seems to be down again. Several hours later, back up. > Where is the best place to report problems like this? We should have, if not already, an automatic system to detect down servers and report (email) to appropriate persons. -- Terry Jan Reedy From martin at v.loewis.de Mon Jul 7 08:39:07 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 07 Jul 2014 08:39:07 +0200 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: <53BA408B.5050901@v.loewis.de> Am 01.07.14 09:44, schrieb Victor Stinner: > scandir(fd) must not close the file descriptor, it should be done by > the caller. Handling the lifetime of the file descriptor is a > difficult problem, it's better to let the user decide how to handle > it. This is an open issue still: when is the file descriptor closed. I think the generator returned from scandir needs to support a .close method that guarantees to close the file descriptor. AFAICT, the pure-Python prototype of scandir already does, but it should be specified in the PEP. While we are at it: is it intended that the generator will also support the other generator methods, in particular .send and .throw? Regards, Martin From andreas.r.maier at gmx.de Mon Jul 7 13:22:27 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 07 Jul 2014 13:22:27 +0200 Subject: [Python-Dev] == on object tests identity in 3.x Message-ID: <53BA82F3.1070403@gmx.de> While discussing Python issue #12067 (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 implements '==' and '!=' on the object type such that if no special equality test operations are implemented in derived classes, there is a default implementation that tests for identity (as opposed to equality of the values). The relevant code is in function do_richcompare() in Objects/object.c. IMHO, that default implementation contradicts the definition that '==' and '!=' test for equality of the values of an object. Python 2.x does not seem to have such a default implementation; == and != raise an exception if attempted on objects that don't implement equality in derived classes. I'd like to gather comments on this issue, specifically: -> Can someone please elaborate what the reason for that is? -> Where is the discrepancy between the documentation of == and its default implementation on object documented? To me, a sensible default implementation for == on object would be (in Python): if v is w: return True; elif type(v) != type(w): return False else: raise ValueError("Equality cannot be determined in default implementation") Andy From benjamin at python.org Mon Jul 7 17:15:47 2014 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 07 Jul 2014 08:15:47 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BA82F3.1070403@gmx.de> References: <53BA82F3.1070403@gmx.de> Message-ID: <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> On Mon, Jul 7, 2014, at 04:22, Andreas Maier wrote: > While discussing Python issue #12067 > (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 > implements '==' and '!=' on the object type such that if no special > equality test operations are implemented in derived classes, there is a > default implementation that tests for identity (as opposed to equality > of the values). > > The relevant code is in function do_richcompare() in Objects/object.c. > > IMHO, that default implementation contradicts the definition that '==' > and '!=' test for equality of the values of an object. > > Python 2.x does not seem to have such a default implementation; == and > != raise an exception if attempted on objects that don't implement > equality in derived classes. Why do you think that? % python Python 2.7.6 (default, May 29 2014, 22:22:15) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class x(object): pass ... >>> class y(object): pass ... >>> x != y True >>> x == y False From rosuav at gmail.com Mon Jul 7 17:22:54 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Jul 2014 01:22:54 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> Message-ID: On Tue, Jul 8, 2014 at 1:15 AM, Benjamin Peterson wrote: > Why do you think that? > > % python > Python 2.7.6 (default, May 29 2014, 22:22:15) > [GCC 4.7.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> class x(object): pass > ... >>>> class y(object): pass > ... >>>> x != y > True >>>> x == y > False Your analysis is flawed - you're testing the equality of the types, not of instances. But your conclusion's correct; testing instances does work the same way you're implying: rosuav at sikorsky:~$ python Python 2.7.3 (default, Mar 13 2014, 11:03:55) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class x(object): pass ... >>> class y(object): pass ... >>> x() != y() True >>> x() == y() False >>> x() == x() False >>> z = x() >>> z == z True ChrisA From guido at python.org Mon Jul 7 17:44:28 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Jul 2014 08:44:28 -0700 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: Message-ID: It would still be nice to know who "the appropriate persons" are. Too much of our infrastructure seems to be maintained by house elves or the ITA. On Sun, Jul 6, 2014 at 11:33 PM, Terry Reedy wrote: > On 7/6/2014 7:54 PM, Ned Deily wrote: > >> As of the moment, buildbot.python.org seems to be down again. >> > > Several hours later, back up. > > > > Where is the best place to report problems like this? > > We should have, if not already, an automatic system to detect down servers > and report (email) to appropriate persons. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Mon Jul 7 17:55:50 2014 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 07 Jul 2014 08:55:50 -0700 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: Message-ID: <1404748550.13353.138929529.030DAD36@webmail.messagingengine.com> On Mon, Jul 7, 2014, at 08:44, Guido van Rossum wrote: > It would still be nice to know who "the appropriate persons" are. Too > much > of our infrastructure seems to be maintained by house elves or the ITA. :) Is ITA "International Trombone Association"? From andreas.r.maier at gmx.de Mon Jul 7 17:29:54 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 07 Jul 2014 17:29:54 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> Message-ID: <53BABCF2.50607@gmx.de> Am 07.07.2014 17:15, schrieb Benjamin Peterson: > On Mon, Jul 7, 2014, at 04:22, Andreas Maier wrote: >> >> Python 2.x does not seem to have such a default implementation; == and >> != raise an exception if attempted on objects that don't implement >> equality in derived classes. > > Why do you think that? Because I looked at the source code of try_rich_compare() in object.c of the 2.7 stream in the repository. Now, looking deeper into that module, it turns out there is a whole number of variations of comparison functions, so maybe I looked at the wrong one. Instead of trying to figure out how they are called, it is probably easier to just try it out, as you did. Your example certainly shows that == between instances of type object returns a value. So the Python 2.7 implementation shows the same discrepancy as Python 3.x regarding the == and != default implementation. Does anyone know why? Andy From python-dev at masklinn.net Mon Jul 7 17:58:39 2014 From: python-dev at masklinn.net (Xavier Morel) Date: Mon, 7 Jul 2014 17:58:39 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BA82F3.1070403@gmx.de> References: <53BA82F3.1070403@gmx.de> Message-ID: <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> On 2014-07-07, at 13:22 , Andreas Maier wrote: > While discussing Python issue #12067 (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 implements '==' and '!=' on the object type such that if no special equality test operations are implemented in derived classes, there is a default implementation that tests for identity (as opposed to equality of the values). > > The relevant code is in function do_richcompare() in Objects/object.c. > > IMHO, that default implementation contradicts the definition that '==' and '!=' test for equality of the values of an object. > > Python 2.x does not seem to have such a default implementation; == and != raise an exception if attempted on objects that don't implement equality in derived classes. That's incorrect on two levels: 1. What Terry notes in the bug comments is that because all Python 3 types inherit from object this can be done as a default __eq__/__ne__, in Python 2 the fallback is encoded in the comparison framework (PyObject_Compare and friends): http://hg.python.org/cpython/file/01ec8bb7187f/Objects/object.c#l756 2. Unless comparison methods are overloaded and throw an error it will always return either True or False (for comparison operator), never throw. > I'd like to gather comments on this issue, specifically: > > -> Can someone please elaborate what the reason for that is? > > -> Where is the discrepancy between the documentation of == and its default implementation on object documented? > > To me, a sensible default implementation for == on object would be (in Python): > > if v is w: > return True; > elif type(v) != type(w): > return False > else: > raise ValueError("Equality cannot be determined in default implementation") Why would comparing two objects of different types return False but comparing two objects of the same type raise an error? From andreas.r.maier at gmx.de Mon Jul 7 18:11:07 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 07 Jul 2014 18:11:07 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> References: <53BA82F3.1070403@gmx.de> <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> Message-ID: <53BAC69B.70901@gmx.de> Am 07.07.2014 17:58, schrieb Xavier Morel: > > On 2014-07-07, at 13:22 , Andreas Maier wrote: > >> While discussing Python issue #12067 (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 implements '==' and '!=' on the object type such that if no special equality test operations are implemented in derived classes, there is a default implementation that tests for identity (as opposed to equality of the values). >> >> The relevant code is in function do_richcompare() in Objects/object.c. >> >> IMHO, that default implementation contradicts the definition that '==' and '!=' test for equality of the values of an object. >> >> Python 2.x does not seem to have such a default implementation; == and != raise an exception if attempted on objects that don't implement equality in derived classes. > > That's incorrect on two levels: > > 1. What Terry notes in the bug comments is that because all Python 3 > types inherit from object this can be done as a default __eq__/__ne__, > in Python 2 the fallback is encoded in the comparison framework > (PyObject_Compare and friends): > http://hg.python.org/cpython/file/01ec8bb7187f/Objects/object.c#l756 > 2. Unless comparison methods are overloaded and throw an error it will > always return either True or False (for comparison operator), never throw. I was incorrect for Python 2.x. >> I'd like to gather comments on this issue, specifically: >> >> -> Can someone please elaborate what the reason for that is? >> >> -> Where is the discrepancy between the documentation of == and its default implementation on object documented? >> >> To me, a sensible default implementation for == on object would be (in Python): >> >> if v is w: >> return True; >> elif type(v) != type(w): >> return False >> else: >> raise ValueError("Equality cannot be determined in default implementation") > > Why would comparing two objects of different types return False Because I think (but I'm not sure) that the type should play a role for comparison of values. But maybe that does not embrace duck typing sufficiently, and the type should be ignored by default for comparing object values. > but comparing two objects of the same type raise an error? That I'm sure of: Because the default implementation (after having exhausted all possibilities of calling __eq__ and friends) has no way to find out whether the values(!!) of the objects are equal. Andy From ethan at stoneleaf.us Mon Jul 7 17:55:08 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 08:55:08 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BA82F3.1070403@gmx.de> References: <53BA82F3.1070403@gmx.de> Message-ID: <53BAC2DC.9030600@stoneleaf.us> On 07/07/2014 04:22 AM, Andreas Maier wrote: > > Where is the discrepancy between the documentation of == and its default implementation on object documented? There's seems to be no discrepancy (at least, you have not shown it), but to answer the question about why the default equals operation is an identity test: - all objects should be equal to themselves (there is only one that isn't, and it's weird) - equality tests should not, as a general rule, raise exceptions -- they should return True or False -- ~Ethan~ From andreas.r.maier at gmx.de Mon Jul 7 18:56:10 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 07 Jul 2014 18:56:10 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BAC2DC.9030600@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> Message-ID: <53BAD12A.20209@gmx.de> Am 07.07.2014 17:55, schrieb Ethan Furman: > On 07/07/2014 04:22 AM, Andreas Maier wrote: >> >> Where is the discrepancy between the documentation of == and its >> default implementation on object documented? > > There's seems to be no discrepancy (at least, you have not shown it), The documentation states consistently that == tests the equality of the value of an object. The default implementation of == in both 2.x and 3.x tests the object identity. Is that not a discrepancy? > but to answer the question about why the default equals operation is an > identity test: > > - all objects should be equal to themselves (there is only one that > isn't, and it's weird) I agree. But that is not a reason to conclude that different objects (as per their identity) should be unequal. Which is what the default implementation does. > - equality tests should not, as a general rule, raise exceptions -- > they should return True or False Why not? Ordering tests also raise exceptions if ordering is not implemented. Andy From guido at python.org Mon Jul 7 19:22:21 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Jul 2014 10:22:21 -0700 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: <1404748550.13353.138929529.030DAD36@webmail.messagingengine.com> References: <1404748550.13353.138929529.030DAD36@webmail.messagingengine.com> Message-ID: It's a reference to Neil Stephenson's Anathem. On Jul 7, 2014 8:55 AM, "Benjamin Peterson" wrote: > On Mon, Jul 7, 2014, at 08:44, Guido van Rossum wrote: > > It would still be nice to know who "the appropriate persons" are. Too > > much > > of our infrastructure seems to be maintained by house elves or the ITA. > > :) Is ITA "International Trombone Association"? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Mon Jul 7 19:47:31 2014 From: antoine at python.org (Antoine Pitrou) Date: Mon, 07 Jul 2014 13:47:31 -0400 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: <1404748550.13353.138929529.030DAD36@webmail.messagingengine.com> Message-ID: Le 07/07/2014 13:22, Guido van Rossum a ?crit : > It's a reference to Neil Stephenson's Anathem. According to Google, it doesn't look like he played the trombone, though. Regards Antoine. > > On Jul 7, 2014 8:55 AM, "Benjamin Peterson" > wrote: > > On Mon, Jul 7, 2014, at 08:44, Guido van Rossum wrote: > > It would still be nice to know who "the appropriate persons" are. Too > > much > > of our infrastructure seems to be maintained by house elves or > the ITA. > > :) Is ITA "International Trombone Association"? > > > From ethan at stoneleaf.us Mon Jul 7 19:43:34 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 10:43:34 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BAD12A.20209@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> Message-ID: <53BADC46.40400@stoneleaf.us> On 07/07/2014 09:56 AM, Andreas Maier wrote: > Am 07.07.2014 17:55, schrieb Ethan Furman: >> On 07/07/2014 04:22 AM, Andreas Maier wrote: >>> >>> Where is the discrepancy between the documentation of == and its >>> default implementation on object documented? >> >> There's seems to be no discrepancy (at least, you have not shown it), > > The documentation states consistently that == tests the equality of the value of an object. The default implementation > of == in both 2.x and 3.x tests the object identity. Is that not a discrepancy? One could say that the value of an object is the object itself. Since different objects are different, then they are not equal. >> but to answer the question about why the default equals operation is an >> identity test: >> >> - all objects should be equal to themselves (there is only one that >> isn't, and it's weird) > > I agree. But that is not a reason to conclude that different objects (as per their identity) should be unequal. Which is > what the default implementation does. Python cannot know which values are important in an equality test, and which are not. So it refuses to guess. Think of a chess board, for example. Are any two black pawns equal? All 16 pawns came from the same Pawn class, the only differences would be in the color and position, but the movement type is the same for all. So equality for a pawn might mean the same color, or it might mean color and position, or it might mean can move to the same position... it's up to the programmer to decide which of the possibilities is the correct one. Quite frankly, have equality mean identity in this case also makes a lot of sense. >> - equality tests should not, as a general rule, raise exceptions -- >> they should return True or False > > Why not? Ordering tests also raise exceptions if ordering is not implemented. Besides the pawn example, this is probably a matter of practicality over purity -- equality tests are used extensively through-out Python, and having exceptions raised at possibly any moment would not be a fun nor productive environment. Ordering is much less frequent, and since we already tried always ordering things, falling back to type name if necessary, we have discovered that that is not a good trade-off. So now if one tries to order things without specifying how it should be done, one gets an exception. -- ~Ethan~ From tjreedy at udel.edu Mon Jul 7 20:20:42 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 07 Jul 2014 14:20:42 -0400 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BA82F3.1070403@gmx.de> References: <53BA82F3.1070403@gmx.de> Message-ID: On 7/7/2014 7:22 AM, Andreas Maier wrote: > While discussing Python issue #12067 > (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 > implements '==' and '!=' on the object type such that if no special > equality test operations are implemented in derived classes, there is a > default implementation that tests for identity (as opposed to equality > of the values). > > The relevant code is in function do_richcompare() in Objects/object.c. > > IMHO, that default implementation contradicts the definition that '==' > and '!=' test for equality of the values of an object. A discrepancy between code and doc can be solved by changing either the code or doc. This is a case where the code should not change (for back compatibility with long standing behavior, if nothing else) and the doc should. -- Terry Jan Reedy From francismb at email.de Mon Jul 7 21:01:59 2014 From: francismb at email.de (francis) Date: Mon, 07 Jul 2014 21:01:59 +0200 Subject: [Python-Dev] Tracker Stats In-Reply-To: <20140623201225.0DA80250DE6@webabinitio.net> References: <53A84D41.6070508@email.de> <20140623201225.0DA80250DE6@webabinitio.net> Message-ID: <53BAEEA7.8050408@email.de> On 06/23/2014 10:12 PM, R. David Murray wrote: > The stats graphs are based on the data generated for the > weekly issue report. I have a patched version of that > report that adds the bug/enhancement info. I'll try to dig > it up this week; someone ping me if I forget :) It think > the patch will need to be updated based on Ezio's changes. > ping From ethan at stoneleaf.us Mon Jul 7 21:26:12 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 12:26:12 -0700 Subject: [Python-Dev] Tracker Stats In-Reply-To: <53BAEEA7.8050408@email.de> References: <53A84D41.6070508@email.de> <20140623201225.0DA80250DE6@webabinitio.net> <53BAEEA7.8050408@email.de> Message-ID: <53BAF454.6060304@stoneleaf.us> On 07/07/2014 12:01 PM, francis wrote: > On 06/23/2014 10:12 PM, R. David Murray wrote: > >> The stats graphs are based on the data generated for the >> weekly issue report. I have a patched version of that >> report that adds the bug/enhancement info. I'll try to dig >> it up this week; someone ping me if I forget :) It think >> the patch will need to be updated based on Ezio's changes. >> > ping pong From ethan at stoneleaf.us Mon Jul 7 18:09:28 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 09:09:28 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BABCF2.50607@gmx.de> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> Message-ID: <53BAC638.7030704@stoneleaf.us> On 07/07/2014 08:29 AM, Andreas Maier wrote: > > So the Python 2.7 implementation shows the same discrepancy as Python 3.x regarding the == and != default implementation. Why do you see this as a discrepancy? Just because two instances from the same object have the same value does not mean they are equal. For a real-life example, look at twins: biologically identical, yet not equal. looking-forward-to-the-rebuttal-mega-thread'ly yrs, -- ~Ethan~ From zuo at chopin.edu.pl Mon Jul 7 23:11:03 2014 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Mon, 07 Jul 2014 23:11:03 +0200 Subject: [Python-Dev] =?utf-8?q?=3D=3D_on_object_tests_identity_in_3=2Ex?= In-Reply-To: <53BAC69B.70901@gmx.de> References: <53BA82F3.1070403@gmx.de> <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> <53BAC69B.70901@gmx.de> Message-ID: <8564322772978800ae89623d1426b469@chopin.edu.pl> 07.07.2014 18:11, Andreas Maier wrote: > Am 07.07.2014 17:58, schrieb Xavier Morel: >> >> On 2014-07-07, at 13:22 , Andreas Maier >> wrote: >> >>> While discussing Python issue #12067 >>> (http://bugs.python.org/issue12067#msg222442), I learned that Python >>> 3.4 implements '==' and '!=' on the object type such that if no >>> special equality test operations are implemented in derived classes, >>> there is a default implementation that tests for identity (as opposed >>> to equality of the values). [...] >>> IMHO, that default implementation contradicts the definition that >>> '==' and '!=' test for equality of the values of an object. [...] >>> To me, a sensible default implementation for == on object would be >>> (in Python): >>> >>> if v is w: >>> return True; >>> elif type(v) != type(w): >>> return False >>> else: >>> raise ValueError("Equality cannot be determined in default >>> implementation") >> >> Why would comparing two objects of different types return False > > Because I think (but I'm not sure) that the type should play a role > for comparison of values. But maybe that does not embrace duck typing > sufficiently, and the type should be ignored by default for comparing > object values. > >> but comparing two objects of the same type raise an error? > > That I'm sure of: Because the default implementation (after having > exhausted all possibilities of calling __eq__ and friends) has no way > to find out whether the values(!!) of the objects are equal. IMHO, in Python context, "value" is a very vague term. Quite often we can read it as the very basic (but not the only one) notion of "what makes objects being equal or not" -- and then saying that "objects are compared by value" is a tautology. In other words, what object's "value" is -- is dependent on its nature: e.g. the value of a list is what are the values of its consecutive (indexed) items; the value of a set is based on values of all its elements without notion of order or repetition; the value of a number is a set of its abstract mathematical properties that determine what makes objects being equal, greater, lesser, how particular arithmetic operations work etc... I think, there is no universal notion of "the value of a Python object". The notion of identity seems to be most generic (every object has it, event if it does not have any other property) -- and that's why by default it is used to define the most basic feature of object's *value*, i.e. "what makes objects being equal or not" (== and !=). Another possibility would be to raise TypeError but, as Ethan Furman wrote, it would be impractical (e.g. key-type-heterogenic dicts or sets would be practically impossible to work with). On the other hand, the notion of sorting order (< > <= >=) is a much more specialized object property. Cheers. *j From rob.cliffe at btinternet.com Mon Jul 7 23:31:55 2014 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 07 Jul 2014 22:31:55 +0100 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <8564322772978800ae89623d1426b469@chopin.edu.pl> References: <53BA82F3.1070403@gmx.de> <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> <53BAC69B.70901@gmx.de> <8564322772978800ae89623d1426b469@chopin.edu.pl> Message-ID: <53BB11CB.8020802@btinternet.com> On 07/07/2014 22:11, Jan Kaliszewski wrote: > [snip] > > IMHO, in Python context, "value" is a very vague term. Quite often we > can read it as the very basic (but not the only one) notion of "what > makes objects being equal or not" -- and then saying that "objects are > compared by value" is a tautology. > > In other words, what object's "value" is -- is dependent on its > nature: e.g. the value of a list is what are the values of its > consecutive (indexed) items; the value of a set is based on values of > all its elements without notion of order or repetition; the value of a > number is a set of its abstract mathematical properties that determine > what makes objects being equal, greater, lesser, how particular > arithmetic operations work etc... > > I think, there is no universal notion of "the value of a Python > object". The notion of identity seems to be most generic (every > object has it, event if it does not have any other property) -- and > that's why by default it is used to define the most basic feature of > object's *value*, i.e. "what makes objects being equal or not" (== and > !=). Another possibility would be to raise TypeError but, as Ethan > Furman wrote, it would be impractical (e.g. key-type-heterogenic dicts > or sets would be practically impossible to work with). On the other > hand, the notion of sorting order (< > <= >=) is a much more > specialized object property. Quite so. x, y = object(), object() print 'Equal:', ' '.join(attr for attr in dir(x) if getattr(x,attr)==getattr(y,attr)) print 'Unequal:', ' '.join(attr for attr in dir(x) if getattr(x,attr)!=getattr(y,attr)) Equal: __class__ __doc__ __new__ __subclasshook__ Unequal: __delattr__ __format__ __getattribute__ __hash__ __init__ __reduce__ __reduce_ex__ __repr__ __setattr__ __sizeof__ __str__ Andreas, what attribute or combination of attributes do you think should be the "values" of x and y? Rob Cliffe From ezio.melotti at gmail.com Tue Jul 8 00:38:05 2014 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Tue, 8 Jul 2014 01:38:05 +0300 Subject: [Python-Dev] Tracker Stats In-Reply-To: <53BAEEA7.8050408@email.de> References: <53A84D41.6070508@email.de> <20140623201225.0DA80250DE6@webabinitio.net> <53BAEEA7.8050408@email.de> Message-ID: On Mon, Jul 7, 2014 at 10:01 PM, francis wrote: > On 06/23/2014 10:12 PM, R. David Murray wrote: > >> The stats graphs are based on the data generated for the >> weekly issue report. I have a patched version of that >> report that adds the bug/enhancement info. I'll try to dig >> it up this week; someone ping me if I forget :) It think >> the patch will need to be updated based on Ezio's changes. >> > ping > If you just want some numbers you can try this: >>> import xmlrpclib >>> x = xmlrpclib.ServerProxy('http://bugs.python.org/xmlrpc', allow_none=True) >>> open_issues = x.filter('issue', None, dict(status=1)) # 1 == open >>> len(open_issues) 4541 >>> len(x.filter('issue', open_issues, dict(type=5))) # behavior 1798 >>> len(x.filter('issue', open_issues, dict(type=6))) # enhancement 1557 >>> len(x.filter('issue', open_issues, dict(type=1))) # crash 122 >>> len(x.filter('issue', open_issues, dict(type=2))) # compile error 141 >>> len(x.filter('issue', open_issues, dict(type=3))) # resource usage 103 >>> len(x.filter('issue', open_issues, dict(type=4))) # security 32 >>> len(x.filter('issue', open_issues, dict(type=7))) # performance 83 Best Regards, Ezio Melotti From andreas.r.maier at gmx.de Tue Jul 8 01:36:25 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:36:25 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BADC46.40400@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> Message-ID: <53BB2EF9.80002@gmx.de> Am 2014-07-07 19:43, schrieb Ethan Furman: > On 07/07/2014 09:56 AM, Andreas Maier wrote: >> Am 07.07.2014 17:55, schrieb Ethan Furman: >>> On 07/07/2014 04:22 AM, Andreas Maier wrote: >>>> >>>> Where is the discrepancy between the documentation of == and its >>>> default implementation on object documented? >>> >>> There's seems to be no discrepancy (at least, you have not shown it), >> >> The documentation states consistently that == tests the equality of >> the value of an object. The default implementation >> of == in both 2.x and 3.x tests the object identity. Is that not a >> discrepancy? > > One could say that the value of an object is the object itself. Since > different objects are different, then they are not equal. > >>> but to answer the question about why the default equals operation is an >>> identity test: >>> >>> - all objects should be equal to themselves (there is only one that >>> isn't, and it's weird) >> >> I agree. But that is not a reason to conclude that different objects >> (as per their identity) should be unequal. Which is >> what the default implementation does. > > Python cannot know which values are important in an equality test, and > which are not. So it refuses to guess. > Well, one could argue that using the address of an object for its value equality test is pretty close to guessing, considering that given a sensible definition of value equality, objects of different identity can very well be equal but will always be considered unequal based on the address. > Think of a chess board, for example. Are any two black pawns equal? > All 16 pawns came from the same Pawn class, the only differences would > be in the color and position, but the movement type is the same for all. > > So equality for a pawn might mean the same color, or it might mean > color and position, or it might mean can move to the same position... > it's up to the programmer to decide which of the possibilities is the > correct one. Quite frankly, have equality mean identity in this case > also makes a lot of sense. That's why I think equality is only defined once the class designer has defined it. Using the address as a default for equality (that is, in absence of such a designer's definition) may be an easy-to-implement default, but not a very logical or sensible one. > >>> - equality tests should not, as a general rule, raise exceptions -- >>> they should return True or False >> >> Why not? Ordering tests also raise exceptions if ordering is not >> implemented. > > Besides the pawn example, this is probably a matter of practicality > over purity -- equality tests are used extensively through-out Python, > and having exceptions raised at possibly any moment would not be a fun > nor productive environment. > So we have many cases of classes whose designers thought about whether a sensible definition of equality was needed, and decided that an address/identity-based equality definition was just what they needed, yet they did not want to or could not use the "is" operator? Can you give me an example for such a class (besides type object)? (I.e. a class that does not have __eq__() and __ne__() but whose instances are compared with == or !=) > Ordering is much less frequent, and since we already tried always > ordering things, falling back to type name if necessary, we have > discovered that that is not a good trade-off. So now if one tries to > order things without specifying how it should be done, one gets an > exception. In Python 2, the default ordering implementation on type object uses the identity (address) as the basis for ordering. In Python 3, that was changed to raise an exception. That seems to be in sync with what you are saying. Maybe it would have been possible to also change that for the default equality implementation in Python 3. But it was not changed. As I wrote in another response, we now need to document this properly. From andreas.r.maier at gmx.de Tue Jul 8 01:37:09 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:37:09 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2AC7.2060009@gmx.de> References: <53BB2AC7.2060009@gmx.de> Message-ID: <53BB2F25.3020205@gmx.de> Am 2014-07-07 23:11, schrieb Jan Kaliszewski: > 07.07.2014 18:11, Andreas Maier wrote: >> Am 07.07.2014 17:58, schrieb Xavier Morel: >>> On 2014-07-07, at 13:22 , Andreas Maier wrote: >>> >>>> While discussing Python issue #12067 >>>> (http://bugs.python.org/issue12067#msg222442), I learned that >>>> Python 3.4 implements '==' and '!=' on the object type such that if >>>> no special equality test operations are implemented in derived >>>> classes, there is a default implementation that tests for identity >>>> (as opposed to equality of the values). > [...] >>>> IMHO, that default implementation contradicts the definition that >>>> '==' and '!=' test for equality of the values of an object. > [...] >>>> To me, a sensible default implementation for == on object would be >>>> (in Python): >>>> >>>> if v is w: >>>> return True; >>>> elif type(v) != type(w): >>>> return False >>>> else: >>>> raise ValueError("Equality cannot be determined in default >>>> implementation") >>> >>> Why would comparing two objects of different types return False >> >> Because I think (but I'm not sure) that the type should play a role >> for comparison of values. But maybe that does not embrace duck typing >> sufficiently, and the type should be ignored by default for comparing >> object values. >> >>> but comparing two objects of the same type raise an error? >> >> That I'm sure of: Because the default implementation (after having >> exhausted all possibilities of calling __eq__ and friends) has no way >> to find out whether the values(!!) of the objects are equal. > > IMHO, in Python context, "value" is a very vague term. Quite often we > can read it as the very basic (but not the only one) notion of "what > makes objects being equal or not" -- and then saying that "objects are > compared by value" is a tautology. > > In other words, what object's "value" is -- is dependent on its > nature: e.g. the value of a list is what are the values of its > consecutive (indexed) items; the value of a set is based on values of > all its elements without notion of order or repetition; the value of a > number is a set of its abstract mathematical properties that determine > what makes objects being equal, greater, lesser, how particular > arithmetic operations work etc... > > I think, there is no universal notion of "the value of a Python > object". The notion of identity seems to be most generic (every > object has it, event if it does not have any other property) -- and > that's why by default it is used to define the most basic feature of > object's *value*, i.e. "what makes objects being equal or not" (== and > !=). Another possibility would be to raise TypeError but, as Ethan > Furman wrote, it would be impractical (e.g. key-type-heterogenic dicts > or sets would be practically impossible to work with). On the other > hand, the notion of sorting order (< > <= >=) is a much more > specialized object property. On the universal notion of a value in Python: In both 2.x and 3.x, it reads (in 3.1. Objects, values and types): - "Every object has an identity, a type and a value." - "An object's /identity/ never changes once it has been created; .... The /value/ of some objects can change. Objects whose value can change are said to be /mutable/; objects whose value is unchangeable once they are created are called /immutable/." These are clear indications that there is an intention to have separate concepts of identity and value in Python. If an instance of type object can exist but does not have a universal notion of value, it should not allow operations that need a value. I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python. The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it. I'll try to summarize in a separate posting. Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.r.maier at gmx.de Tue Jul 8 01:37:48 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:37:48 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2A09.4070208@gmx.de> References: <53BB2A09.4070208@gmx.de> Message-ID: <53BB2F4C.8060402@gmx.de> Am 2014-07-07 23:31, schrieb Rob Cliffe: > > On 07/07/2014 22:11, Jan Kaliszewski wrote: >> [snip] >> >> IMHO, in Python context, "value" is a very vague term. Quite often >> we can read it as the very basic (but not the only one) notion of >> "what makes objects being equal or not" -- and then saying that >> "objects are compared by value" is a tautology. >> >> In other words, what object's "value" is -- is dependent on its >> nature: e.g. the value of a list is what are the values of its >> consecutive (indexed) items; the value of a set is based on values of >> all its elements without notion of order or repetition; the value of >> a number is a set of its abstract mathematical properties that >> determine what makes objects being equal, greater, lesser, how >> particular arithmetic operations work etc... >> >> I think, there is no universal notion of "the value of a Python >> object". The notion of identity seems to be most generic (every >> object has it, event if it does not have any other property) -- and >> that's why by default it is used to define the most basic feature of >> object's *value*, i.e. "what makes objects being equal or not" (== >> and !=). Another possibility would be to raise TypeError but, as >> Ethan Furman wrote, it would be impractical (e.g. >> key-type-heterogenic dicts or sets would be practically impossible to >> work with). On the other hand, the notion of sorting order (< > <= >> >=) is a much more specialized object property. > Quite so. > > x, y = object(), object() > print 'Equal:', ' '.join(attr for attr in dir(x) if > getattr(x,attr)==getattr(y,attr)) > print 'Unequal:', ' '.join(attr for attr in dir(x) if > getattr(x,attr)!=getattr(y,attr)) > > Equal: __class__ __doc__ __new__ __subclasshook__ > Unequal: __delattr__ __format__ __getattribute__ __hash__ __init__ > __reduce__ __reduce_ex__ __repr__ __setattr__ __sizeof__ __str__ > > Andreas, what attribute or combination of attributes do you think > should be the "values" of x and y? > Rob Cliffe > Whatever the object's type defines to be the value. Which requires the presence of an __eq__() or __ne__() implementation. I could even live with a default implementation on type object that ANDs the equality of all instance data attributes and class data attributes, but that is not possible because type object does not have a notion of such data attributes. Reverting to using the identity for the value of an instance of type object is somehow helpless. It may make existing code work, but it is not very logical. I could even argue it makes some logical code fail, because while it reliably detects that the same objects are equal, it fails to detect that different objects may also be equal (at least under a sensible definition of value equality). Having said all this: As a few people already wrote, we cannot change the implementation within a major release. So the real question is how we document it. I'll try to summarize in a separate posting. Andy From benjamin at python.org Tue Jul 8 01:49:40 2014 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 07 Jul 2014 16:49:40 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2EF9.80002@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> Message-ID: <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> On Mon, Jul 7, 2014, at 16:36, Andreas Maier wrote: > Am 2014-07-07 19:43, schrieb Ethan Furman: > > On 07/07/2014 09:56 AM, Andreas Maier wrote: > >> Am 07.07.2014 17:55, schrieb Ethan Furman: > >>> On 07/07/2014 04:22 AM, Andreas Maier wrote: > >>>> > >>>> Where is the discrepancy between the documentation of == and its > >>>> default implementation on object documented? > >>> > >>> There's seems to be no discrepancy (at least, you have not shown it), > >> > >> The documentation states consistently that == tests the equality of > >> the value of an object. The default implementation > >> of == in both 2.x and 3.x tests the object identity. Is that not a > >> discrepancy? > > > > One could say that the value of an object is the object itself. Since > > different objects are different, then they are not equal. > > > >>> but to answer the question about why the default equals operation is an > >>> identity test: > >>> > >>> - all objects should be equal to themselves (there is only one that > >>> isn't, and it's weird) > >> > >> I agree. But that is not a reason to conclude that different objects > >> (as per their identity) should be unequal. Which is > >> what the default implementation does. > > > > Python cannot know which values are important in an equality test, and > > which are not. So it refuses to guess. > > > Well, one could argue that using the address of an object for its value > equality test is pretty close to guessing, considering that given a > sensible definition of value equality, objects of different identity can > very well be equal but will always be considered unequal based on the > address. Probably the best argument for the behavior is that "x is y" should imply "x == y", which preludes raising an exception. No such invariant is desired for ordering, so default implementations of < and > are not provided in Python 3. From andreas.r.maier at gmx.de Tue Jul 8 01:53:06 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:53:06 +0200 Subject: [Python-Dev] == on object tests identity in 3.x - summary Message-ID: <53BB32E2.40805@gmx.de> Thanks to all who responded. In absence of class-specific equality test methods, the default implementations revert to use the identity (=address) of the object as a basis for the test, in both Python 2 and Python 3. In absence of specific ordering test methods, the default implementations revert to use the identity (=address) of the object as a basis for the test, in Python 2. In Python 3, an exception is raised in that case. The bottom line of the discussion seems to be that this behavior is intentional, and a lot of code depends on it. We still need to figure out how to document this. Options could be: 1. We define that the default for the value of an object is its identity. That allows to describe the behavior of the equality test without special casing such objects, but it does not work for ordering. Also, I have difficulties stating what constitutes that default case, because it can really only be explained by referring to the presence or absence of the class-specific equality test and ordering test methods. 2. We don't say anything about the default value of an object, and describe the behavior of the equality test and ordering test, which both need to cover the case that the object does not have the respective test methods. It seems to me that only option 2 really works. Comments and further options welcome. Andy From andreas.r.maier at gmx.de Tue Jul 8 01:55:55 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:55:55 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> Message-ID: <53BB338B.2080401@gmx.de> Am 2014-07-08 01:49, schrieb Benjamin Peterson: > On Mon, Jul 7, 2014, at 16:36, Andreas Maier wrote: >> Am 2014-07-07 19:43, schrieb Ethan Furman: >>> On 07/07/2014 09:56 AM, Andreas Maier wrote: >>>> Am 07.07.2014 17:55, schrieb Ethan Furman: >>>>> On 07/07/2014 04:22 AM, Andreas Maier wrote: >>>>>> Where is the discrepancy between the documentation of == and its >>>>>> default implementation on object documented? >>>>> There's seems to be no discrepancy (at least, you have not shown it), >>>> The documentation states consistently that == tests the equality of >>>> the value of an object. The default implementation >>>> of == in both 2.x and 3.x tests the object identity. Is that not a >>>> discrepancy? >>> One could say that the value of an object is the object itself. Since >>> different objects are different, then they are not equal. >>> >>>>> but to answer the question about why the default equals operation is an >>>>> identity test: >>>>> >>>>> - all objects should be equal to themselves (there is only one that >>>>> isn't, and it's weird) >>>> I agree. But that is not a reason to conclude that different objects >>>> (as per their identity) should be unequal. Which is >>>> what the default implementation does. >>> Python cannot know which values are important in an equality test, and >>> which are not. So it refuses to guess. >>> >> Well, one could argue that using the address of an object for its value >> equality test is pretty close to guessing, considering that given a >> sensible definition of value equality, objects of different identity can >> very well be equal but will always be considered unequal based on the >> address. > Probably the best argument for the behavior is that "x is y" should > imply "x == y", which preludes raising an exception. No such invariant > is desired for ordering, so default implementations of < and > are not > provided in Python 3. I agree that "x is y" should imply "x == y". The problem of the default implementation is that "x is not y" implies "x != y" and that may or may not be true under a sensible definition of equality. From andreas.r.maier at gmx.de Tue Jul 8 02:12:14 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 02:12:14 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BAC638.7030704@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> <53BAC638.7030704@stoneleaf.us> Message-ID: <53BB375E.8010904@gmx.de> Am 2014-07-07 18:09, schrieb Ethan Furman: > Just because two instances from the same object have the same value > does not mean they are equal. For a real-life example, look at > twins: biologically identical, yet not equal. I think they *are* equal in Python if they have the same value, by definition, because somewhere the Python docs state that equality compares the object's values. The reality though is that value is more vague than equality test (as it was already pointed out in this thread): A class designer can directly implement what equality means to the class, but he or she cannot implement an accessor method for the value. The value plays a role only indirectly as part of equality and ordering tests. Andy From ethan at stoneleaf.us Tue Jul 8 01:50:57 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 16:50:57 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2EF9.80002@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> Message-ID: <53BB3261.6080705@stoneleaf.us> On 07/07/2014 04:36 PM, Andreas Maier wrote: > Am 2014-07-07 19:43, schrieb Ethan Furman: >> >> Python cannot know which values are important in an equality test, and which are not. So it refuses to guess. > > Well, one could argue that using the address of an object for its value equality test is pretty close to guessing, > considering that given a sensible definition of value equality, objects of different identity can very well be equal but > will always be considered unequal based on the address. And what would be this 'sensible definition'? > So we have many cases of classes whose designers thought about whether a sensible definition of equality was needed, and > decided that an address/identity-based equality definition was just what they needed, yet they did not want to or could > not use the "is" operator? 1) The address of the object is irrelevant. While that is what CPython uses, it is not what every Python uses. 2) The 'is' operator is specialized, and should only rarely be needed. If equals is what you mean, use '=='. 3) If Python forced us to write our own __eq__ /for every single class/ what would happen? Well, I suspect quite a few would make their own 'object' to inherit from, and would have the fallback of __eq__ meaning object identity. Practicality beats purity. > Can you give me an example for such a class (besides type object)? (I.e. a class that does not have __eq__() and > __ne__() but whose instances are compared with == or !=) I never add __eq__ to my classes until I come upon a place where I need to check if two instances of those classes are 'equal', for whatever I need equal to mean in that case. >> Ordering is much less frequent, and since we already tried always ordering things, falling back to type name if >> necessary, we have discovered that that is not a good trade-off. So now if one tries to order things without >> specifying how it should be done, one gets an exception. > > In Python 2, the default ordering implementation on type object uses the identity (address) as the basis for ordering. > In Python 3, that was changed to raise an exception. That seems to be in sync with what you are saying. > > Maybe it would have been possible to also change that for the default equality implementation in Python 3. But it was > not changed. As I wrote in another response, we now need to document this properly. Doc patches are gratefully accepted. :) -- ~Ethan~ From ethan at stoneleaf.us Tue Jul 8 01:52:17 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 16:52:17 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> Message-ID: <53BB32B1.2090300@stoneleaf.us> On 07/07/2014 04:49 PM, Benjamin Peterson wrote: > > Probably the best argument for the behavior is that "x is y" should > imply "x == y", which preludes raising an exception. No such invariant > is desired for ordering, so default implementations of < and > are not > provided in Python 3. Nice. This bit should definitely make it into the doc patch if not already in the docs. -- ~Ethan~ From ethan at stoneleaf.us Tue Jul 8 02:22:16 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 17:22:16 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB375E.8010904@gmx.de> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> <53BAC638.7030704@stoneleaf.us> <53BB375E.8010904@gmx.de> Message-ID: <53BB39B8.20707@stoneleaf.us> On 07/07/2014 05:12 PM, Andreas Maier wrote: > Am 2014-07-07 18:09, schrieb Ethan Furman: >> >> Just because two instances from the same object have the same value does not mean they are equal. For a real-life >> example, look at twins: biologically identical, yet not equal. > > I think they *are* equal in Python if they have the same value, by definition, because somewhere the Python docs state > that equality compares the object's values. And is personality of no value, then? > The reality though is that value is more vague than equality test (as it was already pointed out in this thread): A > class designer can directly implement what equality means to the class, but he or she cannot implement an accessor > method for the value. The value plays a role only indirectly as part of equality and ordering tests. Not sure what you mean by this. -- ~Ethan~ From stephen at xemacs.org Tue Jul 8 03:44:40 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Jul 2014 10:44:40 +0900 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB338B.2080401@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> <53BB338B.2080401@gmx.de> Message-ID: <87fvictyif.fsf@uwakimon.sk.tsukuba.ac.jp> Andreas Maier writes: > The problem of the default implementation is that "x is not y" > implies "x != y" and that may or may not be true under a sensible > definition of equality. I noticed this a long time ago and just decided it was covered by "consenting adults". That is, if the "sensible definition" of x == y is such that it can be true simultaneously with x != y, it's the programmer's responsibility to notice that, and to provide an implementation. But there's no issue that lack of an explicit implementation of comparison causes a program to have ambiguous meaning. I also consider that for "every object has a value" to make sense as a description of Python, that value must be representable by an object. The obvious default representation for the value of any object is the object itself! Now, for this purpose you don't need a "canonical representation" of an object's value. In particular, equality comparisons need not explicitly construct a representative object. Some do, some don't, I would suppose. For example, in comparing an integer with a float, I would convert the integer to float and compare, but in comparing float and complex I would check the complex for x.im == 0.0, and if true, return the value of x.re == y. I'm not sure how you interpret "value" to find the behavior of Python (the default comparison) problematic. I suspect you'd have a hard time coming up with an interpretation consistent with Python's object orientation. That said, it's probably worth documenting, but I don't know how much of the above should be introduced into the documentation. Steve From andreas.r.maier at gmx.de Tue Jul 8 03:18:16 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 03:18:16 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB3261.6080705@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <53BB3261.6080705@stoneleaf.us> Message-ID: <53BB46D8.6040101@gmx.de> Am 2014-07-08 01:50, schrieb Ethan Furman: > On 07/07/2014 04:36 PM, Andreas Maier wrote: >> Am 2014-07-07 19:43, schrieb Ethan Furman: >>> >>> Python cannot know which values are important in an equality test, >>> and which are not. So it refuses to guess. >> >> Well, one could argue that using the address of an object for its >> value equality test is pretty close to guessing, >> considering that given a sensible definition of value equality, >> objects of different identity can very well be equal but >> will always be considered unequal based on the address. > > And what would be this 'sensible definition'? One that only a class designer can define. That's why I argued for raising an exception if that is not defined. But as I stated elsewhere in this thread: It is as it is, and we need to document it. > >> So we have many cases of classes whose designers thought about >> whether a sensible definition of equality was needed, and >> decided that an address/identity-based equality definition was just >> what they needed, yet they did not want to or could >> not use the "is" operator? > > 1) The address of the object is irrelevant. While that is what > CPython uses, it is not what every Python uses. > > 2) The 'is' operator is specialized, and should only rarely be > needed. If equals is what you mean, use '=='. > > 3) If Python forced us to write our own __eq__ /for every single > class/ what would happen? Well, I suspect quite a few would make > their own 'object' to inherit from, and would have the fallback of > __eq__ meaning object identity. Practicality beats purity. > > >> Can you give me an example for such a class (besides type object)? >> (I.e. a class that does not have __eq__() and >> __ne__() but whose instances are compared with == or !=) > > I never add __eq__ to my classes until I come upon a place where I > need to check if two instances of those classes are 'equal', for > whatever I need equal to mean in that case. With that strategy, you would not be hurt if the default implementation raised an exception in case the two objects are not identical. ;-) >>> Ordering is much less frequent, and since we already tried always >>> ordering things, falling back to type name if >>> necessary, we have discovered that that is not a good trade-off. So >>> now if one tries to order things without >>> specifying how it should be done, one gets an exception. >> >> In Python 2, the default ordering implementation on type object uses >> the identity (address) as the basis for ordering. >> In Python 3, that was changed to raise an exception. That seems to be >> in sync with what you are saying. >> >> Maybe it would have been possible to also change that for the default >> equality implementation in Python 3. But it was >> not changed. As I wrote in another response, we now need to document >> this properly. > > Doc patches are gratefully accepted. :) Understood. I will be working on it. :-) Andy From andreas.r.maier at gmx.de Tue Jul 8 03:29:34 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 03:29:34 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB39B8.20707@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> <53BAC638.7030704@stoneleaf.us> <53BB375E.8010904@gmx.de> <53BB39B8.20707@stoneleaf.us> Message-ID: <53BB497E.4020005@gmx.de> Am 2014-07-08 02:22, schrieb Ethan Furman: > On 07/07/2014 05:12 PM, Andreas Maier wrote: >> Am 2014-07-07 18:09, schrieb Ethan Furman: >>> >>> Just because two instances from the same object have the same value >>> does not mean they are equal. For a real-life >>> example, look at twins: biologically identical, yet not equal. >> >> I think they *are* equal in Python if they have the same value, by >> definition, because somewhere the Python docs state >> that equality compares the object's values. > > And is personality of no value, then? I guess you are pulling my leg, Ethan ... ;-) But again, for a definition of equality between instances of a Python class representing twins, one has to decide what attributes of the twins are supposed to be part of that. If the designer of the class decides that just the biology atributes are part of equality, fine. If he or she decides that personality attributes are additionally part of equality, also fine. >> The reality though is that value is more vague than equality test (as >> it was already pointed out in this thread): A >> class designer can directly implement what equality means to the >> class, but he or she cannot implement an accessor >> method for the value. The value plays a role only indirectly as part >> of equality and ordering tests. > > Not sure what you mean by this. Equality has a precise implementation (and hence definition) in Python; value does not. So to argue that value and equality can be different, is moot in a way, because it is not clear in Python what the value of an object is. Andy From stephen at xemacs.org Tue Jul 8 03:51:51 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Jul 2014 10:51:51 +0900 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB375E.8010904@gmx.de> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> <53BAC638.7030704@stoneleaf.us> <53BB375E.8010904@gmx.de> Message-ID: <87egxwty6g.fsf@uwakimon.sk.tsukuba.ac.jp> Andreas Maier writes: > A class designer can directly implement what equality means to the > class, but he or she cannot implement an accessor method for the > value. Of course she can! What you mean to say, I think, is that Python does not insist on an accessor method for the value. Ie, there is no dunder method __value__ on instances of class object. From steve at pearwood.info Tue Jul 8 03:58:33 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 8 Jul 2014 11:58:33 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB32B1.2090300@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> <53BB32B1.2090300@stoneleaf.us> Message-ID: <20140708015833.GD13014@ando> On Mon, Jul 07, 2014 at 04:52:17PM -0700, Ethan Furman wrote: > On 07/07/2014 04:49 PM, Benjamin Peterson wrote: > > > >Probably the best argument for the behavior is that "x is y" should > >imply "x == y", which preludes raising an exception. No such invariant > >is desired for ordering, so default implementations of < and > are not > >provided in Python 3. > > Nice. This bit should definitely make it into the doc patch if not already > in the docs. However, saying this should not preclude classes where this is not the case, e.g. IEEE-754 NANs. I would not like this wording (which otherwise is very nice) to be used in the future to force reflexivity on object equality. https://en.wikipedia.org/wiki/Reflexive_relation To try to cut off arguments: - Yes, it is fine to have the default implementation of __eq__ assume reflexivity. - Yes, it is fine for standard library containers (lists, dicts, etc.) to assume reflexivity of their items. - I'm fully aware that some people think the non-reflexivity of NANs is logically nonsensical and a mistake. I do not agree with them. - I'm not looking to change anything here, the current behaviour is fine, I just want to ensure that an otherwise admirable doc change does not get interpreted in the future in a way that prevents classes from defining __eq__ to be non-reflexive. -- Steven From rob.cliffe at btinternet.com Tue Jul 8 03:59:30 2014 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 08 Jul 2014 02:59:30 +0100 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2F25.3020205@gmx.de> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> Message-ID: <53BB5082.500@btinternet.com> On 08/07/2014 00:37, Andreas Maier wrote: > [...] > Am 2014-07-07 23:11, schrieb Jan Kaliszewski: >> >> IMHO, in Python context, "value" is a very vague term. Quite often >> we can read it as the very basic (but not the only one) notion of >> "what makes objects being equal or not" -- and then saying that >> "objects are compared by value" is a tautology. >> >> In other words, what object's "value" is -- is dependent on its >> nature: e.g. the value of a list is what are the values of its >> consecutive (indexed) items; the value of a set is based on values of >> all its elements without notion of order or repetition; the value of >> a number is a set of its abstract mathematical properties that >> determine what makes objects being equal, greater, lesser, how >> particular arithmetic operations work etc... >> >> I think, there is no universal notion of "the value of a Python >> object". The notion of identity seems to be most generic (every >> object has it, event if it does not have any other property) -- and >> that's why by default it is used to define the most basic feature of >> object's *value*, i.e. "what makes objects being equal or not" (== >> and !=). Another possibility would be to raise TypeError but, as >> Ethan Furman wrote, it would be impractical (e.g. >> key-type-heterogenic dicts or sets would be practically impossible to >> work with). On the other hand, the notion of sorting order (< > <= >> >=) is a much more specialized object property. > +1. See below. > On the universal notion of a value in Python: In both 2.x and 3.x, it > reads (in 3.1. Objects, values and types): > - "*Every object has an identity, a type and a value.*" Hm, is that *really* true? Every object has an identity and a type, sure. Every *variable* has a value, which is an object (an instance of some class). (I think? :-) ) But ISTM that the notion of the value of an *object* exists more in our minds than in Python. We say that number and string objects have a value because the concepts of number and string, including how to compare them, are intuitive for us, and these objects by design reflect our concepts with some degree of fidelity. Ditto for lists, dictionaries and sets which are only slightly less intuitive. If I came across an int object and had no concept of what an integer number was, how would I know what its "value" is supposed to be? If I'm given an int object, "i", say, and pretend I don't know what an integer is, I see that len(dir(i)) == 64 # Python 2.7 (and there may be attributes that dir doesn't show). How can I know from this bewildering list of 64 attributes (say they were all written in Swahili) that I can obtain the "real" (pun not intended) "value" with i.real or possibly i.numerator or i.__str__() or maybe somewhere else? ISTM "value" is a convention between humans, not something intrinsic to a class definition. Or at best something that is implied by the implementation of the comparison (or other) operators in the class. And can the following *objects* (class instances) be said to have a (obvious) value? obj1 = object() def obj2(): pass obj3 = (x for x in range(3)) obj4 = xrange(4) And is there any sensible way of comparing two such similar objects, e.g. obj3 = (x for x in range(3)) obj3a = (x for x in range(3)) except by id? Well, possibly in some cases. You might define two functions as equal if their code objects are identical (I'm outside my competence here, so please no-one correct me if I've got the technical detail wrong). But I don't see how you can compare two generators (other than by id) except by calling them both destructively (possibly an infinite number of times, and hoping that neither has unpredictable behaviour, side effects, etc.). As has already been said (more or less) in this thread, if you want to be able to compare any two objects of the same type, and not by id, you probably end up with a circular definition of "value" as "that (function of an object's attributes) which is compared". Which is ultimately an implementation decision for each type, not anything intrinsic to the type. So it makes sense to consistently fall back on id when nothing else obvious suggests itself. > - "An object's /identity/ never changes once it has been created; .... > The /value/ of some objects can change. Objects whose value can change > are said to be /mutable/; objects whose value is unchangeable once > they are created are called /immutable/." ISTM it needs to be explicitly documented for each class what the "value" of an instance is intended to be. Oh, I'm being pedantic here, sure. But I wonder if enforcing it would lead to more clarity of thought (maybe even the realisation that some objects don't have a value?? :-) ). > > These are clear indications that there is an intention to have > separate concepts of identity and value in Python. If an instance of > type object can exist but does not have a universal notion of value, > it should not allow operations that need a value. As Jan says, this would make comparing container objects a pain. Apologies if this message is a bit behind the times. There have been about 10 contributions since I started composing this! Best wishes, Rob Cliffe [...] -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Jul 8 04:15:27 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Jul 2014 12:15:27 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB5082.500@btinternet.com> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> Message-ID: On Tue, Jul 8, 2014 at 11:59 AM, Rob Cliffe wrote: > If I came across an int object and had no concept of what an integer number > was, how would I know what its "value" is supposed to be? The value of an integer is the number it represents. In CPython, it's entirely possible to have multiple integer objects (ie objects with unique identities) with the same value, although AIUI there are Pythons for which that's not the case. The value of a float, Fraction, Decimal, or complex is also the number it represents, so when you compare 1==1.0, the answer is that they have the same value. They can't possibly have the same identity (every object has a single type), but they have the same value. But what *is* that value? It's not something that can be independently recognized, because casting to a different type might change the value: >>> i = 2**53+1 >>> f = float(i) >>> i == f False >>> f == int(f) True Ergo the comparison of a float to an int cannot be done by casting the int to float, nor by casting the float to int; it has to be done by comparing the abstract numbers represented. Those are the objects' values. But what's the value of a sentinel object? _SENTINEL = object() def f(x, y=_SENTINEL): do_something_with(x) if y is not _SENTINEL: do_something_with(y) I'd say this is a reasonable argument for the default object value to be identity. ChrisA From steve at pearwood.info Tue Jul 8 04:32:34 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 8 Jul 2014 12:32:34 +1000 Subject: [Python-Dev] == on object tests identity in 3.x - summary In-Reply-To: <53BB32E2.40805@gmx.de> References: <53BB32E2.40805@gmx.de> Message-ID: <20140708023234.GE13014@ando> On Tue, Jul 08, 2014 at 01:53:06AM +0200, Andreas Maier wrote: > Thanks to all who responded. > > In absence of class-specific equality test methods, the default > implementations revert to use the identity (=address) of the object as a > basis for the test, in both Python 2 and Python 3. Scrub out the "= address" part. Python does not require that objects even have an address, that is not part of the language definition. (If I simulate a Python interpreter in my head, what is the address of the objects?) CPython happens to use the address of objects as their identity, but that is an implementation-specific trick, not a language guarantee, and it is documented as such. Neither IronPython nor Jython use the address as ID. > In absence of specific ordering test methods, the default > implementations revert to use the identity (=address) of the object as a > basis for the test, in Python 2. I don't think that is correct. This is using Python 2.7: py> a = (1, 2) py> b = "Hello World!" py> id(a) < id(b) True py> a < b False And just to be sure that neither a nor b are controlling this: py> a.__lt__(b) NotImplemented py> b.__gt__(a) NotImplemented So the identity of the instances a and b are not used for < , although the identity of their types may be: py> id(type(a)) < id(type(b)) False Using the identity of the instances would be silly, since that would mean that sorting a list of mixed types would depend on the items' history, not their values. > In Python 3, an exception is raised in that case. I don't think the ordering methods are terribly relevant to the behaviour of equals. > The bottom line of the discussion seems to be that this behavior is > intentional, and a lot of code depends on it. > > We still need to figure out how to document this. Options could be: I'm not sure it needs to be documented other than to say that the default object.__eq__ compares by identity. Everything else is, in my opinion, over-thinking it. > 1. We define that the default for the value of an object is its > identity. That allows to describe the behavior of the equality test > without special casing such objects, but it does not work for ordering. Why does it need to work for ordering? Not all values define ordering relations. Unlike type and identity, "value" does not have a single concrete definition, it depends on the class designer. In the case of object, the value of an object instance is itself, i.e. its identity. I don't think we need more than that. -- Steven From ethan at stoneleaf.us Tue Jul 8 04:25:58 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 19:25:58 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <20140708015833.GD13014@ando> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> <53BB32B1.2090300@stoneleaf.us> <20140708015833.GD13014@ando> Message-ID: <53BB56B6.8030306@stoneleaf.us> On 07/07/2014 06:58 PM, Steven D'Aprano wrote: > On Mon, Jul 07, 2014 at 04:52:17PM -0700, Ethan Furman wrote: >> On 07/07/2014 04:49 PM, Benjamin Peterson wrote: >>> >>> Probably the best argument for the behavior is that "x is y" should >>> imply "x == y", which preludes raising an exception. No such invariant >>> is desired for ordering, so default implementations of < and > are not >>> provided in Python 3. >> >> Nice. This bit should definitely make it into the doc patch if not already >> in the docs. > > However, saying this should not preclude classes where this is not the > case, e.g. IEEE-754 NANs. I would not like this wording (which otherwise > is very nice) to be used in the future to force reflexivity on object > equality. > > https://en.wikipedia.org/wiki/Reflexive_relation > > To try to cut off arguments: > > - Yes, it is fine to have the default implementation of __eq__ > assume reflexivity. > > - Yes, it is fine for standard library containers (lists, dicts, > etc.) to assume reflexivity of their items. > > - I'm fully aware that some people think the non-reflexivity of > NANs is logically nonsensical and a mistake. I do not agree > with them. > > - I'm not looking to change anything here, the current behaviour > is fine, I just want to ensure that an otherwise admirable doc > change does not get interpreted in the future in a way that > prevents classes from defining __eq__ to be non-reflexive. +1 From ethan at stoneleaf.us Tue Jul 8 04:29:17 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 19:29:17 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB46D8.6040101@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <53BB3261.6080705@stoneleaf.us> <53BB46D8.6040101@gmx.de> Message-ID: <53BB577D.4040208@stoneleaf.us> On 07/07/2014 06:18 PM, Andreas Maier wrote: > Am 2014-07-08 01:50, schrieb Ethan Furman: >> >> I never add __eq__ to my classes until I come upon a place where I need to check if two instances of those classes are >> 'equal', for whatever I need equal to mean in that case. > > With that strategy, you would not be hurt if the default implementation raised an exception in case the two objects are > not identical. ;-) Yes, I would. Not identical means not equal until I say otherwise. Raising an exception instead of returning False (for __eq__) would be horrible. -- ~Ethan~ From steve at pearwood.info Tue Jul 8 05:12:02 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 8 Jul 2014 13:12:02 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB5082.500@btinternet.com> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> Message-ID: <20140708031202.GF13014@ando> On Tue, Jul 08, 2014 at 02:59:30AM +0100, Rob Cliffe wrote: > >- "*Every object has an identity, a type and a value.*" > > Hm, is that *really* true? Yes. It's pretty much true by definition: objects are *defined* to have an identity, type and value, even if that value is abstract rather than concrete. > Every object has an identity and a type, sure. > Every *variable* has a value, which is an object (an instance of some > class). (I think? :-) ) I don't think so. Variables can be undefined, which means they don't have a value: py> del x py> print x Traceback (most recent call last): File "", line 1, in NameError: name 'x' is not defined > But ISTM that the notion of the value of an *object* exists more in our > minds than in Python. Pretty much. How could it be otherwise? Human beings define the semantics of objects, that is, their value, not Python. [...] > If I came across an int object and had no concept of what an integer > number was, how would I know what its "value" is supposed to be? You couldn't, any more than you would know what the value of a Watzit object was if you knew nothing about Watzits. The value of an object is intimitely tied to its semantics, what the object represents and what it is intended to be used for. In general, we can say nothing about the value of an object until we've read the documentation for the object. But we can be confident that the object has *some* value, otherwise what would be the point of it? In some cases, that value might be nothing more than it's identity, but that's okay. I think the problem we're having here is that some people are looking for a concrete definition of what the value of an object is, but there isn't one. [...] > And can the following *objects* (class instances) be said to have a > (obvious) value? > obj1 = object() > def obj2(): pass > obj3 = (x for x in range(3)) > obj4 = xrange(4) The value as understood by a human reader, as opposed to the value as assumed by Python, is not necessarily the same. As far as Python is concerned, the value of all four objects is the object itself, i.e. its identity. (For avoidance of doubt, not its id(), which is just a number.) A human reader could infer more than Python: - the second object is a "do nothing" function; - the third object is a lazy sequence (0, 1, 2); - the fourth object is a lazy sequence (0, 1, 2, 3); but since the class designer didn't deem it important enough, or practical enough, to implement an __eq__ method that takes those things into account, *for the purposes of equality* (but perhaps not other purposes) we say that the value is just the object itself, its identity. > And is there any sensible way of comparing two such similar objects, e.g. > obj3 = (x for x in range(3)) > obj3a = (x for x in range(3)) > except by id? In principle, one might peer into the two generators and note that they perform exactly the same computations on exactly the same input, and therefore should be deemed to have the same value. But since that's hard, and "exactly the same" is not always well-defined, Python doesn't try to be too clever and just uses a simpler idea: the value is the object itself. > Well, possibly in some cases. You might define two functions as equal > if their code objects are identical (I'm outside my competence here, so > please no-one correct me if I've got the technical detail wrong). But I > don't see how you can compare two generators (other than by id) except > by calling them both destructively (possibly an infinite number of > times, and hoping that neither has unpredictable behaviour, side > effects, etc.). Generator objects have code objects as well. py> x = (a for a in (1, 2)) py> x.gi_code

 at 0xb7ee39f8, file "", line 1>

> >- "An object's /identity/ never changes once it has been created; .... 
> >The /value/ of some objects can change. Objects whose value can change 
> >are said to be /mutable/; objects whose value is unchangeable once 
> >they are created are called /immutable/."
>
> ISTM it needs to be explicitly documented for each class what the 
> "value" of an instance is intended to be.

Why? What value (pun intended) is there in adding an explicit statement 
of value to every single class?

"The value of a str is the str's sequence of characters."
"The value of a list is the list's sequence of items."
"The value of an int is the int's numeric value."
"The value of a float is the float's numeric value, or in the case of 
 INFs and NANs, that they are an INF or NAN."
"The value of a complex number is the ordered pair of its real and 
 imaginary components."
"The value of a re MatchObject is the MatchObject itself."

I don't see any benefit to forcing all classes to explicitly document 
this sort of thing. It's nearly always redundant and unnecessary.



-- 
Steven

From rosuav at gmail.com  Tue Jul  8 05:31:46 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 8 Jul 2014 13:31:46 +1000
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <20140708031202.GF13014@ando>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
Message-ID: 

On Tue, Jul 8, 2014 at 1:12 PM, Steven D'Aprano  wrote:
> Why? What value (pun intended) is there in adding an explicit statement
> of value to every single class?
>
> "The value of a str is the str's sequence of characters."
> "The value of a list is the list's sequence of items."
> "The value of an int is the int's numeric value."
> "The value of a float is the float's numeric value, or in the case of
>  INFs and NANs, that they are an INF or NAN."
> "The value of a complex number is the ordered pair of its real and
>  imaginary components."
> "The value of a re MatchObject is the MatchObject itself."
>
> I don't see any benefit to forcing all classes to explicitly document
> this sort of thing. It's nearly always redundant and unnecessary.

It's important where it's not obvious. For instance, two lists with
the same items are equal, two tuples with the same items are equal,
but a list and a tuple with the same items aren't. Doesn't mean it
necessarily has to be documented, though.

ChrisA

From stephen at xemacs.org  Tue Jul  8 05:34:33 2014
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 08 Jul 2014 12:34:33 +0900
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <53BB3261.6080705@stoneleaf.us>
References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us>
 <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us>
 <53BB2EF9.80002@gmx.de> <53BB3261.6080705@stoneleaf.us>
Message-ID: <87bnt0ttfa.fsf@uwakimon.sk.tsukuba.ac.jp>

Ethan Furman writes:

 > And what would be this 'sensible definition' [of value equality]?

I think that's the wrong question.  I suppose Andreas's point is that
when the programmer doesn't provide a definition, there is no such
thing as a "sensible definition" to default to.  I disagree, but given
that as the point of discussion, asking what the definition is, is moot.

 > 2) The 'is' operator is specialized, and should only rarely be
 >    needed.

Nitpick: Except that it's the preferred way to express identity with
singletons, AFAIK.  ("if x is None: ...", not "if x == None: ...".)


From rob.cliffe at btinternet.com  Tue Jul  8 06:02:39 2014
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Tue, 08 Jul 2014 05:02:39 +0100
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <20140708031202.GF13014@ando>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
Message-ID: <53BB6D5F.1010800@btinternet.com>


On 08/07/2014 04:12, Steven D'Aprano wrote:
> On Tue, Jul 08, 2014 at 02:59:30AM +0100, Rob Cliffe wrote:
>
>>> - "*Every object has an identity, a type and a value.*"
>> Hm, is that *really* true?
> Yes. It's pretty much true by definition: objects are *defined* to have
> an identity, type and value, even if that value is abstract rather than
> concrete.
Except that in your last paragraph you imply that an explicit 
*definition* of the value is normally not in the docs.
>
>
>> Every object has an identity and a type, sure.
>> Every *variable* has a value, which is an object (an instance of some
>> class).  (I think? :-) )
> I don't think so. Variables can be undefined, which means they don't
> have a value:
>
> py> del x
> py> print x
> Traceback (most recent call last):
>    File "", line 1, in 
> NameError: name 'x' is not defined
I was aware of that but I considered that a deleted variable no longer 
existed.  Not that it's important.
>
>
>> But ISTM that the notion of the value of an *object* exists more in our
>> minds than in Python.
> Pretty much. How could it be otherwise? Human beings define the
> semantics of objects, that is, their value, not Python.
>
>
> [...]
>> If I came across an int object and had no concept of what an integer
>> number was, how would I know what its "value" is supposed to be?
> You couldn't, any more than you would know what the value of a Watzit
> object was if you knew nothing about Watzits. The value of an object is
> intimitely tied to its semantics, what the object represents and what it
> is intended to be used for. In general, we can say nothing about the
> value of an object until we've read the documentation for the object.
>
> But we can be confident that the object has *some* value, otherwise what
> would be the point of it? In some cases, that value might be nothing
> more than it's identity, but that's okay.
>
> I think the problem we're having here is that some people are looking
> for a concrete definition of what the value of an object is, but there
> isn't one.
>
>
> [...]
>> And can the following *objects* (class instances) be said to have a
>> (obvious) value?
>>      obj1 = object()
>>      def obj2(): pass
>>      obj3 = (x for x in range(3))
>>      obj4 = xrange(4)
> The value as understood by a human reader, as opposed to the value as
> assumed by Python, is not necessarily the same. As far as Python is
> concerned, the value of all four objects is the object itself, i.e. its
> identity.
Is this mentioned in the docs?  I couldn't find it in a quick look 
through the 2.7.8 language reference.

> (For avoidance of doubt, not its id(), which is just a
> number.)
>
> A human reader could infer more than Python:
>
> - the second object is a "do nothing" function;
> - the third object is a lazy sequence (0, 1, 2);
> - the fourth object is a lazy sequence (0, 1, 2, 3);
>
> but since the class designer didn't deem it important enough, or
> practical enough, to implement an __eq__ method that takes those things
> into account, *for the purposes of equality* (but perhaps not other
> purposes) we say that the value is just the object itself, its identity.
>
>
>
>> And is there any sensible way of comparing two such similar objects, e.g.
>>      obj3  = (x for x in range(3))
>>      obj3a = (x for x in range(3))
>> except by id?
> In principle, one might peer into the two generators and note that they
> perform exactly the same computations on exactly the same input, and
> therefore should be deemed to have the same value. But since that's
> hard, and "exactly the same" is not always well-defined, Python doesn't
> try to be too clever and just uses a simpler idea: the value is the
> object itself.
Sure, I wasn't suggesting it was a sensible thing to do (quite the 
opposite), just playing devil's advocate.
>
>
>> Well, possibly in some cases.  You might define two functions as equal
>> if their code objects are identical (I'm outside my competence here, so
>> please no-one correct me if I've got the technical detail wrong).  But I
>> don't see how you can compare two generators (other than by id) except
>> by calling them both destructively (possibly an infinite number of
>> times, and hoping that neither has unpredictable behaviour, side
>> effects, etc.).
> Generator objects have code objects as well.
>
> py> x = (a for a in (1, 2))
> py> x.gi_code
>  at 0xb7ee39f8, file "", line 1>
>
>>> - "An object's /identity/ never changes once it has been created; ....
>>> The /value/ of some objects can change. Objects whose value can change
>>> are said to be /mutable/; objects whose value is unchangeable once
>>> they are created are called /immutable/."
>> ISTM it needs to be explicitly documented for each class what the
>> "value" of an instance is intended to be.
> Why? What value (pun intended) is there in adding an explicit statement
> of value to every single class?
It troubles me a bit that "value" seems to be a fuzzy concept - it has 
an obvious meaning for some types (int, float, list etc.) but for 
callable objects you tell me that their value is the object itself, but 
I can't find it in the docs.  (Is the same true for module objects?)
Apart from anything else:

"Objects whose value can change
are said to be mutable"

How can we say if an object is mutable if we don't know what its value is?
Are callables non-mutable?  (Presumably?)
What about modules?  (Their *attributes* can be changed.)
Or are these questions considered stupid and/or irrelevant?

>
> "The value of a str is the str's sequence of characters."
> "The value of a list is the list's sequence of items."
> "The value of an int is the int's numeric value."
> "The value of a float is the float's numeric value, or in the case of
>   INFs and NANs, that they are an INF or NAN."
> "The value of a complex number is the ordered pair of its real and
>   imaginary components."
> "The value of a re MatchObject is the MatchObject itself."
>
> I don't see any benefit to forcing all classes to explicitly document
> this sort of thing. It's nearly always redundant and unnecessary.
>
"nearly always" yes, but there might be one or two cases where it would 
help.  Sorry, I don't have an example at present.

Thanks for a very full answer, Steven.
Rob Cliffe


From ethan at stoneleaf.us  Tue Jul  8 05:47:23 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 07 Jul 2014 20:47:23 -0700
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <87bnt0ttfa.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us>
 <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de>
 <53BB3261.6080705@stoneleaf.us> <87bnt0ttfa.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <53BB69CB.6040407@stoneleaf.us>

On 07/07/2014 08:34 PM, Stephen J. Turnbull wrote:
> Ethan Furman writes:
>
>> And what would be this 'sensible definition' [of value equality]?
>
> I think that's the wrong question.  I suppose Andreas's point is that
> when the programmer doesn't provide a definition, there is no such
> thing as a "sensible definition" to default to.  I disagree, but given
> that as the point of discussion, asking what the definition is, is moot.

He eventually made that point, but until he did I thought he meant that there was such a sensible default definition, he 
just wasn't sharing what he thought it might be with us.


>> 2) The 'is' operator is specialized, and should only rarely be
>>    needed.
>
> Nitpick: Except that it's the preferred way to express identity with
> singletons, AFAIK.  ("if x is None: ...", not "if x == None: ...".)

Not a nit at all, at least in my code -- the number of times I use '==' far outweighs the number of times I use 'is'. 
Thus, 'is' is rare.

(Now, of course, I'll have to go measure that assertion and probably find out I am wrong :/ ).

--
~Ethan~

From ncoghlan at gmail.com  Tue Jul  8 06:58:50 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 7 Jul 2014 21:58:50 -0700
Subject: [Python-Dev] buildbot.python.org down again?
In-Reply-To: 
References: 
 
 
Message-ID: 

On 7 Jul 2014 10:47, "Guido van Rossum"  wrote:
>
> It would still be nice to know who "the appropriate persons" are. Too
much of our infrastructure seems to be maintained by house elves or the ITA.

I volunteered to be the board's liaison to the infrastructure team, and
getting more visibility around what the infrastructure *is* and how it's
monitored and supported is going to be part of that. That will serve a
couple of key purposes:

- making the points of escalation clearer if anything breaks or needs
improvement (although "infrastructure at python.org" is a good default choice)
- making the current "todo" list of the infrastructure team more visible
(both to calibrate resolution time expectations and to provide potential
contributors an idea of what's involved)

Noah has already set up http://status.python.org/ to track service status,
I can see about getting buildbot.python.org added to the list.

Cheers,
Nick.

>
>
> On Sun, Jul 6, 2014 at 11:33 PM, Terry Reedy  wrote:
>>
>> On 7/6/2014 7:54 PM, Ned Deily wrote:
>>>
>>> As of the moment, buildbot.python.org seems to be down again.
>>
>>
>> Several hours later, back up.
>>
>>
>> > Where is the best place to report problems like this?
>>
>> We should have, if not already, an automatic system to detect down
servers and report (email) to appropriate persons.
>>
>> --
>> Terry Jan Reedy
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ncoghlan at gmail.com  Tue Jul  8 07:23:35 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 7 Jul 2014 22:23:35 -0700
Subject: [Python-Dev] == on object tests identity in 3.x - summary
In-Reply-To: <53BB32E2.40805@gmx.de>
References: <53BB32E2.40805@gmx.de>
Message-ID: 

On 7 Jul 2014 19:22, "Andreas Maier"  wrote:
>
> Thanks to all who responded.
>
> In absence of class-specific equality test methods, the default
implementations revert to use the identity (=address) of the object as a
basis for the test, in both Python 2 and Python 3.
>
> In absence of specific ordering test methods, the default implementations
revert to use the identity (=address) of the object as a basis for the
test, in Python 2. In Python 3, an exception is raised in that case.

In Python 2, it orders by type, and only then by id (which happens to be
the address in CPython).

>
> The bottom line of the discussion seems to be that this behavior is
intentional, and a lot of code depends on it.
>
> We still need to figure out how to document this. Options could be:
>
> 1. We define that the default for the value of an object is its identity.
That allows to describe the behavior of the equality test without special
casing such objects, but it does not work for ordering. Also, I have
difficulties stating what constitutes that default case, because it can
really only be explained by referring to the presence or absence of the
class-specific equality test and ordering test methods.
>
> 2. We don't say anything about the default value of an object, and
describe the behavior of the equality test and ordering test, which both
need to cover the case that the object does not have the respective test
methods.

The behaviour of Python 3's type system is fully covered by equality
defaulting to comparing by identity, and ordering comparisons having to be
defined explicitly. The docs at
https://docs.python.org/3/reference/expressions.html#not-in could likely be
clarified, but they do cover this (they just cover a lot about the builtins
at the same time).

> It seems to me that only option 2 really works.

Indeed, and that's the version already documented.

Regards,
Nick.

>
>
> Comments and further options welcome.
>
> Andy
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From stephen at xemacs.org  Tue Jul  8 09:01:00 2014
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 08 Jul 2014 16:01:00 +0900
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <53BB6D5F.1010800@btinternet.com>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com>
Message-ID: <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>

Rob Cliffe writes:

 > > Why? What value (pun intended) is there in adding an explicit statement
 > > of value to every single class?

 > It troubles me a bit that "value" seems to be a fuzzy concept - it has 
 > an obvious meaning for some types (int, float, list etc.) but for 
 > callable objects you tell me that their value is the object itself,

Value is *abstract* and implicit, but not fuzzy: it's what you compare
when you test for equality.  It's abstract in the sense that "inside
of Python" an object's value has to be an object (everything is an
object).  Now, the question is "do we need a canonical representation
of objects' values?"  Ie, do we need a mapping from from every object
conceivable within Python to a specific object that is its value?
Since Python generally allows, even prefers, duck-typing, the answer
presumably is "no".  (Maybe you can think of Python programs you'd
like to write where the answer is "yes", but I don't have any
examples.)  And in fact there is no such mapping in Python.

So the answer I propose is that an object's value needs a
representation in Python, but that representation doesn't need to be
unique.  Any object is a representation of its own value, and if you
need two different objects to be equal to each other, you must define
their __eq__ methods to produce that result.

This (the fact that any object represents its value, and so can be
used as "the" standard of comparison for that value) is why it's so
important that equality be reflexive, symmetric, and transitive, and
why we really want to be careful about creating objects like NaN whose
definition is "my value isn't a value", and therefore "a = float('NaN');
a == a" evaluates to False.

I agree with Steven d'A that this rule is not part of the language
definition and shouldn't be, but it's the rule of thumb I find hardest
to imagine *ever* wanting to break in my own code (although I sort of
understand why the IEEE 754 committee found they had to).

 > How can we say if an object is mutable if we don't know what its
 > value is?

Mutability is a different question.  You can define a class whose
instances have mutable attributes but are nonetheless all compare
equal regardless of the contents of those attributes.

OTOH, the test for mutability to try to mutate it.  If that doesn't
raise, it's mutable.

Steve

From rosuav at gmail.com  Tue Jul  8 09:09:27 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 8 Jul 2014 17:09:27 +1000
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com>
 <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: 

On Tue, Jul 8, 2014 at 5:01 PM, Stephen J. Turnbull  wrote:
> I agree with Steven d'A that this rule is not part of the language
> definition and shouldn't be, but it's the rule of thumb I find hardest
> to imagine *ever* wanting to break in my own code (although I sort of
> understand why the IEEE 754 committee found they had to).

The reason NaN isn't equal to itself is because there are X bit
patterns representing NaN, but an infinite number of possible
non-numbers that could result from a calculation. Is
float("inf")-float("inf") equal to float("inf")/float("inf")? There
are three ways NaN equality could have been defined:

1) All NaNs are equal, as if NaN is some kind of "special number".
2) NaNs are equal if they have the exact same bit pattern, and unequal else.
3) All NaNs are unequal, even if they have the same bit pattern.

The first option is very dangerous, because it'll mean that "NaN
pollution" can actually result in unexpected equality. The second
looks fine - a NaN is equal to itself, for instance - but it suffers
from the pigeonhole problem, in that eventually you'll have two
numbers which resulted from different calculations and happen to have
the same bit pattern. The third is what IEEE went with. It's the
sanest option.

ChrisA

From donald at stufft.io  Tue Jul  8 09:33:32 2014
From: donald at stufft.io (Donald Stufft)
Date: Tue, 8 Jul 2014 03:33:32 -0400
Subject: [Python-Dev] buildbot.python.org down again?
In-Reply-To: 
References: 
 
 
 
Message-ID: <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io>


On Jul 8, 2014, at 12:58 AM, Nick Coghlan  wrote:

> 
> On 7 Jul 2014 10:47, "Guido van Rossum"  wrote:
> >
> > It would still be nice to know who "the appropriate persons" are. Too much of our infrastructure seems to be maintained by house elves or the ITA.
> 
> I volunteered to be the board's liaison to the infrastructure team, and getting more visibility around what the infrastructure *is* and how it's monitored and supported is going to be part of that. That will serve a couple of key purposes:
> 
> - making the points of escalation clearer if anything breaks or needs improvement (although "infrastructure at python.org" is a good default choice)
> - making the current "todo" list of the infrastructure team more visible (both to calibrate resolution time expectations and to provide potential contributors an idea of what's involved)
> 
> Noah has already set up http://status.python.org/ to track service status, I can see about getting buildbot.python.org added to the list.
> 
> Cheers,
> Nick.
> 
> 

We (the infrastructure team) were actually looking earlier about
buildbot.python.org and we?re not entirely sure who "owns" buildbot.python.org.
Unfortunately a lot of the *.python.org services are in a similar state where
there is no clear owner. Generally we've not wanted to just step in and take
over for fear of stepping on someones toes but it appears that perhaps
buildbot.p.o has no owner?

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: 

From stephen at xemacs.org  Tue Jul  8 09:53:50 2014
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 08 Jul 2014 16:53:50 +0900
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: 
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com>
 <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
Message-ID: <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>

Chris Angelico writes:

 > The reason NaN isn't equal to itself is because there are X bit
 > patterns representing NaN, but an infinite number of possible
 > non-numbers that could result from a calculation.

I understand that.  But you're missing at least two alternatives that
involve raising on some calculations involving NaN, as well as the
fact that forcing inequality of two NaNs produced by equivalent
calculations is arguably just as wrong as allowing equality of two
NaNs produced by the different calculations.  That's where things get
fuzzy for me -- in Python I would expect that preserving invariants
would be more important than computational efficiency, but evidently
it's not.  I assume that I would have a better grasp on why Python
chose to go this way rather than that if I understood IEEE 754 better.


From rosuav at gmail.com  Tue Jul  8 09:59:11 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 8 Jul 2014 17:59:11 +1000
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com>
 <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: 

On Tue, Jul 8, 2014 at 5:53 PM, Stephen J. Turnbull  wrote:
> But you're missing at least two alternatives that
> involve raising on some calculations involving NaN, as well as the
> fact that forcing inequality of two NaNs produced by equivalent
> calculations is arguably just as wrong as allowing equality of two
> NaNs produced by the different calculations.

This is off-topic for this thread, but still...

The trouble is that your "arguably just as wrong" is an
indistinguishable case. If you don't want two different calculations'
NaNs to *ever* compare equal, the only solution is to have all NaNs
compare unequal - otherwise, two calculations might happen to produce
the same bitpattern, as there are only a finite number of them
available.

> That's where things get
> fuzzy for me -- in Python I would expect that preserving invariants
> would be more important than computational efficiency, but evidently
> it's not.

What invariant is being violated for efficiency? As I see it, it's one
possible invariant (things should be equal to themselves) coming up
against another possible invariant (one way of generating NaN is
unequal to any other way of generating NaN).

Raising an exception is, of course, the purpose of signalling NaNs
rather than quiet NaNs, which is a separate consideration from how
they compare.

ChrisA

From benhoyt at gmail.com  Tue Jul  8 15:52:18 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 8 Jul 2014 09:52:18 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
Message-ID: 

Hi folks,

After some very good python-dev feedback on my first version of PEP
471, I've updated the PEP to clarify a few things and added various
"Rejected ideas" subsections. Here's a link to the new version (I've
also copied the full text below):

http://legacy.python.org/dev/peps/pep-0471/ -- new PEP as HTML
http://hg.python.org/peps/rev/0da4736c27e8 -- changes

Specifically, I've made these changes (not an exhaustive list):

* Clarified wording in several places, for example "Linux and OS X" ->
"POSIX-based systems"
* Added a new "Notes on exception handling" section
* Added a thorough "Rejected ideas" section with the various ideas
that have been discussed previously and rejected for various reasons
* Added a description of the .full_name attribute, which folks seemed
to generally agree is a good idea
* Removed the "open issues" section, as the three open issues have
either been included (full_name) or rejected (windows_wildcard)

One known error in the PEP is that the "Notes" sections should be
top-level sections, not be subheadings of "Examples". If someone would
like to give me ("benhoyt") commit access to the peps repo, I can fix
this and any other issues that come up.

I'd love to see this finalized! If you're going to comment with
suggestions to change the API, please ensure you've first read the
"rejected ideas" sections in the PEP as well as the relevant
python-dev discussion (linked to in the PEP).

Thanks,
Ben


PEP: 471
Title: os.scandir() function -- a better and faster directory iterator
Version: $Revision$
Last-Modified: $Date$
Author: Ben Hoyt 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-May-2014
Python-Version: 3.5
Post-History: 27-Jun-2014, 8-Jul-2014


Abstract
========

This PEP proposes including a new directory iteration function,
``os.scandir()``, in the standard library. This new function adds
useful functionality and increases the speed of ``os.walk()`` by 2-10
times (depending on the platform and file system) by significantly
reducing the number of times ``stat()`` needs to be called.


Rationale
=========

Python's built-in ``os.walk()`` is significantly slower than it needs
to be, because -- in addition to calling ``os.listdir()`` on each
directory -- it executes the ``stat()`` system call or
``GetFileAttributes()`` on each file to determine whether the entry is
a directory or not.

But the underlying system calls -- ``FindFirstFile`` /
``FindNextFile`` on Windows and ``readdir`` on POSIX systems --
already tell you whether the files returned are directories or not, so
no further system calls are needed. Further, the Windows system calls
return all the information for a ``stat_result`` object, such as file
size and last modification time.

In short, you can reduce the number of system calls required for a
tree function like ``os.walk()`` from approximately 2N to N, where N
is the total number of files and directories in the tree. (And because
directory trees are usually wider than they are deep, it's often much
better than this.)

In practice, removing all those extra system calls makes ``os.walk()``
about **8-9 times as fast on Windows**, and about **2-3 times as fast
on POSIX systems**. So we're not talking about micro-
optimizations. See more `benchmarks here`_.

.. _`benchmarks here`: https://github.com/benhoyt/scandir#benchmarks

Somewhat relatedly, many people (see Python `Issue 11406`_) are also
keen on a version of ``os.listdir()`` that yields filenames as it
iterates instead of returning them as one big list. This improves
memory efficiency for iterating very large directories.

So, as well as providing a ``scandir()`` iterator function for calling
directly, Python's existing ``os.walk()`` function could be sped up a
huge amount.

.. _`Issue 11406`: http://bugs.python.org/issue11406


Implementation
==============

The implementation of this proposal was written by Ben Hoyt (initial
version) and Tim Golden (who helped a lot with the C extension
module). It lives on GitHub at `benhoyt/scandir`_.

.. _`benhoyt/scandir`: https://github.com/benhoyt/scandir

Note that this module has been used and tested (see "Use in the wild"
section in this PEP), so it's more than a proof-of-concept. However,
it is marked as beta software and is not extensively battle-tested.
It will need some cleanup and more thorough testing before going into
the standard library, as well as integration into ``posixmodule.c``.



Specifics of proposal
=====================

Specifically, this PEP proposes adding a single function to the ``os``
module in the standard library, ``scandir``, that takes a single,
optional string as its argument::

    scandir(path='.') -> generator of DirEntry objects

Like ``listdir``, ``scandir`` calls the operating system's directory
iteration system calls to get the names of the files in the ``path``
directory, but it's different from ``listdir`` in two ways:

* Instead of returning bare filename strings, it returns lightweight
  ``DirEntry`` objects that hold the filename string and provide
  simple methods that allow access to the additional data the
  operating system returned.

* It returns a generator instead of a list, so that ``scandir`` acts
  as a true iterator instead of returning the full list immediately.

``scandir()`` yields a ``DirEntry`` object for each file and directory
in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'``
pseudo-directories are skipped, and the entries are yielded in
system-dependent order. Each ``DirEntry`` object has the following
attributes and methods:

* ``name``: the entry's filename, relative to the ``path`` argument
  (corresponds to the return values of ``os.listdir``)

* ``full_name``: the entry's full path name -- the equivalent of
  ``os.path.join(path, entry.name)``

* ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never
  requires a system call on Windows, and usually doesn't on POSIX
  systems

* ``is_file()``: like ``os.path.isfile()``, but much cheaper -- it
  never requires a system call on Windows, and usually doesn't on
  POSIX systems

* ``is_symlink()``: like ``os.path.islink()``, but much cheaper -- it
  never requires a system call on Windows, and usually doesn't on
  POSIX systems

* ``lstat()``: like ``os.lstat()``, but much cheaper on some systems
  -- it only requires a system call on POSIX systems

The ``is_X`` methods may perform a ``stat()`` call under certain
conditions (for example, on certain file systems on POSIX systems),
and therefore possibly raise ``OSError``. The ``lstat()`` method will
call ``stat()`` on POSIX systems and therefore also possibly raise
``OSError``. See the "Notes on exception handling" section for more
details.

The ``DirEntry`` attribute and method names were chosen to be the same
as those in the new ``pathlib`` module for consistency.

Like the other functions in the ``os`` module, ``scandir()`` accepts
either a bytes or str object for the ``path`` parameter, and returns
the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the
same type as ``path``. However, it is *strongly recommended* to use
the str type, as this ensures cross-platform support for Unicode
filenames.


Examples
========

Below is a good usage pattern for ``scandir``. This is in fact almost
exactly how the scandir module's faster ``os.walk()`` implementation
uses it::

    dirs = []
    non_dirs = []
    for entry in os.scandir(path):
        if entry.is_dir():
            dirs.append(entry)
        else:
            non_dirs.append(entry)

The above ``os.walk()``-like code will be significantly faster with
scandir than ``os.listdir()`` and ``os.path.isdir()`` on both Windows
and POSIX systems.

Or, for getting the total size of files in a directory tree, showing
use of the ``DirEntry.lstat()`` method and ``DirEntry.full_name``
attribute::

    def get_tree_size(path):
        """Return total size of files in path and subdirs."""
        total = 0
        for entry in os.scandir(path):
            if entry.is_dir():
                total += get_tree_size(entry.full_name)
            else:
                total += entry.lstat().st_size
        return total

Note that ``get_tree_size()`` will get a huge speed boost on Windows,
because no extra stat call are needed, but on POSIX systems the size
information is not returned by the directory iteration functions, so
this function won't gain anything there.


Notes on caching
----------------

The ``DirEntry`` objects are relatively dumb -- the ``name`` and
``full_name`` attributes are obviously always cached, and the ``is_X``
and ``lstat`` methods cache their values (immediately on Windows via
``FindNextFile``, and on first use on POSIX systems via a ``stat``
call) and never refetch from the system.

For this reason, ``DirEntry`` objects are intended to be used and
thrown away after iteration, not stored in long-lived data structured
and the methods called again and again.

If developers want "refresh" behaviour (for example, for watching a
file's size change), they can simply use ``pathlib.Path`` objects,
or call the regular ``os.lstat()`` or ``os.path.getsize()`` functions
which get fresh data from the operating system every call.


Notes on exception handling
---------------------------

``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods
rather than attributes or properties, to make it clear that they may
not be cheap operations, and they may do a system call. As a result,
these methods may raise ``OSError``.

For example, ``DirEntry.lstat()`` will always make a system call on
POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a
``stat()`` system call on such systems if ``readdir()`` returns a
``d_type`` with a value of ``DT_UNKNOWN``, which can occur under
certain conditions or on certain file systems.

For this reason, when a user requires fine-grained error handling,
it's good to catch ``OSError`` around these method calls and then
handle as appropriate.

For example, below is a version of the ``get_tree_size()`` example
shown above, but with basic error handling added::

    def get_tree_size(path):
        """Return total size of files in path and subdirs. If
        is_dir() or lstat() fails, print an error message to stderr
        and assume zero size (for example, file has been deleted).
        """
        total = 0
        for entry in os.scandir(path):
            try:
                is_dir = entry.is_dir()
            except OSError as error:
                print('Error calling is_dir():', error, file=sys.stderr)
                continue
            if is_dir:
                total += get_tree_size(entry.full_name)
            else:
                try:
                    total += entry.lstat().st_size
                except OSError as error:
                    print('Error calling lstat():', error, file=sys.stderr)
        return total


Support
=======

The scandir module on GitHub has been forked and used quite a bit (see
"Use in the wild" in this PEP), but there's also been a fair bit of
direct support for a scandir-like function from core developers and
others on the python-dev and python-ideas mailing lists. A sampling:

* **python-dev**: a good number of +1's and very few negatives for
  scandir and PEP 471 on `this June 2014 python-dev thread
  `_

* **Nick Coghlan**, a core Python developer: "I've had the local Red
  Hat release engineering team express their displeasure at having to
  stat every file in a network mounted directory tree for info that is
  present in the dirent structure, so a definite +1 to os.scandir from
  me, so long as it makes that info available."
  [`source1 `_]

* **Tim Golden**, a core Python developer, supports scandir enough to
  have spent time refactoring and significantly improving scandir's C
  extension module.
  [`source2 `_]

* **Christian Heimes**, a core Python developer: "+1 for something
  like yielddir()"
  [`source3 `_]
  and "Indeed! I'd like to see the feature in 3.4 so I can remove my
  own hack from our code base."
  [`source4 `_]

* **Gregory P. Smith**, a core Python developer: "As 3.4beta1 happens
  tonight, this isn't going to make 3.4 so i'm bumping this to 3.5.
  I really like the proposed design outlined above."
  [`source5 `_]

* **Guido van Rossum** on the possibility of adding scandir to Python
  3.5 (as it was too late for 3.4): "The ship has likewise sailed for
  adding scandir() (whether to os or pathlib). By all means experiment
  and get it ready for consideration for 3.5, but I don't want to add
  it to 3.4."
  [`source6 `_]

Support for this PEP itself (meta-support?) was given by Nick Coghlan
on python-dev: "A PEP reviewing all this for 3.5 and proposing a
specific os.scandir API would be a good thing."
[`source7 `_]


Use in the wild
===============

To date, the ``scandir`` implementation is definitely useful, but has
been clearly marked "beta", so it's uncertain how much use of it there
is in the wild. Ben Hoyt has had several reports from people using it.
For example:

* Chris F: "I am processing some pretty large directories and was half
  expecting to have to modify getdents. So thanks for saving me the
  effort." [via personal email]

* bschollnick: "I wanted to let you know about this, since I am using
  Scandir as a building block for this code. Here's a good example of
  scandir making a radical performance improvement over os.listdir."
  [`source8 `_]

* Avram L: "I'm testing our scandir for a project I'm working on.
  Seems pretty solid, so first thing, just want to say nice work!"
  [via personal email]

Others have `requested a PyPI package`_ for it, which has been
created. See `PyPI package`_.

.. _`requested a PyPI package`: https://github.com/benhoyt/scandir/issues/12
.. _`PyPI package`: https://pypi.python.org/pypi/scandir

GitHub stats don't mean too much, but scandir does have several
watchers, issues, forks, etc. Here's the run-down as of the stats as
of July 7, 2014:

* Watchers: 17
* Stars: 57
* Forks: 20
* Issues: 4 open, 26 closed

**However, the much larger point is this:**, if this PEP is accepted,
``os.walk()`` can easily be reimplemented using ``scandir`` rather
than ``listdir`` and ``stat``, increasing the speed of ``os.walk()``
very significantly. There are thousands of developers, scripts, and
production code that would benefit from this large speedup of
``os.walk()``. For example, on GitHub, there are almost as many uses
of ``os.walk`` (194,000) as there are of ``os.mkdir`` (230,000).


Rejected ideas
==============


Naming
------

The only other real contender for this function's name was
``iterdir()``. However, ``iterX()`` functions in Python (mostly found
in Python 2) tend to be simple iterator equivalents of their
non-iterator counterparts. For example, ``dict.iterkeys()`` is just an
iterator version of ``dict.keys()``, but the objects returned are
identical. In ``scandir()``'s case, however, the return values are
quite different objects (``DirEntry`` objects vs filename strings), so
this should probably be reflected by a difference in name -- hence
``scandir()``.

See some `relevant discussion on python-dev
`_.


Wildcard support
----------------

``FindFirstFile``/``FindNextFile`` on Windows support passing a
"wildcard" like ``*.jpg``, so at first folks (this PEP's author
included) felt it would be a good idea to include a
``windows_wildcard`` keyword argument to the ``scandir`` function so
users could pass this in.

However, on further thought and discussion it was decided that this
would be bad idea, *unless it could be made cross-platform* (a
``pattern`` keyword argument or similar). This seems easy enough at
first -- just use the OS wildcard support on Windows, and something
like ``fnmatch`` or ``re`` afterwards on POSIX-based systems.

Unfortunately the exact Windows wildcard matching rules aren't really
documented anywhere by Microsoft, and they're quite quirky (see this
`blog post
`_),
meaning it's very problematic to emulate using ``fnmatch`` or regexes.

So the consensus was that Windows wildcard support was a bad idea.
It would be possible to add at a later date if there's a
cross-platform way to achieve it, but not for the initial version.

Read more on the `this Nov 2012 python-ideas thread
`_
and this `June 2014 python-dev thread on PEP 471
`_.


DirEntry attributes being properties
------------------------------------

In some ways it would be nicer for the ``DirEntry`` ``is_X()`` and
``lstat()`` to be properties instead of methods, to indicate they're
very cheap or free. However, this isn't quite the case, as ``lstat()``
will require an OS call on POSIX-based systems but not on Windows.
Even ``is_dir()`` and friends may perform an OS call on POSIX-based
systems if the ``dirent.d_type`` value is ``DT_UNKNOWN`` (on certain
file systems).

Also, people would expect the attribute access ``entry.is_dir`` to
only ever raise ``AttributeError``, not ``OSError`` in the case it
makes a system call under the covers. Calling code would have to have
a ``try``/``except`` around what looks like a simple attribute access,
and so it's much better to make them *methods*.

See `this May 2013 python-dev thread
`_
where this PEP author makes this case and there's agreement from a
core developers.


DirEntry fields being "static" attribute-only objects
-----------------------------------------------------

In `this July 2014 python-dev message
`_,
Paul Moore suggested a solution that was a "thin wrapper round the OS
feature", where the ``DirEntry`` object had only static attributes:
``name``, ``full_name``, and ``is_X``, with the ``st_X`` attributes
only present on Windows. The idea was to use this simpler, lower-level
function as a building block for higher-level functions.

At first there was general agreement that simplifying in this way was
a good thing. However, there were two problems with this approach.
First, the assumption is the ``is_dir`` and similar attributes are
always present on POSIX, which isn't the case (if ``d_type`` is not
present or is ``DT_UNKNOWN``). Second, it's a much harder-to-use API
in practice, as even the ``is_dir`` attributes aren't always present
on POSIX, and would need to be tested with ``hasattr()`` and then
``os.stat()`` called if they weren't present.

See `this July 2014 python-dev response
`_
from this PEP's author detailing why this option is a non-ideal
solution, and the subsequent reply from Paul Moore voicing agreement.


DirEntry fields being static with an ensure_lstat option
--------------------------------------------------------

Another seemingly simpler and attractive option was suggested by
Nick Coghlan in this `June 2014 python-dev message
`_:
make ``DirEntry.is_X`` and ``DirEntry.lstat_result`` properties, and
populate ``DirEntry.lstat_result`` at iteration time, but only if
the new argument ``ensure_lstat=True`` was specified on the
``scandir()`` call.

This does have the advantage over the above in that you can easily get
the stat result from ``scandir()`` if you need it. However, it has the
serious disadvantage that fine-grained error handling is messy,
because ``stat()`` will be called (and hence potentially raise
``OSError``) during iteration, leading to a rather ugly, hand-made
iteration loop::

    it = os.scandir(path)
    while True:
        try:
            entry = next(it)
        except OSError as error:
            handle_error(path, error)
        except StopIteration:
            break

Or it means that ``scandir()`` would have to accept an ``onerror``
argument -- a function to call when ``stat()`` errors occur during
iteration. This seems to this PEP's author neither as direct nor as
Pythonic as ``try``/``except`` around a ``DirEntry.lstat()`` call.

See `Ben Hoyt's July 2014 reply
`_
to the discussion summarizing this and detailing why he thinks the
original PEP 471 proposal is "the right one" after all.


Return values being (name, stat_result) two-tuples
--------------------------------------------------

Initially this PEP's author proposed this concept as a function called
``iterdir_stat()`` which yielded two-tuples of (name, stat_result).
This does have the advantage that there are no new types introduced.
However, the ``stat_result`` is only partially filled on POSIX-based
systems (most fields set to ``None`` and other quirks), so they're not
really ``stat_result`` objects at all, and this would have to be
thoroughly documented as different from ``os.stat()``.

Also, Python has good support for proper objects with attributes and
methods, which makes for a saner and simpler API than two-tuples. It
also makes the ``DirEntry`` objects more extensible and future-proof
as operating systems add functionality and we want to include this in
``DirEntry``.

See also some previous discussion:

* `May 2013 python-dev thread
  `_
  where Nick Coghlan makes the original case for a ``DirEntry``-style
  object.

* `June 2014 python-dev thread
  `_
  where Nick Coghlan makes (another) good case against the two-tuple
  approach.


Return values being overloaded stat_result objects
--------------------------------------------------

Another alternative discussed was making the return values to be
overloaded ``stat_result`` objects with ``name`` and ``full_name``
attributes. However, apart from this being a strange (and strained!)
kind of overloading, this has the same problems mentioned above --
most of the ``stat_result`` information is not fetched by
``readdir()`` on POSIX systems, only (part of) the ``st_mode`` value.


Return values being pathlib.Path objects
----------------------------------------

With Antoine Pitrou's new standard library ``pathlib`` module, it
at first seems like a great idea for ``scandir()`` to return instances
of ``pathlib.Path``. However, ``pathlib.Path``'s ``is_X()`` and
``lstat()`` functions are explicitly not cached, whereas ``scandir``
has to cache them by design, because it's (often) returning values
from the original directory iteration system call.

And if the ``pathlib.Path`` instances returned by ``scandir`` cached
lstat values, but the ordinary ``pathlib.Path`` objects explicitly
don't, that would be more than a little confusing.

Guido van Rossum explicitly rejected ``pathlib.Path`` caching lstat in
the context of scandir `here
`_,
making ``pathlib.Path`` objects a bad choice for scandir return
values.


Possible improvements
=====================

There are many possible improvements one could make to scandir, but
here is a short list of some this PEP's author has in mind:

* scandir could potentially be further sped up by calling ``readdir``
  / ``FindNextFile`` say 50 times per ``Py_BEGIN_ALLOW_THREADS`` block
  so that it stays in the C extension module for longer, and may be
  somewhat faster as a result. This approach hasn't been tested, but
  was suggested by on Issue 11406 by Antoine Pitrou.
  [`source9 `_]

* scandir could use a free list to avoid the cost of memory allocation
  for each iteration -- a short free list of 10 or maybe even 1 may help.
  Suggested by Victor Stinner on a `python-dev thread on June 27`_.

.. _`python-dev thread on June 27`:
https://mail.python.org/pipermail/python-dev/2014-June/135232.html


Previous discussion
===================

* `Original thread Ben Hoyt started on python-ideas`_ about speeding
  up ``os.walk()``

* Python `Issue 11406`_, which includes the original proposal for a
  scandir-like function

* `Further thread Ben Hoyt started on python-dev`_ that refined the
  ``scandir()`` API, including Nick Coghlan's suggestion of scandir
  yielding ``DirEntry``-like objects

* `Another thread Ben Hoyt started on python-dev`_ to discuss the
  interaction between scandir and the new ``pathlib`` module

* `Final thread Ben Hoyt started on python-dev`_ to discuss the first
  version of this PEP, with extensive discussion about the API.

* `Question on StackOverflow`_ about why ``os.walk()`` is slow and
  pointers on how to fix it (this inspired the author of this PEP
  early on)

* `BetterWalk`_, this PEP's author's previous attempt at this, on
  which the scandir code is based

.. _`Original thread Ben Hoyt started on python-ideas`:
https://mail.python.org/pipermail/python-ideas/2012-November/017770.html
.. _`Further thread Ben Hoyt started on python-dev`:
https://mail.python.org/pipermail/python-dev/2013-May/126119.html
.. _`Another thread Ben Hoyt started on python-dev`:
https://mail.python.org/pipermail/python-dev/2013-November/130572.html
.. _`Final thread Ben Hoyt started on python-dev`:
https://mail.python.org/pipermail/python-dev/2014-June/135215.html
.. _`Question on StackOverflow`:
http://stackoverflow.com/questions/2485719/very-quickly-getting-total-size-of-folder
.. _`BetterWalk`: https://github.com/benhoyt/betterwalk


Copyright
=========

This document has been placed in the public domain.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End:

From guido at python.org  Tue Jul  8 16:48:39 2014
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jul 2014 07:48:39 -0700
Subject: [Python-Dev] buildbot.python.org down again?
In-Reply-To: <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io>
References: 
 
 
 
 <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io>
Message-ID: 

May the true owner of buildbot.python.org stand up!

(But I do think there may well not be anyone who feels they own it. And
that's a problem for its long term viability.)

Generally speaking, as an organization we should set up a process for
managing ownership of *all* infrastructure in a uniform way. I don't mean
to say that we need to manage all infrastructure uniformly, just that we
need to have a process for identifying and contacting the owner(s) for each
piece of infrastructure, as well as collecting other information that
people besides the owners might need to know. You can use a wiki page for
that list for all I care, but have a process for what belongs there,
how/when to update it, and even an owner for the wiki page! Stuff like this
shouldn't be just in a few people's heads (even if they are board members)
nor should it be in a file in a repo that nobody has ever heard of.


On Tue, Jul 8, 2014 at 12:33 AM, Donald Stufft  wrote:

>
> On Jul 8, 2014, at 12:58 AM, Nick Coghlan  wrote:
>
>
> On 7 Jul 2014 10:47, "Guido van Rossum"  wrote:
> >
> > It would still be nice to know who "the appropriate persons" are. Too
> much of our infrastructure seems to be maintained by house elves or the ITA.
>
> I volunteered to be the board's liaison to the infrastructure team, and
> getting more visibility around what the infrastructure *is* and how it's
> monitored and supported is going to be part of that. That will serve a
> couple of key purposes:
>
> - making the points of escalation clearer if anything breaks or needs
> improvement (although "infrastructure at python.org" is a good default
> choice)
> - making the current "todo" list of the infrastructure team more visible
> (both to calibrate resolution time expectations and to provide potential
> contributors an idea of what's involved)
>
> Noah has already set up http://status.python.org/ to track service
> status, I can see about getting buildbot.python.org added to the list.
>
> Cheers,
> Nick.
>
>
> We (the infrastructure team) were actually looking earlier about
> buildbot.python.org and we?re not entirely sure who "owns"
> buildbot.python.org.
> Unfortunately a lot of the *.python.org services are in a similar state
> where
> there is no clear owner. Generally we've not wanted to just step in and
> take
> over for fear of stepping on someones toes but it appears that perhaps
> buildbot.p.o has no owner?
>
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
> DCFA
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From victor.stinner at gmail.com  Tue Jul  8 17:13:08 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 8 Jul 2014 17:13:08 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
Message-ID: 

Hi,

2014-07-08 15:52 GMT+02:00 Ben Hoyt :
> After some very good python-dev feedback on my first version of PEP
> 471, I've updated the PEP to clarify a few things and added various
> "Rejected ideas" subsections. Here's a link to the new version (I've
> also copied the full text below):

Thanks, the new PEP looks better.

> * Removed the "open issues" section, as the three open issues have
> either been included (full_name) or rejected (windows_wildcard)

I remember a pending question on python-dev:

- Martin von Loewis asked if the scandir generator would have send()
and close() methods as any Python generator. I didn't see a reply on
the mailing (nor in the PEP).

> One known error in the PEP is that the "Notes" sections should be
> top-level sections, not be subheadings of "Examples". If someone would
> like to give me ("benhoyt") commit access to the peps repo, I can fix
> this and any other issues that come up.

Or just send me your new PEP ;-)

> Notes on caching
> ----------------
>
> The ``DirEntry`` objects are relatively dumb -- the ``name`` and
> ``full_name`` attributes are obviously always cached, and the ``is_X``
> and ``lstat`` methods cache their values (immediately on Windows via
> ``FindNextFile``, and on first use on POSIX systems via a ``stat``
> call) and never refetch from the system.

It is not clear to me which methods share the cache.

On UNIX, is_dir() and is_file() call os.stat(); whereas lstat() and
is_symlink() call os.lstat().

If os.stat() says that the file is not a symlink, I guess that you can
use os.stat() result for lstat() and is_symlink() methods?

In the worst case, if the path is a symlink, would it be possible that
os.stat() and os.lstat() become "inconsistent" if the symlink is
modified between the two calls? If yes, I don't think that it's an
issue, it's just good to know it.

For symlinks, readdir() returns the status of the linked file or of the symlink?

Victor

From 2014 at jmunch.dk  Tue Jul  8 16:58:33 2014
From: 2014 at jmunch.dk (Anders J. Munch)
Date: Tue, 08 Jul 2014 16:58:33 +0200
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: 
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>
 
Message-ID: <53BC0719.1070705@jmunch.dk>

Chris Angelico wrote:
>
> This is off-topic for this thread, but still...
>
> The trouble is that your "arguably just as wrong" is an
> indistinguishable case. If you don't want two different calculations'
> NaNs to *ever* compare equal, the only solution is to have all NaNs
> compare unequal
For two NaNs computed differently to compare equal is no worse than 2+2 
comparing equal to 1+3.  You're comparing values, not their history.

You've prompted me to get a rant on the subject off my chest, I just posted an 
article on NaN comparisons to python-list.

regards, Anders


From janzert at janzert.com  Tue Jul  8 17:44:54 2014
From: janzert at janzert.com (Janzert)
Date: Tue, 08 Jul 2014 11:44:54 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
Message-ID: 

On 7/8/2014 9:52 AM, Ben Hoyt wrote:
> DirEntry fields being "static" attribute-only objects
> -----------------------------------------------------
>
> In `this July 2014 python-dev message
> `_,
> Paul Moore suggested a solution that was a "thin wrapper round the OS
> feature", where the ``DirEntry`` object had only static attributes:
> ``name``, ``full_name``, and ``is_X``, with the ``st_X`` attributes
> only present on Windows. The idea was to use this simpler, lower-level
> function as a building block for higher-level functions.
>
> At first there was general agreement that simplifying in this way was
> a good thing. However, there were two problems with this approach.
> First, the assumption is the ``is_dir`` and similar attributes are
> always present on POSIX, which isn't the case (if ``d_type`` is not
> present or is ``DT_UNKNOWN``). Second, it's a much harder-to-use API
> in practice, as even the ``is_dir`` attributes aren't always present
> on POSIX, and would need to be tested with ``hasattr()`` and then
> ``os.stat()`` called if they weren't present.
>

Only exposing what the OS provides for free will make the API too 
difficult to use in the common case. But is there a nice way to expand 
the API that will allow the user who is trying to avoid extra expense 
know what information is already available?

Even if the initial version doesn't have a way to check what information 
is there for free, ensuring there is a clean way to add this in the 
future would be really nice.

Janzert


From steve at pearwood.info  Tue Jul  8 18:57:45 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 9 Jul 2014 02:57:45 +1000
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <20140708165745.GJ13014@ando>

On Tue, Jul 08, 2014 at 04:53:50PM +0900, Stephen J. Turnbull wrote:
> Chris Angelico writes:
> 
>  > The reason NaN isn't equal to itself is because there are X bit
>  > patterns representing NaN, but an infinite number of possible
>  > non-numbers that could result from a calculation.
> 
> I understand that.  But you're missing at least two alternatives that
> involve raising on some calculations involving NaN, as well as the
> fact that forcing inequality of two NaNs produced by equivalent
> calculations is arguably just as wrong as allowing equality of two
> NaNs produced by the different calculations.  

I don't think so. Floating point == represents *numeric* equality, not
(for example) equality in the sense of "All Men Are Created Equal". Not
even numeric equality in the most general sense, but specifically in the
sense of (approximately) real-valued numbers, so it's an extremely 
precise definition of "equal", not fuzzy in any way.

In an early post, you suggested that NANs don't have a value, or that 
they have a value which is not a value. I don't think that's a good way 
to look at it. I think the obvious way to think of it is that NAN's 
value is Not A Number, exactly like it says on the box. Now, if 
something is not a number, obviously you cannot compare it numerically:

    "Considered as numbers, is the sound of rain on a tin roof
     numerically equal to the sight of a baby smiling?"

Some might argue that the only valid answer to this question is "Mu",

https://en.wikipedia.org/wiki/Mu_%28negative%29#.22Unasking.22_the_question

but if we're forced to give a Yes/No True/False answer, then clearly
False is the only sensible answer. No, Virginia, Santa Claus is not the 
same number as Santa Claus.

To put it another way, if x is not a number, then x != y for all 
possible values of y -- including x.

[Disclaimer: despite the name, IEEE-754 arguably does not intend NANs to 
be Not A Number in the sense that Santa Claus is not a number, but more 
like "it's some number, but it's impossible to tell which". However, 
despite that, the standard specifies behaviour which is best thought of 
in terms of as the Santa Claus model.]



> That's where things get
> fuzzy for me -- in Python I would expect that preserving invariants
> would be more important than computational efficiency, but evidently
> it's not.  

I'm not sure what you're referring to here. Is it that containers such 
as lists and dicts are permitted to optimize equality tests with 
identity tests for speed?

py> NAN = float('NAN')
py> a = [1, 2, NAN, 4]
py> NAN in a  # identity is checked before equality
True
py> any(x == NAN for x in a)
False


When this came up for discussion last time, the clear consensus was that 
this is reasonable behaviour. NANs and other such "weird" objects are 
too rare and too specialised for built-in classes to carry the burden of 
having to allow for them. If you want a "NAN-aware list", you can make 
one yourself.


> I assume that I would have a better grasp on why Python
> chose to go this way rather than that if I understood IEEE 754 better.

See the answer by Stephen Canon here:

http://stackoverflow.com/questions/1565164/

[quote]

It is not possible to specify a fixed-size arithmetic type that 
satisfies all of the properties of real arithmetic that we know and 
love. The 754 committee has to decide to bend or break some of them. 
This is guided by some pretty simple principles:

    When we can, we match the behavior of real arithmetic.
    When we can't, we try to make the violations as predictable and as 
    easy to diagnose as possible.

[end quote]


In particular, reflexivity for NANs was dropped for a number of reasons, 
some stronger than others:

- One of the weaker reasons for NAN non-reflexivity is that it preserved
  the identity x == y <=> x - y == 0. Although that is the cornerstone 
  of real arithmetic, it's violated by IEEE-754 INFs, so violating it
  for NANs is not a big deal either.

- Dropping reflexivity preserves the useful property that NANs compare 
  unequal to everything.

- Practicality beats purity: dropping reflexivity allowed programmers
  to identify NANs without waiting years or decades for programming 
  languages to implement isnan() functions. E.g. before Python had 
  math.isnan(), I made my own:

  def isnan(x):
      return isinstance(x, float) and x != x

- Keeping reflexivity for NANs would have implied some pretty nasty
  things, e.g. if log(-3) == log(-5), then -3 == -5.


Basically, and I realise that many people disagree with their decision 
(notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the 
IEEE-754 committee led by William Kahan decided that the problems caused 
by having NANs compare unequal to themselves were much less than the 
problems that would have been caused without it.



-- 
Steven

From steve at pearwood.info  Tue Jul  8 19:00:46 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 9 Jul 2014 03:00:46 +1000
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <53BC0719.1070705@jmunch.dk>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <53BC0719.1070705@jmunch.dk>
Message-ID: <20140708170046.GK13014@ando>

On Tue, Jul 08, 2014 at 04:58:33PM +0200, Anders J. Munch wrote:

> For two NaNs computed differently to compare equal is no worse than 2+2 
> comparing equal to 1+3.  You're comparing values, not their history.

a = -23
b = -42
if log(a) == log(b):
    print "a == b"


-- 
Steven

From rosuav at gmail.com  Tue Jul  8 19:13:00 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 9 Jul 2014 03:13:00 +1000
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <20140708170046.GK13014@ando>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com>
 <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <53BC0719.1070705@jmunch.dk> <20140708170046.GK13014@ando>
Message-ID: 

On Wed, Jul 9, 2014 at 3:00 AM, Steven D'Aprano  wrote:
> On Tue, Jul 08, 2014 at 04:58:33PM +0200, Anders J. Munch wrote:
>
>> For two NaNs computed differently to compare equal is no worse than 2+2
>> comparing equal to 1+3.  You're comparing values, not their history.
>
> a = -23
> b = -42
> if log(a) == log(b):
>     print "a == b"

That could also happen from rounding error, though.

>>> a = 2.0**52
>>> b = a+1.0
>>> a == b
False
>>> log(a) == log(b)
True

Any time you do any operation on numbers that are close together but
not equal, you run the risk of getting results that, in
finite-precision floating point, are deemed equal, even though
mathematically they shouldn't be (two unequal numbers MUST have
unequal logarithms).

ChrisA

From python at mrabarnett.plus.com  Tue Jul  8 19:33:31 2014
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 08 Jul 2014 18:33:31 +0100
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <20140708165745.GJ13014@ando>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <20140708165745.GJ13014@ando>
Message-ID: <53BC2B6B.3080209@mrabarnett.plus.com>

On 2014-07-08 17:57, Steven D'Aprano wrote:
[snip]
>
> In particular, reflexivity for NANs was dropped for a number of reasons,
> some stronger than others:
>
> - One of the weaker reasons for NAN non-reflexivity is that it preserved
>    the identity x == y <=> x - y == 0. Although that is the cornerstone
>    of real arithmetic, it's violated by IEEE-754 INFs, so violating it
>    for NANs is not a big deal either.
>
> - Dropping reflexivity preserves the useful property that NANs compare
>    unequal to everything.
>
> - Practicality beats purity: dropping reflexivity allowed programmers
>    to identify NANs without waiting years or decades for programming
>    languages to implement isnan() functions. E.g. before Python had
>    math.isnan(), I made my own:
>
>    def isnan(x):
>        return isinstance(x, float) and x != x
>
> - Keeping reflexivity for NANs would have implied some pretty nasty
>    things, e.g. if log(-3) == log(-5), then -3 == -5.
>
The log of a negative number is a complex number.
>
> Basically, and I realise that many people disagree with their decision
> (notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the
> IEEE-754 committee led by William Kahan decided that the problems caused
> by having NANs compare unequal to themselves were much less than the
> problems that would have been caused without it.
>


From benhoyt at gmail.com  Tue Jul  8 20:03:00 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 8 Jul 2014 14:03:00 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
Message-ID: 

> I remember a pending question on python-dev:
>
> - Martin von Loewis asked if the scandir generator would have send()
> and close() methods as any Python generator. I didn't see a reply on
> the mailing (nor in the PEP).

Good call. Looks like you're referring to this message:
https://mail.python.org/pipermail/python-dev/2014-July/135324.html

I'm not actually familiar with the purpose of .close() and
.send()/.throw() on generators. Do you typically call these functions
manually, or are they called automatically by the generator protocol?

> It is not clear to me which methods share the cache.
>
> On UNIX, is_dir() and is_file() call os.stat(); whereas lstat() and
> is_symlink() call os.lstat().
>
> If os.stat() says that the file is not a symlink, I guess that you can
> use os.stat() result for lstat() and is_symlink() methods?
>
> In the worst case, if the path is a symlink, would it be possible that
> os.stat() and os.lstat() become "inconsistent" if the symlink is
> modified between the two calls? If yes, I don't think that it's an
> issue, it's just good to know it.
>
> For symlinks, readdir() returns the status of the linked file or of the symlink?

I think you're misunderstanding is_dir() and is_file(), as these don't
actually call os.stat(). All DirEntry methods either call nothing or
os.lstat() to get the stat info on the entry itself (not the
destination of the symlink). In light of this, I don't think what
you're describing above is an issue.

-Ben

From benhoyt at gmail.com  Tue Jul  8 20:05:53 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 8 Jul 2014 14:05:53 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
Message-ID: 

> Only exposing what the OS provides for free will make the API too difficult
> to use in the common case. But is there a nice way to expand the API that
> will allow the user who is trying to avoid extra expense know what
> information is already available?
>
> Even if the initial version doesn't have a way to check what information is
> there for free, ensuring there is a clean way to add this in the future
> would be really nice.

We could easily add ".had_type" and ".had_lstat" properties (not sure
on the names), that would be true if the is_X information and lstat
information was fetched, respectively. Basically both would always be
True on Windows, but on POSIX only had_type would be True d_type is
present and != DT_UNKNOWN.

I don't feel this is actually necessary, but it's not hard to add.

Thoughts?

-Ben

From ethan at stoneleaf.us  Tue Jul  8 21:02:56 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 08 Jul 2014 12:02:56 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
Message-ID: <53BC4060.5090805@stoneleaf.us>

On 07/08/2014 11:05 AM, Ben Hoyt wrote:
>> Only exposing what the OS provides for free will make the API too difficult
>> to use in the common case. But is there a nice way to expand the API that
>> will allow the user who is trying to avoid extra expense know what
>> information is already available?
>>
>> Even if the initial version doesn't have a way to check what information is
>> there for free, ensuring there is a clean way to add this in the future
>> would be really nice.
>
> We could easily add ".had_type" and ".had_lstat" properties (not sure
> on the names), that would be true if the is_X information and lstat
> information was fetched, respectively. Basically both would always be
> True on Windows, but on POSIX only had_type would be True d_type is
> present and != DT_UNKNOWN.
>
> I don't feel this is actually necessary, but it's not hard to add.
>
> Thoughts?

Better to just have the attributes be None if they were not fetched.  None is better than hasattr anyway, at least in 
the respect of not having to catch exceptions to function properly.

--
~Ethan~

From benhoyt at gmail.com  Tue Jul  8 21:34:26 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 8 Jul 2014 15:34:26 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BC4060.5090805@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
Message-ID: 

> Better to just have the attributes be None if they were not fetched.  None
> is better than hasattr anyway, at least in the respect of not having to
> catch exceptions to function properly.

The thing is, is_dir() and lstat() are not attributes (for a good
reason). Please read the relevant "Rejected ideas" sections and let us
know what you think. :-)

-Ben

From victor.stinner at gmail.com  Tue Jul  8 21:55:59 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 8 Jul 2014 21:55:59 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
Message-ID: 

Le mardi 8 juillet 2014, Ben Hoyt  a ?crit :

>
> > It is not clear to me which methods share the cache.
> >
> > On UNIX, is_dir() and is_file() call os.stat(); whereas lstat() and
> > is_symlink() call os.lstat().
> >
> > If os.stat() says that the file is not a symlink, I guess that you can
> > use os.stat() result for lstat() and is_symlink() methods?
> >
> > In the worst case, if the path is a symlink, would it be possible that
> > os.stat() and os.lstat() become "inconsistent" if the symlink is
> > modified between the two calls? If yes, I don't think that it's an
> > issue, it's just good to know it.
> >
> > For symlinks, readdir() returns the status of the linked file or of the
> symlink?
>
> I think you're misunderstanding is_dir() and is_file(), as these don't
> actually call os.stat(). All DirEntry methods either call nothing or
> os.lstat() to get the stat info on the entry itself (not the
> destination of the symlink).


Oh. Extract of your PEP: "is_dir(): like os.path.isdir(), but much cheaper".

genericpath.isdir() and genericpath.isfile() use os.stat(), whereas
posixpath.islink() uses os.lstat().

Is it a mistake in the PEP?

> In light of this, I don't think what you're describing above is an issue.

I'm not saying that there is an issue, I'm just trying to understand.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From benhoyt at gmail.com  Tue Jul  8 22:09:36 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 8 Jul 2014 16:09:36 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
Message-ID: 

>> I think you're misunderstanding is_dir() and is_file(), as these don't
>> actually call os.stat(). All DirEntry methods either call nothing or
>> os.lstat() to get the stat info on the entry itself (not the
>> destination of the symlink).
>
>
> Oh. Extract of your PEP: "is_dir(): like os.path.isdir(), but much cheaper".
>
> genericpath.isdir() and genericpath.isfile() use os.stat(), whereas
> posixpath.islink() uses os.lstat().
>
> Is it a mistake in the PEP?

Ah, you're dead right -- this is basically a bug in the PEP, as
DirEntry.is_dir() is not like os.path.isdir() in that it is based on
the entry itself (like lstat), not following the link.

I'll improve the wording here and update the PEP.

-Ben

From ethan at stoneleaf.us  Tue Jul  8 22:22:33 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 08 Jul 2014 13:22:33 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
Message-ID: <53BC5309.6000605@stoneleaf.us>

On 07/08/2014 12:34 PM, Ben Hoyt wrote:
>>
>> Better to just have the attributes be None if they were not fetched.  None
>> is better than hasattr anyway, at least in the respect of not having to
>> catch exceptions to function properly.
>
> The thing is, is_dir() and lstat() are not attributes (for a good
> reason). Please read the relevant "Rejected ideas" sections and let us
> know what you think. :-)

I did better than that -- I read the whole thing!  ;)

-1 on the PEP's implementation.

Just like an attribute does not imply a system call, having a method named 'is_dir' /does/ imply a system call, and not 
having one can be just as misleading.

If we have this:

     size = 0
     for entry in scandir('/some/path'):
         size += entry.st_size

   - on Windows, this should Just Work (if I have the names correct ;)
   - on Posix, etc., this should fail noisily with either an AttributeError
     ('entry' has no 'st_size') or a TypeError (cannot add None)

and the solution is equally simple:

     for entry in scandir('/some/path', stat=True):

   - if not Windows, perform a stat call at the same time

Now, of course, we might get errors.  I am not a big fan of wrapping everything in try/except, particularly when we 
already have a model to follow -- os.walk:

     for entry in scandir('/some/path', stat=True, onerror=record_and_skip):

If we don't care if an error crashes the script, leave off onerror.

If we don't need st_size and friends, leave off stat=True.

If we get better performance on Windows instead of Linux, that's okay.

scandir is going into os because it may not behave the same on every platform.  Heck, even some non-os modules 
(multiprocessing comes to mind) do not behave the same on every platform.

I think caching the attributes for DirEntry is fine, but let's do it as a snapshot of that moment in time, not name now, 
and attributes in 30 minutes when we finally get to you because we had a lot of processing/files ahead of you (you being 
a DirEntry ;) .

--
~Ethan~

From ethan at stoneleaf.us  Tue Jul  8 23:05:22 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 08 Jul 2014 14:05:22 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BC5309.6000605@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
Message-ID: <53BC5D12.30105@stoneleaf.us>

On 07/08/2014 01:22 PM, Ethan Furman wrote:
>
> I think caching the attributes for DirEntry is fine, but let's do it as a snapshot of that moment in time, not name now,
> and attributes in 30 minutes when we finally get to you because we had a lot of processing/files ahead of you (you being
> a DirEntry ;) .

This bit is wrong, I think, since scandir is a generator -- there wouldn't be much time passing between the direntry 
call and the stat call in any case.  Hopefully my other points still hold.

--
~Ethan~

From benhoyt at gmail.com  Wed Jul  9 03:08:03 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 8 Jul 2014 21:08:03 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BC5309.6000605@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
Message-ID: 

> I did better than that -- I read the whole thing!  ;)

Thanks. :-)

> -1 on the PEP's implementation.
>
> Just like an attribute does not imply a system call, having a
> method named 'is_dir' /does/ imply a system call, and not
> having one can be just as misleading.

Why does a method imply a system call? os.path.join() and str.lower()
don't make system calls. Isn't it just a matter of clear
documentation? Anyway -- less philosophical discussion below.

> If we have this:
>
>     size = 0
>     for entry in scandir('/some/path'):
>         size += entry.st_size
>
>   - on Windows, this should Just Work (if I have the names correct ;)
>   - on Posix, etc., this should fail noisily with either an AttributeError
>     ('entry' has no 'st_size') or a TypeError (cannot add None)
>
> and the solution is equally simple:
>
>     for entry in scandir('/some/path', stat=True):
>
>   - if not Windows, perform a stat call at the same time

I'm not totally opposed to this, which is basically a combination of
Nick Coghlan's and Paul Moore's recent proposals mentioned in the PEP.
However, as discussed on python-dev, there are some edge cases it
doesn't handle very well, and it's messier to handle errors (requires
onerror as you mention below).

I presume you're suggesting that is_dir/is_file/is_symlink should be
regular attributes, and accessing them should never do a system call.
But what if the system doesn't support d_type (eg: Solaris) or the
d_type value is DT_UNKNOWN (can happen on Linux, OS X, BSD)? The
options are:

1) scandir() would always call lstat() in the case of missing/unknown
d_type. If so, scandir() is actually more expensive than listdir(),
and as a result it's no longer safe to implement listdir in terms of
scandir:

def listdir(path='.'):
    return [e.name for e in scandir(path)]

2) Or would it be better to have another flag like scandir(path,
type=True) to ensure the is_X type info is fetched? This is explicit,
but also getting kind of unwieldly.

3) A third option is for the is_X attributes to be absent in this case
(hasattr tests required, and the user would do the lstat manually).
But as I noted on python-dev recently, you basically always want is_X,
so this leads to unwieldly and code that's twice as long as it needs
to be. See here:
https://mail.python.org/pipermail/python-dev/2014-July/135312.html

4) I gather in your proposal above, scandir will call lstat() if
stat=True? Except where does it put the values? Surely it should
return an existing stat_result object, rather than stuffing everything
onto the DirEntry, or throwing away some values on Linux? In this
case, I'd prefer Nick Coghlan's approach of ensure_lstat and a
.stat_result attribute. However, this still has the "what if d_type is
missing or DT_UNKNOWN" issue.

It seems to me that making is_X() methods handles this exact scenario
-- methods are so you don't have to do the dirty work.

So yes, the real world is messy due to missing is_X values, but I
think it's worth getting this right, and is_X() methods can do this
while keeping the API simple and cross-platform.

> Now, of course, we might get errors.  I am not a big fan of wrapping everything in try/except, particularly when we already have a model to follow -- os.walk:

I don't mind the onerror too much if we went with this kind of
approach. It's not quite as nice as a standard try/except around the
method call, but it's definitely workable and has a precedent with
os.walk().

It seems a bit like we're going around in circles here, and I think we
have all the information and options available to us, so I'm going to
SUMMARIZE.

We have a choice before us, a fork in the road. :-) We can choose one
of these options for the scandir API:

1) The current PEP 471 approach. This solves the issue with d_type
being missing or DT_UNKNOWN, it doesn't require onerror, and it's a
really tidy API that doesn't explode with AttributeErrors if you write
code on Windows (without thinking too hard) and then move to Linux. I
think all of these points are important -- the cross-platform one not
the least, because we want to make it easy, even *trivial*, for people
to write cross-platform code.

For reference, here's what get_tree_size() looks like with this
approach, not including error handling with try/except:

def get_tree_size(path):
    total = 0
    for entry in os.scandir(path):
        if entry.is_dir():
            total += get_tree_size(entry.full_name)
        else:
            total += entry.lstat().st_size
    return total

2) Nick Coghlan's model of only fetching the lstat value if
ensure_lstat=True, and including an onerror callback for error
handling when scandir calls lstat internally. However, as described,
we'd also need an ensure_type=True option, so that scandir() isn't way
slower than listdir() if you actually don't want the is_X values and
d_type is missing/unknown.

For reference, here's what get_tree_size() looks like with this
approach, not including error handling with onerror:

def get_tree_size(path):
    total = 0
    for entry in os.scandir(path, ensure_type=True, ensure_lstat=True):
        if entry.is_dir:
            total += get_tree_size(entry.full_name)
        else:
            total += entry.lstat_result.st_size
    return total

I'm fairly strongly in favour of approach #1, but I wouldn't die if
everyone else thinks the benefits of #2 outweigh the somewhat less
nice API.

Comments and votes, please!

-Ben

From steve at pearwood.info  Wed Jul  9 03:22:42 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 9 Jul 2014 11:22:42 +1000
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <53BC2B6B.3080209@mrabarnett.plus.com>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <20140708165745.GJ13014@ando>
 <53BC2B6B.3080209@mrabarnett.plus.com>
Message-ID: <20140709012242.GL13014@ando>

On Tue, Jul 08, 2014 at 06:33:31PM +0100, MRAB wrote:

> The log of a negative number is a complex number.

Only in complex arithmetic. In real arithmetic, the log of a negative 
number isn't a number at all.

-- 
Steven

From ethan at stoneleaf.us  Wed Jul  9 03:31:55 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 08 Jul 2014 18:31:55 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
Message-ID: <53BC9B8B.40509@stoneleaf.us>

On 07/08/2014 06:08 PM, Ben Hoyt wrote:
>>
>> Just like an attribute does not imply a system call, having a
>> method named 'is_dir' /does/ imply a system call, and not
>> having one can be just as misleading.
>
> Why does a method imply a system call? os.path.join() and str.lower()
> don't make system calls. Isn't it just a matter of clear
> documentation? Anyway -- less philosophical discussion below.

In this case because the names are exactly the same as the os versions which /do/ make a system call.


> I presume you're suggesting that is_dir/is_file/is_symlink should be
> regular attributes, and accessing them should never do a system call.
> But what if the system doesn't support d_type (eg: Solaris) or the
> d_type value is DT_UNKNOWN (can happen on Linux, OS X, BSD)? The
> options are:

So if I'm finally understanding the root problem here:

   - listdir returns a list of strings, one for each filename and one for
     each directory, and keeps no other O/S supplied info.

   - os.walk, which uses listdir, then needs to go back to the O/S and
     refetch the thrown-away information

   - so it's slow.

The solution:

   - have scandir /not/ throw away the O/S supplied info

and the new problem:

   - not all O/Ses provide the same (or any) extra info about the
     directory entries

Have I got that right?

If so, I still like the attribute idea better (surprise!), we just need to revisit the 'ensure_lstat' (or whatever it's 
called) parameter:  instead of a true/false value, it could have a scale:

   - 0 = whatever the O/S gives us

   - 1 = at least the is_dir/is_file (whatever the other normal one is),
         and if the O/S doesn't give it to us for free than call lstat

   - 2 = we want it all -- call lstat if necessary on this platform

After all, the programmer should know up front how much of the extra info will be needed for the work that is trying to 
be done.


> We have a choice before us, a fork in the road. :-) We can choose one
> of these options for the scandir API:
>
> 1) The current PEP 471 approach. This solves the issue with d_type
> being missing or DT_UNKNOWN, it doesn't require onerror, and it's a
> really tidy API that doesn't explode with AttributeErrors if you write
> code on Windows (without thinking too hard) and then move to Linux. I
> think all of these points are important -- the cross-platform one not
> the least, because we want to make it easy, even *trivial*, for people
> to write cross-platform code.

Yes, but we don't want a function that sucks equally on all platforms.  ;)


> 2) Nick Coghlan's model of only fetching the lstat value if
> ensure_lstat=True, and including an onerror callback for error
> handling when scandir calls lstat internally. However, as described,
> we'd also need an ensure_type=True option, so that scandir() isn't way
> slower than listdir() if you actually don't want the is_X values and
> d_type is missing/unknown.

With the multi-level version of 'ensure_lstat' we do not need an extra 'ensure_type'.

For reference, here's what get_tree_size() looks like with this approach, not including error handling with onerror:

   def get_tree_size(path):
        total = 0
        for entry in os.scandir(path, ensure_lstat=1):
            if entry.is_dir:
                total += get_tree_size(entry.full_name)
            else:
                total += entry.lstat_result.st_size
        return total

And if we added the onerror here it would be a line fragment, as opposed to the extra four lines (at least) for the 
try/except in the first example (which I cut).


Finally:

Thank you for writing scandir, and this PEP.  Excellent work.

Oh, and +1 for option 2, slightly modified.  :)

--
~Ethan~

From raymond.hettinger at gmail.com  Wed Jul  9 03:48:17 2014
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Tue, 8 Jul 2014 18:48:17 -0700
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <53BB2F25.3020205@gmx.de>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
Message-ID: <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>


On Jul 7, 2014, at 4:37 PM, Andreas Maier  wrote:

> I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python.
> 
> The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it.

Once every few years, someone discovers IEEE-754, learns that NaNs
aren't supposed to be equal to themselves and becomes inspired
to open an old debate about whether the wreck Python in a effort
to make the world safe for NaNs.  And somewhere along the way,
people forget that practicality beats purity.

Here are a few thoughts on the subject that may or may not add
a little clarity ;-)

* Python already has IEEE-754 compliant NaNs:

       assert float('NaN') != float('NaN')

* Python already has the ability to filter-out NaNs:

       [x for x in container if not math.nan(x)]

* In the numeric world, the most common use of NaNs is for
  missing data (much like we usually use None).  The property
  of not being equality to itself is primarily useful in
  low level code optimized to run a calculation to completion
  without running frequent checks for invalid results
  (much like @n/a is used in MS Excel).

* Python also lets containers establish their own invariants
  to establish correctness, improve performance, and make it
  possible to reason about our programs:

           for x in c:
	       assert x in c

* Containers like dicts and sets have always used the rule
  that identity-implies equality.  That is central to their
  implementation.  In particular, the check of interned
  string keys relies on identity to bypass a slow
  character-by-character comparison to verify equality.

* Traditionally, a relation R is considered an equality
  relation if it is reflexive, symmetric, and transitive:

      R(x, x) -> True
      R(x, y) -> R(y, x)
      R(x, y) ^ R(y, z) -> R(x, z)

* Knowingly or not, programs tend to assume that all of those
  hold.  Test suites in particular assume that if you put
  something in a container that assertIn() will pass.

* Here are some examples of cases where non-reflexive objects
  would jeopardize the pragmatism of being able to reason
  about the correctness of programs:

      s = SomeSet()
      s.add(x)
      assert x in s

      s.remove(x)        # See collections.abc.Set.remove
      assert not s

      s.clear()          # See collections.abc.Set.clear
      asset not s

* What the above code does is up to the implementer of the
  container.  If you use the Set ABC, you can choose to
  implement __contains__() and discard() to use straight
  equality or identity-implies equality.  Nothing prevents
  you from making containers that are hard to reason about.

* The builtin containers make the choice for identity-implies
  equality so that it is easier to build fast, correct code.
  For the most part, this has worked out great (dictionaries
  in particular have had identify checks built-in from almost
  twenty years).

* Years ago, there was a debate about whether to add an __is__()
  method to allow overriding the is-operator.  The push for the
  change was the "pure" notion that "all operators should be
  customizable".  However, the idea was rejected based on the
  "practical" notions that it would wreck our ability to reason
  about code, it slow down all code that used identity checks,
  that library modules (ours and third-party) already made
  deep assumptions about what "is" means, and that people would
  shoot themselves in the foot with hard to find bugs.

Personally, I see no need to make the same mistake by removing
the identity-implies-equality rule from the built-in containers.
There's no need to upset the apple cart for nearly zero benefit.

IMO, the proposed quest for purity is misguided.
There are many practical reasons to let the builtin
containers continue work as the do now.


Raymond 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From stephen at xemacs.org  Wed Jul  9 06:21:11 2014
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 09 Jul 2014 13:21:11 +0900
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <20140708165745.GJ13014@ando>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com>
 <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp>
 <20140708165745.GJ13014@ando>
Message-ID: <87y4w3rwlk.fsf@uwakimon.sk.tsukuba.ac.jp>

Steven D'Aprano writes:

 > I don't think so. Floating point == represents *numeric* equality,

There is no such thing as floating point == in Python.  You can apply
== to two floating point numbers, but == (at the language level)
handles any two numbers, as well as pairs of things that aren't
numbers in the Python language.  So it's a design decision to include
NaNs at all, and another design decision to follow IEEE in giving them
behavior that violates the definition of equivalence relation for ==.

 > In an early post, you suggested that NANs don't have a value, or that 
 > they have a value which is not a value. I don't think that's a good way 
 > to look at it. I think the obvious way to think of it is that NAN's 
 > value is Not A Number, exactly like it says on the box. Now, if 
 > something is not a number, obviously you cannot compare it numerically:

And if Python can't do something you ask it to do, it raises an
exception.  Why should this be different?  Obviously, it's question of
expedience.

 > I'm not sure what you're referring to here. Is it that containers such 
 > as lists and dicts are permitted to optimize equality tests with 
 > identity tests for speed?

No, when I say I'm fuzzy I'm referring to the fact that although I
understand the logical rationale for IEEE 754 NaN behavior, I don't
really understand the ins and outs well enough to judge for myself
whether it's a good idea for Python to follow that model and turn ==
into something that is not an equivalence relation.

I'm not going to argue for a change, I just want to know where I stand.

 > Basically, and I realise that many people disagree with their decision 
 > (notably Bertrand Meyer of Eiffel fame, and our own Mark
 > Dickenson),

Indeed.  So "it's the standard" does not mean there is a consensus of
experts.  I'm willing to delegate to a consensus of expert opinion,
but not when some prominent local expert(s) disagree -- then I'd like
to understand well enough to come to my own conclusions.


From p.f.moore at gmail.com  Wed Jul  9 09:13:10 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 9 Jul 2014 08:13:10 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
Message-ID: 

On 9 July 2014 02:08, Ben Hoyt  wrote:
> Comments and votes, please!

+1 on option 1 (current PEP approach) at the moment, but I would like
to see how the error handling would look (suppose the function logs
files that can't be statted, and assumes a size of 0 for them). The
idea of a multi-level ensure_lstat isn't unreasonable, either, and
that helps option 2.

The biggest issue *I* see with option 2 is that people won't remember
to add the ensure_XXX argument, and that will result in more code that
seems to work but fails cross-platform. Unless scandir deliberately
fails if you use an attribute that you haven't "ensured", but that
would be really unfriendly...

Paul

From benhoyt at gmail.com  Wed Jul  9 14:48:04 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 08:48:04 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BC9B8B.40509@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
Message-ID: 

> In this case because the names are exactly the same as the os versions which
> /do/ make a system call.

Fair enough.

> So if I'm finally understanding the root problem here:
>
>   - listdir returns a list of strings, one for each filename and one for
>     each directory, and keeps no other O/S supplied info.
>
>   - os.walk, which uses listdir, then needs to go back to the O/S and
>     refetch the thrown-away information
>
>   - so it's slow.
> ...
> and the new problem:
>
>   - not all O/Ses provide the same (or any) extra info about the
>     directory entries
>
> Have I got that right?

Yes, that's exactly right.

> If so, I still like the attribute idea better (surprise!), we just need to
> revisit the 'ensure_lstat' (or whatever it's called) parameter:  instead of
> a true/false value, it could have a scale:
>
>   - 0 = whatever the O/S gives us
>
>   - 1 = at least the is_dir/is_file (whatever the other normal one is),
>         and if the O/S doesn't give it to us for free than call lstat
>
>   - 2 = we want it all -- call lstat if necessary on this platform
>
> After all, the programmer should know up front how much of the extra info
> will be needed for the work that is trying to be done.

Yeah, I think this is a good idea to make option #2 a bit nicer. I
don't like the magic constants, and using constants like
os.SCANDIR_LSTAT is annoying, so how about using strings? I also
suggest calling the parameter "info" (because it determines what info
is returned), so you'd do scandir(path, info='type') if you need just
the is_X type information.

I also think it's nice to have a way for power users to "just return
what the OS gives us". However, I think making this the default is a
bad idea, as it's just asking for cross-platform bugs (and it's easy
to prevent).

Paul Moore basically agrees with this in his reply yesterday, though I
disagree with him it would be unfriendly to fail hard unless you asked
for the info -- quite the opposite, Linux users would think it very
unfriendly when your code broke because you didn't ask for the info.
:-)

So how about tweaking option #2 a tiny bit more to this:

def scandir(path='.', info=None, onerror=None): ...

* if info is None (the default), only the .name and .full_name
attributes are present
* if info is 'type', scandir ensures the is_dir/is_file/is_symlink
attributes are present and either True or False
* if info is 'lstat', scandir additionally ensures a .lstat is present
and is a full stat_result object
* if info is 'os', scandir returns the attributes the OS provides
(everything on Windows, only is_X -- most of the time -- on POSIX)

* if onerror is not None and errors occur during any internal lstat()
call, onerror(exc) is called with the OSError exception object

Further point -- because the is_dir/is_file/is_symlink attributes are
booleans, it would be very bad for them to be present but None if you
didn't ask for (or the OS didn't return) the type information. Because
then "if entry.is_dir:" would be None and your code would think it
wasn't a directory, when actually you don't know. For this reason, all
attributes should fail with AttributeError if not fetched.

> Thank you for writing scandir, and this PEP.  Excellent work.

Thanks!

> Oh, and +1 for option 2, slightly modified.  :)

With the above tweaks, I'm getting closer to being 50/50. It's
probably 60% #1 and 40% #2 for me now. :-)

Okay folks -- please respond: option #1 as per the current PEP 471, or
option #2 with Ethan's multi-level thing tweaks as per the above?

-Ben

From victor.stinner at gmail.com  Wed Jul  9 15:05:05 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 9 Jul 2014 15:05:05 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
Message-ID: 

2014-07-08 22:09 GMT+02:00 Ben Hoyt :
>>> I think you're misunderstanding is_dir() and is_file(), as these don't
>>> actually call os.stat(). All DirEntry methods either call nothing or
>>> os.lstat() to get the stat info on the entry itself (not the
>>> destination of the symlink).
>>
>>
>> Oh. Extract of your PEP: "is_dir(): like os.path.isdir(), but much cheaper".
>>
>> genericpath.isdir() and genericpath.isfile() use os.stat(), whereas
>> posixpath.islink() uses os.lstat().
>>
>> Is it a mistake in the PEP?
>
> Ah, you're dead right -- this is basically a bug in the PEP, as
> DirEntry.is_dir() is not like os.path.isdir() in that it is based on
> the entry itself (like lstat), not following the link.
>
> I'll improve the wording here and update the PEP.

Ok, so it means that your example grouping files per type, files and
directories, is also wrong. Or at least, it behaves differently than
os.walk(). You should put symbolic links to directories in the "dirs"
list too.

if entry.is_dir():   # is_dir() checks os.lstat()
    dirs.append(entry)
elif entry.is_symlink() and os.path.isdir(entry):   # isdir() checks os.stat()
    dirs.append(entry)
else:
    non_dirs.append(entry)

Victor

From benhoyt at gmail.com  Wed Jul  9 15:12:24 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 09:12:24 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
Message-ID: 

> Ok, so it means that your example grouping files per type, files and
> directories, is also wrong. Or at least, it behaves differently than
> os.walk(). You should put symbolic links to directories in the "dirs"
> list too.
>
> if entry.is_dir():   # is_dir() checks os.lstat()
>     dirs.append(entry)
> elif entry.is_symlink() and os.path.isdir(entry):   # isdir() checks os.stat()
>     dirs.append(entry)
> else:
>     non_dirs.append(entry)

Yes, good call. I believe I'm doing this wrong in the scandir.py
os.walk() implementation too -- hence this open issue:
https://github.com/benhoyt/scandir/issues/4

-Ben

From p.f.moore at gmail.com  Wed Jul  9 15:12:34 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 9 Jul 2014 14:12:34 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
Message-ID: 

On 9 July 2014 13:48, Ben Hoyt  wrote:
> Okay folks -- please respond: option #1 as per the current PEP 471, or
> option #2 with Ethan's multi-level thing tweaks as per the above?

I'm probably about 50/50 at the moment. What will swing it for me is
likely error handling, so let's try both approaches with some error
handling:

Rules are that we calculate the total size of all files in a tree (as
returned from lstat), with files that fail to stat being logged and
their size assumed to be 0.

Option 1:

def get_tree_size(path):
    total = 0
    for entry in os.scandir(path):
        try:
            isdir = entry.is_dir()
        except OSError:
            logger.warn("Cannot stat {}".format(entry.full_name))
            continue
        if entry.is_dir():
            total += get_tree_size(entry.full_name)
        else:
            try:
                total += entry.lstat().st_size
            except OSError:
                logger.warn("Cannot stat {}".format(entry.full_name))
    return total

Option 2:
def log_err(exc):
    logger.warn("Cannot stat {}".format(exc.filename))

def get_tree_size(path):
    total = 0
    for entry in os.scandir(path, info='lstat', onerror=log_err):
        if entry.is_dir:
            total += get_tree_size(entry.full_name)
        else:
            total += entry.lstat.st_size
    return total

On this basis, #2 wins. However, I'm slightly uncomfortable using the
filename attribute of the exception in the logging, as there is
nothing in the docs saying that this will give a full pathname. I'd
hate to see "Unable to stat __init__.py"!!!

So maybe the onerror function should also receive the DirEntry object
- which will only have the name and full_name attributes, but that's
all that is needed.

OK, looks like option #2 is now my preferred option. My gut instinct
still rebels over an API that deliberately throws information away in
the default case, even though there is now an option to ask it to keep
that information, but I see the logic and can learn to live with it.

Paul

From antoine at python.org  Wed Jul  9 15:21:26 2014
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 09 Jul 2014 09:21:26 -0400
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <87y4w3rwlk.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando>
 <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp>
 
 <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <20140708165745.GJ13014@ando>
 <87y4w3rwlk.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: 

Le 09/07/2014 00:21, Stephen J. Turnbull a ?crit :
> Steven D'Aprano writes:
>
>   > I don't think so. Floating point == represents *numeric* equality,
>
> There is no such thing as floating point == in Python.  You can apply
> == to two floating point numbers, but == (at the language level)
> handles any two numbers, as well as pairs of things that aren't
> numbers in the Python language.

This is becoming pointless hair-splitting.

 >>> float.__eq__(1.0, 2.0)
False
 >>> float.__eq__(1.0, 2)
False
 >>> float.__eq__(1.0, 1.0+0J)
NotImplemented
 >>> float.__eq__(1, 2)
Traceback (most recent call last):
   File "", line 1, in 
TypeError: descriptor '__eq__' requires a 'float' object but received a 
'int'


Please direct any further discussion of this to python-ideas.



From benhoyt at gmail.com  Wed Jul  9 15:22:41 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 09:22:41 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
Message-ID: 

> Option 2:
> def log_err(exc):
>     logger.warn("Cannot stat {}".format(exc.filename))
>
> def get_tree_size(path):
>     total = 0
>     for entry in os.scandir(path, info='lstat', onerror=log_err):
>         if entry.is_dir:
>             total += get_tree_size(entry.full_name)
>         else:
>             total += entry.lstat.st_size
>     return total
>
> On this basis, #2 wins.

That's a pretty nice comparison, and you're right, onerror handling is
nicer here.

> However, I'm slightly uncomfortable using the
> filename attribute of the exception in the logging, as there is
> nothing in the docs saying that this will give a full pathname. I'd
> hate to see "Unable to stat __init__.py"!!!

Huh, you're right. I think this should be documented in os.walk() too.
I think it should be the full filename (is it currently?).

> So maybe the onerror function should also receive the DirEntry object
> - which will only have the name and full_name attributes, but that's
> all that is needed.

That's an interesting idea -- though enough of a deviation from
os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just
document that the onerror exception .filename is the full path name.

One issue with option #2 that I just realized -- does scandir yield
the entry at all if there's a stat error? It can't really, because the
caller will except the .lstat attribute to be set (assuming he asked
for type='lstat') but it won't be. Is effectively removing these
entries just because the stat failed a problem? I kind of think it is.
If so, is there a way to solve it with option #2?

> OK, looks like option #2 is now my preferred option. My gut instinct
> still rebels over an API that deliberately throws information away in
> the default case, even though there is now an option to ask it to keep
> that information, but I see the logic and can learn to live with it.

In terms of throwing away info "in the default case" -- it's simply a
case of getting what you ask for. :-) Worst case, you'll write your
code and test it, it'll fail hard on any system, you'll fix it
immediately, and then it'll work on any system.

-Ben

From p.f.moore at gmail.com  Wed Jul  9 15:30:32 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 9 Jul 2014 14:30:32 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
Message-ID: 

On 9 July 2014 14:22, Ben Hoyt  wrote:
>> So maybe the onerror function should also receive the DirEntry object
>> - which will only have the name and full_name attributes, but that's
>> all that is needed.
>
> That's an interesting idea -- though enough of a deviation from
> os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just
> document that the onerror exception .filename is the full path name.

But the onerror exception will come from the lstat call, so it'll be a
raw OSError (unless scandir modifies it, which may be what you're
thinking of). And if so, aren't we at the mercy of what the OS module
gives us? That's why I said we can't guarantee it. I looked at the
documentation of OSError (in "Built In Exceptions"), and all it says
is "the filename" (unqualified). I'd expect that to be "whatever got
passed to the underlying OS API" - which may well be an absolute
pathname if we're lucky, but who knows? (I'd actually prefer it if
OSError guaranteed a full pathname, but that's a bigger issue...)

Paul

From ethan at stoneleaf.us  Wed Jul  9 15:17:40 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 06:17:40 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
Message-ID: <53BD40F4.8020009@stoneleaf.us>

On 07/09/2014 05:48 AM, Ben Hoyt wrote:
>
> So how about tweaking option #2 a tiny bit more to this:
>
> def scandir(path='.', info=None, onerror=None): ...
>
> * if info is None (the default), only the .name and .full_name
> attributes are present
> * if info is 'type', scandir ensures the is_dir/is_file/is_symlink
> attributes are present and either True or False
> * if info is 'lstat', scandir additionally ensures a .lstat is present
> and is a full stat_result object
> * if info is 'os', scandir returns the attributes the OS provides
> (everything on Windows, only is_X -- most of the time -- on POSIX)

I would rather have the default for info be 'os': cross-platform is good, but there is no reason to force it on some 
poor script that is meant to run on a local machine and will never leave it.


> * if onerror is not None and errors occur during any internal lstat()
> call, onerror(exc) is called with the OSError exception object

As Paul mentioned, 'onerror(exc, DirEntry)' would be better.


> Further point -- because the is_dir/is_file/is_symlink attributes are
> booleans, it would be very bad for them to be present but None if you
> didn't ask for (or the OS didn't return) the type information. Because
> then "if entry.is_dir:" would be None and your code would think it
> wasn't a directory, when actually you don't know. For this reason, all
> attributes should fail with AttributeError if not fetched.

Fair point, and agreed.

--
~Ethan~

From ethan at stoneleaf.us  Wed Jul  9 15:41:04 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 06:41:04 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
Message-ID: <53BD4670.9080100@stoneleaf.us>

On 07/09/2014 06:22 AM, Ben Hoyt wrote:
>
> One issue with option #2 that I just realized -- does scandir yield the entry at all if there's a stat error? It
> can't really, because the caller will expect the .lstat attribute to be set (assuming he asked for type='lstat') but
> it won't be. Is effectively removing these entries just because the stat failed a problem? I kind of think it is. If
> so, is there a way to solve it with option #2?

Leave it up to the onerror handler.  If it returns None, skip yielding the entry, otherwise yield whatever it returned
-- which also means the error handler should be able to set fields on the DirEntry:

   def log_err(exc, entry):
       logger.warn("Cannot stat {}".format(exc.filename))
       entry.lstat.st_size = 0
       return True

   def get_tree_size(path):
       total = 0
       for entry in os.scandir(path, info='lstat', onerror=log_err):
           if entry.is_dir:
               total += get_tree_size(entry.full_name)
           else:
               total += entry.lstat.st_size
       return total

This particular example doesn't benefit much from the addition, but this way we don't have to guess what the programmer 
wants or needs to do in the case of failure.

--
~Ethan~

From ethan at stoneleaf.us  Wed Jul  9 16:41:11 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 07:41:11 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BD4670.9080100@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
Message-ID: <53BD5487.3060608@stoneleaf.us>

On 07/09/2014 06:41 AM, Ethan Furman wrote:
>
> Leave it up to the onerror handler.  If it returns None, skip yielding the entry, otherwise yield whatever it returned
> -- which also means the error handler should be able to set fields on the DirEntry:
>
>    def log_err(exc, entry):
>        logger.warn("Cannot stat {}".format(exc.filename))
>        entry.lstat.st_size = 0
>        return True

Blah.  Okay, either return the DirEntry (possibly modified), or have the log_err return entry instead of True.  (Now 
where is that caffeine??)

--
~Ethan~

From walter at livinglogic.de  Wed Jul  9 16:41:44 2014
From: walter at livinglogic.de (Walter =?utf-8?q?D=C3=B6rwald?=)
Date: Wed, 09 Jul 2014 16:41:44 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
Message-ID: 

On 8 Jul 2014, at 15:52, Ben Hoyt wrote:

> Hi folks,
>
> After some very good python-dev feedback on my first version of PEP
> 471, I've updated the PEP to clarify a few things and added various
> "Rejected ideas" subsections. Here's a link to the new version (I've
> also copied the full text below):
>
> http://legacy.python.org/dev/peps/pep-0471/ -- new PEP as HTML
> http://hg.python.org/peps/rev/0da4736c27e8 -- changes
>
> [...]
> Rejected ideas
> ==============
>
> [...]
> Return values being pathlib.Path objects
> ----------------------------------------
>
> With Antoine Pitrou's new standard library ``pathlib`` module, it
> at first seems like a great idea for ``scandir()`` to return instances
> of ``pathlib.Path``. However, ``pathlib.Path``'s ``is_X()`` and
> ``lstat()`` functions are explicitly not cached, whereas ``scandir``
> has to cache them by design, because it's (often) returning values
> from the original directory iteration system call.
>
> And if the ``pathlib.Path`` instances returned by ``scandir`` cached
> lstat values, but the ordinary ``pathlib.Path`` objects explicitly
> don't, that would be more than a little confusing.
>
> Guido van Rossum explicitly rejected ``pathlib.Path`` caching lstat in
> the context of scandir `here
> `_,
> making ``pathlib.Path`` objects a bad choice for scandir return
> values.

Can we at least make sure that attributes of DirEntry that have the same 
meaning as attributes of pathlib.Path have the same name?

> [...]

Servus,
    Walter

From victor.stinner at gmail.com  Wed Jul  9 17:05:33 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 9 Jul 2014 17:05:33 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
Message-ID: 

2014-07-09 15:12 GMT+02:00 Ben Hoyt :
>> Ok, so it means that your example grouping files per type, files and
>> directories, is also wrong. Or at least, it behaves differently than
>> os.walk(). You should put symbolic links to directories in the "dirs"
>> list too.
>>
>> if entry.is_dir():   # is_dir() checks os.lstat()
>>     dirs.append(entry)
>> elif entry.is_symlink() and os.path.isdir(entry):   # isdir() checks os.stat()
>>     dirs.append(entry)
>> else:
>>     non_dirs.append(entry)
>
> Yes, good call. I believe I'm doing this wrong in the scandir.py
> os.walk() implementation too -- hence this open issue:
> https://github.com/benhoyt/scandir/issues/4

The PEP says that DirEntry should mimic pathlib.Path, so I think that
DirEntry.is_dir() should work as os.path.isir(): if the entry is a
symbolic link, you should follow the symlink to get the status of the
linked file with os.stat().

"entry.is_dir() or (entry.is_symlink() and os.path.isdir(entry))"
looks wrong: why would you have to check is_dir() and isdir()?
Duplicating this check is error prone and not convinient.

Pseudo-code:
---
class DirEntry:
    def __init__(self, lstat=None, d_type=None):
        self._stat = None
        self._lstat = lstat
        self._d_type = d_type

    def stat(self):
        if self._stat is None:
            self._stat = os.stat(self.full_name)
        return self._stat

    def lstat(self):
        if self._lstat is None:
            self._lstat = os.lstat(self.full_name)
        return self._lstat

    def is_dir(self):
        if self._d_type is not None:
            if self._d_type == DT_DIR:
                return True
            if self._d_type != DT_LNK:
                return False
        else:
            lstat = self.lstat()
            if stat.S_ISDIR(lstat.st_mode):
                return True
            if not stat.S_ISLNK(lstat.st_mode):
                return False
        stat = self.stat()
        return stat.S_ISDIR(stat.st_mode)
---

DirEntry would be created with lstat (Windows) or d_type (Linux)
prefilled. is_dir() would only need to call os.stat() once for
symbolic links.

With this code, it becomes even more obvious why is_dir() is a method
and not a property ;-)

Victor

From p.f.moore at gmail.com  Wed Jul  9 17:26:48 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 9 Jul 2014 16:26:48 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
Message-ID: 

On 9 July 2014 16:05, Victor Stinner  wrote:
> The PEP says that DirEntry should mimic pathlib.Path, so I think that
> DirEntry.is_dir() should work as os.path.isir(): if the entry is a
> symbolic link, you should follow the symlink to get the status of the
> linked file with os.stat().

Would this not "break" the tree size script being discussed in the
other thread, as it would follow links and include linked directories
in the "size" of the tree?

As a Windows user with only a superficial understanding of how
symlinks should behave, I don't have any intuition as to what the
"right" answer should be. But I would say that the tree size code
we've been debating over there (which recurses if is_dir is true and
adds in st_size otherwise) should do whatever people would expect of a
function with that name, as it's a perfect example of something a
Windows user might write and expect it to work cross-platform. If it
doesn't much of the worrying over making sure scandir's API is
"cross-platform by default" is probably being wasted :-)

(Obviously the walk_tree function could be modified if needed, but
that's missing the point I'm trying to make :-))

Paul

From benhoyt at gmail.com  Wed Jul  9 17:29:21 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 11:29:21 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
Message-ID: 

>> The PEP says that DirEntry should mimic pathlib.Path, so I think that
>> DirEntry.is_dir() should work as os.path.isir(): if the entry is a
>> symbolic link, you should follow the symlink to get the status of the
>> linked file with os.stat().
>
> Would this not "break" the tree size script being discussed in the
> other thread, as it would follow links and include linked directories
> in the "size" of the tree?

Yeah, I agree. Victor -- I don't think the DirEntry is_X() methods (or
attributes) should mimic the link-following os.path.isdir() at all.
You want the type of the entry, not the type of the source.

Otherwise, as Paul says, you are essentially forced to follow links,
and os.walk(followlinks=False), which is the default, can't do the
right thing.

-Ben

From benhoyt at gmail.com  Wed Jul  9 17:35:26 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 11:35:26 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BD4670.9080100@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
Message-ID: 

>> One issue with option #2 that I just realized -- does scandir yield the
>> entry at all if there's a stat error? It
>> can't really, because the caller will expect the .lstat attribute to be
>> set (assuming he asked for type='lstat') but
>>
>> it won't be. Is effectively removing these entries just because the stat
>> failed a problem? I kind of think it is. If
>> so, is there a way to solve it with option #2?
>
>
> Leave it up to the onerror handler.  If it returns None, skip yielding the
> entry, otherwise yield whatever it returned
> -- which also means the error handler should be able to set fields on the
> DirEntry:
>
>   def log_err(exc, entry):
>       logger.warn("Cannot stat {}".format(exc.filename))
>       entry.lstat.st_size = 0
>       return True

This is an interesting idea, but it's just getting more and more
complex, and I'm guessing that being able to change the attributes of
DirEntry will make the C implementation more complex.

Also, I'm not sure it's very workable. For log_err above, you'd
actually have to do something like this, right?

def log_err(exc, entry):
    logger.warn("Cannot stat {}".format(exc.filename))
    entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
    return entry

Unless there's another simple way around this issue, I'm back to
loving the simplicity of option #1, which avoids this whole question.

-Ben

From p.f.moore at gmail.com  Wed Jul  9 19:10:29 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 9 Jul 2014 18:10:29 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
Message-ID: 

On 9 July 2014 14:22, Ben Hoyt  wrote:
> One issue with option #2 that I just realized -- does scandir yield
> the entry at all if there's a stat error? It can't really, because the
> caller will except the .lstat attribute to be set (assuming he asked
> for type='lstat') but it won't be. Is effectively removing these
> entries just because the stat failed a problem? I kind of think it is.
> If so, is there a way to solve it with option #2?

So the issue is that you need to do a stat but it failed. You have
"whatever the OS gave you", but can't get anything more. This is only
an issue on POSIX, where the original OS call doesn't give you
everything, so it's fine, those POSIX people can just learn to cope
with their broken OS, right? :-)

More seriously, why not just return a DirEntry that says it's a file
with a stat entry that's all zeroes? That seems pretty harmless. And
the onerror function will be called, so if it is inappropriate the
application can do something. Maybe it's worth letting onerror return
a boolean that says whether to skip the entry, but that's as far as
I'd bother going.

It's a close call, but I think option #2 still wins (just) for me.

Paul

From ethan at stoneleaf.us  Wed Jul  9 18:35:04 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 09:35:04 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
Message-ID: <53BD6F38.7090000@stoneleaf.us>

On 07/09/2014 08:35 AM, Ben Hoyt wrote:
>>> One issue with option #2 that I just realized -- does scandir yield the
>>> entry at all if there's a stat error? It
>>> can't really, because the caller will expect the .lstat attribute to be
>>> set (assuming he asked for type='lstat') but
>>>
>>> it won't be. Is effectively removing these entries just because the stat
>>> failed a problem? I kind of think it is. If
>>> so, is there a way to solve it with option #2?
>>
>>
>> Leave it up to the onerror handler.  If it returns None, skip yielding the
>> entry, otherwise yield whatever it returned
>> -- which also means the error handler should be able to set fields on the
>> DirEntry:
>>
>>    def log_err(exc, entry):
>>        logger.warn("Cannot stat {}".format(exc.filename))
>>        entry.lstat.st_size = 0
>>        return True
>
> This is an interesting idea, but it's just getting more and more
> complex, and I'm guessing that being able to change the attributes of
> DirEntry will make the C implementation more complex.
>
> Also, I'm not sure it's very workable. For log_err above, you'd
> actually have to do something like this, right?
>
> def log_err(exc, entry):
>      logger.warn("Cannot stat {}".format(exc.filename))
>      entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
>      return entry

I would imagine we would provide a helper function:

   def stat_result(st_size=0, st_atime=0, st_mtime=0, ...):
       return os.stat_result((st_size, st_atime, st_mtime, ...))

and then in onerror

       entry.lstat = stat_result()


> Unless there's another simple way around this issue, I'm back to
> loving the simplicity of option #1, which avoids this whole question.

Too simple is just as bad as too complex, and properly handling errors is rarely a simple task.  Either we provide a 
clean way to deal with errors in the API, or we force every user everywhere to come up with their own system.

Also, just because we provide it doesn't force people to use it, but if we don't provide it then people cannot use it.

To summarize the choice I think we are looking at:

   1) We provide a very basic tool that many will have to write wrappers
      around to get the desired behavior (choice 1)

   2) We provide a more advanced tool that, in many cases, can be used
      as-is, and is also fairly easy to extend to handle odd situations
     (choice 2)

More specifically, if we go with choice 1 (no built-in error handling, no mutable DirEntry), how would I implement 
choice 2?  Would I have to write my own CustomDirEntry object?

--
~Ethan~

From p.f.moore at gmail.com  Wed Jul  9 20:04:09 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 9 Jul 2014 19:04:09 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BD6F38.7090000@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
Message-ID: 

On 9 July 2014 17:35, Ethan Furman  wrote:
> More specifically, if we go with choice 1 (no built-in error handling, no
> mutable DirEntry), how would I implement choice 2?  Would I have to write my
> own CustomDirEntry object?

Having built-in error handling is, I think, a key point. That's where
#1 really falls down.

But a mutable DirEntry and/or letting onerror manipulate the result is
a lot more than just having a hook for being notified of errors. That
seems to me to be a step too far, in the current context.
Specifically, the tree size example doesn't need it.

Do you have a compelling use case that needs a mutable DirEntry? It
feels like YAGNI to me.

Paul

From ethan at stoneleaf.us  Wed Jul  9 19:38:38 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 10:38:38 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 
Message-ID: <53BD7E1E.6020700@stoneleaf.us>

On 07/09/2014 10:10 AM, Paul Moore wrote:
> On 9 July 2014 14:22, Ben Hoyt  wrote:
>> One issue with option #2 that I just realized -- does scandir yield
>> the entry at all if there's a stat error? It can't really, because the
>> caller will except the .lstat attribute to be set (assuming he asked
>> for type='lstat') but it won't be. Is effectively removing these
>> entries just because the stat failed a problem? I kind of think it is.
>> If so, is there a way to solve it with option #2?
>
> So the issue is that you need to do a stat but it failed. You have
> "whatever the OS gave you", but can't get anything more. This is only
> an issue on POSIX, where the original OS call doesn't give you
> everything, so it's fine, those POSIX people can just learn to cope
> with their broken OS, right? :-)

LOL


> More seriously, why not just return a DirEntry that says it's a file
> with a stat entry that's all zeroes? That seems pretty harmless. And
> the onerror function will be called, so if it is inappropriate the
> application can do something. Maybe it's worth letting onerror return
> a boolean that says whether to skip the entry, but that's as far as
> I'd bother going.

I could live with this -- we could enhance it the future fairly easily if we needed to.

--
~Ethan~

From ethan at stoneleaf.us  Wed Jul  9 20:29:50 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 11:29:50 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
Message-ID: <53BD8A1E.6090804@stoneleaf.us>

On 07/09/2014 11:04 AM, Paul Moore wrote:
> On 9 July 2014 17:35, Ethan Furman  wrote:
>> More specifically, if we go with choice 1 (no built-in error handling, no
>> mutable DirEntry), how would I implement choice 2?  Would I have to write my
>> own CustomDirEntry object?
>
> Having built-in error handling is, I think, a key point. That's where
> #1 really falls down.
>
> But a mutable DirEntry and/or letting onerror manipulate the result is
> a lot more than just having a hook for being notified of errors. That
> seems to me to be a step too far, in the current context.
> Specifically, the tree size example doesn't need it.
>
> Do you have a compelling use case that needs a mutable DirEntry? It
> feels like YAGNI to me.

Not at this point.  As I indicated in my reply to your response, as long as we have the onerror machinery now we can 
tweak it later if real-world use shows it would be beneficial.

--
~Ethan~

From benhoyt at gmail.com  Wed Jul  9 21:03:20 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 15:03:20 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BD6F38.7090000@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
Message-ID: 

This is just getting way too complex ... further thoughts below.

>> This is an interesting idea, but it's just getting more and more
>> complex, and I'm guessing that being able to change the attributes of
>> DirEntry will make the C implementation more complex.
>>
>> Also, I'm not sure it's very workable. For log_err above, you'd
>> actually have to do something like this, right?
>>
>> def log_err(exc, entry):
>>      logger.warn("Cannot stat {}".format(exc.filename))
>>      entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
>>      return entry
>
>
> I would imagine we would provide a helper function:
>
>   def stat_result(st_size=0, st_atime=0, st_mtime=0, ...):
>       return os.stat_result((st_size, st_atime, st_mtime, ...))
>
> and then in onerror
>
>       entry.lstat = stat_result()
>
>> Unless there's another simple way around this issue, I'm back to
>> loving the simplicity of option #1, which avoids this whole question.
>
>
> Too simple is just as bad as too complex, and properly handling errors is
> rarely a simple task.  Either we provide a clean way to deal with errors in
> the API, or we force every user everywhere to come up with their own system.
>
> Also, just because we provide it doesn't force people to use it, but if we
> don't provide it then people cannot use it.

So here's the ways in which option #2 is now more complicated than option #1:

1) it has an additional "info" argument, the values of which have to
be documented ('os', 'type', 'lstat', and what each one means)
2) it has an additional "onerror" argument, the signature of which and
fairly complicated return value is non-obvious and has to be
documented
3) it requires user modification of the DirEntry object, which needs
documentation, and is potentially hard to implement
4) because the DirEntry object now allows modification, you need a
stat_result() helper function to help you build your own stat values

I'm afraid points 3 and 4 here add way too much complexity.

Remind me why all this is better than the PEP 471 approach again? It
handles all of these problems, is very direct, and uses built-in
Python constructs (method calls and try/except error handling).

And it's also simple to document -- much simpler than the above 4
things, which could be a couple of pages in the docs. Here's the doc
required for the PEP 471 approach:

"Note about caching and error handling: The is_X() and lstat()
functions may perform an lstat() on first call if the OS didn't
already fetch this data when reading the directory. So if you need
fine-grained error handling, catch OSError exceptions around these
method calls. After the first call, the is_X() and lstat() functions
cache the value on the DirEntry."

-Ben

From ethan at stoneleaf.us  Wed Jul  9 21:17:43 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 12:17:43 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
Message-ID: <53BD9557.80709@stoneleaf.us>

On 07/09/2014 12:03 PM, Ben Hoyt wrote:
>
> So here's the ways in which option #2 is now more complicated than option #1:
>
> 1) it has an additional "info" argument, the values of which have to
> be documented ('os', 'type', 'lstat', and what each one means)
> 2) it has an additional "onerror" argument, the signature of which and
> fairly complicated return value is non-obvious and has to be
> documented
> 3) it requires user modification of the DirEntry object, which needs
> documentation, and is potentially hard to implement
> 4) because the DirEntry object now allows modification, you need a
> stat_result() helper function to help you build your own stat values
>
> I'm afraid points 3 and 4 here add way too much complexity.

I'm okay with dropping 3 and 4, and making the return from onerror being simply True to yield the entry, and False/None 
to skip it.  That should make implementation much easier, and documentation not too strenuous either.

--
~Ethan~

From benhoyt at gmail.com  Wed Jul  9 21:59:39 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 15:59:39 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BD9557.80709@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
Message-ID: 

>> 1) it has an additional "info" argument, the values of which have to
>> be documented ('os', 'type', 'lstat', and what each one means)
>> 2) it has an additional "onerror" argument, the signature of which and
>> fairly complicated return value is non-obvious and has to be
>> documented
>> 3) it requires user modification of the DirEntry object, which needs
>> documentation, and is potentially hard to implement
>> 4) because the DirEntry object now allows modification, you need a
>> stat_result() helper function to help you build your own stat values
>>
>> I'm afraid points 3 and 4 here add way too much complexity.
>
>
> I'm okay with dropping 3 and 4, and making the return from onerror being
> simply True to yield the entry, and False/None to skip it.  That should make
> implementation much easier, and documentation not too strenuous either.

That's definitely better in terms of complexity.

Other python-devers, please chime in with your thoughts or votes.

-Ben

From victor.stinner at gmail.com  Wed Jul  9 22:24:19 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 9 Jul 2014 22:24:19 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
Message-ID: 

2014-07-09 21:59 GMT+02:00 Ben Hoyt :
> Other python-devers, please chime in with your thoughts or votes.

Sorry, I didn't follow the whole discussion. IMO DirEntry must use
methods and you should not expose nor document which infos are already
provided by the OS or not. DirEntry should be a best-effort black-box
object providing an API similar to pathlib.Path. is_dir() may be fast?
fine, but don't say it in the documentation because Python must remain
portable and you should not write code specific to one specific
platform.

is_dir(), is_file(), is_symlink() and lstat() can fail as any other
Python function, no need to specialize them with custom error handler.
If you care, just use a very standard try/except.

I'm also against ensure_lstat=True or ideas like that fetching all
datas transparently in the generator. The behaviour would be too
different depending on the OS, and usually you don't need all
informations. And it raises errors in the generator, which is
something unusual, and difficult to handle (I don't like the onerror
callback).

Example where you may sometimes need is_dir(), but not always
---
for entry in os.scandir(path):
  if ignore_entry(entry.name):
     # this entry is not interesting, lstat_result is useless here
     continue
  if entry.is_dir():  # fetch required data if needed
     continue
  ...
---

I don't understand why you are all focused on handling os.stat() and
os.lstat() errors. See for example the os.walk() function which is an
old function (python 2.6!): it doesn't catch erros on isdir(), even if
it has an onerror parameter... It only handles errors on listdir().
IMO errors on os.stat() and os.lstat() are very rare under very
specific conditions. The most common case is that you can get the
status if you can list files.

Victor

From p.f.moore at gmail.com  Wed Jul  9 22:57:57 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 9 Jul 2014 21:57:57 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
Message-ID: 

On 9 July 2014 21:24, Victor Stinner  wrote:
> Example where you may sometimes need is_dir(), but not always
> ---
> for entry in os.scandir(path):
>   if ignore_entry(entry.name):
>      # this entry is not interesting, lstat_result is useless here
>      continue
>   if entry.is_dir():  # fetch required data if needed
>      continue
>   ...
> ---

That is an extremely good point, and articulates why I've always been
a bit uncomfortable with the whole ensure_stat idea.

> I don't understand why you are all focused on handling os.stat() and
> os.lstat() errors. See for example the os.walk() function which is an
> old function (python 2.6!): it doesn't catch erros on isdir(), even if
> it has an onerror parameter... It only handles errors on listdir().
> IMO errors on os.stat() and os.lstat() are very rare under very
> specific conditions. The most common case is that you can get the
> status if you can list files.

Personally, I'm only focused on it as a response to others feeling
it's important. I'm on Windows, where there are no extra stat calls,
so all *I* care about is having an API that deals with the use cases
others are concerned about without making it too hard for me to use it
on Windows where I don't have to worry about all this.

If POSIX users come to a consensus that error handling doesn't need
special treatment, I'm more than happy to go back to the PEP version.
(Much as previously happened with the race condition debate).

Paul

From ethan at stoneleaf.us  Wed Jul  9 23:28:07 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 14:28:07 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 
Message-ID: <53BDB3E7.5030004@stoneleaf.us>

On 07/09/2014 01:57 PM, Paul Moore wrote:
> On 9 July 2014 21:24, Victor Stinner wrote:
>>
>> Example where you may sometimes need is_dir(), but not always
>> ---
>> for entry in os.scandir(path):
>>    if ignore_entry(entry.name):
>>       # this entry is not interesting, lstat_result is useless here
>>       continue
>>    if entry.is_dir():  # fetch required data if needed
>>       continue
>>    ...
>
> That is an extremely good point, and articulates why I've always been
> a bit uncomfortable with the whole ensure_stat idea.

On a system which did not supply is_dir automatically I would write that as:

   for entry in os.scandir(path):  # info defaults to 'os', which is basically None in this case
       if ignore_entry(entry.name):
           continue
       if os.path.isdir(entry.full_name):
           # do something interesting

Not hard to read or understand, no time wasted in unnecessary lstat calls.

--
~Ethan~

From ethan at stoneleaf.us  Wed Jul  9 22:44:12 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 13:44:12 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
Message-ID: <53BDA99C.3020101@stoneleaf.us>

On 07/09/2014 01:24 PM, Victor Stinner wrote:
>
> Sorry, I didn't follow the whole discussion. IMO DirEntry must use
> methods and you should not expose nor document which infos are already
> provided by the OS or not. DirEntry should be a best-effort black-box
> object providing an API similar to pathlib.Path. is_dir() may be fast?
> fine, but don't say it in the documentation because Python must remain
> portable and you should not write code specific to one specific
> platform.

Okay, so using that logic we should head over to the os module and remove:

ctermid, getenv, getegid, geteuid, getgid, getgrouplist, getgroups, getpgid, getpgrp, getpriority, PRIO_PROCESS, 
PRIO_PGRP, PRIO_USER, getresuid, getresgid, getuid, initgroups, putenv, setegid, seteuid, setgid, setgroups, 
setpriority, setregid, setrusgid, setresuid, setreuid, getsid, setsid, setuid, unsetenv, fchmod, fchown, fdatasync, 
fpathconf, fstatvfs, ftruncate, lockf, F_LOCK, F_TLOCK, F_ULOCK, F_TEST, O_DSYNC, O_RSYNC, O_SYNC, O_NDELAY, O_NONBLOCK, 
O_NOCTTY, O_SHLOCK, O_EXLOCK, O_CLOEXEC, O_BINARY, O_NOINHERIT, O_SHORT_LIVED, O_TEMPORARY, O_RANDOM, O_SEQUENTIAL, 
O_TEXT, ...

Okay, I'm tired of typing, but that list is not even half-way through the os page, and those are all methods or 
attributes that are not available on either Windows or Unix or some flavors of Unix.

Oh, and all those upper-case attributes?  Yup, documented.  And when we don't document it ourselves we often refer 
readers to their system documentation because Python does not, in fact, return exactly the same results on all platforms 
-- particularly when calling into the OS.

--
~Ethan~

From benhoyt at gmail.com  Wed Jul  9 23:33:12 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 17:33:12 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BDB3E7.5030004@stoneleaf.us>
References: 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 
 <53BDB3E7.5030004@stoneleaf.us>
Message-ID: 

> On a system which did not supply is_dir automatically I would write that as:
>
>   for entry in os.scandir(path):  # info defaults to 'os', which is
> basically None in this case
>       if ignore_entry(entry.name):
>           continue
>       if os.path.isdir(entry.full_name):
>           # do something interesting
>
> Not hard to read or understand, no time wasted in unnecessary lstat calls.

No, but how do you know whether you're on "a system which did not
supply is_dir automatically"? The above is not cross-platform, or at
least, not efficient cross-platform, which defeats the whole point of
scandir -- the above is no better than listdir().

-Ben

From benhoyt at gmail.com  Wed Jul  9 23:42:07 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Wed, 9 Jul 2014 17:42:07 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BDA99C.3020101@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
Message-ID: 

I really don't understand why you *want* a worse, much less cross-platform API?

> Okay, so using that logic we should head over to the os module and remove:
>
> ctermid, getenv, getegid...
>
> Okay, I'm tired of typing, but that list is not even half-way through the os
> page, and those are all methods or attributes that are not available on
> either Windows or Unix or some flavors of Unix.

True, is this really the precedent we want to *aim for*. listdir() is
cross-platform, and it's relatively easy to make scandir()
cross-platform, so why not?

> Oh, and all those upper-case attributes?  Yup, documented.  And when we
> don't document it ourselves we often refer readers to their system
> documentation because Python does not, in fact, return exactly the same
> results on all platforms -- particularly when calling into the OS.

But again, why a worse, less cross-platform API when a simple,
cross-platform one is a method call away?

-Ben

From victor.stinner at gmail.com  Wed Jul  9 23:38:26 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Wed, 9 Jul 2014 23:38:26 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BDA99C.3020101@stoneleaf.us>
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
Message-ID: 

2014-07-09 22:44 GMT+02:00 Ethan Furman :
> On 07/09/2014 01:24 PM, Victor Stinner wrote:
>> Sorry, I didn't follow the whole discussion. IMO DirEntry must use
>> methods and you should not expose nor document which infos are already
>> provided by the OS or not. DirEntry should be a best-effort black-box
>> object providing an API similar to pathlib.Path. is_dir() may be fast?
>> fine, but don't say it in the documentation because Python must remain
>> portable and you should not write code specific to one specific
>> platform.
>
>
> Okay, so using that logic we should head over to the os module and remove: (...)

My comment was specific to the PEP 471, design of the DirEntry class.

Victor

From ethan at stoneleaf.us  Thu Jul 10 00:12:18 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 15:12:18 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
 
Message-ID: <53BDBE42.7050609@stoneleaf.us>

On 07/09/2014 02:42 PM, Ben Hoyt wrote:
>>
>> Okay, so using that [no platform specific] logic we should head over to the os module and remove:
>>
>> ctermid, getenv, getegid...
>>
>> Okay, I'm tired of typing, but that list is not even half-way through the os
>> page, and those are all methods or attributes that are not available on
>> either Windows or Unix or some flavors of Unix.
>
> True, is this really the precedent we want to *aim for*. listdir() is
> cross-platform,

and listdir has serious performance issues, which is why you developed scandir.

>> Oh, and all those [snipped] upper-case attributes?  Yup, documented.  And when we
>> don't document it ourselves we often refer readers to their system
>> documentation because Python does not, in fact, return exactly the same
>> results on all platforms -- particularly when calling into the OS.
>
> But again, why a worse, less cross-platform API when a simple,
> cross-platform one is a method call away?

For the same reason we don't use code that makes threaded behavior better, but kills the single thread application.

If the programmer would rather have consistency on all platforms rather than performance on the one being used, 
`info='lstat'` is the option to use.

I like the 'onerror' API better primarily because it gives a single point to deal with the errors.  This has at least a 
couple advantages:

   - less duplication of code: in the tree_size example, the error
     handling is duplicated twice

   - readablity: with the error handling in a separate routine, one
     does not have to jump around the try/except blocks looking for
     what happens if there are no errors

--
~Ethan~

From ethan at stoneleaf.us  Thu Jul 10 00:15:49 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 15:15:49 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
 
Message-ID: <53BDBF15.7020505@stoneleaf.us>

On 07/09/2014 02:38 PM, Victor Stinner wrote:
> 2014-07-09 22:44 GMT+02:00 Ethan Furman:
>> On 07/09/2014 01:24 PM, Victor Stinner wrote:
>>>
>>> [...] Python must remain
>>> portable and you should not write code specific to one specific
>>> platform.
>>
>>
>> Okay, so using that logic we should head over to the os module and remove: (...)
>
> My comment was specific to the PEP 471, design of the DirEntry class.

And my comment was to the point of there being methods/attributes/return values that /do/ vary by platform, and /are/ 
documented as such.  Even stat itself is not the same on Windows as posix.

--
~Ethan~

From ethan at stoneleaf.us  Thu Jul 10 00:50:28 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 15:50:28 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 
 <53BDB3E7.5030004@stoneleaf.us>
 
Message-ID: <53BDC734.9050901@stoneleaf.us>

On 07/09/2014 02:33 PM, Ben Hoyt wrote:
>>
>> On a system which did not supply is_dir automatically I would write that as:
>>
>>    for entry in os.scandir(path):
>>        if ignore_entry(entry.name):
>>            continue
>>        if os.path.isdir(entry.full_name):
>>            # do something interesting
>>
>> Not hard to read or understand, no time wasted in unnecessary lstat calls.
>
> No, but how do you know whether you're on "a system which did not
> supply is_dir automatically"? The above is not cross-platform, or at
> least, not efficient cross-platform, which defeats the whole point of
> scandir -- the above is no better than listdir().

Hit a directory with 100,000 entries and you'll change your mind.  ;)

Okay, so the issue is you /want/ to write an efficient, cross-platform routine...

hrmmm.....

thinking........

Okay, marry the two ideas together:

   scandir(path, info=None, onerror=None)
       """
       Return a generator that returns one directory entry at a time in a DirEntry object
       info:  None --> DirEntries will have whatever attributes the O/S provides
              'type'  --> DirEntries will already have at least the file/dir distinction
              'stat'  --> DirEntries will also already have stat information
       """

   DirEntry.is_dir()
      Return True if this is a directory-type entry; may call os.lstat if the cache is empty.

   DirEntry.is_file()
      Return True if this is a file-type entry; may call os.lstat if the cache is empty.

   DirEntry.is_symlink()
      Return True if this is a symbolic link; may call os.lstat if the cache is empty.

   DirEntry.stat
      Return the stat info for this link; may call os.lstat if the cache is empty.


This way both paradigms are supported.

--
~Ethan~

From python at mrabarnett.plus.com  Thu Jul 10 01:22:21 2014
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 10 Jul 2014 00:22:21 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BDC734.9050901@stoneleaf.us>
References: 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 
 <53BDB3E7.5030004@stoneleaf.us>
 
 <53BDC734.9050901@stoneleaf.us>
Message-ID: <53BDCEAD.8070809@mrabarnett.plus.com>

On 2014-07-09 23:50, Ethan Furman wrote:
> On 07/09/2014 02:33 PM, Ben Hoyt wrote:
>>>
>>> On a system which did not supply is_dir automatically I would write that as:
>>>
>>>    for entry in os.scandir(path):
>>>        if ignore_entry(entry.name):
>>>            continue
>>>        if os.path.isdir(entry.full_name):
>>>            # do something interesting
>>>
>>> Not hard to read or understand, no time wasted in unnecessary lstat calls.
>>
>> No, but how do you know whether you're on "a system which did not
>> supply is_dir automatically"? The above is not cross-platform, or at
>> least, not efficient cross-platform, which defeats the whole point of
>> scandir -- the above is no better than listdir().
>
> Hit a directory with 100,000 entries and you'll change your mind.  ;)
>
> Okay, so the issue is you /want/ to write an efficient, cross-platform routine...
>
> hrmmm.....
>
> thinking........
>
> Okay, marry the two ideas together:
>
>     scandir(path, info=None, onerror=None)
>         """
>         Return a generator that returns one directory entry at a time in a DirEntry object

Should that be "that yields one directory entry at a time"?

>         info:  None --> DirEntries will have whatever attributes the O/S provides
>                'type'  --> DirEntries will already have at least the file/dir distinction
>                'stat'  --> DirEntries will also already have stat information
>         """
>
>     DirEntry.is_dir()
>        Return True if this is a directory-type entry; may call os.lstat if the cache is empty.
>
>     DirEntry.is_file()
>        Return True if this is a file-type entry; may call os.lstat if the cache is empty.
>
>     DirEntry.is_symlink()
>        Return True if this is a symbolic link; may call os.lstat if the cache is empty.
>
>     DirEntry.stat
>        Return the stat info for this link; may call os.lstat if the cache is empty.
>
Why is "is_dir", et al, functions, but "stat" not a function?

>
> This way both paradigms are supported.
>


From ethan at stoneleaf.us  Thu Jul 10 01:26:01 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 16:26:01 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BDCEAD.8070809@mrabarnett.plus.com>
References: 
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 
 <53BDB3E7.5030004@stoneleaf.us>
 
 <53BDC734.9050901@stoneleaf.us> <53BDCEAD.8070809@mrabarnett.pl
 us.com>
Message-ID: <53BDCF89.5070007@stoneleaf.us>

On 07/09/2014 04:22 PM, MRAB wrote:
> On 2014-07-09 23:50, Ethan Furman wrote:
>>
>> Okay, marry the two ideas together:
>>
>>     scandir(path, info=None, onerror=None)
>>         """
>>         Return a generator that returns one directory entry at a time in a DirEntry object
>
> Should that be "that yields one directory entry at a time"?

Yes, thanks.

>>         info:  None --> DirEntries will have whatever attributes the O/S provides
>>                'type'  --> DirEntries will already have at least the file/dir distinction
>>                'stat'  --> DirEntries will also already have stat information
>>         """
>>
>>     DirEntry.is_dir()
>>        Return True if this is a directory-type entry; may call os.lstat if the cache is empty.
>>
>>     DirEntry.is_file()
>>        Return True if this is a file-type entry; may call os.lstat if the cache is empty.
>>
>>     DirEntry.is_symlink()
>>        Return True if this is a symbolic link; may call os.lstat if the cache is empty.
>>
>>     DirEntry.stat
>>        Return the stat info for this link; may call os.lstat if the cache is empty.
>
> Why is "is_dir", et al, functions, but "stat" not a function?

Good point.  Make stat a function as well.

--
~Ethan~

From victor.stinner at gmail.com  Thu Jul 10 02:15:58 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 10 Jul 2014 02:15:58 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
Message-ID: 

2014-07-09 17:29 GMT+02:00 Ben Hoyt :
>> Would this not "break" the tree size script being discussed in the
>> other thread, as it would follow links and include linked directories
>> in the "size" of the tree?

The get_tree_size() function in the PEP would use: "if not
entry.is_symlink() and entry.is_dir():".

Note: First I wrote "if entry.is_dir() and not entry.is_symlink():",
but this syntax is slower on Linux because is_dir() has to call
lstat().

Adding an optional keyword to DirEntry.is_dir() would allow to write
"if entry.is_dir(follow_symlink=False)", but it looks like a micro
optimization and as I said, I prefer to stick to pathlib.Path API
(which was already heavily discussed in its PEP). Anyway, this case is
rare (I explain that below), we should not worry too much about it.

> Yeah, I agree. Victor -- I don't think the DirEntry is_X() methods (or
> attributes) should mimic the link-following os.path.isdir() at all.
> You want the type of the entry, not the type of the source.

On UNIX, a symlink to a directory is expected to behave like a
directory. For example, in a file browser, you should enter in the
linked directory when you click on a symlink to a directory.

There are only a few cases where you want to handle symlinks
differently: archive (ex: tar), compute the size of a directory (ex:
du does not follow symlinks by default, du -L follows them), remove a
directory.

You should do a short poll in the Python stdlib and on the Internet to
check what is the most common check.

Examples of the Python stdlib:

- zipfile: listdir + os.path.isdir
- pkgutil: listdir + os.path.isdir
- unittest.loader: listdir + os.path.isdir and os.path.isfile
- http.server: listdir + os.path.isdir, it also uses os.path.islink: "
Append / for directories or @ for symbolic links "
- idlelib.GrepDialog: listdir + os.path.isdir
- compileall: listdir + os.path.isdir and "os.path.isdir(fullname) and
not os.path.islink(fullname)" <= don't follow symlinks to directories
- shutil (copytree): listdir + os.path.isdir + os.path.islink
- shutil (rmtree): listdir + os.lstat() + stat.S_ISDIR(mode) <= don't
follow symlinks to directories
- mailbox: listdir + os.path.isdir
- tabnanny: listdir + os.path.isdir
- os.walk: listdir + os.path.isdir + os.path.islink <= don't follow
symlinks to directories by default, but the behaviour is configurable
... but symlinks to directories are added to the "dirs" list (not all
symlinks, only symlinks to directories)
- setup.py: listdir + os.path.isfile

In this list of 12 examples, only compileall, shutil.rmtree and
os.walk check if entries are symlinks. compileall starts by checking
"if not os.path.isdir(fullname):" which follows symlinks. os.walk()
starts by checking "if os.path.isdir(name):" which follows symlinks. I
consider that only one case on 12 (8.3%) doesn't follow symlinks.

If entry.is_dir() doesn't follow symlinks, the other 91.7% will need
to be modified to use "if entry.is_dir() or (entry.is_link() and
os.path.is_dir(entry.full_name)):" to keep the same behaviour :-(

> Otherwise, as Paul says, you are essentially forced to follow links,
> and os.walk(followlinks=False), which is the default, can't do the
> right thing.

os.walk() and get_tree_size() are good users of scandir(), but they
are recursive functions. It means that you may handle symlinks
differently, os.walk() gives the choice to follow or not symlinks for
example.

Recursive functions are rare. The most common case is to list files of
a single directory and then filter files depending on various filters
(is a file? is a directory? match the file name? ...). In such use
case, you don't "care" of symlinks (you want to follow them).

Victor

From victor.stinner at gmail.com  Thu Jul 10 02:23:17 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 10 Jul 2014 02:23:17 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
Message-ID: 

2014-07-09 17:26 GMT+02:00 Paul Moore :
> On 9 July 2014 16:05, Victor Stinner  wrote:
>> The PEP says that DirEntry should mimic pathlib.Path, so I think that
>> DirEntry.is_dir() should work as os.path.isir(): if the entry is a
>> symbolic link, you should follow the symlink to get the status of the
>> linked file with os.stat().
>
> (...)
> As a Windows user with only a superficial understanding of how
> symlinks should behave, (...)

FYI Windows also supports symbolic links since Windows Vista. The
feature is unknown because it is restricted to the administrator
account. Try the "mklink" command in a terminal (cmd.exe) ;-)
http://en.wikipedia.org/wiki/NTFS_symbolic_link

... To be honest, I never created a symlink on Windows. But since it
is supported, you need to know it to write correctly your Windows
code.

(It's unrelated to "LNK" files.)

Victor

From Nikolaus at rath.org  Thu Jul 10 02:25:54 2014
From: Nikolaus at rath.org (Nikolaus Rath)
Date: Wed, 09 Jul 2014 17:25:54 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
 (Ben Hoyt's message of "Wed, 9 Jul 2014 15:03:20 -0400")
References: 
 
 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
Message-ID: <87pphe2h65.fsf@vostro.rath.org>

Ben Hoyt  writes:
> So here's the ways in which option #2 is now more complicated than option #1:
>
> 1) it has an additional "info" argument, the values of which have to
> be documented ('os', 'type', 'lstat', and what each one means)
> 2) it has an additional "onerror" argument, the signature of which and
> fairly complicated return value is non-obvious and has to be
> documented
> 3) it requires user modification of the DirEntry object, which needs
> documentation, and is potentially hard to implement
> 4) because the DirEntry object now allows modification, you need a
> stat_result() helper function to help you build your own stat values
>
> I'm afraid points 3 and 4 here add way too much complexity.

Points 3 and 4 are not required to go with option #2, option #2 merely
allows to implement points 3 and 4 at some point in the future if it
turns out to be desirable.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             ?Time flies like an arrow, fruit flies like a Banana.?

From ethan at stoneleaf.us  Thu Jul 10 02:38:11 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 09 Jul 2014 17:38:11 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
 
Message-ID: <53BDE073.2030208@stoneleaf.us>

On 07/09/2014 05:15 PM, Victor Stinner wrote:
> 2014-07-09 17:29 GMT+02:00 Ben Hoyt :
>>> Would this not "break" the tree size script being discussed in the
>>> other thread, as it would follow links and include linked directories
>>> in the "size" of the tree?
>
> The get_tree_size() function in the PEP would use: "if not
> entry.is_symlink() and entry.is_dir():".
>
> Note: First I wrote "if entry.is_dir() and not entry.is_symlink():",
> but this syntax is slower on Linux because is_dir() has to call
> lstat().

Wouldn't it only have to call lstat if the entry was, in fact, a link?


> There are only a few cases where you want to handle symlinks
> differently: archive (ex: tar), compute the size of a directory (ex:
> du does not follow symlinks by default, du -L follows them), remove a
> directory.

I agree with Victor here.  If the entry is a link I would want to know if it was a link to a directory or a link to a 
file.  If I care about not following sym links I can check is_symlink() (or whatever it's called).

--
~Ethan~

From victor.stinner at gmail.com  Thu Jul 10 02:57:00 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 10 Jul 2014 02:57:00 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
Message-ID: 

Oh, since I'm proposing to add a new stat() method to DirEntry, we can
optimize it. stat() can reuse lstat() result if the file is not a
symlink. It simplifies is_dir().

New pseudo-code:
---
class DirEntry:
    def __init__(self, path, name, lstat=None, d_type=None):
        self.name = name
        self.full_name = os.path.join(path, name)
        # lstat is known on Windows
        self._lstat = lstat
        if lstat is not None and not stat.S_ISLNK(lstat.st_mode):
            # On Windows, stat() only calls os.stat() for symlinks
            self._stat = lstat
        else:
            self._stat = None
        # d_type is known on UNIX
        if d_type != DT_UNKNOWN:
            self._d_type = d_type
        else:
           # DT_UNKNOWN is not a very useful information :-p
           self._d_type = None

    def stat(self):
        if self._stat is None:
            self._stat = os.stat(self.full_name)
        return self._stat

    def lstat(self):
        if self._lstat is None:
            self._lstat = os.lstat(self.full_name)
            if self._stat is None and not stat.S_ISLNK(self._lstat.st_mode):
                self._stat = lstat
        return self._lstat

    def is_dir(self):
        if self._d_type is not None:
            if self._d_type == DT_DIR:
                return True
            if self._d_type != DT_LNK:
                return False
        else:
            lstat = self.lstat()
            if stat.S_ISDIR(lstat.st_mode):
                return True
        stat = self.stat()   # if lstat() was already called, stat()
will only call os.stat() for symlink
        return stat.S_ISDIR(stat.st_mode)
---

The extra caching rules are complex, that's why I suggest to not document them.

In short: is_dir() only needs an extra syscall for symlinks, for other
file types it does not need any syscall.

Victor

From timothy.c.delaney at gmail.com  Thu Jul 10 02:58:57 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Thu, 10 Jul 2014 10:58:57 +1000
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
Message-ID: 

On 10 July 2014 10:23, Victor Stinner  wrote:

> 2014-07-09 17:26 GMT+02:00 Paul Moore :
> > On 9 July 2014 16:05, Victor Stinner  wrote:
> >> The PEP says that DirEntry should mimic pathlib.Path, so I think that
> >> DirEntry.is_dir() should work as os.path.isir(): if the entry is a
> >> symbolic link, you should follow the symlink to get the status of the
> >> linked file with os.stat().
> >
> > (...)
> > As a Windows user with only a superficial understanding of how
> > symlinks should behave, (...)
>
> FYI Windows also supports symbolic links since Windows Vista. The
> feature is unknown because it is restricted to the administrator
> account. Try the "mklink" command in a terminal (cmd.exe) ;-)
> http://en.wikipedia.org/wiki/NTFS_symbolic_link
>
> ... To be honest, I never created a symlink on Windows. But since it
> is supported, you need to know it to write correctly your Windows
> code.
>

Personally, I create them all the time on Windows - mainly via  the Link
Shell Extension <
http://schinagl.priv.at/nt/hardlinkshellext/linkshellextension.html>. It's
the easiest way to ensure that my directory structures are as I want them
whilst not worrying about where the files really are e.g. code on SSD,
GB+-sized data files on rusty metal, symlinks makes it look like it's the
same directory structure. Same thing can be done with junctions if you're
only dealing with directories, but symlinks work with files as well.

I work cross-platform, and have a mild preference for option #2 with
similar semantics on all platforms.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From 4kir4.1i at gmail.com  Thu Jul 10 04:28:09 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Thu, 10 Jul 2014 06:28:09 +0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
References: 
Message-ID: <87mwcic5hi.fsf@gmail.com>

Ben Hoyt  writes:
...
> ``scandir()`` yields a ``DirEntry`` object for each file and directory
> in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'``
> pseudo-directories are skipped, and the entries are yielded in
> system-dependent order. Each ``DirEntry`` object has the following
> attributes and methods:
>
> * ``name``: the entry's filename, relative to the ``path`` argument
>   (corresponds to the return values of ``os.listdir``)
>
> * ``full_name``: the entry's full path name -- the equivalent of
>   ``os.path.join(path, entry.name)``

I suggest renaming .full_name -> .path

.full_name might be misleading e.g., it implies that .full_name ==
abspath(.full_name) that might be false. The .path name has no such
associations.

The semantics of the the .path attribute is defined by these assertions::

    for entry in os.scandir(topdir):
        #NOTE: assume os.path.normpath(topdir) is not called to create .path
        assert entry.path == os.path.join(topdir, entry.name)
        assert entry.name == os.path.basename(entry.path)
        assert entry.name == os.path.relpath(entry.path, start=topdir)
        assert os.path.dirname(entry.path) == topdir
        assert (entry.path != os.path.abspath(entry.path) or
                os.path.isabs(topdir)) # it is absolute only if topdir is
        assert (entry.path != os.path.realpath(entry.path) or
                topdir == os.path.realpath(topdir)) # symlinks are not resolved
        assert (entry.path != os.path.normcase(entry.path) or
                topdir == os.path.normcase(topdir)) # no case-folding,
                                                    # unlike PureWindowsPath


...
> * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never
>   requires a system call on Windows, and usually doesn't on POSIX
>   systems

I suggest documenting the implicit follow_symlinks parameter for .is_X methods.

Note: lstat == partial(stat, follow_symlinks=False).

In particular, .is_dir() should probably use follow_symlinks=True by
default as suggested by Victor Stinner *if .is_dir() does it on Windows*

MSDN says: GetFileAttributes() does not follow symlinks.

os.path.isdir docs imply follow_symlinks=True: "both islink() and
isdir() can be true for the same path."


...
> Like the other functions in the ``os`` module, ``scandir()`` accepts
> either a bytes or str object for the ``path`` parameter, and returns
> the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the
> same type as ``path``. However, it is *strongly recommended* to use
> the str type, as this ensures cross-platform support for Unicode
> filenames.

Document when {e.name for e in os.scandir(path)} != set(os.listdir(path))
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

e.g., path can be an open file descriptor in os.listdir(path) since
Python 3.3 but the PEP doesn't mention it explicitly.

It has been discussed already e.g.,
https://mail.python.org/pipermail/python-dev/2014-July/135296.html

PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path (.full_name) attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 ).

Reject explicitly in PEP 471 the support for dir_fd parameter
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

aka the support for paths relative to directory descriptors.

Note: it is a *different* (but related) issue.


...
> Notes on exception handling
> ---------------------------
>
> ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods
> rather than attributes or properties, to make it clear that they may
> not be cheap operations, and they may do a system call. As a result,
> these methods may raise ``OSError``.
>
> For example, ``DirEntry.lstat()`` will always make a system call on
> POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a
> ``stat()`` system call on such systems if ``readdir()`` returns a
> ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under
> certain conditions or on certain file systems.
>
> For this reason, when a user requires fine-grained error handling,
> it's good to catch ``OSError`` around these method calls and then
> handle as appropriate.
>

I suggest documenting that next(os.scandir()) may raise OSError

e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir

Also, document whether os.scandir() itself may raise OSError (whether
opendir or other OS functions may be called before the first yield).


...
os.scandir() should allow the explicit cleanup
++++++++++++++++++++++++++++++++++++++++++++++

::
    with closing(os.scandir()) as entries:
        for _ in entries:
            break

entries.close() is called that frees the resources if necessary, to
*avoid relying on garbage-collection for managing file descriptors*
(check whether it is consistent with the .close() method from the
generator protocol e.g., it might be already called on the exit from the
loop whether an exception happens or not without requiring the
with-statement (I don't know)). *It should be possible to limit the
resource life-time on non-refcounting Python implementations.*

 os.scandir() object may support the context manager protocol explicitly::

    with os.scandir() as entries:
        for _ in entries:
            break

``.__exit__`` method may just call ``.close`` method.


...
> Rejected ideas
> ==============
>
>
> Naming
> ------
>
> The only other real contender for this function's name was
> ``iterdir()``. However, ``iterX()`` functions in Python (mostly found
> in Python 2) tend to be simple iterator equivalents of their
> non-iterator counterparts. For example, ``dict.iterkeys()`` is just an
> iterator version of ``dict.keys()``, but the objects returned are
> identical. In ``scandir()``'s case, however, the return values are
> quite different objects (``DirEntry`` objects vs filename strings), so
> this should probably be reflected by a difference in name -- hence
> ``scandir()``.
>
> See some `relevant discussion on python-dev
> `_.
>

- os.scandir() name is inconsistent with the pathlib module.
  pathlib.Path has `.iterdir() method
  `_
  that generates Path instances i.e., the argument that iterdir()
  should return strings is not valid

- os.scandir() name conflicts with POSIX. POSIX already has `scandir()
  function
  `_
  Most functions in the os module are thin-wrappers of their
  corresponding POSIX analogs

In principle, POSIX scandir(path, &entries, sel, compar) is emulated
using::

    entries = sorted(filter(sel, os.scandir(path)),
                     key=cmp_to_key(compar))

so that the above code snippet could be provided in the docs. We may
say that os.scandir is a pythonic analog of the POSIX function and
therefore there is no conflict even if os.scandir doesn't use POSIX
scandir function in its implementation. If we can't say it then a
*different name/module should be used to allow adding POSIX-compatible
os.scandir() in the future*.


--
Akira


From ncoghlan at gmail.com  Thu Jul 10 06:02:01 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 9 Jul 2014 23:02:01 -0500
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BDBE42.7050609@stoneleaf.us>
References: 
 <53BC4060.5090805@stoneleaf.us>
 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
 
 <53BDBE42.7050609@stoneleaf.us>
Message-ID: 

On 9 Jul 2014 17:14, "Ethan Furman"  wrote:
>
> On 07/09/2014 02:42 PM, Ben Hoyt wrote:
>>>
>>>
>>> Okay, so using that [no platform specific] logic we should head over to
the os module and remove:
>>>
>>>
>>> ctermid, getenv, getegid...
>>>
>>> Okay, I'm tired of typing, but that list is not even half-way through
the os
>>> page, and those are all methods or attributes that are not available on
>>> either Windows or Unix or some flavors of Unix.
>>
>>
>> True, is this really the precedent we want to *aim for*. listdir() is
>> cross-platform,
>
>
> and listdir has serious performance issues, which is why you developed
scandir.
>
>>> Oh, and all those [snipped] upper-case attributes?  Yup, documented.
 And when we
>>>
>>> don't document it ourselves we often refer readers to their system
>>> documentation because Python does not, in fact, return exactly the same
>>> results on all platforms -- particularly when calling into the OS.
>>
>>
>> But again, why a worse, less cross-platform API when a simple,
>> cross-platform one is a method call away?
>
>
> For the same reason we don't use code that makes threaded behavior
better, but kills the single thread application.
>
> If the programmer would rather have consistency on all platforms rather
than performance on the one being used, `info='lstat'` is the option to use.
>
> I like the 'onerror' API better primarily because it gives a single point
to deal with the errors.  This has at least a couple advantages:
>
>   - less duplication of code: in the tree_size example, the error
>     handling is duplicated twice
>
>   - readablity: with the error handling in a separate routine, one
>     does not have to jump around the try/except blocks looking for
>     what happens if there are no errors

The "onerror" approach can also deal with readdir failing, which the PEP
currently glosses over.

I'm somewhat inclined towards the current approach in the PEP, but I'd like
to see an explanation of two aspects:

1. How a scandir variant with an 'onerror' option could be implemented
given the version in the PEP

2. How the existing scandir module handles the 'onerror' parameter to its
directory walking function

Regards,
Nick.

>
> --
> ~Ethan~
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From p.f.moore at gmail.com  Thu Jul 10 09:04:53 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 10 Jul 2014 08:04:53 +0100
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
Message-ID: 

On 10 July 2014 01:23, Victor Stinner  wrote:
>> As a Windows user with only a superficial understanding of how
>> symlinks should behave, (...)
>
> FYI Windows also supports symbolic links since Windows Vista. The
> feature is unknown because it is restricted to the administrator
> account. Try the "mklink" command in a terminal (cmd.exe) ;-)
> http://en.wikipedia.org/wiki/NTFS_symbolic_link
>
> ... To be honest, I never created a symlink on Windows. But since it
> is supported, you need to know it to write correctly your Windows
> code.

I know how symlinks *do* behave, and I know how Windows supports them.
What I meant was that, because Windows typically makes little use of
symlinks, I have little or no intuition of what feels natural to
people using an OS where symlinks are common.

As someone (Tim?) pointed out later in the thread,
FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor
do the dirent entries on Unix). So whether or not it's "natural", the
"free" functionality provided by the OS is that of lstat, not that of
stat. Presumably because it's possible to build symlink-following code
on top of non-following code, but not the other way around.

Paul

From timothy.c.delaney at gmail.com  Thu Jul 10 09:35:19 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Thu, 10 Jul 2014 17:35:19 +1000
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
 
Message-ID: 

On 10 July 2014 17:04, Paul Moore  wrote:

> On 10 July 2014 01:23, Victor Stinner  wrote:
> >> As a Windows user with only a superficial understanding of how
> >> symlinks should behave, (...)
> >
> > FYI Windows also supports symbolic links since Windows Vista. The
> > feature is unknown because it is restricted to the administrator
> > account. Try the "mklink" command in a terminal (cmd.exe) ;-)
> > http://en.wikipedia.org/wiki/NTFS_symbolic_link
> >
> > ... To be honest, I never created a symlink on Windows. But since it
> > is supported, you need to know it to write correctly your Windows
> > code.
>
> I know how symlinks *do* behave, and I know how Windows supports them.
> What I meant was that, because Windows typically makes little use of
> symlinks, I have little or no intuition of what feels natural to
> people using an OS where symlinks are common.
>
> As someone (Tim?) pointed out later in the thread,
> FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor
> do the dirent entries on Unix).


It wasn't me (I didn't even see it - lost in the noise).


> So whether or not it's "natural", the
> "free" functionality provided by the OS is that of lstat, not that of
> stat. Presumably because it's possible to build symlink-following code
> on top of non-following code, but not the other way around.
>

For most uses the most natural thing is to follow symlinks (e.g. opening a
symlink in a text editor should open the target). However, I think not
following symlinks by default is better approach for exactly the reason
Paul has noted above.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From martin at v.loewis.de  Thu Jul 10 09:41:10 2014
From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 10 Jul 2014 09:41:10 +0200
Subject: [Python-Dev] buildbot.python.org down again?
In-Reply-To: 
References: 
 
 
 
 <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io>
 
Message-ID: <53BE4396.8010409@v.loewis.de>

Am 08.07.14 16:48, schrieb Guido van Rossum:
> May the true owner of buildbot.python.org 
> stand up!

Well, I think that's me (atleast by my definition of "true owner").
I requested that the machine be set up, and I deployed the software
that is running the service (it was also me who originally introduced
buildbot to the Python project).

On the other hand, I'm not at all "in charge" of that infrastructure
piece. I haven't logged into the machine in many months, and it's
Antoine who currently maintains its configuration. So I don't want to
be pinged when the machine is down.

> (But I do think there may well not be anyone who feels they own it. And
> that's a problem for its long term viability.)

I don't think that's actually the case for "ownership". But then, I also
think that ownership is not a very important concept for pydotorg. Most
owners will likely agree that they lose their right to have a say in it
when they stop maintaining the piece that they own.

> Generally speaking, as an organization we should set up a process for
> managing ownership of *all* infrastructure in a uniform way. I don't
> mean to say that we need to manage all infrastructure uniformly, just
> that we need to have a process for identifying and contacting the
> owner(s) for each piece of infrastructure, as well as collecting other
> information that people besides the owners might need to know. You can
> use a wiki page for that list for all I care, but have a process for
> what belongs there, how/when to update it, and even an owner for the
> wiki page!

Unfortunately, that plan keeps failing. Everybody agrees that such a
list would be useful, so everybody makes their own list. I was
maintaining such a list in the Python wiki for some time, until a
board member decided that a publically-visible inventory is not
appropriate, and it must be a password-protected wiki - where I now
keep forgetting where the wiki is, in the first place, let alone
remembering how to log in.

Regards,
Martin


From victor.stinner at gmail.com  Thu Jul 10 10:37:19 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Thu, 10 Jul 2014 10:37:19 +0200
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
 
Message-ID: 

2014-07-10 9:04 GMT+02:00 Paul Moore :
> As someone (Tim?) pointed out later in the thread,
> FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor
> do the dirent entries on Unix). So whether or not it's "natural", the
> "free" functionality provided by the OS is that of lstat, not that of
> stat. Presumably because it's possible to build symlink-following code
> on top of non-following code, but not the other way around.

DirEntry methods will remain free (no syscall) for directories and
regular files. One extra syscall will be needed only for symlinks,
which are more rare than other file types (for example, you wrote "
Windows typically makes little use of symlinks").

See my pseudo-code:
https://mail.python.org/pipermail/python-dev/2014-July/135439.html

On Windows, _lstat and _stat attributes will be filled directly in the
constructor on Windows for regular files and directories.

Victor

From ncoghlan at gmail.com  Thu Jul 10 15:58:57 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 10 Jul 2014 08:58:57 -0500
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
 
 
Message-ID: 

On 10 Jul 2014 03:39, "Victor Stinner"  wrote:
>
> 2014-07-10 9:04 GMT+02:00 Paul Moore :
> > As someone (Tim?) pointed out later in the thread,
> > FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor
> > do the dirent entries on Unix). So whether or not it's "natural", the
> > "free" functionality provided by the OS is that of lstat, not that of
> > stat. Presumably because it's possible to build symlink-following code
> > on top of non-following code, but not the other way around.
>
> DirEntry methods will remain free (no syscall) for directories and
> regular files. One extra syscall will be needed only for symlinks,
> which are more rare than other file types (for example, you wrote "
> Windows typically makes little use of symlinks").

The info we want for scandir is that of the *link itself*. That makes it
easy to implement things like the "followlinks" flag of os.walk. The *far
end* of the link isn't relevant at this level.

The docs just need to be clear that DirEntry objects always match lstat(),
never stat().

Cheers,
Nick.

>
> See my pseudo-code:
> https://mail.python.org/pipermail/python-dev/2014-July/135439.html
>
> On Windows, _lstat and _stat attributes will be filled directly in the
> constructor on Windows for regular files and directories.
>
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From benhoyt at gmail.com  Thu Jul 10 16:19:28 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Thu, 10 Jul 2014 10:19:28 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
 
 
 
Message-ID: 

>> DirEntry methods will remain free (no syscall) for directories and
>> regular files. One extra syscall will be needed only for symlinks,
>> which are more rare than other file types (for example, you wrote "
>> Windows typically makes little use of symlinks").
>
> The info we want for scandir is that of the *link itself*. That makes it
> easy to implement things like the "followlinks" flag of os.walk. The *far
> end* of the link isn't relevant at this level.
>
> The docs just need to be clear that DirEntry objects always match lstat(),
> never stat().

Yeah, I agree with this. It makes the function (and documentation and
implementation) quite a lot simpler to understand.

scandir() is a lowish-level function which deals with the directory
entries themselves, and mirrors both Windows FindNextFile and POSIX
readdir() in that. If the user wants follow-links behaviour, they can
easily call os.stat() themselves. If this is clearly documented that
seems much simpler to me (and it also seems implicit to me in the fact
that you're calling is_dir() on the *entry*).

Otherwise we might as well go down the route of -- the objects
returned are just like pathlib.Path(), but with stat() and lstat()
cached on first use.

-Ben

From ethan at stoneleaf.us  Thu Jul 10 19:53:45 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 10 Jul 2014 10:53:45 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 
 
 
 
 
 
 
 
 
 
 
 
Message-ID: <53BED329.5020005@stoneleaf.us>

On 07/10/2014 06:58 AM, Nick Coghlan wrote:
>
> The info we want for scandir is that of the *link itself*. That makes it
> easy to implement things like the "followlinks" flag of os.walk. The
>  *far end* of the link isn't relevant at this level.

This also mirrors listdir, correct?  scandir is simply* returning something smarter than a string.

> The docs just need to be clear that DirEntry objects always match lstat(), never stat().

Agreed.

--
~Ethan~

* As well as being a less resource-intensive generator.  :)

From breamoreboy at yahoo.co.uk  Thu Jul 10 20:59:11 2014
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Thu, 10 Jul 2014 19:59:11 +0100
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
Message-ID: 

I'm just curious as to why there are 54 open issues after both of these 
PEPs have been accepted and 384 is listed as finished.  Did we hit some 
unforeseen technical problem which stalled development?

For these and any other open issues if you need some Windows testing 
doing please feel free to put me on the nosy list and ask for a test run.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



From brett at python.org  Thu Jul 10 21:59:37 2014
From: brett at python.org (Brett Cannon)
Date: Thu, 10 Jul 2014 19:59:37 +0000
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
References: 
Message-ID: 

[for those that don't know, 3121 is extension module inti/finalization and
384 is the stable ABI]

On Thu Jul 10 2014 at 3:47:03 PM, Mark Lawrence 
wrote:

> I'm just curious as to why there are 54 open issues after both of these
> PEPs have been accepted and 384 is listed as finished.  Did we hit some
> unforeseen technical problem which stalled development?
>

No, the PEPs were fine and were accepted properly. A huge portion of the
open issues are from Robin Schreiber who as part of GSoC 2012 --
https://www.google-melange.com/gsoc/project/details/google/gsoc2012/robin_hood/5668600916475904
-- went through and updated the stdlib to follow the new practices
introduced in the two PEPs. Not sure if there was some policy decision made
that updating the code wasn't worth it or people simply didn't get around
to applying the patches.

-Brett


>
> For these and any other open issues if you need some Windows testing
> doing please feel free to put me on the nosy list and ask for a test run.
>
> --
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
>
> Mark Lawrence
>
> ---
> This email is free from viruses and malware because avast! Antivirus
> protection is active.
> http://www.avast.com
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From guido at python.org  Thu Jul 10 22:08:55 2014
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Jul 2014 13:08:55 -0700
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
 
Message-ID: 

I don't know the details, but I suspect that was the result of my general
guideline "don't start projects cleaning up lots of stdlib code just to
satisfy some new style rule or just to use a new API" -- which came from
hard-won experience where such a cleanup project introduced some new bugs
that weren't found by review nor by tests. Though that was admittedly a
long time. Still, such a project can really sap reviewer resources for
relatively little benefit.


On Thu, Jul 10, 2014 at 12:59 PM, Brett Cannon  wrote:

> [for those that don't know, 3121 is extension module inti/finalization and
> 384 is the stable ABI]
>
>
> On Thu Jul 10 2014 at 3:47:03 PM, Mark Lawrence 
> wrote:
>
>> I'm just curious as to why there are 54 open issues after both of these
>> PEPs have been accepted and 384 is listed as finished.  Did we hit some
>> unforeseen technical problem which stalled development?
>>
>
> No, the PEPs were fine and were accepted properly. A huge portion of the
> open issues are from Robin Schreiber who as part of GSoC 2012 --
> https://www.google-melange.com/gsoc/project/details/google/gsoc2012/robin_hood/5668600916475904
> -- went through and updated the stdlib to follow the new practices
> introduced in the two PEPs. Not sure if there was some policy decision made
> that updating the code wasn't worth it or people simply didn't get around
> to applying the patches.
>
> -Brett
>
>
>>
>> For these and any other open issues if you need some Windows testing
>> doing please feel free to put me on the nosy list and ask for a test run.
>>
>> --
>> My fellow Pythonistas, ask not what our language can do for you, ask
>> what you can do for our language.
>>
>> Mark Lawrence
>>
>> ---
>> This email is free from viruses and malware because avast! Antivirus
>> protection is active.
>> http://www.avast.com
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
>> brett%40python.org
>>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From alexander.belopolsky at gmail.com  Fri Jul 11 01:57:39 2014
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Thu, 10 Jul 2014 19:57:39 -0400
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
Message-ID: 

On Thu, Jul 10, 2014 at 2:59 PM, Mark Lawrence 
wrote:

> I'm just curious as to why there are 54 open issues after both of these
> PEPs have been accepted and 384 is listed as finished.  Did we hit some
> unforeseen technical problem which stalled development?


I tried to bring some sanity to that effort by opening a "meta issue":

http://bugs.python.org/issue15787

My enthusiasm, however, vanished after I reviewed the refactoring for the
datetime module:

http://bugs.python.org/issue15390

My main objections are to following PEP 384
 (Stable ABI) within stdlib
modules.  I see little benefit for the stdlib (which is shipped fresh with
every new version of Python) from following those guidelines.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ethan at stoneleaf.us  Fri Jul 11 02:31:09 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 10 Jul 2014 17:31:09 -0700
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
 
Message-ID: <53BF304D.3030901@stoneleaf.us>

On 07/10/2014 04:57 PM, Alexander Belopolsky wrote:
> On Thu, Jul 10, 2014 at 2:59 PM, Mark Lawrence wrote:
>>
>> I'm just curious as to why there are 54 open issues after both of
>> these PEPs have been accepted and 384 is listed as finished.  Did
>>  we hit some unforeseen technical problem which stalled development?
>
> I tried to bring some sanity to that effort by opening a "meta issue":
>
> http://bugs.python.org/issue15787
>
> My enthusiasm, however, vanished after I reviewed the refactoring for the datetime module:
>
> http://bugs.python.org/issue15390
>
> My main objections are to following PEP 384  (Stable ABI) within stdlib
> modules.  I see little benefit for the stdlib (which is shipped fresh with every new version of Python) from following
> those guidelines.

If we aren't going to implement the changes (and I agree there's little value for the stdlib to do so), let's mark the 
issues as "won't fix" and close them.

And thanks, Mark, for bringing it up.

--
~Ethan~

From ethan at stoneleaf.us  Fri Jul 11 05:26:05 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 10 Jul 2014 20:26:05 -0700
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
 
 <53BDBE42.7050609@stoneleaf.us>
 
Message-ID: <53BF594D.9060007@stoneleaf.us>

On 07/09/2014 09:02 PM, Nick Coghlan wrote:
> On 9 Jul 2014 17:14, "Ethan Furman" wrote:
>>
>> I like the 'onerror' API better primarily because it gives a single
>> point to deal with the errors. [...]
>
> The "onerror" approach can also deal with readdir failing, which the
>  PEP currently glosses over.

Do we want this, though?  I can see an error handler for individual entries, but if one of the *dir commands fails that 
would seem to be fairly catastrophic.

> I'm somewhat inclined towards the current approach in the PEP, but I'd like to see an explanation of two aspects:
>
> 1. How a scandir variant with an 'onerror' option could be implemented given the version in the PEP

Here's a stab at it:

     def scandir_error(path, info=None, onerror=None):
         for entry in scandir(path):
             if info == 'type':
                 try:
                     entry.is_dir()
                 except OSError as exc:
                     if onerror is None:
                         raise
                     if not onerror(exc, entry):
                         continue
             elif info == 'lstat':
                 try:
                     entry.lstat()
                 except OSError as exc:
                     if onerror is None:
                         raise
                     if not onerror(exc, entry):
                         continue
             yield entry

Here it is again with an attempt to deal with opendir/readdir/closedir exceptions:

     def scandir_error(path, info=None, onerror=None):
         entries = scandir(path)
         try:
             entry = next(entries)
         except StopIteration:
             # pass it through
             raise
         except Exception as exc:
             if onerror is None:
                 raise
             if not onerror(exc, 'what else here?'):
                 # what do we do on False?
                 # what do we do on True?
         else:
             for entry in scandir(path):
                 if info == 'type':
                     try:
                         entry.is_dir()
                     except OSError as exc:
                         if onerror is None:
                             raise
                         if not onerror(exc, entry):
                             continue
                 elif info == 'lstat':
                     try:
                         entry.lstat()
                     except OSError as exc:
                         if onerror is None:
                             raise
                         if not onerror(exc, entry):
                             continue
                 yield entry


> 2. How the existing scandir module handles the 'onerror' parameter to its directory walking function

Here's the first third of it from the repo:

     def walk(top, topdown=True, onerror=None, followlinks=False):
         """Like os.walk(), but faster, as it uses scandir() internally."""
         # Determine which are files and which are directories
         dirs = []
         nondirs = []
         try:
             for entry in scandir(top):
                 if entry.is_dir():
                     dirs.append(entry)
                 else:
                     nondirs.append(entry)
         except OSError as error:
             if onerror is not None:
                 onerror(error)
             return
         ...

--
~Ethan~

From benhoyt at gmail.com  Fri Jul 11 13:12:59 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Fri, 11 Jul 2014 07:12:59 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: <53BF594D.9060007@stoneleaf.us>
References: 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
 
 <53BDBE42.7050609@stoneleaf.us>
 
 <53BF594D.9060007@stoneleaf.us>
Message-ID: 

[replying to python-dev this time]

>> The "onerror" approach can also deal with readdir failing, which the
>>  PEP currently glosses over.
>
>
> Do we want this, though?  I can see an error handler for individual entries,
> but if one of the *dir commands fails that would seem to be fairly
> catastrophic.

Very much agreed that this isn't necessary for just readdir/FindNext
errors. We've never had this level of detail before -- if listdir()
fails half way through (very unlikely) it just bombs with OSError and
you get no entries at all.

If you really really want this (again very unlikely), you can always
use call next() directly and catch OSError around that call.

-Ben

From stefan at bytereef.org  Fri Jul 11 13:46:27 2014
From: stefan at bytereef.org (Stefan Krah)
Date: Fri, 11 Jul 2014 13:46:27 +0200
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
 
Message-ID: <20140711114627.GA27927@sleipnir.bytereef.org>

Brett Cannon  wrote:
> No, the PEPs were fine and were accepted properly. A huge portion of the open
> issues are from Robin?Schreiber who as part of GSoC 2012 -- https://
> www.google-melange.com/gsoc/project/details/google/gsoc2012/robin_hood/
> 5668600916475904 -- went through and updated the stdlib to follow the new
> practices introduced in the two PEPs. Not sure if there was some policy
> decision made that updating the code wasn't worth it or people simply didn't
> get around to applying the patches.

Due to the frequent state lookups there is a performance problem though,
which is quite significant for _decimal.  Otherwise I think I would have
implemented the changes already.

http://bugs.python.org/issue15722


I think for speed sensitive applications it may be an idea to create
a new C function (METH_STATE flag) which gets the state passed in by
ceval.

Other than that, looking up the state inside the module but cache it (like
it's done for the _decimal context) also has reasonable performance.



Also I hit the same issues that Eli mentioned here a while ago:

https://mail.python.org/pipermail/python-dev/2013-August/127862.html



Stefan Krah



From status at bugs.python.org  Fri Jul 11 18:07:43 2014
From: status at bugs.python.org (Python tracker)
Date: Fri, 11 Jul 2014 18:07:43 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20140711160743.59D7856A3B@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2014-07-04 - 2014-07-11)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    4588 (-15)
  closed 29141 (+55)
  total  33729 (+40)

Open issues with patches: 2151 


Issues opened (24)
==================

#21918: Convert test_tools to directory
http://bugs.python.org/issue21918  opened by serhiy.storchaka

#21919: Changing cls.__bases__ must ensure proper metaclass inheritanc
http://bugs.python.org/issue21919  opened by abusalimov

#21922: PyLong: use GMP
http://bugs.python.org/issue21922  opened by h.venev

#21925: ResouceWarning sometimes doesn't display
http://bugs.python.org/issue21925  opened by msmhrt

#21927: BOM appears in stdin when using Powershell
http://bugs.python.org/issue21927  opened by jason.coombs

#21928: Incorrect reference to partial() in functools.wraps documentat
http://bugs.python.org/issue21928  opened by Dustin.Oprea

#21929: Rounding properly
http://bugs.python.org/issue21929  opened by jeroen1225

#21931: Nonsense errors reported by msilib.FCICreate for bad argument
http://bugs.python.org/issue21931  opened by Jeffrey.Armstrong

#21933: Allow the user to change font sizes with the text pane of turt
http://bugs.python.org/issue21933  opened by Lita.Cho

#21934: OpenBSD has no /dev/full device
http://bugs.python.org/issue21934  opened by Daniel.Dickman

#21935: Implement AUTH command in smtpd.
http://bugs.python.org/issue21935  opened by zvyn

#21937: IDLE interactive window doesn't display unsaved-indicator
http://bugs.python.org/issue21937  opened by rhettinger

#21939: IDLE - Test Percolator
http://bugs.python.org/issue21939  opened by sahutd

#21941: Clean up turtle TPen class
http://bugs.python.org/issue21941  opened by ingrid

#21944: Allow copying of CodecInfo objects
http://bugs.python.org/issue21944  opened by lehmannro

#21946: 'python -u' yields trailing carriage return '\r'  (Python2 for
http://bugs.python.org/issue21946  opened by msp

#21947: `Dis` module doesn't know how to disassemble generators
http://bugs.python.org/issue21947  opened by hakril

#21949: Document the Py_SIZE() macro.
http://bugs.python.org/issue21949  opened by gregory.p.smith

#21951: tcl test change crashes AIX
http://bugs.python.org/issue21951  opened by David.Edelsohn

#21952: fnmatch.py can appear in tracemalloc diffs
http://bugs.python.org/issue21952  opened by pitrou

#21953: pythonrun.c does not check std streams the same as fileio.c
http://bugs.python.org/issue21953  opened by steve.dower

#21955: ceval.c: implement fast path for integers with a single digit
http://bugs.python.org/issue21955  opened by haypo

#21956: Doc files deleted from repo are not deleted from docs.python.o
http://bugs.python.org/issue21956  opened by brandon-rhodes

#21957: ASCII Formfeed (FF) & ASCII Vertical Tab (VT) Have Hexadecimal
http://bugs.python.org/issue21957  opened by Zero



Most recent 15 issues with no replies (15)
==========================================

#21957: ASCII Formfeed (FF) & ASCII Vertical Tab (VT) Have Hexadecimal
http://bugs.python.org/issue21957

#21955: ceval.c: implement fast path for integers with a single digit
http://bugs.python.org/issue21955

#21951: tcl test change crashes AIX
http://bugs.python.org/issue21951

#21949: Document the Py_SIZE() macro.
http://bugs.python.org/issue21949

#21944: Allow copying of CodecInfo objects
http://bugs.python.org/issue21944

#21941: Clean up turtle TPen class
http://bugs.python.org/issue21941

#21937: IDLE interactive window doesn't display unsaved-indicator
http://bugs.python.org/issue21937

#21935: Implement AUTH command in smtpd.
http://bugs.python.org/issue21935

#21933: Allow the user to change font sizes with the text pane of turt
http://bugs.python.org/issue21933

#21931: Nonsense errors reported by msilib.FCICreate for bad argument
http://bugs.python.org/issue21931

#21928: Incorrect reference to partial() in functools.wraps documentat
http://bugs.python.org/issue21928

#21919: Changing cls.__bases__ must ensure proper metaclass inheritanc
http://bugs.python.org/issue21919

#21916: Create unit tests for turtle textonly
http://bugs.python.org/issue21916

#21909: PyLong_FromString drops const
http://bugs.python.org/issue21909

#21899: Futures are not marked as completed
http://bugs.python.org/issue21899



Most recent 15 issues waiting for review (15)
=============================================

#21953: pythonrun.c does not check std streams the same as fileio.c
http://bugs.python.org/issue21953

#21947: `Dis` module doesn't know how to disassemble generators
http://bugs.python.org/issue21947

#21944: Allow copying of CodecInfo objects
http://bugs.python.org/issue21944

#21941: Clean up turtle TPen class
http://bugs.python.org/issue21941

#21939: IDLE - Test Percolator
http://bugs.python.org/issue21939

#21935: Implement AUTH command in smtpd.
http://bugs.python.org/issue21935

#21934: OpenBSD has no /dev/full device
http://bugs.python.org/issue21934

#21925: ResouceWarning sometimes doesn't display
http://bugs.python.org/issue21925

#21922: PyLong: use GMP
http://bugs.python.org/issue21922

#21918: Convert test_tools to directory
http://bugs.python.org/issue21918

#21916: Create unit tests for turtle textonly
http://bugs.python.org/issue21916

#21914: Create unit tests for Turtle guionly
http://bugs.python.org/issue21914

#21907: Update Windows build batch scripts
http://bugs.python.org/issue21907

#21906: Tools\Scripts\md5sum.py doesn't work in Python 3.x
http://bugs.python.org/issue21906

#21905: RuntimeError in pickle.whichmodule  when sys.modules if mutate
http://bugs.python.org/issue21905



Top 10 most discussed issues (10)
=================================

#21597: Allow turtledemo code pane to get wider.
http://bugs.python.org/issue21597  26 msgs

#21922: PyLong: use GMP
http://bugs.python.org/issue21922  15 msgs

#21907: Update Windows build batch scripts
http://bugs.python.org/issue21907  11 msgs

#10289: Document magic methods called by built-in functions
http://bugs.python.org/issue10289   6 msgs

#21323: CGI HTTP server not running scripts from subdirectories
http://bugs.python.org/issue21323   6 msgs

#21765: Idle: make 3.x HyperParser work with non-ascii identifiers.
http://bugs.python.org/issue21765   5 msgs

#21880: IDLE: Ability to run 3rd party code checkers
http://bugs.python.org/issue21880   5 msgs

#21925: ResouceWarning sometimes doesn't display
http://bugs.python.org/issue21925   5 msgs

#21927: BOM appears in stdin when using Powershell
http://bugs.python.org/issue21927   5 msgs

#8231: Unable to run IDLE without write-access to home directory
http://bugs.python.org/issue8231   4 msgs



Issues closed (49)
==================

#5712: tkinter - askopenfilenames returns string instead of tuple in 
http://bugs.python.org/issue5712  closed by serhiy.storchaka

#9554: test_argparse.py: use new unittest features
http://bugs.python.org/issue9554  closed by berker.peksag

#9745: MSVC .pdb files not created by python 2.7 distutils
http://bugs.python.org/issue9745  closed by berker.peksag

#9822: windows batch files are dependent on cmd current directory
http://bugs.python.org/issue9822  closed by zach.ware

#9973: Sometimes buildbot fails to cleanup working copy
http://bugs.python.org/issue9973  closed by zach.ware

#10722: IDLE's subprocess didnit make connection ..... Python 2.7
http://bugs.python.org/issue10722  closed by terry.reedy

#11259: asynchat does not check if terminator is negative integer
http://bugs.python.org/issue11259  closed by haypo

#12523: 'str' object has no attribute 'more' [/usr/lib/python3.2/async
http://bugs.python.org/issue12523  closed by haypo

#14121: add a convenience C-API function for unpacking iterables
http://bugs.python.org/issue14121  closed by scoder

#15105: curses: wrong indentation
http://bugs.python.org/issue15105  closed by ned.deily

#17755: test_builtin assumes LANG=C
http://bugs.python.org/issue17755  closed by ned.deily

#18887: test_multiprocessing.test_connection failure on Python 2.7
http://bugs.python.org/issue18887  closed by neologix

#19279: UTF-7 decoder can produce inconsistent Unicode string
http://bugs.python.org/issue19279  closed by serhiy.storchaka

#19283: Need support to avoid Windows CRT compatibility issue.
http://bugs.python.org/issue19283  closed by loewis

#19593: Use specific asserts in importlib tests
http://bugs.python.org/issue19593  closed by serhiy.storchaka

#19650: test_multiprocessing_spawn.test_mymanager_context() crashed wi
http://bugs.python.org/issue19650  closed by haypo

#20639: pathlib.PurePath.with_suffix() does not allow removing the suf
http://bugs.python.org/issue20639  closed by pitrou

#21365: asyncio.Task reference misses the most important fact about it
http://bugs.python.org/issue21365  closed by haypo

#21437: document that asyncio.ProactorEventLoop doesn't support SSL
http://bugs.python.org/issue21437  closed by haypo

#21646: Add tests for turtle.ScrolledCanvas
http://bugs.python.org/issue21646  closed by ingrid

#21680: asyncio: document event loops
http://bugs.python.org/issue21680  closed by haypo

#21707: modulefinder uses wrong CodeType signature in .replace_paths_i
http://bugs.python.org/issue21707  closed by berker.peksag

#21714: Path.with_name can construct invalid paths
http://bugs.python.org/issue21714  closed by pitrou

#21732: SubprocessTestsMixin.test_subprocess_terminate() hangs on "AMD
http://bugs.python.org/issue21732  closed by haypo

#21743: Create tests for RawTurtleScreen
http://bugs.python.org/issue21743  closed by Lita.Cho

#21754: Add tests for turtle.TurtleScreenBase
http://bugs.python.org/issue21754  closed by ingrid

#21803: Remove macro indirections in complexobject
http://bugs.python.org/issue21803  closed by pitrou

#21806: Add tests for turtle.TPen class
http://bugs.python.org/issue21806  closed by ingrid

#21844: Fix HTMLParser in unicodeless build
http://bugs.python.org/issue21844  closed by ezio.melotti

#21881: python cannot parse tcl value
http://bugs.python.org/issue21881  closed by serhiy.storchaka

#21886: asyncio: Future.set_result() called on cancelled Future raises
http://bugs.python.org/issue21886  closed by python-dev

#21897: frame.f_locals causes segfault on Python >=3.4.1
http://bugs.python.org/issue21897  closed by pitrou

#21911: "IndexError: tuple index out of range" should include the requ
http://bugs.python.org/issue21911  closed by ezio.melotti

#21920: Fixed missing colon in the docs
http://bugs.python.org/issue21920  closed by berker.peksag

#21921: Example in asyncio event throws resource usage warning
http://bugs.python.org/issue21921  closed by python-dev

#21923: distutils.sysconfig.customize_compiler will try to read variab
http://bugs.python.org/issue21923  closed by ned.deily

#21924: Cannot import anything that imports tokenize from script calle
http://bugs.python.org/issue21924  closed by ned.deily

#21926: Bundle C++ compiler with Python on Windows
http://bugs.python.org/issue21926  closed by loewis

#21930: new assert raises syntax proposal
http://bugs.python.org/issue21930  closed by ezio.melotti

#21932: os.read() must use Py_ssize_t for the size parameter
http://bugs.python.org/issue21932  closed by haypo

#21936: test_future_exception_never_retrieved() of test_asyncio fails 
http://bugs.python.org/issue21936  closed by haypo

#21938: Py_XDECREF statement in gen_iternext()
http://bugs.python.org/issue21938  closed by pitrou

#21940: IDLE - Test WidgetRedirector
http://bugs.python.org/issue21940  closed by terry.reedy

#21942: pydoc source not displayed in browser on Windows
http://bugs.python.org/issue21942  closed by zach.ware

#21943: To duplicate a list has biyective properties, not inyective on
http://bugs.python.org/issue21943  closed by mark.dickinson

#21945: Wrong grammar in documentation
http://bugs.python.org/issue21945  closed by ezio.melotti

#21948: Documentation Typo
http://bugs.python.org/issue21948  closed by berker.peksag

#21950: import sqlite3 not running
http://bugs.python.org/issue21950  closed by alexganwd

#21954: str(b'text') returns "b'text'" in interpreter
http://bugs.python.org/issue21954  closed by ned.deily

From andreas.r.maier at gmx.de  Fri Jul 11 16:04:35 2014
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Fri, 11 Jul 2014 16:04:35 +0200
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>
Message-ID: <53BFEEF3.2060101@gmx.de>

Am 09.07.2014 03:48, schrieb Raymond Hettinger:
>
> On Jul 7, 2014, at 4:37 PM, Andreas Maier  wrote:
>
>> I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python.
>>
>> The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it.
>
> Once every few years, someone discovers IEEE-754, learns that NaNs
> aren't supposed to be equal to themselves and becomes inspired
> to open an old debate about whether the wreck Python in a effort
> to make the world safe for NaNs.  And somewhere along the way,
> people forget that practicality beats purity.
>
> Here are a few thoughts on the subject that may or may not add
> a little clarity ;-)
>
> * Python already has IEEE-754 compliant NaNs:
>
>         assert float('NaN') != float('NaN')
>
> * Python already has the ability to filter-out NaNs:
>
>         [x for x in container if not math.nan(x)]
>
> * In the numeric world, the most common use of NaNs is for
>    missing data (much like we usually use None).  The property
>    of not being equality to itself is primarily useful in
>    low level code optimized to run a calculation to completion
>    without running frequent checks for invalid results
>    (much like @n/a is used in MS Excel).
>
> * Python also lets containers establish their own invariants
>    to establish correctness, improve performance, and make it
>    possible to reason about our programs:
>
>             for x in c:
> 	       assert x in c
>
> * Containers like dicts and sets have always used the rule
>    that identity-implies equality.  That is central to their
>    implementation.  In particular, the check of interned
>    string keys relies on identity to bypass a slow
>    character-by-character comparison to verify equality.
>
> * Traditionally, a relation R is considered an equality
>    relation if it is reflexive, symmetric, and transitive:
>
>        R(x, x) -> True
>        R(x, y) -> R(y, x)
>        R(x, y) ^ R(y, z) -> R(x, z)
>
> * Knowingly or not, programs tend to assume that all of those
>    hold.  Test suites in particular assume that if you put
>    something in a container that assertIn() will pass.
>
> * Here are some examples of cases where non-reflexive objects
>    would jeopardize the pragmatism of being able to reason
>    about the correctness of programs:
>
>        s = SomeSet()
>        s.add(x)
>        assert x in s
>
>        s.remove(x)        # See collections.abc.Set.remove
>        assert not s
>
>        s.clear()          # See collections.abc.Set.clear
>        asset not s
>
> * What the above code does is up to the implementer of the
>    container.  If you use the Set ABC, you can choose to
>    implement __contains__() and discard() to use straight
>    equality or identity-implies equality.  Nothing prevents
>    you from making containers that are hard to reason about.
>
> * The builtin containers make the choice for identity-implies
>    equality so that it is easier to build fast, correct code.
>    For the most part, this has worked out great (dictionaries
>    in particular have had identify checks built-in from almost
>    twenty years).
>
> * Years ago, there was a debate about whether to add an __is__()
>    method to allow overriding the is-operator.  The push for the
>    change was the "pure" notion that "all operators should be
>    customizable".  However, the idea was rejected based on the
>    "practical" notions that it would wreck our ability to reason
>    about code, it slow down all code that used identity checks,
>    that library modules (ours and third-party) already made
>    deep assumptions about what "is" means, and that people would
>    shoot themselves in the foot with hard to find bugs.
>
> Personally, I see no need to make the same mistake by removing
> the identity-implies-equality rule from the built-in containers.
> There's no need to upset the apple cart for nearly zero benefit.

Containers delegate the equal comparison on the container to their 
elements; they do not apply identity-based comparison to their elements. 
At least that is the externally visible behavior.

Only the default comparison behavior implemented on type object follows 
the identity-implies-equality rule.

As part of my doc patch, I will upload an extension to the 
test_compare.py test suite, which tests all built-in containers with 
values whose order differs the identity order, and it shows that the 
value order and equality wins over identity, if implemented.

>
> IMO, the proposed quest for purity is misguided.
> There are many practical reasons to let the builtin
> containers continue work as the do now.

As I said, I can accept compatibility reasons. Plus, the argument 
brought up by Benjamin about the desire for the the 
identity-implies-equality rule as a default, with no corresponding rule 
for order comparison (and I added both to the doc patch).

Andy


From andreas.r.maier at gmx.de  Fri Jul 11 16:10:47 2014
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Fri, 11 Jul 2014 16:10:47 +0200
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <53BB69CB.6040407@stoneleaf.us>
References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us>
 <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de>
 <53BB3261.6080705@stoneleaf.us> <87bnt0ttfa.fsf@uwakimon.sk.tsukuba.ac.jp>
 <53BB69CB.6040407@stoneleaf.us>
Message-ID: <53BFF067.7060602@gmx.de>

Am 08.07.2014 05:47, schrieb Ethan Furman:
> On 07/07/2014 08:34 PM, Stephen J. Turnbull wrote:
>> Ethan Furman writes:
>>
>>> And what would be this 'sensible definition' [of value equality]?
>>
>> I think that's the wrong question.  I suppose Andreas's point is that
>> when the programmer doesn't provide a definition, there is no such
>> thing as a "sensible definition" to default to.  I disagree, but given
>> that as the point of discussion, asking what the definition is, is moot.
>
> He eventually made that point, but until he did I thought he meant that
> there was such a sensible default definition, he just wasn't sharing
> what he thought it might be with us.

My main point is that a sensible definition is up to the class designer, 
so (all freedom at hand) would prefer an exception as default. But that 
cannot be changed at this point, and maybe never will. And I don't 
intend to stir up that discussion again.

I dropped my other point about a better default comparison (i.e. one 
with a result, not an exceptioN). It is not easy to define one unless 
one comes to types such as sequences or integral types, and they in fact 
have defined their own customizations for comparison.

Bottom line: I'm fine with just a doc patch, and a testcase improvement :-)

Andy


From andreas.r.maier at gmx.de  Fri Jul 11 16:23:59 2014
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Fri, 11 Jul 2014 16:23:59 +0200
Subject: [Python-Dev] == on object tests identity in 3.x - uploaded doc patch
In-Reply-To: <53BFA64A.4080807@stoneleaf.us>
References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us>
 <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de>
 <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com>
 <53BB32B1.2090300@stoneleaf.us> <20140708015833.GD13014@ando>
 <53BB56B6.8030306@stoneleaf.us> <53BFA590.7000509@gmx.de>
 <53BFA64A.4080807@stoneleaf.us>
Message-ID: <53BFF37F.8000507@gmx.de>

Am 11.07.2014 10:54, schrieb Ethan Furman:
> On 07/11/2014 01:51 AM, Andreas Maier wrote:
>> I like the motivation provided by Benjamin and will work it into the
>> doc patch for issue #12067. The NaN special case
>> will also stay in.
>
> Cool -- you should nosy myself, D'Aprano, and Benjamin (at least) on
> that issue.

Done.

Plus, I have uploaded a patch (v8) to issue #12067, that reflects 
hopefully everything that was said (to the extent it was related to 
comparisons).

Andy


From ethan at stoneleaf.us  Fri Jul 11 22:54:40 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 11 Jul 2014 13:54:40 -0700
Subject: [Python-Dev] == on object tests identity in 3.x
In-Reply-To: <53BFEEF3.2060101@gmx.de>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
Message-ID: <53C04F10.8070509@stoneleaf.us>

On 07/11/2014 07:04 AM, Andreas Maier wrote:
> Am 09.07.2014 03:48, schrieb Raymond Hettinger:
>>
>> Personally, I see no need to make the same mistake by removing
>> the identity-implies-equality rule from the built-in containers.
>> There's no need to upset the apple cart for nearly zero benefit.
>
> Containers delegate the equal comparison on the container to their elements; they do not apply identity-based comparison
> to their elements. At least that is the externally visible behavior.

If that were true, then [NaN] == [NaN] would be False, and it is not.

Here is the externally visible behavior:

Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
--> NaN = float('nan')
--> NaN == NaN
False
--> [NaN] == [NaN]
True

--
~Ethan~

From nad at acm.org  Sat Jul 12 03:04:14 2014
From: nad at acm.org (Ned Deily)
Date: Fri, 11 Jul 2014 18:04:14 -0700
Subject: [Python-Dev] buildbot.python.org down again?
References: 
 
 
 
 <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io>
Message-ID: 

In article <62321D60-1197-47A5-B455-6E5200DD52F7 at stufft.io>,
 Donald Stufft  wrote:
> On Jul 8, 2014, at 12:58 AM, Nick Coghlan  wrote:
> > On 7 Jul 2014 10:47, "Guido van Rossum"  wrote:
> > > It would still be nice to know who "the appropriate persons" are. Too 
> > > much of our infrastructure seems to be maintained by house elves or the 
> > > ITA.
> > I volunteered to be the board's liaison to the infrastructure team, and 
> > getting more visibility around what the infrastructure *is* and how it's 
> > monitored and supported is going to be part of that. That will serve a 
> > couple of key purposes:
> > - making the points of escalation clearer if anything breaks or needs 
> > improvement (although "infrastructure at python.org" is a good default choice)
> > - making the current "todo" list of the infrastructure team more visible 
> > (both to calibrate resolution time expectations and to provide potential 
> > contributors an idea of what's involved)
> > Noah has already set up http://status.python.org/ to track service status, 
> > I can see about getting buildbot.python.org added to the list.
> We (the infrastructure team) were actually looking earlier about
> buildbot.python.org and we're not entirely sure who "owns" 
> buildbot.python.org.
> Unfortunately a lot of the *.python.org services are in a similar state where
> there is no clear owner. Generally we've not wanted to just step in and take
> over for fear of stepping on someones toes but it appears that perhaps
> buildbot.p.o has no owner?

In parallel to this discussion, I ran into Noah at a meeting the other 
day and we talked a bit about buildbot.python.org.  As Donald noted, it 
sounds like he and the infrastructure team are willing to add it to the 
list of machines they monitor and reboot, as long as they wouldn't be 
expected to administer the buildbot master itself.  I checked with 
Antoine and Martin and they are agreeable with that.  So I think there 
is general agreement that the infrastructure team can take on uptime 
monitoring and rebooting of buildbot.python.org and that Antoine/Martin 
would be the primary/secondary contacts/owners for other administrative 
issues.  Martin would also be happy if the infrastructure team could 
handle installing routine security fixes as well.  I'll leave it to the 
interested parties to discuss it further among themselves.

-- 
 Ned Deily,
 nad at acm.org


From eliben at gmail.com  Sat Jul 12 15:15:31 2014
From: eliben at gmail.com (Eli Bendersky)
Date: Sat, 12 Jul 2014 06:15:31 -0700
Subject: [Python-Dev] Semi-official read-only Github mirror of the
	CPython Mercurial repository
In-Reply-To: 
References: 
Message-ID: 

Just a quick update on this. I've finally found time to set up a VPS at
DigitalOcean of myself, and I'm moving the cronjob for updating the Github
mirrors to it. This lets me ramp up the update frequency. For now I'll set
it to every 4 hours, but in the future I may make it even more frequent.
Hopefully this will not overrun my bandwidth allocation :)

The CPython mirror (https://github.com/python/cpython) has been pretty
popular so far, with over 70 forks.

Eli



On Mon, Sep 30, 2013 at 6:09 AM, Eli Bendersky  wrote:

> Hi all,
>
> https://github.com/python/cpython is now live as a semi-official, *read
> only* Github mirror of the CPython Mercurial repository. Let me know if you
> have any problems/concerns.
>
> I still haven't decided how often to update it (considering either just N
> times a day, or maybe use a Hg hook for batching). Suggestions are welcome.
>
> The methodology I used to create it is via hg-fast-export. I also tried to
> pack and gc the git repo as much as possible before the initial Github push
> - it went down from almost ~2GB to ~200MB (so this is the size of a fresh
> clone right now).
>
> Eli
>
> P.S. thanks Jesse for the keys to https://github.com/python
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ncoghlan at gmail.com  Sat Jul 12 17:07:03 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Jul 2014 10:07:03 -0500
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
 
 <53BDBE42.7050609@stoneleaf.us>
 
 <53BF594D.9060007@stoneleaf.us>
 
Message-ID: 

On 11 Jul 2014 12:46, "Ben Hoyt"  wrote:
>
> [replying to python-dev this time]
>
> >> The "onerror" approach can also deal with readdir failing, which the
> >>  PEP currently glosses over.
> >
> >
> > Do we want this, though?  I can see an error handler for individual
entries,
> > but if one of the *dir commands fails that would seem to be fairly
> > catastrophic.
>
> Very much agreed that this isn't necessary for just readdir/FindNext
> errors. We've never had this level of detail before -- if listdir()
> fails half way through (very unlikely) it just bombs with OSError and
> you get no entries at all.
>
> If you really really want this (again very unlikely), you can always
> use call next() directly and catch OSError around that call.

Agreed - I think the PEP should point this out explicitly, and show that
the approach it takes offers a lot of flexibility in error handling from
"just let it fail", to a single try/catch around the whole loop, to
try/catch just around the operations that might call lstat(), to try/catch
around the individual iteration steps.

os.walk remains the higher level API that most code should be using, and
that has to retain the current listdir based behaviour (any error = ignore
all entries in that directory) for backwards compatibility reasons.

Cheers,
Nick.

>
> -Ben
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From geertj at gmail.com  Sat Jul 12 11:12:37 2014
From: geertj at gmail.com (Geert Jansen)
Date: Sat, 12 Jul 2014 11:12:37 +0200
Subject: [Python-Dev] Memory BIO for _ssl
In-Reply-To: 
References: 
 
Message-ID: 

On Mon, Jul 7, 2014 at 1:49 AM, Antoine Pitrou  wrote:

> Le 05/07/2014 14:04, Geert Jansen a ?crit :
>
>> Since I need this for my Gruvi async framework, I want to volunteer to
>> write a patch. It should be useful as well to Py3K's asyncio and other
>> async frameworks. It would be good to get some feedback before I start
>> on this.
>
> Thanks for volunteering! This would be a very welcome addition.

I have a first patch and submitted it as issue #21965

http://bugs.python.org/issue21965

I've incorporated your feedback.

Regards,
Geert

From ncoghlan at gmail.com  Sat Jul 12 17:19:56 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Jul 2014 10:19:56 -0500
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
 
Message-ID: 

On 10 Jul 2014 19:59, "Alexander Belopolsky" 
wrote:
>
>
> On Thu, Jul 10, 2014 at 2:59 PM, Mark Lawrence 
wrote:
>>
>> I'm just curious as to why there are 54 open issues after both of these
PEPs have been accepted and 384 is listed as finished.  Did we hit some
unforeseen technical problem which stalled development?
>
>
> I tried to bring some sanity to that effort by opening a "meta issue":
>
> http://bugs.python.org/issue15787
>
> My enthusiasm, however, vanished after I reviewed the refactoring for the
datetime module:
>
> http://bugs.python.org/issue15390
>
> My main objections are to following PEP 384 (Stable ABI) within stdlib
modules.  I see little benefit for the stdlib (which is shipped fresh with
every new version of Python) from following those guidelines.

The main downside of "do as we say, not as we do" in this case is that we
miss out on the feedback loop of what the stable ABI is like to *use*. For
example, the docs problem, where it's hard to tell whether an API is part
of the stable ABI or not, or the performance problem Stefan mentions.

Using the stable ABI for standard library extensions also serves to
decouple them further from the internal details of the CPython runtime,
making it more likely they will be able to run correctly on alternative
interpreters (since emulating or otherwise supporting the limited API is
easier than supporting the whole thing).

Cheers,
Nick.

>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From alexander.belopolsky at gmail.com  Sat Jul 12 19:00:18 2014
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Sat, 12 Jul 2014 13:00:18 -0400
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
 
 
Message-ID: 

On Sat, Jul 12, 2014 at 11:19 AM, Nick Coghlan  wrote:

> The main downside of "do as we say, not as we do" in this case is that we
> miss out on the feedback loop of what the stable ABI is like to *use*.


I good start for improving the situation would be to  convert the extension
module templates that we ship with the Python source:

http://bugs.python.org/issue15848 (xxsubtype module)
http://bugs.python.org/issue15849 (xx module)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From jaraco at jaraco.com  Sun Jul 13 16:04:17 2014
From: jaraco at jaraco.com (Jason R. Coombs)
Date: Sun, 13 Jul 2014 14:04:17 +0000
Subject: [Python-Dev] Another case for frozendict
Message-ID: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>

I repeatedly run into situations where a frozendict would be useful, and every time I do, I go searching and find the (unfortunately rejected) PEP-416. I'd just like to share another case where having a frozendict in the stdlib would be useful to me.

I was interacting with a database and had a list of results from 206 queries:

>>> res = [db.cases.remove({'_id': doc['_id']}) for doc in fives]
>>> len(res)
206

I can see that the results are the same for the first two queries.

>>> res[0]
{'n': 1, 'err': None, 'ok': 1.0}
>>> res[1]
{'n': 1, 'err': None, 'ok': 1.0}

So I'd like to test to see if that's the case, so I try to construct a 'set' on the results, which in theory would give me a list of unique results:

>>> set(res)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'dict'

I can't do that because dict is unhashable. That's reasonable, and if I had a frozen dict, I could easily work around this limitation and accomplish what I need.

>>> set(map(frozendict, res))
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'frozendict' is not defined

PEP-416 mentions a MappingProxyType, but that's no help.

>>> res_ex = list(map(types.MappingProxyType, res))
>>> set(res_ex)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'mappingproxy'

I can achieve what I need by constructing a set on the 'items' of the dict.

>>> set(tuple(doc.items()) for doc in res)
{(('n', 1), ('err', None), ('ok', 1.0))}

But that syntax would be nicer if the result had the same representation as the input (mapping instead of tuple of pairs). A frozendict would have readily enabled the desirable behavior.

Although hashability is mentioned in the PEP under constraints, there are many use-cases that fall out of the ability to hash a dict, such as the one described above, which are not mentioned at all in use-cases for the PEP.

If there's ever any interest in reviving that PEP, I'm in favor of its implementation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From victor.stinner at gmail.com  Sun Jul 13 16:13:14 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sun, 13 Jul 2014 16:13:14 +0200
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
Message-ID: 

The PEP has been rejected, but the MappingProxyType is now public:

$ ./python
Python 3.5.0a0 (default:5af54ed3af02, Jul 12 2014, 03:13:04)
>>> d={1:2}
>>> import types
>>> d = types.MappingProxyType(d)
>>> d
mappingproxy({1: 2})
>>> d[1]
2
>>> d[1] = 3
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'mappingproxy' object does not support item assignment

Victor

From rosuav at gmail.com  Sun Jul 13 16:22:57 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 14 Jul 2014 00:22:57 +1000
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
Message-ID: 

On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs  wrote:
> I can achieve what I need by constructing a set on the ?items? of the dict.
>
>>>> set(tuple(doc.items()) for doc in res)
>
> {(('n', 1), ('err', None), ('ok', 1.0))}

This is flawed; the tuple-of-tuples depends on iteration order, which
may vary. It should be a frozenset of those tuples, not a tuple. Which
strengthens your case; it's that easy to get it wrong in the absence
of an actual frozendict.

ChrisA

From andreas.r.maier at gmx.de  Sun Jul 13 17:13:20 2014
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Sun, 13 Jul 2014 17:13:20 +0200
Subject: [Python-Dev] == on object tests identity in 3.x - list delegation
	to members?
In-Reply-To: <53C04F10.8070509@stoneleaf.us>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
 <53C04F10.8070509@stoneleaf.us>
Message-ID: <53C2A210.80902@gmx.de>

Am 11.07.2014 22:54, schrieb Ethan Furman:
> On 07/11/2014 07:04 AM, Andreas Maier wrote:
>> Am 09.07.2014 03:48, schrieb Raymond Hettinger:
>>>
>>> Personally, I see no need to make the same mistake by removing
>>> the identity-implies-equality rule from the built-in containers.
>>> There's no need to upset the apple cart for nearly zero benefit.
>>
>> Containers delegate the equal comparison on the container to their
>> elements; they do not apply identity-based comparison
>> to their elements. At least that is the externally visible behavior.
>
> If that were true, then [NaN] == [NaN] would be False, and it is not.
>
> Here is the externally visible behavior:
>
> Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
> [GCC 4.7.3] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> --> NaN = float('nan')
> --> NaN == NaN
> False
> --> [NaN] == [NaN]
> True

Ouch, that hurts ;-)

First, the delegation of sequence equality to element equality is not 
something I have come up with during my doc patch. It has always been in
5.9 Comparisons of the Language Reference (copied from Python 3.4):

"Tuples and lists are compared lexicographically using comparison of 
corresponding elements. This means that to compare equal, each element 
must compare equal and the two sequences must be of the same type and 
have the same length."

Second, if not by delegation to equality of its elements, how would the 
equality of sequences defined otherwise?

But your test is definitely worth having a closer look at. I have 
broadened the test somewhat and that brings up further questions. Here 
is the test output, and a discussion of the results (test program 
try_eq.py and its output test_eq.out are attached to issue #12067):

Test #1: Different equal int objects:

   obj1: type=, str=257, id=39305936
   obj2: type=, str=257, id=39306160

   a) obj1 is obj2: False
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: True

Discussion:

Case 1.c) can be interpreted that the list delegates its == to the == on 
its elements. It cannot be interpreted to delegate to identity 
comparison. That is consistent with how everyone (I hope ;-) would 
expect int objects to behave, or lists or dicts of them.

The motivation for case f) is explained further down, it has to do with 
caching.

Test #2: Same int object:

   obj1: type=, str=257, id=39305936
   obj2: type=, str=257, id=39305936

   a) obj1 is obj2: True
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: True

-> No surprises (I hope).

Test #3: Different equal float objects:

   obj1: type=, str=257.0, id=5734664
   obj2: type=, str=257.0, id=5734640

   a) obj1 is obj2: False
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: True

Discussion:

I added this test only to show that float NaN is a special case, and 
that this test for float objects - that are not NaN - behaves like test 
#1 for int objects.

Test #4: Same float object:

   obj1: type=, str=257.0, id=5734664
   obj2: type=, str=257.0, id=5734664

   a) obj1 is obj2: True
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: True

-> Same as test #2, hopefully no surprises.

Test #5: Different float NaN objects:

   obj1: type=, str=nan, id=5734784
   obj2: type=, str=nan, id=5734976

   a) obj1 is obj2: False
   b) obj1 == obj2: False
   c) [obj1] == [obj2]: False
   d) {obj1:'v'} == {obj2:'v'}: False
   e) {'k':obj1} == {'k':obj2}: False
   f) obj1 == obj2: False

Discussion:

Here, the list behaves as I would expect under the rule that it 
delegates equality to its elements. Case c) allows that interpretation. 
However, an interpretation based on identity would also be possible.

Test #6: Same float NaN object:

   obj1: type=, str=nan, id=5734784
   obj2: type=, str=nan, id=5734784

   a) obj1 is obj2: True
   b) obj1 == obj2: False
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
   f) obj1 == obj2: False

Discussion (this is Ethan's example):

Case 6.b) shows the special behavior of float NaN that is documented: a 
float NaN object is the same as itself but unequal to itself.

Case 6.c) is the surprising case. It could be interpreted in two ways 
(at least that's what I found):

1) The comparison is based on identity of the float objects. But that is 
inconsistent with test #4. And why would the list special-case NaN 
comparison in such a way that it ends up being inconsistent with the 
special definition of NaN (outside of the list)?

2) The list does not always delegate to element equality, but attempts 
to optimize if the objects are the same (same identity). We will see 
later that that happens. Further, when comparing float NaNs of the same 
identity, the list implementation forgot to special-case NaNs. Which 
would be a bug, IMHO. I did not analyze the C implementation, so this is 
all speculation based upon external visible behavior.

Test #7: Different objects (with equal x) of class C
    (C.__eq__() implemented with equality of x,
     C.__ne__() returning NotImplemented):

   obj1: type=, str=C(256), id=39406504
   obj2: type=, str=C(256), id=39406616

   a) obj1 is obj2: False
C.__eq__(): self=39406504, other=39406616, returning True
   b) obj1 == obj2: True
C.__eq__(): self=39406504, other=39406616, returning True
   c) [obj1] == [obj2]: True
C.__eq__(): self=39406616, other=39406504, returning True
   d) {obj1:'v'} == {obj2:'v'}: True
C.__eq__(): self=39406504, other=39406616, returning True
   e) {'k':obj1} == {'k':obj2}: True
C.__eq__(): self=39406504, other=39406616, returning True
   f) obj1 == obj2: True

The __eq__() and __ne__() implementations each print a debug message. 
The __ne__() is only defined to verify that it is not invoked, and that 
the inherited default __ne__() does not chime in.

Discussion:

Here we see that the list equality comparison does invoke the element 
equality. However, the picture becomes more complex further down.

Test #8: Same object of class C
    (C.__eq__() implemented with equality of x,
     C.__ne__() returning NotImplemented):

   obj1: type=, str=C(256), id=39406504
   obj2: type=, str=C(256), id=39406504

   a) obj1 is obj2: True
C.__eq__(): self=39406504, other=39406504, returning True
   b) obj1 == obj2: True
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
C.__eq__(): self=39406504, other=39406504, returning True
   f) obj1 == obj2: True

Discussion:

The == on the class C objects in case 8.b) invokes __eq__(), even though 
the objects are the same object. This can be explained by the desire in 
Python that classes should be able not to be reflexive, if needed. Like 
float NaN, for example.

Now, the list equality in case 8.c) is interesting. The list equality 
does not invoke element equality. Even though object equality in case 
8.b) did not assume reflexivity and invoked the __eq__() method, the 
list seems to assume reflexivity and seems to go by object identity.

The only other potential explanation (that I found) would be that some 
aspects of the comparison behavior are cached. That's why I added the 
cases f), which show that caching for comparison results does not happen 
(the __eq__() method is invoked again).

So we are back to discussing why element equality does not assume 
reflexivity, but list equality does. IMHO, that is another bug, or maybe 
the same one.

Test #9: Different objects (with equal x) of class D
    (D.__eq__() implemented with inequality of x,
     D.__ne__() returning NotImplemented):

   obj1: type=, str=C(256), id=39407064
   obj2: type=, str=C(256), id=39406952

   a) obj1 is obj2: False
D.__eq__(): self=39407064, other=39406952, returning False
   b) obj1 == obj2: False
D.__eq__(): self=39407064, other=39406952, returning False
   c) [obj1] == [obj2]: False
D.__eq__(): self=39406952, other=39407064, returning False
   d) {obj1:'v'} == {obj2:'v'}: False
D.__eq__(): self=39407064, other=39406952, returning False
   e) {'k':obj1} == {'k':obj2}: False
D.__eq__(): self=39407064, other=39406952, returning False
   f) obj1 == obj2: False

Discussion:

Class D implements __eq__() by != on the data attribute. This test does 
not really show any surprises, and is consistent with the theory that 
list comparison delegates to element comparison. This is really just a 
preparation for the next test, that uses the same object of this class.

Test #10: Same object of class D
    (D.__eq__() implemented with inequality of x,
     D.__ne__() returning NotImplemented):

   obj1: type=, str=C(256), id=39407064
   obj2: type=, str=C(256), id=39407064

   a) obj1 is obj2: True
D.__eq__(): self=39407064, other=39407064, returning False
   b) obj1 == obj2: False
   c) [obj1] == [obj2]: True
   d) {obj1:'v'} == {obj2:'v'}: True
   e) {'k':obj1} == {'k':obj2}: True
D.__eq__(): self=39407064, other=39407064, returning False
   f) obj1 == obj2: False

Discussion:

The inequality-based implementation of __eq__() explains case 10.b). It 
is surprising (to me) that the list comparison in case 10.c) returns 
True. If one compares that to case 9.c), one could believe that the 
identities of the objects are used for both cases. But why would the 
list not respect the result of __eq__() if it is implemented?

This behavior seems at least to be consistent with surprise of case 6.c)

In order to not just rely on the external behavior, I started digging 
into the C implementation. For list equality comparison, I started at 
list_richcompare() which uses PyObject_RichCompareBool(), which 
shortcuts its result based on identity comparison, and thus enforces 
reflexitivity.

The comment on line 714 in object.c in PyObject_RichCompareBool() also 
confirms that:

   /* Quick result when objects are the same.
      Guarantees that identity implies equality. */

IMHO, we need to discuss whether we are serious with the direction that 
was claimed earlier in this thread, that reflexivity (i.e. identity 
implies equality) should be decided upon by the classes and not by the 
Python language. As I see it, we have some pieces of code that enforce 
reflexivity, and some that don't.

Andy

From steve at pearwood.info  Sun Jul 13 18:23:03 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 14 Jul 2014 02:23:03 +1000
Subject: [Python-Dev] == on object tests identity in 3.x - list
	delegation to members?
In-Reply-To: <53C2A210.80902@gmx.de>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
 <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de>
Message-ID: <20140713162249.GP5705@ando>

On Sun, Jul 13, 2014 at 05:13:20PM +0200, Andreas Maier wrote:

> Second, if not by delegation to equality of its elements, how would the 
> equality of sequences defined otherwise?

Wow. I'm impressed by the amount of detailed effort you've put into 
investigating this. (Too much detail to absorb, I'm afraid.) But perhaps 
you might have just asked on the python-list at python.org mailing list, or 
here, where we would have told you the answer:

    list __eq__ first checks element identity before going on
    to check element equality.


If you can read C, you might like to check the list source code:

http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c

but if I'm reading it correctly, list.__eq__ conceptually looks 
something like this:

def __eq__(self, other):
    if not isinstance(other, list):
        return NotImplemented
    if len(other) != len(self):
        return False
    for a, b in zip(self, other):
        if not (a is b or a == b):
            return False
    return True

(The actual code is a bit more complex than that, since there is a 
single function, list_richcompare, which handles all the rich 
comparisons.)

The critical test is PyObject_RichCompareBool here:

http://hg.python.org/cpython/file/22e5a85ba840/Objects/object.c

which explicitly says:

    /* Quick result when objects are the same.
       Guarantees that identity implies equality. */


[...]
> I added this test only to show that float NaN is a special case,

NANs are not a special case. List __eq__ treats all object types 
identically (pun intended):

py> class X:
...     def __eq__(self, other): return False
...
py> x = X()
py> x == x
False
py> [x] == [X()]
False
py> [x] == [x]
True


[...]
> Case 6.c) is the surprising case. It could be interpreted in two ways 
> (at least that's what I found):
> 
> 1) The comparison is based on identity of the float objects. But that is 
> inconsistent with test #4. And why would the list special-case NaN 
> comparison in such a way that it ends up being inconsistent with the 
> special definition of NaN (outside of the list)?

It doesn't. NANs are not special cased in any way.

This was discussed to death some time ago, both on python-dev and 
python-ideas. If you're interested, you can start here:

https://mail.python.org/pipermail/python-list/2012-October/633992.html

which is in the middle of one of the threads, but at least it gets you 
to the right time period.


> 2) The list does not always delegate to element equality, but attempts 
> to optimize if the objects are the same (same identity).

Right! It's not just lists -- I believe that tuples, dicts and sets 
behave the same way.


> We will see 
> later that that happens. Further, when comparing float NaNs of the same 
> identity, the list implementation forgot to special-case NaNs. Which 
> would be a bug, IMHO.

"Forgot"? I don't think the behaviour of list comparisons is an 
accident.

NAN equality is non-reflexive. Very few other things are the same. It 
would be seriously weird if alist == alist could return False. You'll 
note that the IEEE-754 standard has nothing to say about the behaviour 
of Python lists containing NANs, so we're free to pick whatever 
behaviour makes the most sense for Python, and that is to minimise the 
"Gotcha!" factor.

NANs are a gotcha to anyone who doesn't know IEEE-754, and possibly even 
some who do. I will go to the barricades to fight to keep the 
non-reflexivity of NANs *in isolation*, but I believe that Python has 
made the right decision to treat lists containing NANs the same as 
everything else.

NAN == NAN  # obeys IEEE-754 semantics and returns False

[NAN] == [NAN]  # obeys standard expectation that equality is reflexive

This behaviour is not a bug, it is a feature. As far as I am concerned, 
this only needs documenting. If anyone needs list equality to honour the 
special behaviour of NANs, write a subclass or an equal() function.



-- 
Steven

From rosuav at gmail.com  Sun Jul 13 18:34:20 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 14 Jul 2014 02:34:20 +1000
Subject: [Python-Dev] == on object tests identity in 3.x - list
 delegation to members?
In-Reply-To: <20140713162249.GP5705@ando>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>
 <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us>
 <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando>
Message-ID: 

On Mon, Jul 14, 2014 at 2:23 AM, Steven D'Aprano  wrote:
>> We will see
>> later that that happens. Further, when comparing float NaNs of the same
>> identity, the list implementation forgot to special-case NaNs. Which
>> would be a bug, IMHO.
>
> "Forgot"? I don't think the behaviour of list comparisons is an
> accident.

Well, "forgot" is on the basis that the identity check is intended to
be a mere optimization. If that were the case ("don't actually call
__eq__ when you reckon it'll return True"), then yes, failing to
special-case NaN would be a bug. But since it's intended behaviour, as
explained further down, it's not a bug and not the result of
forgetfulness.

ChrisA

From wizzat at gmail.com  Sun Jul 13 18:50:53 2014
From: wizzat at gmail.com (Mark Roberts)
Date: Sun, 13 Jul 2014 09:50:53 -0700
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
Message-ID: 

I find it handy to use named tuple as my database mapping type.  It allows you to perform this behavior seamlessly.

-Mark

> On Jul 13, 2014, at 7:04, "Jason R. Coombs"  wrote:
> 
> I repeatedly run into situations where a frozendict would be useful, and every time I do, I go searching and find the (unfortunately rejected) PEP-416. I?d just like to share another case where having a frozendict in the stdlib would be useful to me.
>  
> I was interacting with a database and had a list of results from 206 queries:
>  
> >>> res = [db.cases.remove({'_id': doc['_id']}) for doc in fives]
> >>> len(res)
> 206
>  
> I can see that the results are the same for the first two queries.
>  
> >>> res[0]
> {'n': 1, 'err': None, 'ok': 1.0}
> >>> res[1]
> {'n': 1, 'err': None, 'ok': 1.0}
>  
> So I?d like to test to see if that?s the case, so I try to construct a ?set? on the results, which in theory would give me a list of unique results:
>  
> >>> set(res)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unhashable type: 'dict'
>  
> I can?t do that because dict is unhashable. That?s reasonable, and if I had a frozen dict, I could easily work around this limitation and accomplish what I need.
>  
> >>> set(map(frozendict, res))
> Traceback (most recent call last):
>   File "", line 1, in 
> NameError: name 'frozendict' is not defined
>  
> PEP-416 mentions a MappingProxyType, but that?s no help.
>  
> >>> res_ex = list(map(types.MappingProxyType, res))
> >>> set(res_ex)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unhashable type: 'mappingproxy'
>  
> I can achieve what I need by constructing a set on the ?items? of the dict.
>  
> >>> set(tuple(doc.items()) for doc in res)
> {(('n', 1), ('err', None), ('ok', 1.0))}
>  
> But that syntax would be nicer if the result had the same representation as the input (mapping instead of tuple of pairs). A frozendict would have readily enabled the desirable behavior.
>  
> Although hashability is mentioned in the PEP under constraints, there are many use-cases that fall out of the ability to hash a dict, such as the one described above, which are not mentioned at all in use-cases for the PEP.
>  
> If there?s ever any interest in reviving that PEP, I?m in favor of its implementation.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/wizzat%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ncoghlan at gmail.com  Sun Jul 13 20:11:58 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Jul 2014 13:11:58 -0500
Subject: [Python-Dev] == on object tests identity in 3.x - list
 delegation to members?
In-Reply-To: 
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>
 <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us>
 <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando>
 
Message-ID: 

On 13 July 2014 11:34, Chris Angelico  wrote:
> On Mon, Jul 14, 2014 at 2:23 AM, Steven D'Aprano  wrote:
>>> We will see
>>> later that that happens. Further, when comparing float NaNs of the same
>>> identity, the list implementation forgot to special-case NaNs. Which
>>> would be a bug, IMHO.
>>
>> "Forgot"? I don't think the behaviour of list comparisons is an
>> accident.
>
> Well, "forgot" is on the basis that the identity check is intended to
> be a mere optimization. If that were the case ("don't actually call
> __eq__ when you reckon it'll return True"), then yes, failing to
> special-case NaN would be a bug. But since it's intended behaviour, as
> explained further down, it's not a bug and not the result of
> forgetfulness.

Right, it's not a mere optimisation - it's the only way to get
containers to behave sensibly. Otherwise we'd end up with nonsense
like:

>>> x = float("nan")
>>> x in [x]
False

That currently returns True because of the identity check - it would
return False if we delegated the check to float.__eq__ because the
defined IEEE754 behaviour for NaN's breaks the mathematical definition
of an equivalence class as a transitive, reflexive and commutative
operation. (It breaks it for *good reasons*, but we still need to
figure out a way of dealing with the impedance mismatch between the
definition of floats and the definition of container invariants like
"assert x in [x]")

The current approach means that the lack of reflexivity of NaN's stays
confined to floats and similar types - it doesn't leak out and infect
the behaviour of the container types.

What we've never figured out is a good place to *document* it. I
thought there was an open bug for that, but I can't find it right now.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From rosuav at gmail.com  Sun Jul 13 20:16:11 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 14 Jul 2014 04:16:11 +1000
Subject: [Python-Dev] == on object tests identity in 3.x - list
 delegation to members?
In-Reply-To: 
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>
 <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us>
 <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando>
 
 
Message-ID: 

On Mon, Jul 14, 2014 at 4:11 AM, Nick Coghlan  wrote:
> What we've never figured out is a good place to *document* it. I
> thought there was an open bug for that, but I can't find it right now.

Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found
a parallel explanation of sequence equality.

ChrisA

From ncoghlan at gmail.com  Sun Jul 13 20:23:42 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Jul 2014 13:23:42 -0500
Subject: [Python-Dev] == on object tests identity in 3.x - list
 delegation to members?
In-Reply-To: 
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>
 <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us>
 <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando>
 
 
 
Message-ID: 

On 13 July 2014 13:16, Chris Angelico  wrote:
> On Mon, Jul 14, 2014 at 4:11 AM, Nick Coghlan  wrote:
>> What we've never figured out is a good place to *document* it. I
>> thought there was an open bug for that, but I can't find it right now.
>
> Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found
> a parallel explanation of sequence equality.

We might need to expand the tables of sequence operations to cover
equality and inequality checks - those are currently missing.

Cheers,
Nick.

>
> ChrisA
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com



-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From dw+python-dev at hmmz.org  Sun Jul 13 20:43:28 2014
From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org)
Date: Sun, 13 Jul 2014 18:43:28 +0000
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
Message-ID: <20140713184328.GA6345@k2>

On Sun, Jul 13, 2014 at 02:04:17PM +0000, Jason R. Coombs wrote:

> PEP-416 mentions a MappingProxyType, but that?s no help.

Well, it kindof is. By combining MappingProxyType and UserDict the
desired effect can be achieved concisely:

    import collections
    import types

    class frozendict(collections.UserDict):
        def __init__(self, d, **kw):
            if d:
                d = d.copy()
                d.update(kw)
            else:
                d = kw
            self.data = types.MappingProxyType(d)

        _h = None
        def __hash__(self):
            if self._h is None:
                self._h = sum(map(hash, self.data.items()))
            return self._h

        def __repr__(self):
            return repr(dict(self))


> Although hashability is mentioned in the PEP under constraints, there are many
> use-cases that fall out of the ability to hash a dict, such as the one
> described above, which are not mentioned at all in use-cases for the PEP.

> If there?s ever any interest in reviving that PEP, I?m in favor of its
> implementation.

In its previous form, the PEP seemed more focused on some false
optimization capabilities of a read-only type, rather than as here, the
far more interesting hashability properties. It might warrant a fresh
PEP to more thoroughly investigate this angle.


David

From dw+python-dev at hmmz.org  Sun Jul 13 20:50:18 2014
From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org)
Date: Sun, 13 Jul 2014 18:50:18 +0000
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <20140713184328.GA6345@k2>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 <20140713184328.GA6345@k2>
Message-ID: <20140713185018.GB6345@k2>

On Sun, Jul 13, 2014 at 06:43:28PM +0000, dw+python-dev at hmmz.org wrote:

>             if d:
>                 d = d.copy()

To cope with iterables, "d = d.copy()" should have read "d = dict(d)".


David

From ncoghlan at gmail.com  Sun Jul 13 21:09:25 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Jul 2014 14:09:25 -0500
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <20140713184328.GA6345@k2>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 <20140713184328.GA6345@k2>
Message-ID: 

On 13 July 2014 13:43,   wrote:
> In its previous form, the PEP seemed more focused on some false
> optimization capabilities of a read-only type, rather than as here, the
> far more interesting hashability properties. It might warrant a fresh
> PEP to more thoroughly investigate this angle.

RIght, the use case would be "frozendict as a simple alternative to a
full class definition", but even less structured than namedtuple in
that the keys may vary as well. That difference means that frozendict
applies more cleanly to semi-structured data manipulated as
dictionaries (think stuff deserialised from JSON) than namedtuple
does.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From marko at pacujo.net  Sun Jul 13 21:54:02 2014
From: marko at pacujo.net (Marko Rauhamaa)
Date: Sun, 13 Jul 2014 22:54:02 +0300
Subject: [Python-Dev] == on object tests identity in 3.x - list
	delegation to members?
In-Reply-To: 
 (Nick Coghlan's message of "Sun, 13 Jul 2014 13:11:58 -0500")
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>
 <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us>
 <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando>
 
 
Message-ID: <8738e56nmt.fsf@elektro.pacujo.net>

Nick Coghlan :

> Right, it's not a mere optimisation - it's the only way to get
> containers to behave sensibly. Otherwise we'd end up with nonsense
> like:
>
>>>> x = float("nan")
>>>> x in [x]
> False

Why is that nonsense? I mean, why is it any more nonsense than

   >>> x == x
   False

Anyway, personally, I'm perfectly "happy" to live with the choices of
past generations, regardless of whether they were good or not. What you
absolutely don't want to do is "correct" the choices of past generations.


Marko

From 4kir4.1i at gmail.com  Sun Jul 13 22:05:27 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Mon, 14 Jul 2014 00:05:27 +0400
Subject: [Python-Dev] == on object tests identity in 3.x - list
	delegation to members?
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com>
 <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us>
 <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando>
 
 
Message-ID: <87ion1owhk.fsf@gmail.com>

Nick Coghlan  writes:
...
> definition of floats and the definition of container invariants like
> "assert x in [x]")
>
> The current approach means that the lack of reflexivity of NaN's stays
> confined to floats and similar types - it doesn't leak out and infect
> the behaviour of the container types.
>
> What we've never figured out is a good place to *document* it. I
> thought there was an open bug for that, but I can't find it right now.

There was related issue "Tuple comparisons with NaNs are broken"
http://bugs.python.org/issue21873 
but it was closed as "not a bug" despite the corresponding behavior is
*not documented* anywhere.


--
Akira


From benhoyt at gmail.com  Mon Jul 14 02:12:16 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Sun, 13 Jul 2014 20:12:16 -0400
Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal
In-Reply-To: 
References: 
 <53BC5309.6000605@stoneleaf.us>
 
 <53BC9B8B.40509@stoneleaf.us>
 
 
 
 <53BD4670.9080100@stoneleaf.us>
 
 <53BD6F38.7090000@stoneleaf.us>
 
 <53BD9557.80709@stoneleaf.us>
 
 
 <53BDA99C.3020101@stoneleaf.us>
 
 <53BDBE42.7050609@stoneleaf.us>
 
 <53BF594D.9060007@stoneleaf.us>
 
 
Message-ID: 

>> Very much agreed that this isn't necessary for just readdir/FindNext
>> errors. We've never had this level of detail before -- if listdir()
>> fails half way through (very unlikely) it just bombs with OSError and
>> you get no entries at all.
>>
>> If you really really want this (again very unlikely), you can always
>> use call next() directly and catch OSError around that call.
>
> Agreed - I think the PEP should point this out explicitly, and show that the
> approach it takes offers a lot of flexibility in error handling from "just
> let it fail", to a single try/catch around the whole loop, to try/catch just
> around the operations that might call lstat(), to try/catch around the
> individual iteration steps.

Good point. It'd be good to mention this explicitly in the PEP and
have another example or two of the different levels of errors
handling.

> os.walk remains the higher level API that most code should be using, and
> that has to retain the current listdir based behaviour (any error = ignore
> all entries in that directory) for backwards compatibility reasons.

Yes, definitely.

-Ben

From benhoyt at gmail.com  Mon Jul 14 02:33:16 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Sun, 13 Jul 2014 20:33:16 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
Message-ID: 

Hi folks,

Thanks Victor, Nick, Ethan, and others for continued discussion on the
scandir PEP 471 (most recent thread starts at
https://mail.python.org/pipermail/python-dev/2014-July/135377.html).

Just an aside ... I was reminded again recently why scandir() matters:
a scandir user emailed me the other day, saying "I used scandir to
dump the contents of a network dir in under 15 seconds. 13 root dirs,
60,000 files in the structure. This will replace some old VBA code
embedded in a spreadsheet that was taking 15-20 minutes to do the
exact same thing." I asked if he could run scandir's benchmark.py on
his directory tree, and here's what it printed out:

C:\Python34\scandir-master>benchmark.py "\\my\network\directory"
Using fast C version of scandir
Priming the system's cache...
Benchmarking walks on \\my\network\directory, repeat 1/3...
Benchmarking walks on \\my\network\directory, repeat 2/3...
Benchmarking walks on \\my\network\directory, repeat 3/3...
os.walk took 8739.851s, scandir.walk took 129.500s -- 67.5x as fast

That's right -- os.walk() with scandir was almost 70x as fast as the
current version! Admittedly this is a network file system, but that's
still a real and important use case. It really pays not to throw away
information the OS gives you for free. :-)

On the recent python-dev thread, Victor especially made some well
thought out suggestions. It seems to me there's general agreement that
the basic API in PEP 471 is good (with Ethan not a fan at first, but
it seems he's on board after further discussion :-).

That said, I think there's basically one thing remaining to decide:
whether or not to have DirEntry.is_dir() and .is_file() follow
symlinks by default. I think Victor made a pretty good case that:

(a) following links is usually what you want
(b) that's the precedent set by the similar functions os.path.isdir()
and pathlib.Path.is_dir(), so to do otherwise would be confusing
(c) with the non-link-following version, if you wanted to follow links
you'd have to say something like "if (entry.is_symlink() and
os.path.isdir(entry.full_name)) or entry.is_dir()" instead of just "if
entry.is_dir()"
(d) it's error prone to have to do (c), as I found out recently when I
had a bug in my implementation of os.walk() with scandir -- I had a
bug due to getting this exact test wrong

If we go with Victor's link-following .is_dir() and .is_file(), then
we probably need to add his suggestion of a follow_symlinks=False
parameter (defaults to True). Either that or you have to say
"stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
less nice.

As a KISS enthusiast, I admit I'm still somewhat partial to the
DirEntry methods just returning (non-link following) info about the
*directory entry* itself. However, I can definitely see the
error-proneness of that, and the advantages given the points above. So
I guess I'm on the fence.

Given the above arguments for symlink-following is_dir()/is_file()
methods (have I missed any, Victor?), what do others think?

I'd be very keen to come to a consensus on this, so that I can make
some final updates to the PEP and see about getting it accepted and/or
implemented. :-)

-Ben

From timothy.c.delaney at gmail.com  Mon Jul 14 02:52:42 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Mon, 14 Jul 2014 10:52:42 +1000
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
Message-ID: 

On 14 July 2014 10:33, Ben Hoyt  wrote:

>

If we go with Victor's link-following .is_dir() and .is_file(), then
> we probably need to add his suggestion of a follow_symlinks=False
> parameter (defaults to True). Either that or you have to say
> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
> less nice.
>

Absolutely agreed that follow_symlinks is the way to go, disagree on the
default value.


> Given the above arguments for symlink-following is_dir()/is_file()
> methods (have I missed any, Victor?), what do others think?
>

I would say whichever way you go, someone will assume the opposite. IMO not
following symlinks by default is safer. If you follow symlinks by default
then everyone has the following issues:

1. Crossing filesystems (including onto network filesystems);

2. Recursive directory structures (symlink to a parent directory);

3. Symlinks to non-existent files/directories;

4. Symlink to an absolutely huge directory somewhere else (very annoying if
you just wanted to do a directory sizer ...).

If follow_symlinks=False by default, only those who opt-in have to deal
with the above.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ncoghlan at gmail.com  Mon Jul 14 04:17:33 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Jul 2014 21:17:33 -0500
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
Message-ID: 

On 13 Jul 2014 20:54, "Tim Delaney"  wrote:
>
> On 14 July 2014 10:33, Ben Hoyt  wrote:
>>
>>
>>
>> If we go with Victor's link-following .is_dir() and .is_file(), then
>> we probably need to add his suggestion of a follow_symlinks=False
>> parameter (defaults to True). Either that or you have to say
>> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
>> less nice.
>
>
> Absolutely agreed that follow_symlinks is the way to go, disagree on the
default value.
>
>>
>> Given the above arguments for symlink-following is_dir()/is_file()
>> methods (have I missed any, Victor?), what do others think?
>
>
> I would say whichever way you go, someone will assume the opposite. IMO
not following symlinks by default is safer. If you follow symlinks by
default then everyone has the following issues:
>
> 1. Crossing filesystems (including onto network filesystems);
>
> 2. Recursive directory structures (symlink to a parent directory);
>
> 3. Symlinks to non-existent files/directories;
>
> 4. Symlink to an absolutely huge directory somewhere else (very annoying
if you just wanted to do a directory sizer ...).
>
> If follow_symlinks=False by default, only those who opt-in have to deal
with the above.

Or the ever popular symlink to "." (or a directory higher in the tree).

I think os.walk() is a good source of inspiration here: call the flag
"followlink" and default it to False.

Cheers,
Nick.

>
> Tim Delaney
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From timothy.c.delaney at gmail.com  Mon Jul 14 04:29:12 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Mon, 14 Jul 2014 12:29:12 +1000
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
Message-ID: 

On 14 July 2014 12:17, Nick Coghlan  wrote:
>
> I think os.walk() is a good source of inspiration here: call the flag
> "followlink" and default it to False.
>
Actually, that's "followlinks", and I'd forgotten that os.walk() defaulted
to not follow - definitely behaviour to match IMO :)

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ethan at stoneleaf.us  Mon Jul 14 04:55:37 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sun, 13 Jul 2014 19:55:37 -0700
Subject: [Python-Dev] == on object tests identity in 3.x - list
	delegation to members?
In-Reply-To: <53C2A210.80902@gmx.de>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
 <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de>
Message-ID: <53C346A9.3050200@stoneleaf.us>

On 07/13/2014 08:13 AM, Andreas Maier wrote:
> Am 11.07.2014 22:54, schrieb Ethan Furman:
>>
>> Here is the externally visible behavior:
>>
>> Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
>> [GCC 4.7.3] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>> --> NaN = float('nan')
>> --> NaN == NaN
>> False
>> --> [NaN] == [NaN]
>> True
>
> Ouch, that hurts ;-)

Yeah, I've been bitten enough times that now I try to always test code before I post.  ;)


> Test #8: Same object of class C
>     (C.__eq__() implemented with equality of x,
>      C.__ne__() returning NotImplemented):
>
>    obj1: type=, str=C(256), id=39406504
>    obj2: type=, str=C(256), id=39406504
>
>    a) obj1 is obj2: True
> C.__eq__(): self=39406504, other=39406504, returning True

This is interesting/weird/odd -- why is __eq__ being called for an 'is' test?

--- test_eq.py ----------------------------
class TestEqTrue:
     def __eq__(self, other):
         print('Test.__eq__ returning True')
         return True

class TestEqFalse:
     def __eq__(self, other):
         print('Test.__eq__ returning False')
         return False

tet = TestEqTrue()
print(tet is tet)
print(tet in [tet])

tef = TestEqFalse()
print(tef is tef)
print(tef in [tef])
-------------------------------------------

When I run this all I get is four Trues, never any messages about being in __eq__.

How did you get that result?

--
~Ethan~

From ethan at stoneleaf.us  Mon Jul 14 06:52:37 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sun, 13 Jul 2014 21:52:37 -0700
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
Message-ID: <53C36215.2080206@stoneleaf.us>

On 07/13/2014 05:33 PM, Ben Hoyt wrote:
>
> On the recent python-dev thread, Victor especially made some well
> thought out suggestions. It seems to me there's general agreement that
> the basic API in PEP 471 is good (with Ethan not a fan at first, but
> it seems he's on board after further discussion :-).

I would still like to have 'info' and 'onerror' added to the basic API, but I agree that having methods and caching on 
first lookup is good.


> That said, I think there's basically one thing remaining to decide:
> whether or not to have DirEntry.is_dir() and .is_file() follow
> symlinks by default.

We should have a flag for that, and default it to False:

   scandir(path, *, followlinks=False, info=None, onerror=None)

--
~Ethan~

From ethan at stoneleaf.us  Mon Jul 14 07:51:04 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sun, 13 Jul 2014 22:51:04 -0700
Subject: [Python-Dev] == on object tests identity in 3.x - list
	delegation to members?
In-Reply-To: <53C36BBA.8010406@gmx.de>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
 <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de>
 <53C346A9.3050200@stoneleaf.us> <53C36BBA.8010406@gmx.de>
Message-ID: <53C36FC8.3000707@stoneleaf.us>

On 07/13/2014 10:33 PM, Andreas Maier wrote:
> Am 14.07.2014 04:55, schrieb Ethan Furman:
>> On 07/13/2014 08:13 AM, Andreas Maier wrote:
>>> Test #8: Same object of class C
>>>     (C.__eq__() implemented with equality of x,
>>>      C.__ne__() returning NotImplemented):
>>>
>>>    obj1: type=, str=C(256), id=39406504
>>>    obj2: type=, str=C(256), id=39406504
>>>
>>>    a) obj1 is obj2: True
>>> C.__eq__(): self=39406504, other=39406504, returning True
>>
>> This is interesting/weird/odd -- why is __eq__ being called for an 'is'
>> test?
>
> The debug messages are printed before the result is printed. So this is the debug message for the next case, 8.b).

Ah, whew!  That's a relief.

> Sorry for not explaining it.

Had I been reading more closely I would (hopefully) have noticed that, but I was headed out the door at the time.

--
~Ethan~

From victor.stinner at gmail.com  Mon Jul 14 10:18:31 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 14 Jul 2014 10:18:31 +0200
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
Message-ID: 

2014-07-14 2:33 GMT+02:00 Ben Hoyt :
> If we go with Victor's link-following .is_dir() and .is_file(), then
> we probably need to add his suggestion of a follow_symlinks=False
> parameter (defaults to True). Either that or you have to say
> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
> less nice.

You forgot one of my argument: we must have exactly the same API than
os.path.is_dir() and pathlib.Path.is_dir(), because it would be very
confusing (source of bugs) to have a different behaviour.

Since these functions don't have any parameter (there is no such
follow_symlink(s) parameter), I'm opposed to the idea of adding such
parameter.

If you really want to add a follow_symlink optional parameter, IMO you
should modify all os.path.is*() functions and all pathlib.Path.is*()
methods to add it there too. Maybe if nobody asked for this feature
before, it's because it's not useful in practice. You can simply test
explicitly is_symlink() before checking is_dir().

Well, let's imagine DirEntry.is_dir() does not follow symlinks. How do
you test is_dir() and follow symlinks?
"stat.S_ISDIR(entry.stat().st_mode)" ? You have to import the stat
module, and use the ugly C macro S_ISDIR().

Victor

From victor.stinner at gmail.com  Mon Jul 14 10:25:48 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 14 Jul 2014 10:25:48 +0200
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
Message-ID: 

2014-07-14 4:17 GMT+02:00 Nick Coghlan :
> Or the ever popular symlink to "." (or a directory higher in the tree).

"." and ".." are explicitly ignored by os.listdir() an os.scandir().

> I think os.walk() is a good source of inspiration here: call the flag
> "followlink" and default it to False.

IMO the specific function os.walk() is not a good example. It includes
symlinks to directories in the dirs list and then it does not follow
symlink, it is a recursive function and has a followlinks optional
parameter (default: False).

Moreover, in 92% of cases, functions using os.listdir() and
os.path.isdir() *follow* symlinks:
https://mail.python.org/pipermail/python-dev/2014-July/135435.html

Victor

From victor.stinner at gmail.com  Mon Jul 14 10:31:00 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 14 Jul 2014 10:31:00 +0200
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: <53C36215.2080206@stoneleaf.us>
References: 
 <53C36215.2080206@stoneleaf.us>
Message-ID: 

2014-07-14 6:52 GMT+02:00 Ethan Furman :
> We shoIf you put the option on scandir(), you uld have a flag for that, and default it to False:
>
>   scandir(path, *, followlinks=False, info=None, onerror=None)

What happens to name and full_name with followlinks=True? Do they
contain the name in the directory (name of the symlink) or name of the
linked file?

So it means that is_dir() may or may not follow symlinks depending how
the object was built?

Victor

From benhoyt at gmail.com  Mon Jul 14 14:27:39 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Mon, 14 Jul 2014 08:27:39 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
Message-ID: 

First, just to clarify a couple of points.

> You forgot one of my argument: we must have exactly the same API than
> os.path.is_dir() and pathlib.Path.is_dir(), because it would be very
> confusing (source of bugs) to have a different behaviour.

Actually, I specifically included that argument. It's item (b) in the
list in my original message yesterday. :-)

> Since these functions don't have any parameter (there is no such
> follow_symlink(s) parameter), I'm opposed to the idea of adding such
> parameter.
>
> If you really want to add a follow_symlink optional parameter, IMO you
> should modify all os.path.is*() functions and all pathlib.Path.is*()
> methods to add it there too. Maybe if nobody asked for this feature
> before, it's because it's not useful in practice. You can simply test
> explicitly is_symlink() before checking is_dir().

Yeah, this is fair enough.

> Well, let's imagine DirEntry.is_dir() does not follow symlinks. How do
> you test is_dir() and follow symlinks?
> "stat.S_ISDIR(entry.stat().st_mode)" ? You have to import the stat
> module, and use the ugly C macro S_ISDIR().

No, you don't actually need stat/S_ISDIR in that case -- if
DirEntry.is_dir() does not follow symlinks, you just say:

entry.is_symlink() and os.path.isdir(entry.full_name)

Or for the full test:

(entry.is_symlink() and os.path.isdir(entry.full_name)) or entry.is_dir()

On the other hand, if DirEntry.is_dir() does follow symlinks per your
proposal, then to do is_dir without following symlinks you need to use
DirEntry. lstat() like so:

stat.S_ISDIR(entry.lstat().st_mode)

So from this perspective it's somewhat nicer to have DirEntry.is_X()
not follow links and use DirEntry.is_symlink() and os.path.isX() to
supplement that if you want to follow links.

I think Victor has a good point re 92% of the stdlib calls that use
listdir and isX do follow links.

However, I think Tim Delaney makes some good points above about the
(not so) safety of scandir following symlinks by default -- symlinks
to network file systems, nonexist files, or huge directory trees. In
that light, this kind of thing should be opt-*in*.

I guess I'm still slightly on the DirEntry-does-not-follow-links side
of the fence, due to the fact that it's a method on the *directory
entry* object, due to simplicity of implementation, and due to Tim
Delaney's "it should be safe by default" point above.

However, we're *almost* bikeshedding at this point, and I think we
just need to pick one way or the other. It's straight forward to
implement one in terms of the other in each case.

-Ben

From andreas.r.maier at gmx.de  Mon Jul 14 07:33:46 2014
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Mon, 14 Jul 2014 07:33:46 +0200
Subject: [Python-Dev] == on object tests identity in 3.x - list delegation
	to members?
In-Reply-To: <53C346A9.3050200@stoneleaf.us>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
 <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de>
 <53C346A9.3050200@stoneleaf.us>
Message-ID: <53C36BBA.8010406@gmx.de>

Am 14.07.2014 04:55, schrieb Ethan Furman:
> On 07/13/2014 08:13 AM, Andreas Maier wrote:
>> Test #8: Same object of class C
>>     (C.__eq__() implemented with equality of x,
>>      C.__ne__() returning NotImplemented):
>>
>>    obj1: type=, str=C(256), id=39406504
>>    obj2: type=, str=C(256), id=39406504
>>
>>    a) obj1 is obj2: True
>> C.__eq__(): self=39406504, other=39406504, returning True
>
> This is interesting/weird/odd -- why is __eq__ being called for an 'is'
> test?

The debug messages are printed before the result is printed. So this is 
the debug message for the next case, 8.b).

Sorry for not explaining it.

Andy


From 4kir4.1i at gmail.com  Mon Jul 14 07:51:24 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Mon, 14 Jul 2014 09:51:24 +0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
References: 
 
 
Message-ID: <87a98cpjxf.fsf@gmail.com>

Nick Coghlan  writes:

> On 13 Jul 2014 20:54, "Tim Delaney"  wrote:
>>
>> On 14 July 2014 10:33, Ben Hoyt  wrote:
>>>
>>>
>>>
>>> If we go with Victor's link-following .is_dir() and .is_file(), then
>>> we probably need to add his suggestion of a follow_symlinks=False
>>> parameter (defaults to True). Either that or you have to say
>>> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
>>> less nice.
>>
>>
>> Absolutely agreed that follow_symlinks is the way to go, disagree on the
> default value.
>>
>>>
>>> Given the above arguments for symlink-following is_dir()/is_file()
>>> methods (have I missed any, Victor?), what do others think?
>>
>>
>> I would say whichever way you go, someone will assume the opposite. IMO
> not following symlinks by default is safer. If you follow symlinks by
> default then everyone has the following issues:
>>
>> 1. Crossing filesystems (including onto network filesystems);
>>
>> 2. Recursive directory structures (symlink to a parent directory);
>>
>> 3. Symlinks to non-existent files/directories;
>>
>> 4. Symlink to an absolutely huge directory somewhere else (very annoying
> if you just wanted to do a directory sizer ...).
>>
>> If follow_symlinks=False by default, only those who opt-in have to deal
> with the above.
>
> Or the ever popular symlink to "." (or a directory higher in the tree).
>
> I think os.walk() is a good source of inspiration here: call the flag
> "followlink" and default it to False.
>

Let's not multiply entities beyond necessity.

There is well-defined *follow_symlinks* parameter
https://docs.python.org/3/library/os.html#follow-symlinks
e.g., os.access, os.chown, os.link, os.stat, os.utime and many other
functions in os module support follow_symlinks parameter, see
os.supports_follow_symlinks.

os.walk is an exception that uses *followlinks*. It might be because it
is an old function e.g., newer os.fwalk uses follow_symlinks.

------------------------------------------------------------

As it has been said: os.path.isdir, pathlib.Path.is_dir in Python
File.directory? in Ruby, System.Directory.doesDirectoryExist in Haskell,
`test -d` in shell do follow symlinks i.e., follow_symlinks=True as
default is more familiar for .is_dir method.

`cd path` in shell, os.chdir(path), `ls path`, os.listdir(path), and
os.scandir(path) itself follow symlinks (even on Windows:
http://bugs.python.org/issue13772 ). GUI file managers such as
`nautilus` also treat symlinks to directories as directories -- you may
click on them to open corresponding directories.

Only *recursive* functions such as os.walk, os.fwalk do not follow
symlinks by default, to avoid symlink loops. Note: the behavior is
consistent with coreutils commands such as `cp` that follows symlinks
for non-recursive actions but e.g., `du` utility that is inherently
recursive doesn't follow symlinks by default.

follow_symlinks=True as default for DirEntry.is_dir method allows to
avoid easy-to-introduce bugs while replacing old
os.listdir/os.path.isdir code or writing a new code using the same
mental model.


--
Akira


From tisdall at gmail.com  Mon Jul 14 15:57:06 2014
From: tisdall at gmail.com (Tim Tisdall)
Date: Mon, 14 Jul 2014 09:57:06 -0400
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
Message-ID: 

I was interested in providing patches for the socket module to add
Bluetooth 4.0 support.  I couldn't find any details on how to provide
contributions to the Python project, though...  Is there some online
documentation with guidelines on how to contribute?  Should I just provide
a patch to this mailing list?

Also, is there a method to test changes against all the different *nix
variations?  Is Bluez the standard across the different *nix variations?

-Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From martin at v.loewis.de  Mon Jul 14 17:21:25 2014
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 14 Jul 2014 17:21:25 +0200
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
 
 
Message-ID: <53C3F575.9010602@v.loewis.de>

Am 12.07.14 17:19, schrieb Nick Coghlan:
> Using the stable ABI for standard library extensions also serves to
> decouple them further from the internal details of the CPython runtime,
> making it more likely they will be able to run correctly on alternative
> interpreters (since emulating or otherwise supporting the limited API is
> easier than supporting the whole thing).

There are two features to be gained for the standard library from that

A. with proper module shutdown support, it will be possible to release
   objects that are currently held in C global/static variables, as the
   C global variables will go away. This, in turn, is a step forward in
   the desire to allow for proper leak-free interpreter shutdown, and
   in the desire to base interpreter shutdown on GC.

B. with proper use of heap types (instead of the static type objects),
   support for the multiple-interpreter feature will be improved, since
   type objects will be per-interpreter, instead of being global. This,
   in turn, is desirable since otherwise state changes can leak from
   one interpreter to the other.

So I still maintain that the change is desirable even for the standard
library.

Regards,
Martin




From g.rodola at gmail.com  Mon Jul 14 17:32:42 2014
From: g.rodola at gmail.com (Giampaolo Rodola')
Date: Mon, 14 Jul 2014 17:32:42 +0200
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
Message-ID: 

On Mon, Jul 14, 2014 at 3:57 PM, Tim Tisdall  wrote:

> I was interested in providing patches for the socket module to add
> Bluetooth 4.0 support.  I couldn't find any details on how to provide
> contributions to the Python project, though...  Is there some online
> documentation with guidelines on how to contribute?  Should I just provide
> a patch to this mailing list?
>
> Also, is there a method to test changes against all the different *nix
> variations?  Is Bluez the standard across the different *nix variations?
>
> -Tim
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com
>
>
Hello there,
you can take a look at:
https://docs.python.org/devguide/#contributing
Patches must be submitted on the Python bug tracker:
http://bugs.python.org/

-- 
Giampaolo - http://grodola.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From skip at pobox.com  Mon Jul 14 17:30:04 2014
From: skip at pobox.com (Skip Montanaro)
Date: Mon, 14 Jul 2014 10:30:04 -0500
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
Message-ID: 

On Mon, Jul 14, 2014 at 8:57 AM, Tim Tisdall  wrote:
> Is there some online documentation with guidelines on how to contribute?

http://lmgtfy.com/?q=contribute+to+python

Skip

From brett at python.org  Mon Jul 14 17:41:57 2014
From: brett at python.org (Brett Cannon)
Date: Mon, 14 Jul 2014 15:41:57 +0000
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
References: 
 
 
 <53C3F575.9010602@v.loewis.de>
Message-ID: 

On Mon Jul 14 2014 at 11:27:34 AM, "Martin v. L?wis" 
wrote:

> Am 12.07.14 17:19, schrieb Nick Coghlan:
> > Using the stable ABI for standard library extensions also serves to
> > decouple them further from the internal details of the CPython runtime,
> > making it more likely they will be able to run correctly on alternative
> > interpreters (since emulating or otherwise supporting the limited API is
> > easier than supporting the whole thing).
>
> There are two features to be gained for the standard library from that
>
> A. with proper module shutdown support, it will be possible to release
>    objects that are currently held in C global/static variables, as the
>    C global variables will go away. This, in turn, is a step forward in
>    the desire to allow for proper leak-free interpreter shutdown, and
>    in the desire to base interpreter shutdown on GC.
>
> B. with proper use of heap types (instead of the static type objects),
>    support for the multiple-interpreter feature will be improved, since
>    type objects will be per-interpreter, instead of being global. This,
>    in turn, is desirable since otherwise state changes can leak from
>    one interpreter to the other.
>
> So I still maintain that the change is desirable even for the standard
> library.
>

I agree for PEP  3121 which is the initialization/finalization work. The
stable ABi is not necessary. So maybe we should re-examine the patches and
accept the bits that clean up init/finalization and leave out any
ABi-related changes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From brian at python.org  Mon Jul 14 17:53:47 2014
From: brian at python.org (Brian Curtin)
Date: Mon, 14 Jul 2014 10:53:47 -0500
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
 
Message-ID: 

On Mon, Jul 14, 2014 at 10:30 AM, Skip Montanaro  wrote:

> On Mon, Jul 14, 2014 at 8:57 AM, Tim Tisdall  wrote:
> > Is there some online documentation with guidelines on how to contribute?
>
> http://lmgtfy.com/?q=contribute+to+python


This response is unacceptable.

Tim: check out https://docs.python.org/devguide/ and perhaps look at the
core-mentorship[0] mailing list while coming up with your first
contributions. It's a good first step to getting some guidance on the
process and getting some eyes on your early patches.

[0] https://mail.python.org/mailman/listinfo/core-mentorship/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From skip at pobox.com  Mon Jul 14 18:09:55 2014
From: skip at pobox.com (Skip Montanaro)
Date: Mon, 14 Jul 2014 11:09:55 -0500
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
 
 
Message-ID: 

On Mon, Jul 14, 2014 at 10:53 AM, Brian Curtin  wrote:
>> > Is there some online documentation with guidelines on how to contribute?
>>
>> http://lmgtfy.com/?q=contribute+to+python
>
>
> This response is unacceptable.

Tim and I already discussed this offline. I admitted to being in a bit
of a snarky mood today, and he seems to have accepted my post in good
natured fashion. I should have at least added a smiley to my post. I
will refrain from attempts at unadorned levity in the future.

As penance, Tim or Brian, if you are are in or near Chicago, look me
up. I'd be happy to buy y'all a beer.

Skip

From ethan at stoneleaf.us  Mon Jul 14 18:16:22 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 14 Jul 2014 09:16:22 -0700
Subject: [Python-Dev] Python Job Board
Message-ID: <53C40256.3020101@stoneleaf.us>

has now been dead for five months.

--
~Ethan~

From hasan.diwan at gmail.com  Mon Jul 14 18:20:36 2014
From: hasan.diwan at gmail.com (Hasan Diwan)
Date: Mon, 14 Jul 2014 09:20:36 -0700
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
 
 
 
Message-ID: 

Would http://lmbtfy.com/?q=contribute+to+python# be more or less
acceptable? -- H


On 14 July 2014 09:09, Skip Montanaro  wrote:

> On Mon, Jul 14, 2014 at 10:53 AM, Brian Curtin  wrote:
> >> > Is there some online documentation with guidelines on how to
> contribute?
> >>
> >> http://lmgtfy.com/?q=contribute+to+python
> >
> >
> > This response is unacceptable.
>
> Tim and I already discussed this offline. I admitted to being in a bit
> of a snarky mood today, and he seems to have accepted my post in good
> natured fashion. I should have at least added a smiley to my post. I
> will refrain from attempts at unadorned levity in the future.
>
> As penance, Tim or Brian, if you are are in or near Chicago, look me
> up. I'd be happy to buy y'all a beer.
>
> Skip
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/hasan.diwan%40gmail.com
>



-- 
Sent from my mobile device
Envoy? de mon portable
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From tisdall at gmail.com  Mon Jul 14 17:57:06 2014
From: tisdall at gmail.com (Tim Tisdall)
Date: Mon, 14 Jul 2014 11:57:06 -0400
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
 
 
Message-ID: 

Naw, I'd accept that response.  I think I searched on Friday, but forgot
about finding that.  :)   There's enough traffic on a mailing list without
useless noise.

Thanks for all the responses.


On Mon, Jul 14, 2014 at 11:53 AM, Brian Curtin  wrote:

> On Mon, Jul 14, 2014 at 10:30 AM, Skip Montanaro  wrote:
>
>> On Mon, Jul 14, 2014 at 8:57 AM, Tim Tisdall  wrote:
>> > Is there some online documentation with guidelines on how to contribute?
>>
>> http://lmgtfy.com/?q=contribute+to+python
>
>
> This response is unacceptable.
>
> Tim: check out https://docs.python.org/devguide/ and perhaps look at the
> core-mentorship[0] mailing list while coming up with your first
> contributions. It's a good first step to getting some guidance on the
> process and getting some eyes on your early patches.
>
> [0] https://mail.python.org/mailman/listinfo/core-mentorship/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From brett at python.org  Mon Jul 14 18:59:29 2014
From: brett at python.org (Brett Cannon)
Date: Mon, 14 Jul 2014 16:59:29 +0000
Subject: [Python-Dev] Python Job Board
References: <53C40256.3020101@stoneleaf.us>
Message-ID: 

On Mon Jul 14 2014 at 12:17:03 PM, Ethan Furman  wrote:

> has now been dead for five months.
>

This is the wrong place to ask about this. It falls under the purview of
the web site who you can email at webmaster@ or submit an issue at
https://github.com/python/pythondotorg . But I know from PSF status reports
that it's being actively rewritten and fixed to make it manageable for more
than one person to run easily.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From skip at pobox.com  Mon Jul 14 19:43:24 2014
From: skip at pobox.com (Skip Montanaro)
Date: Mon, 14 Jul 2014 12:43:24 -0500
Subject: [Python-Dev] Python Job Board
In-Reply-To: 
References: <53C40256.3020101@stoneleaf.us>
 
Message-ID: 

On Mon, Jul 14, 2014 at 11:59 AM, Brett Cannon  wrote:
> This is the wrong place to ask about this. It falls under the purview of the
> web site who you can email at webmaster@ or submit an issue at
> https://github.com/python/pythondotorg . But I know from PSF status reports
> that it's being actively rewritten and fixed to make it manageable for more
> than one person to run easily.

Agree with that. I originally skipped this post because I'm pretty
sure MAL who is heavily involved with the rewrite effort) still hangs
out here. I will modify Brett's admonition a bit though. A better
place to comment about the job board (and perhaps volunteer to help
with the current effort) is jobs at python.org.

Skip

From alexander.belopolsky at gmail.com  Mon Jul 14 20:10:16 2014
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 14 Jul 2014 14:10:16 -0400
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
 
 
 <53C3F575.9010602@v.loewis.de>
 
Message-ID: 

On Mon, Jul 14, 2014 at 11:41 AM, Brett Cannon  wrote:

> So maybe we should re-examine the patches and accept the bits that clean
> up init/finalization and leave out any ABI-related changes.


This is precisely what I suggested two years ago.

http://bugs.python.org/issue15390#msg170249

I am not against ABI-related changes in principle, but I think these
changes should be carefully considered on a case by case basis and not
applied wholesale.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ethan at stoneleaf.us  Mon Jul 14 20:24:55 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 14 Jul 2014 11:24:55 -0700
Subject: [Python-Dev] Python Job Board
In-Reply-To: 
References: <53C40256.3020101@stoneleaf.us>
 
 
Message-ID: <53C42077.9070408@stoneleaf.us>

On 07/14/2014 10:43 AM, Skip Montanaro wrote:
> On Mon, Jul 14, 2014 at 11:59 AM, Brett Cannon wrote:
>>
>> This is the wrong place to ask about this. It falls under the purview of the
>> web site who you can email at webmaster@ or submit an issue at
>> https://github.com/python/pythondotorg . But I know from PSF status reports
>> that it's being actively rewritten and fixed to make it manageable for more
>> than one person to run easily.
>
> Agree with that. I originally skipped this post because I'm pretty
> sure MAL who is heavily involved with the rewrite effort) still hangs
> out here. I will modify Brett's admonition a bit though. A better
> place to comment about the job board (and perhaps volunteer to help
> with the current effort) is jobs at python.org.

Mostly just hoping to raise awareness in case anybody here is able/willing to pitch in.

--
~Ethan~

From tjreedy at udel.edu  Mon Jul 14 22:42:25 2014
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 14 Jul 2014 16:42:25 -0400
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
Message-ID: 

On 7/14/2014 9:57 AM, Tim Tisdall wrote:

2 questions not answered yet.

> Also, is there a method to test changes against all the different *nix
> variations?

We have a set of buildbots.
https://www.python.org/dev/buildbot/

> Is Bluez the standard across the different *nix variations?

No idea.

-- 
Terry Jan Reedy


From hasan.diwan at gmail.com  Mon Jul 14 22:46:06 2014
From: hasan.diwan at gmail.com (Hasan Diwan)
Date: Mon, 14 Jul 2014 13:46:06 -0700
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
 
Message-ID: 

Tim,
Are  you aware of https://code.google.com/p/pybluez/ ? -- H


On 14 July 2014 13:42, Terry Reedy  wrote:

> On 7/14/2014 9:57 AM, Tim Tisdall wrote:
>
> 2 questions not answered yet.
>
>
>  Also, is there a method to test changes against all the different *nix
>> variations?
>>
>
> We have a set of buildbots.
> https://www.python.org/dev/buildbot/
>
>
>  Is Bluez the standard across the different *nix variations?
>>
>
> No idea.
>
> --
> Terry Jan Reedy
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> hasan.diwan%40gmail.com
>



-- 
Sent from my mobile device
Envoy? de mon portable
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From rdmurray at bitdance.com  Mon Jul 14 23:30:56 2014
From: rdmurray at bitdance.com (R. David Murray)
Date: Mon, 14 Jul 2014 17:30:56 -0400
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
 
Message-ID: <20140714213056.7A3C1250DFD@webabinitio.net>

On Mon, 14 Jul 2014 16:42:25 -0400, Terry Reedy  wrote:
> On 7/14/2014 9:57 AM, Tim Tisdall wrote:
> 
> 2 questions not answered yet.
> 
> > Also, is there a method to test changes against all the different *nix
> > variations?
> 
> We have a set of buildbots.
> https://www.python.org/dev/buildbot/
> 
> > Is Bluez the standard across the different *nix variations?
> 
> No idea.

It would be really nice to answer that and the related testing questions.
The socket module has bluetooth support, but there are no tests.
An effort to write some was started at the Bloomberg sprint last month,
but nothing has been posted to the issue yet:

    http://bugs.python.org/issue7687

Is Bluetooth 4.0 something different from what the socket module already
has?

--David

From tisdall at gmail.com  Tue Jul 15 01:08:43 2014
From: tisdall at gmail.com (Tim Tisdall)
Date: Mon, 14 Jul 2014 19:08:43 -0400
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
 
 
Message-ID: 

Quite aware.  I'm pretty sure it has no 4.x LE capabilities.

Last I checked it seemed like a dead project, but there seems to be some
activity there now.
On Jul 14, 2014 4:47 PM, "Hasan Diwan"  wrote:

> Tim,
> Are  you aware of https://code.google.com/p/pybluez/ ? -- H
>
>
> On 14 July 2014 13:42, Terry Reedy  wrote:
>
>> On 7/14/2014 9:57 AM, Tim Tisdall wrote:
>>
>> 2 questions not answered yet.
>>
>>
>>  Also, is there a method to test changes against all the different *nix
>>> variations?
>>>
>>
>> We have a set of buildbots.
>> https://www.python.org/dev/buildbot/
>>
>>
>>  Is Bluez the standard across the different *nix variations?
>>>
>>
>> No idea.
>>
>> --
>> Terry Jan Reedy
>>
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
>> hasan.diwan%40gmail.com
>>
>
>
>
> --
> Sent from my mobile device
> Envoy? de mon portable
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/tisdall%40gmail.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From tisdall at gmail.com  Tue Jul 15 01:13:32 2014
From: tisdall at gmail.com (Tim Tisdall)
Date: Mon, 14 Jul 2014 19:13:32 -0400
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: <20140714213056.7A3C1250DFD@webabinitio.net>
References: 
 
 <20140714213056.7A3C1250DFD@webabinitio.net>
Message-ID: 

The major change is to the Bluetooth address struct.  It now has an added
value for the distinction between "public" and "random" 4.x addresses.
Also some added constants to open LE connections.
On Jul 14, 2014 5:32 PM, "R. David Murray"  wrote:

> On Mon, 14 Jul 2014 16:42:25 -0400, Terry Reedy  wrote:
> > On 7/14/2014 9:57 AM, Tim Tisdall wrote:
> >
> > 2 questions not answered yet.
> >
> > > Also, is there a method to test changes against all the different *nix
> > > variations?
> >
> > We have a set of buildbots.
> > https://www.python.org/dev/buildbot/
> >
> > > Is Bluez the standard across the different *nix variations?
> >
> > No idea.
>
> It would be really nice to answer that and the related testing questions.
> The socket module has bluetooth support, but there are no tests.
> An effort to write some was started at the Bloomberg sprint last month,
> but nothing has been posted to the issue yet:
>
>     http://bugs.python.org/issue7687
>
> Is Bluetooth 4.0 something different from what the socket module already
> has?
>
> --David
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/tisdall%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From wes.turner at gmail.com  Tue Jul 15 03:01:19 2014
From: wes.turner at gmail.com (Wes Turner)
Date: Mon, 14 Jul 2014 20:01:19 -0500
Subject: [Python-Dev] Python Job Board
In-Reply-To: <53C42077.9070408@stoneleaf.us>
References: <53C40256.3020101@stoneleaf.us>
 
 
 <53C42077.9070408@stoneleaf.us>
Message-ID: 

>From http://www.reddit.com/r/Python/comments/17c69p/i_was_told_by_a_friend_that_learning_python_for/c84bswd
:

>* http://www.python.org/community/jobs/
>* https://jobs.github.com/positions?description=python
>* http://careers.joelonsoftware.com/jobs?searchTerm=python
>* http://www.linkedin.com/jsearch?keywords=python
>* http://www.indeed.com/q-Python-jobs.html
>* http://www.simplyhired.com/a/jobs/list/q-python
>* http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&FREE_TEXT=python
>* http://careers.stackoverflow.com/jobs/tag/python
>* http://www.pythonjobs.com/
>* http://www.djangojobs.org/

--
Wes Turner


On Mon, Jul 14, 2014 at 1:24 PM, Ethan Furman  wrote:
> On 07/14/2014 10:43 AM, Skip Montanaro wrote:
>
>> On Mon, Jul 14, 2014 at 11:59 AM, Brett Cannon wrote:
>>>
>>>
>>> This is the wrong place to ask about this. It falls under the purview of
>>> the
>>> web site who you can email at webmaster@ or submit an issue at
>>> https://github.com/python/pythondotorg . But I know from PSF status
>>> reports
>>> that it's being actively rewritten and fixed to make it manageable for
>>> more
>>> than one person to run easily.
>>
>>
>> Agree with that. I originally skipped this post because I'm pretty
>> sure MAL who is heavily involved with the rewrite effort) still hangs
>> out here. I will modify Brett's admonition a bit though. A better
>> place to comment about the job board (and perhaps volunteer to help
>> with the current effort) is jobs at python.org.
>
>
> Mostly just hoping to raise awareness in case anybody here is able/willing
> to pitch in.
>
> --
> ~Ethan~
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com

From benhoyt at gmail.com  Tue Jul 15 04:48:41 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Mon, 14 Jul 2014 22:48:41 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: <87a98cpjxf.fsf@gmail.com>
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
Message-ID: 

> Let's not multiply entities beyond necessity.
>
> There is well-defined *follow_symlinks* parameter
> https://docs.python.org/3/library/os.html#follow-symlinks
> e.g., os.access, os.chown, os.link, os.stat, os.utime and many other
> functions in os module support follow_symlinks parameter, see
> os.supports_follow_symlinks.

Huh, interesting. I didn't know os.stat() had a follow_symlinks
parameter -- when False, it's equivalent to lstat. If DirEntry has a
.stat(follow_symlinks=True) method, we don't actually need lstat().

> os.walk is an exception that uses *followlinks*. It might be because it
> is an old function e.g., newer os.fwalk uses follow_symlinks.

Yes, I'm sure that's correct. Today it'd be called follow_symlinks,
but obviously one can't change os.walk() anymore.

> Only *recursive* functions such as os.walk, os.fwalk do not follow
> symlinks by default, to avoid symlink loops. [...]
>
> follow_symlinks=True as default for DirEntry.is_dir method allows to
> avoid easy-to-introduce bugs while replacing old
> os.listdir/os.path.isdir code or writing a new code using the same
> mental model.

I think these are good points, especially that of porting existing
listdir()/os.path.isdir() code and avoiding bugs. As I mentioned, I
was really on the fence about the link-following thing, but if it's a
tiny bit harder to implement but it avoids bugs (and I already had a
bug with this when implementing os.walk), that's a worthwhile
trade-off.

In light of that, I propose I update the PEP to basically follow
Victor's model of is_X() and stat() following symlinks by default, and
allowing you to specify follow_symlinks=False if you want something
other than that.

Victor had one other question:

> What happens to name and full_name with followlinks=True?
> Do they contain the name in the directory (name of the symlink)
> or name of the linked file?

I would say they should contain the name and full path of the entry --
the symlink, NOT the linked file. They kind of have to, right,
otherwise they'd have to be method calls that potentially call the
system.

In any case, here's the modified proposal:

scandir(path='.') -> generator of DirEntry objects, which have:

* name: name as per listdir()
* full_name: full path name (not necessarily absolute), equivalent of
os.path.join(path, entry.name)
* is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name),
but free in most cases; cached per entry
* is_file(follow_symlinks=True): like os.path.isfile(entry.full_name),
but free in most cases; cached per entry
* is_symlink(): like os.path.islink(), but free in most cases; cached per entry
* stat(follow_symlinks=True): like os.stat(entry.full_name,
follow_symlinks=follow_symlinks); cached per entry

The above may not be quite perfect, but it's good, and I think there's
been enough bike-shedding on the API. :-)

So please speak now or forever hold your peace. :-) I intend to update
the PEP to reflect this and make a few other clarifications in the
next few days.

-Ben

From ethan at stoneleaf.us  Tue Jul 15 04:57:30 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 14 Jul 2014 19:57:30 -0700
Subject: [Python-Dev] Python Job Board
In-Reply-To: 
References: <53C40256.3020101@stoneleaf.us>
 
 
 <53C42077.9070408@stoneleaf.us>
 
Message-ID: <53C4989A.7040203@stoneleaf.us>

On 07/14/2014 06:01 PM, Wes Turner wrote:
>  From http://www.reddit.com/r/Python/comments/17c69p/i_was_told_by_a_friend_that_learning_python_for/c84bswd
> :
>
>> * http://www.python.org/community/jobs/
>> * https://jobs.github.com/positions?description=python
>> * http://careers.joelonsoftware.com/jobs?searchTerm=python
>> * http://www.linkedin.com/jsearch?keywords=python
>> * http://www.indeed.com/q-Python-jobs.html
>> * http://www.simplyhired.com/a/jobs/list/q-python
>> * http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&FREE_TEXT=python
>> * http://careers.stackoverflow.com/jobs/tag/python
>> * http://www.pythonjobs.com/
>> * http://www.djangojobs.org/

Nice, thanks!

--
~Ethan~

From ethan at stoneleaf.us  Tue Jul 15 05:00:51 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 14 Jul 2014 20:00:51 -0700
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
 
Message-ID: <53C49963.30509@stoneleaf.us>

On 07/14/2014 07:48 PM, Ben Hoyt wrote:
>
> In any case, here's the modified proposal:
>
> scandir(path='.') -> generator of DirEntry objects, which have:
>
> * name: name as per listdir()
> * full_name: full path name (not necessarily absolute), equivalent of
> os.path.join(path, entry.name)
> * is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name),
> but free in most cases; cached per entry
> * is_file(follow_symlinks=True): like os.path.isfile(entry.full_name),
> but free in most cases; cached per entry
> * is_symlink(): like os.path.islink(), but free in most cases; cached per entry
> * stat(follow_symlinks=True): like os.stat(entry.full_name,
> follow_symlinks=follow_symlinks); cached per entry
>
> The above may not be quite perfect, but it's good, and I think there's
> been enough bike-shedding on the API. :-)

Looks doable.  Just make sure the cached entries reflect the 'follow_symlinks' setting -- so a symlink could end up with 
both an lstat cached entry and a stat cached entry.

--
~Ethan~

From victor.stinner at gmail.com  Tue Jul 15 08:25:52 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 15 Jul 2014 08:25:52 +0200
Subject: [Python-Dev]  Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
 
Message-ID: 

Le mardi 15 juillet 2014, Ben Hoyt  a ?crit :
>
>
> Victor had one other question:
>
> > What happens to name and full_name with followlinks=True?
> > Do they contain the name in the directory (name of the symlink)
> > or name of the linked file?
>
> I would say they should contain the name and full path of the entry --
> the symlink, NOT the linked file. They kind of have to, right,
> otherwise they'd have to be method calls that potentially call the
> system.
>

Sorry, I don't remember who but someone proposed to add the follow_symlinks
parameter in scandir()  directly. If the parameter is added to methods,
there is no such issue.

I like the compromise of adding an optional follow_symlinks to is_xxx() and
stat() method. No need for .lstat().

Again: remove any garantee about the cache in the definitions of methods,
instead copy the doc from os.path and os. Add a global remark saying that
most methods don't need any syscall in general, except for symlinks (with
follow_symlinks=True).

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ncoghlan at gmail.com  Tue Jul 15 13:09:12 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Jul 2014 06:09:12 -0500
Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues
In-Reply-To: 
References: 
 
 
 <53C3F575.9010602@v.loewis.de>
 
Message-ID: 

On 14 Jul 2014 11:41, "Brett Cannon"  wrote:
>
>
> I agree for PEP  3121 which is the initialization/finalization work. The
stable ABi is not necessary. So maybe we should re-examine the patches and
accept the bits that clean up init/finalization and leave out any
ABi-related changes.

Martin's right about improving the subinterpreter support - every type
declaration we move from a static struct to the dynamic type creation API
is one that isn't shared between subinterpreters any more.

That argument is potentially valid even for *builtin* modules and types,
not just those in extension modules.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ncoghlan at gmail.com  Tue Jul 15 13:24:14 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Jul 2014 06:24:14 -0500
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
 
Message-ID: 

On 14 Jul 2014 22:50, "Ben Hoyt"  wrote:
>
> In light of that, I propose I update the PEP to basically follow
> Victor's model of is_X() and stat() following symlinks by default, and
> allowing you to specify follow_symlinks=False if you want something
> other than that.
>
> Victor had one other question:
>
> > What happens to name and full_name with followlinks=True?
> > Do they contain the name in the directory (name of the symlink)
> > or name of the linked file?
>
> I would say they should contain the name and full path of the entry --
> the symlink, NOT the linked file. They kind of have to, right,
> otherwise they'd have to be method calls that potentially call the
> system.

It would be worth explicitly pointing out "os.readlink(entry.full_name)" in
the docs as the way to get the target of a symlink entry.

Alternatively, it may be worth including a readlink() method directly on
the entry objects. (That can easily be added later though, so no need for
it in the initial proposal).

>
> In any case, here's the modified proposal:
>
> scandir(path='.') -> generator of DirEntry objects, which have:
>
> * name: name as per listdir()
> * full_name: full path name (not necessarily absolute), equivalent of
> os.path.join(path, entry.name)
> * is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name),
> but free in most cases; cached per entry
> * is_file(follow_symlinks=True): like os.path.isfile(entry.full_name),
> but free in most cases; cached per entry
> * is_symlink(): like os.path.islink(), but free in most cases; cached per
entry
> * stat(follow_symlinks=True): like os.stat(entry.full_name,
> follow_symlinks=follow_symlinks); cached per entry
>
> The above may not be quite perfect, but it's good, and I think there's
> been enough bike-shedding on the API. :-)

+1, sounds good to me (and I like having the caching guarantees listed -
helps make it clear how DirEntry differs from pathlib.Path)

Cheers,
Nick.

>
> So please speak now or forever hold your peace. :-) I intend to update
> the PEP to reflect this and make a few other clarifications in the
> next few days.
>
> -Ben
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From benhoyt at gmail.com  Tue Jul 15 14:01:16 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 15 Jul 2014 08:01:16 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: <53C49963.30509@stoneleaf.us>
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
 
 <53C49963.30509@stoneleaf.us>
Message-ID: 

> Looks doable.  Just make sure the cached entries reflect the
> 'follow_symlinks' setting -- so a symlink could end up with both an lstat
> cached entry and a stat cached entry.

Yes, good point -- basically the functions will use the _stat cache if
follow_symlinks=True, otherwise the _lstat cache. If the entry is not
a symlink (the usual case), they'll be the same value.

-Ben

From benhoyt at gmail.com  Tue Jul 15 14:05:55 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 15 Jul 2014 08:05:55 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
 
 
Message-ID: 

> Sorry, I don't remember who but someone proposed to add the follow_symlinks
> parameter in scandir()  directly. If the parameter is added to methods,
> there is no such issue.

Yeah, I think having the DirEntry methods do different things
depending on how scandir() was called is a really bad idea. It seems
you're agreeing with this?

> Again: remove any garantee about the cache in the definitions of methods,
> instead copy the doc from os.path and os. Add a global remark saying that
> most methods don't need any syscall in general, except for symlinks (with
> follow_symlinks=True).

I'm not sure I follow this -- surely it *has* to be documented that
the values of DirEntry.is_X() and DirEntry.stat() are cached per
entry, in contrast to os.path.isX()/os.stat()?

I don't mind a global remark about not needing syscalls, but I do
think it makes sense to make it explicit -- that is_X() almost never
need syscalls, whereas stat() does only on POSIX but is free on
Windows (except for symlinks).

-Ben

From benhoyt at gmail.com  Tue Jul 15 14:19:35 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 15 Jul 2014 08:19:35 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: <87zjgbni64.fsf@gmail.com>
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
 
 <87zjgbni64.fsf@gmail.com>
Message-ID: 

> I'd *keep DirEntry.lstat() method* regardless of existence of
> .stat(*, follow_symlinks=True) method (despite the slight violation of
> DRY principle) for readability. `dir_entry.lstat().st_mode` is more
> consice than `dir_entry.stat(follow_symlinks=False).st_mode` and the
> meaning of lstat is well-established -- get (symbolic link) status [2].

The meaning of lstat() is well-established, so I don't mind this. But
I don't think it's necessary, either. My thought would be that in new
code/functions we should kind of prescribe best-practices rather than
leave the options open. Yes, it's a few more characters, but
"follow_symlinks=True" is allow much clear than "l" to describe this
behaviour, especially for non-Linux hackers.

> I suggest *renaming .full_name -> .path* due to reasons outlined in [1].
>
> [1]: https://mail.python.org/pipermail/python-dev/2014-July/135441.html

Hmmm, perhaps. You suggest .full_name implies it's the absolute path,
which isn't true. I don't mind .path, but it kind of sounds like "the
Path object associated with this entry". I think "full_name" is fine
-- it's not "abs_name".

> follow_symlinks (if added) should be *keyword-only parameter* because
> `dir_entry.is_dir(False)` is unreadable (it is not clear at a glance
> what `False` means in this case).

Agreed follow_symlinks should be a keyword-only parameter (as it is in
os.stat() in Python 3).

> Exceptions are part of the public API. pathlib is inconsitent with
> os.path here e.g., os.path.isdir() ignores all OS errors raised by
> the stat() call but the corresponding pathlib call ignores only broken
> symlinks (non-existent entries).
>
> The cherry-picking of which stat errors to silence (implicitly) seems
> worse than either silencing the errors (like os.path.isdir does) or
> allowing them to propagate.

Hmmm, you're right there's a subtle difference here. I think the
os.path.isdir() behaviour could mask real errors, and the pathlib
behaviour is more correct. pathlib's behaviour is not implicit though
-- it's clearly documented in the docs:
https://docs.python.org/3/library/pathlib.html#pathlib.Path.is_dir

> Returning False instead of raising OSError in is_dir() method simplifies
> the usage greatly without (much) negative consequences. It is a *rare*
> case when silencing errors could be more practical.

I think is_X() *should* fail if there are permissions errors or other
fatal errors. Whether or not they should fail if the file doesn't
exist (unlikely to happen anyway) or on a broken symlink is a
different question, but there's a good prececent with the existing
os/pathlib functions there.

-Ben

From p.f.moore at gmail.com  Tue Jul 15 14:31:16 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 15 Jul 2014 13:31:16 +0100
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
 
 <87zjgbni64.fsf@gmail.com>
 
Message-ID: 

On 15 July 2014 13:19, Ben Hoyt  wrote:
> Hmmm, perhaps. You suggest .full_name implies it's the absolute path,
> which isn't true. I don't mind .path, but it kind of sounds like "the
> Path object associated with this entry". I think "full_name" is fine
> -- it's not "abs_name".

Interesting. I hadn't really thought about it, but I might have
assumed full_name was absolute. However, now I see that it's "only as
absolute as the directory argument to scandir is". Having said that, I
don't think that full_name *implies* that, just that it's a possible
mistake people could make. I agree that "path" could be seen as
implying a Path object.

My preference would be to retain the name full_name, but just make it
explicit in the documentation that it is based on the directory name
argument.

Paul

From ethan at stoneleaf.us  Tue Jul 15 18:41:40 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 15 Jul 2014 09:41:40 -0700
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 <87a98cpjxf.fsf@gmail.com>
 
 
Message-ID: <53C559C4.20708@stoneleaf.us>

On 07/14/2014 11:25 PM, Victor Stinner wrote:
>
> Again: remove any garantee about the cache in the definitions of methods,
> instead copy the doc from os.path and os. Add a global remark saying that
>  most methods don't need any syscall in general, except for symlinks (with
>  follow_symlinks=True).

I don't understand what you're saying here.  The fact that DirEnrry.is_xxx will use cached values *must* be documented, 
or our users will waste huge amounts of time trying to figure out why an unknowingly cached value is no longer matching 
the current status.

~Ethan~

From rowen at uw.edu  Wed Jul 16 01:48:48 2014
From: rowen at uw.edu (Russell E. Owen)
Date: Tue, 15 Jul 2014 16:48:48 -0700
Subject: [Python-Dev] Another case for frozendict
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
Message-ID: 

In article 
,
 Chris Angelico  wrote:

> On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs  wrote:
> > I can achieve what I need by constructing a set on the ???items??? of the dict.
> >
> >>>> set(tuple(doc.items()) for doc in res)
> >
> > {(('n', 1), ('err', None), ('ok', 1.0))}
> 
> This is flawed; the tuple-of-tuples depends on iteration order, which
> may vary. It should be a frozenset of those tuples, not a tuple. Which
> strengthens your case; it's that easy to get it wrong in the absence
> of an actual frozendict.

I would love to see frozendict in python.

I find myself using dicts for translation tables, usually tables that 
should not be modified. Documentation usually suffices to get that idea 
across, but it's not ideal.

frozendict would also be handy as a default values for function 
arguments. In that case documentation isn't enough and one has to resort 
to using a default value of None and then changing it in the function 
body.

I like frozendict because I feel it is expressive and adds some safety. 

-- Russell


From python at mrabarnett.plus.com  Wed Jul 16 04:27:23 2014
From: python at mrabarnett.plus.com (MRAB)
Date: Wed, 16 Jul 2014 03:27:23 +0100
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: 
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
 
Message-ID: <53C5E30B.6060509@mrabarnett.plus.com>

On 2014-07-16 00:48, Russell E. Owen wrote:
> In article
> ,
>   Chris Angelico  wrote:
>
>> On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs  wrote:
>> > I can achieve what I need by constructing a set on the ???items??? of the dict.
>> >
>> >>>> set(tuple(doc.items()) for doc in res)
>> >
>> > {(('n', 1), ('err', None), ('ok', 1.0))}
>>
>> This is flawed; the tuple-of-tuples depends on iteration order, which
>> may vary. It should be a frozenset of those tuples, not a tuple. Which
>> strengthens your case; it's that easy to get it wrong in the absence
>> of an actual frozendict.
>
> I would love to see frozendict in python.
>
> I find myself using dicts for translation tables, usually tables that
> should not be modified. Documentation usually suffices to get that idea
> across, but it's not ideal.
>
> frozendict would also be handy as a default values for function
> arguments. In that case documentation isn't enough and one has to resort
> to using a default value of None and then changing it in the function
> body.
>
> I like frozendict because I feel it is expressive and adds some safety.
>
Here's another use-case.

Using the 're' module:

 >>> import re
 >>> # Make a regex.
... p = re.compile(r'(?P\w+)\s+(?P\w+)')
 >>>
 >>> # What are the named groups?
... p.groupindex
{'first': 1, 'second': 2}
 >>>
 >>> # Perform a match.
... m = p.match('FIRST SECOND')
 >>> m.groupdict()
{'first': 'FIRST', 'second': 'SECOND'}
 >>>
 >>> # Try modifying the pattern object.
... p.groupindex['JUNK'] = 'foobar'
 >>>
 >>> # What are the named groups now?
... p.groupindex
{'first': 1, 'second': 2, 'JUNK': 'foobar'}
 >>>
 >>> # And the match object?
... m.groupdict()
Traceback (most recent call last):
   File "", line 2, in 
IndexError: no such group

It can't find a named group called 'JUNK'.

And with a bit more tinkering it's possible to crash Python. (I'll
leave that as an exercise for the reader! :-))

The 'regex' module, on the other hand, rebuilds the dict each time:

 >>> import regex
 >>> # Make a regex.
... p = regex.compile(r'(?P\w+)\s+(?P\w+)')
 >>>
 >>> # What are the named groups?
... p.groupindex
{'second': 2, 'first': 1}
 >>>
 >>> # Perform a match.
... m = p.match('FIRST SECOND')
 >>> m.groupdict()
{'second': 'SECOND', 'first': 'FIRST'}
 >>>
 >>> # Try modifying the regex.
... p.groupindex['JUNK'] = 'foobar'
 >>>
 >>> # What are the named groups now?
... p.groupindex
{'second': 2, 'first': 1}
 >>>
 >>> # And the match object?
... m.groupdict()
{'second': 'SECOND', 'first': 'FIRST'}

Using a frozendict instead would be a nicer solution.


From cs at zip.com.au  Wed Jul 16 05:40:00 2014
From: cs at zip.com.au (Cameron Simpson)
Date: Wed, 16 Jul 2014 13:40:00 +1000
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
Message-ID: <20140716034000.GA41444@cskk.homeip.net>

I was going to stay out of this one...

On 14Jul2014 10:25, Victor Stinner  wrote:
>2014-07-14 4:17 GMT+02:00 Nick Coghlan :
>> Or the ever popular symlink to "." (or a directory higher in the tree).
>
>"." and ".." are explicitly ignored by os.listdir() an os.scandir().
>
>> I think os.walk() is a good source of inspiration here: call the flag
>> "followlink" and default it to False.

I also think followslinks should be spelt like os.walk, and also default to 
False.

>IMO the specific function os.walk() is not a good example. It includes
>symlinks to directories in the dirs list and then it does not follow
>symlink,

I agree that is a bad mix.

>it is a recursive function and has a followlinks optional
>parameter (default: False).

Which I think is desirable.

>Moreover, in 92% of cases, functions using os.listdir() and
>os.path.isdir() *follow* symlinks:
>https://mail.python.org/pipermail/python-dev/2014-July/135435.html

Sigh.

This is a historic artifact, a convenience, and a side effect of bring symlinks 
into UNIX in the first place.

The objective was that symlinks should largely be transparent to users for 
naive operation. So the UNIX calls open/cd/listdir all follow symlinks so that 
things work transparently and a million C programs do not break. 

However, so do chmod/chgrp/chown, for the same reasons and with generally less 
desirable effects.

Conversely, the find command, for example, does not follow symlinks and this is 
generally a good thing. "ls" is the same. Like os.walk, they are for inspecting 
stuff, and shouldn't indirect unless asked.

I think following symlinks, especially for something like os.walk and 
os.scandir, should default to False. I DO NOT want to quietly wander to remote 
parts of the file space because someone has stuck a symlink somewhere 
unfortunate, lurking like a little bomb (or perhaps trapdoor, waiting to suck 
me down into an unexpected dark place).

It is also slower to follow symlinks by default.

I am also against flag parameters that default to True, on the whole; they are 
a failure of ergonomic design. Leaving off a flag should usually be like 
setting it to False. A missing flag is an "off" flag.

For these reasons (and others I have not yet thought through:-) I am voting for 
a:

   followlinks=False

optional parameter.

If you want to follow links, it is hardly difficult.

Cheers,
Cameron Simpson 

Our job is to make the questions so painful that the only way to make the
pain go away is by thinking.    - Fred Friendly

From rdmurray at bitdance.com  Wed Jul 16 15:37:55 2014
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 16 Jul 2014 09:37:55 -0400
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <53C5E30B.6060509@mrabarnett.plus.com>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
 
 <53C5E30B.6060509@mrabarnett.plus.com>
Message-ID: <20140716133755.C0A61250DEF@webabinitio.net>

On Wed, 16 Jul 2014 03:27:23 +0100, MRAB  wrote:
> Here's another use-case.
> 
> Using the 're' module:
> 
>  >>> import re
>  >>> # Make a regex.
> ... p = re.compile(r'(?P\w+)\s+(?P\w+)')
>  >>>
>  >>> # What are the named groups?
> ... p.groupindex
> {'first': 1, 'second': 2}
>  >>>
>  >>> # Perform a match.
> ... m = p.match('FIRST SECOND')
>  >>> m.groupdict()
> {'first': 'FIRST', 'second': 'SECOND'}
>  >>>
>  >>> # Try modifying the pattern object.
> ... p.groupindex['JUNK'] = 'foobar'
>  >>>
>  >>> # What are the named groups now?
> ... p.groupindex
> {'first': 1, 'second': 2, 'JUNK': 'foobar'}
>  >>>
>  >>> # And the match object?
> ... m.groupdict()
> Traceback (most recent call last):
>    File "", line 2, in 
> IndexError: no such group
> 
> It can't find a named group called 'JUNK'.

IMO, preventing someone from shooting themselves in the foot by modifying
something they shouldn't modify according to the API is not a Python
use case ("consenting adults").

> And with a bit more tinkering it's possible to crash Python. (I'll
> leave that as an exercise for the reader! :-))

Preventing a Python program from being able to crash the interpreter,
that's a use case :)

--David

From rdmurray at bitdance.com  Wed Jul 16 15:47:59 2014
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 16 Jul 2014 09:47:59 -0400
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <53C5E30B.6060509@mrabarnett.plus.com>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
 
 <53C5E30B.6060509@mrabarnett.plus.com>
Message-ID: <20140716134802.9ED8DB14086@webabinitio.net>

On Wed, 16 Jul 2014 03:27:23 +0100, MRAB  wrote:
>  >>> # Try modifying the pattern object.
> ... p.groupindex['JUNK'] = 'foobar'
>  >>>
>  >>> # What are the named groups now?
> ... p.groupindex
> {'first': 1, 'second': 2, 'JUNK': 'foobar'}
>  >>>
>  >>> # And the match object?
> ... m.groupdict()
> Traceback (most recent call last):
>    File "", line 2, in 
> IndexError: no such group
> 
> It can't find a named group called 'JUNK'.

After I hit send on my previous message, I thought more about your
example.  One of the issues here is that modifying the dict breaks an
invariant of the API.  I have a similar situation in the email module,
and I used the same solution you did in regex: always return a new dict.
It would be nice to be able to return a frozendict instead of having the
overhead of building a new dict on each call.  That by itself might not
be enough reason.  But, if the user wants to use the data in modified form
elsewhere, they would then have to construct a new regular dict out of it,
making the decision to vary the data from what matches the state of the
object it came from an explicit one.  That seems to fit the Python zen
("explicit is better than implicit").

So I'm changing my mind, and do consider this a valid use case, even
absent the crash.

--David

From rdmurray at bitdance.com  Wed Jul 16 16:24:45 2014
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 16 Jul 2014 10:24:45 -0400
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <20140716140429.GA14503@k2>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
 
 <53C5E30B.6060509@mrabarnett.plus.com>
 <20140716134802.9ED8DB14086@webabinitio.net> <20140716140429.GA14503@k2>
Message-ID: <20140716142445.7F4BB250D0C@webabinitio.net>

On Wed, 16 Jul 2014 14:04:29 -0000, dw+python-dev at hmmz.org wrote:
> On Wed, Jul 16, 2014 at 09:47:59AM -0400, R. David Murray wrote:
> 
> > It would be nice to be able to return a frozendict instead of having the
> > overhead of building a new dict on each call.
> 
> There already is an in-between available both to Python and C:
> PyDictProxy_New() / types.MappingProxyType. It's a one line change in
> each case to return a temporary intermediary, using something like (C):
>     Py_INCREF(self->dict)
>     return self->dict;
> 
> To
>     return PyDictProxy_New(self->dict);
> 
> Or Python:
>     return self.dct
> 
> To
>     return types.MappingProxyType(self.dct)
> 
> Which is cheaper than a copy, and avoids having to audit every use of
> self->dict to ensure the semantics required for a "frozendict" are
> respected, i.e. no mutation occurs after the dict becomes visible to the
> user, and potentially has __hash__ called.
> 
> 
> > That by itself might not be enough reason.  But, if the user wants to
> > use the data in modified form elsewhere, they would then have to
> > construct a new regular dict out of it, making the decision to vary
> > the data from what matches the state of the object it came from an
> > explicit one.  That seems to fit the Python zen ("explicit is better
> > than implicit").
> > 
> > So I'm changing my mind, and do consider this a valid use case, even
> > absent the crash.
> 
> Avoiding crashes seems a better use for a read-only proxy, rather than a
> hashable immutable type.

Good point.  MappingProxyType wasn't yet exposed when I wrote that email
code.

--David

From ericsnowcurrently at gmail.com  Wed Jul 16 16:27:51 2014
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 16 Jul 2014 08:27:51 -0600
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <20140716134802.9ED8DB14086@webabinitio.net>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
 
 <53C5E30B.6060509@mrabarnett.plus.com>
 <20140716134802.9ED8DB14086@webabinitio.net>
Message-ID: 

On Wed, Jul 16, 2014 at 7:47 AM, R. David Murray  wrote:
> After I hit send on my previous message, I thought more about your
> example.  One of the issues here is that modifying the dict breaks an
> invariant of the API.  I have a similar situation in the email module,
> and I used the same solution you did in regex: always return a new dict.
> It would be nice to be able to return a frozendict instead of having the
> overhead of building a new dict on each call.  That by itself might not
> be enough reason.  But, if the user wants to use the data in modified form
> elsewhere, they would then have to construct a new regular dict out of it,
> making the decision to vary the data from what matches the state of the
> object it came from an explicit one.  That seems to fit the Python zen
> ("explicit is better than implicit").
>
> So I'm changing my mind, and do consider this a valid use case, even
> absent the crash.

+1

A simple implementation is pretty straight-forward:

class FrozenDict(Mapping):
    def __init__(self, *args, **kwargs):
        self._map = dict(*args, **kwargs)
        self._hash = ...
    def __hash__(self):
        return self._hash
    def __len__(self):
        return len(self._map)
    def __iter__(self):
        yield from self._map
    def __getitem__(self, key):
        return self._map[key]

This is actually something I've used before on a number of occasions.
Having it in the stdlib would be nice (though that alone is not
sufficient for inclusion :)).  If there is a valid use case for a
frozen dict type in other stdlib modules, I'd consider that a pretty
good justification for adding it.

Incidentally, collections.abc.Mapping is the only one of the 6
container ABCs that does not have a concrete implementation (not
counting types.MappingProxyType which is only a proxy).

-eric

From andreas.r.maier at gmx.de  Wed Jul 16 13:39:55 2014
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Wed, 16 Jul 2014 13:39:55 +0200
Subject: [Python-Dev] == on object tests identity in 3.x - list delegation
	to members?
In-Reply-To: <20140713162249.GP5705@ando>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
 <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de>
 <20140713162249.GP5705@ando>
Message-ID: <53C6648B.5000404@gmx.de>

Am 13.07.2014 18:23, schrieb Steven D'Aprano:
> On Sun, Jul 13, 2014 at 05:13:20PM +0200, Andreas Maier wrote:
>
>> Second, if not by delegation to equality of its elements, how would the
>> equality of sequences defined otherwise?
>
> Wow. I'm impressed by the amount of detailed effort you've put into
> investigating this. (Too much detail to absorb, I'm afraid.) But perhaps
> you might have just asked on the python-list at python.org mailing list, or
> here, where we would have told you the answer:
>
>      list __eq__ first checks element identity before going on
>      to check element equality.

I apologize for not asking. It seems I was looking at the trees 
(behaviors of specific cases) without seeing the wood (identity goes first).

> If you can read C, you might like to check the list source code:
>
> http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c

I can read (and write) C fluently, but (1) I don't have a build 
environment on my Windows system so I cannot debug it, and (2) I find it 
hard to judge from just looking at the C code which C function is 
invoked when the Python code enters the C code.
(Quoting Raymond H. from his blog: "Unless you know where to look, 
searching the source for an answer can be a time consuming intellectual 
investment.")

So thanks for clarifying this.

I guess I am arriving (slowly and still partly reluctantly, and I'm not 
alone with that feeling, it seems ...) at the bottom line of all this, 
which is that reflexivity is an important goal in Python, that 
self-written non-reflexive classes are not intended nor well supported, 
and that the non-reflexive NaN is considered an exception that cannot be 
expected to be treated consistently non-reflexive.

> This was discussed to death some time ago, both on python-dev and
> python-ideas. If you're interested, you can start here:
>
> https://mail.python.org/pipermail/python-list/2012-October/633992.html
>
> which is in the middle of one of the threads, but at least it gets you
> to the right time period.

I read a number of posts in that thread by now. Sorry for not reading it 
earlier, but the mailing list archive just does not lend itself to 
searching the past. Of course, one can google it ;-)

Andy

From andreas.r.maier at gmx.de  Wed Jul 16 13:40:03 2014
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Wed, 16 Jul 2014 13:40:03 +0200
Subject: [Python-Dev] == on object tests identity in 3.x - list delegation
	to members?
In-Reply-To: <87ion1owhk.fsf@gmail.com>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
 <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de>
 <20140713162249.GP5705@ando>
 
 
 <87ion1owhk.fsf@gmail.com>
Message-ID: <53C66493.5040904@gmx.de>

Am 13.07.2014 22:05, schrieb Akira Li:
> Nick Coghlan  writes:
> ...
>> definition of floats and the definition of container invariants like
>> "assert x in [x]")
>>
>> The current approach means that the lack of reflexivity of NaN's stays
>> confined to floats and similar types - it doesn't leak out and infect
>> the behaviour of the container types.
>>
>> What we've never figured out is a good place to *document* it. I
>> thought there was an open bug for that, but I can't find it right now.
>
> There was related issue "Tuple comparisons with NaNs are broken"
> http://bugs.python.org/issue21873
> but it was closed as "not a bug" despite the corresponding behavior is
> *not documented* anywhere.

I currently know about these two issues related to fixing the docs:

http://bugs.python.org/11945 - about NaN values in containers
http://bugs.python.org/12067 - comparisons

I am working on the latter, currently. The patch only targets the 
comparisons chapter in the Language Reference, there is another 
comparisons chapter in the Library Reference, and one in the Tutorial.

I will need to update the patch to issue 12067 as a result of this 
discussion.

Andy


From dw+python-dev at hmmz.org  Wed Jul 16 16:04:29 2014
From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org)
Date: Wed, 16 Jul 2014 14:04:29 +0000
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <20140716134802.9ED8DB14086@webabinitio.net>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
 
 <53C5E30B.6060509@mrabarnett.plus.com>
 <20140716134802.9ED8DB14086@webabinitio.net>
Message-ID: <20140716140429.GA14503@k2>

On Wed, Jul 16, 2014 at 09:47:59AM -0400, R. David Murray wrote:

> It would be nice to be able to return a frozendict instead of having the
> overhead of building a new dict on each call.

There already is an in-between available both to Python and C:
PyDictProxy_New() / types.MappingProxyType. It's a one line change in
each case to return a temporary intermediary, using something like (C):
    Py_INCREF(self->dict)
    return self->dict;

To
    return PyDictProxy_New(self->dict);

Or Python:
    return self.dct

To
    return types.MappingProxyType(self.dct)

Which is cheaper than a copy, and avoids having to audit every use of
self->dict to ensure the semantics required for a "frozendict" are
respected, i.e. no mutation occurs after the dict becomes visible to the
user, and potentially has __hash__ called.


> That by itself might not be enough reason.  But, if the user wants to
> use the data in modified form elsewhere, they would then have to
> construct a new regular dict out of it, making the decision to vary
> the data from what matches the state of the object it came from an
> explicit one.  That seems to fit the Python zen ("explicit is better
> than implicit").
> 
> So I'm changing my mind, and do consider this a valid use case, even
> absent the crash.

Avoiding crashes seems a better use for a read-only proxy, rather than a
hashable immutable type.


David

From andreas.r.maier at gmx.de  Wed Jul 16 17:24:16 2014
From: andreas.r.maier at gmx.de (Andreas Maier)
Date: Wed, 16 Jul 2014 17:24:16 +0200
Subject: [Python-Dev] == on object tests identity in 3.x - uploaded patch v9
In-Reply-To: <53C66493.5040904@gmx.de>
References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de>
 <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de>
 <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de>
 <20140713162249.GP5705@ando>
 
 
 <87ion1owhk.fsf@gmail.com> <53C66493.5040904@gmx.de>
Message-ID: <53C69920.3050808@gmx.de>

Am 16.07.2014 13:40, schrieb Andreas Maier:
> Am 13.07.2014 22:05, schrieb Akira Li:
>> Nick Coghlan  writes:
>> ...
>>
>> There was related issue "Tuple comparisons with NaNs are broken"
>> http://bugs.python.org/issue21873
>> but it was closed as "not a bug" despite the corresponding behavior is
>> *not documented* anywhere.
>
> I currently know about these two issues related to fixing the docs:
>
> http://bugs.python.org/11945 - about NaN values in containers
> http://bugs.python.org/12067 - comparisons
>
> I am working on the latter, currently. The patch only targets the
> comparisons chapter in the Language Reference, there is another
> comparisons chapter in the Library Reference, and one in the Tutorial.
>
> I will need to update the patch to issue 12067 as a result of this
> discussion.

I have uploaded v9 of the patch to issue 12067; it should address the 
recent discussion (plus Mark's review comment on the issue itself).

Please review.

Andy


From jeanpierreda at gmail.com  Wed Jul 16 19:10:07 2014
From: jeanpierreda at gmail.com (Devin Jeanpierre)
Date: Wed, 16 Jul 2014 10:10:07 -0700
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: <20140716133755.C0A61250DEF@webabinitio.net>
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
 
 <53C5E30B.6060509@mrabarnett.plus.com>
 <20140716133755.C0A61250DEF@webabinitio.net>
Message-ID: 

On Wed, Jul 16, 2014 at 6:37 AM, R. David Murray  wrote:
> IMO, preventing someone from shooting themselves in the foot by modifying
> something they shouldn't modify according to the API is not a Python
> use case ("consenting adults").

Then why have immutable objects at all? Why do you have to put tuples
and frozensets inside sets, instead of lists and sets? Compare with
Java, which really is "consenting adults" here -- you can add a
mutable object to a set, just don't mutate it, or you might not be
able to find it in the set again.

Several people seem to act as if the Pythonic way is to not allow for
any sort of immutable types at all. ISTM people are trying to
retroactively claim some standard of Pythonicity that never existed.
Python can and does protect you from shooting yourself in the foot by
making objects immutable. Or do you have another explanation for the
proliferation of immutable types, and the inability to add mutable
types to sets and dicts?

Using a frozendict to protect and enforce an invariant in the re
module is entirely reasonable. So is creating a new dict each time.
The intermediate -- reusing a mutable dict and failing in
incomprehensible ways if you mutate it, and potentially even crashing
due to memory safety issues -- is not Pythonic at all.

-- Devin

From rdmurray at bitdance.com  Wed Jul 16 19:17:11 2014
From: rdmurray at bitdance.com (R. David Murray)
Date: Wed, 16 Jul 2014 13:17:11 -0400
Subject: [Python-Dev] Another case for frozendict
In-Reply-To: 
References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com>
 
 
 <53C5E30B.6060509@mrabarnett.plus.com>
 <20140716133755.C0A61250DEF@webabinitio.net>
 
Message-ID: <20140716171712.1A9B4250DF6@webabinitio.net>

On Wed, 16 Jul 2014 10:10:07 -0700, Devin Jeanpierre  wrote:
> On Wed, Jul 16, 2014 at 6:37 AM, R. David Murray  wrote:
> > IMO, preventing someone from shooting themselves in the foot by modifying
> > something they shouldn't modify according to the API is not a Python
> > use case ("consenting adults").
> 
> Then why have immutable objects at all? Why do you have to put tuples
> and frozensets inside sets, instead of lists and sets? Compare with
> Java, which really is "consenting adults" here -- you can add a
> mutable object to a set, just don't mutate it, or you might not be
> able to find it in the set again.
> 
> Several people seem to act as if the Pythonic way is to not allow for
> any sort of immutable types at all. ISTM people are trying to
> retroactively claim some standard of Pythonicity that never existed.
> Python can and does protect you from shooting yourself in the foot by
> making objects immutable. Or do you have another explanation for the
> proliferation of immutable types, and the inability to add mutable
> types to sets and dicts?
> 
> Using a frozendict to protect and enforce an invariant in the re
> module is entirely reasonable. So is creating a new dict each time.
> The intermediate -- reusing a mutable dict and failing in
> incomprehensible ways if you mutate it, and potentially even crashing
> due to memory safety issues -- is not Pythonic at all.

You'll note I ended up agreeing with you there: when mutation breaks an
invariant of the object it came from, that's an issue.  Which would be
the case if you could use mutable objects as keys.

--David

From kmike84 at gmail.com  Wed Jul 16 23:44:23 2014
From: kmike84 at gmail.com (Mikhail Korobov)
Date: Thu, 17 Jul 2014 03:44:23 +0600
Subject: [Python-Dev] cStringIO vs io.BytesIO
Message-ID: 

Hi,

cStringIO was removed from Python 3. It seems the suggested replacement is
io.BytesIO. But there is a problem: cStringIO.StringIO(b'data') didn't copy
the data while io.BytesIO(b'data') makes a copy (even if the data is not
modified later).

This means io.BytesIO is not suited well to cases when you want to get a
readonly file-like interface for existing byte strings. Isn't it one of the
main io.BytesIO use cases? Wrapping bytes in cStringIO.StringIO used to be
almost free, but this is not true for io.BytesIO.

So making code 3.x compatible by ditching cStringIO can cause a serious
performance/memory  regressions. One can change the code to build the data
using BytesIO (without creating bytes objects in the first place), but that
is not always possible or convenient.

I believe this problem affects tornado (
https://github.com/tornadoweb/tornado/issues/1110), Scrapy (this is how I
became aware of this issue), NLTK (anecdotical evidence - I tried to port
some hairy NLTK module to io.BytesIO, it became many times slower) and
maybe pretty much every IO-related project ported to Python 3.x (django -
check
,
werkzeug and frameworks based on it - check
,
requests - check

- they all wrap user data to BytesIO, and this may cause slowdowns and up
to 2x memory usage in Python 3.x).

Do you know if there a workaround? Maybe there is some stdlib part that I'm
missing, or a module on PyPI? It is not that hard to write an own wrapper
that won't do copies (or to port [c]StringIO to 3.x), but I wonder if there
is an existing solution or plans to fix it in Python itself - this BytesIO
use case looks quite important.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From dw+python-dev at hmmz.org  Thu Jul 17 01:07:54 2014
From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org)
Date: Wed, 16 Jul 2014 23:07:54 +0000
Subject: [Python-Dev] cStringIO vs io.BytesIO
In-Reply-To: 
References: 
Message-ID: <20140716230754.GA22619@k2>

On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:

> So making code 3.x compatible by ditching cStringIO can cause a serious
> performance/memory? regressions. One can change the code to build the data
> using BytesIO (without creating bytes objects in the first place), but that is
> not always possible or convenient.
> 
> I believe this problem affects tornado (https://github.com/tornadoweb/tornado/
> Do you know if there a workaround? Maybe there is some stdlib part that I'm
> missing, or a module on PyPI? It is not that hard to write an own wrapper that
> won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an
> existing solution or plans to fix it in Python itself - this BytesIO use case
> looks quite important.

Regarding a fix, the problem seems mostly that the StringI/StringO
specializations were removed, and the new implementation is basically
just a StringO.

At a small cost to memory, it is easy to add a Py_buffer source and
flags variable to the bytesio struct, with the buffers initially setup
for reading, and if a mutation method is called, check for a
copy-on-write flag, duplicate the source object into private memory,
then continue operating as it does now.

Attached is a (rough) patch implementing this, feel free to try it with
hg tip.

    [23:03:44 k2!124 cpython] cat i.py
    import io
    buf = b'x' * (1048576 * 16)
    def x():
        io.BytesIO(buf)

    [23:03:51 k2!125 cpython] ./python -m timeit  -s 'import i' 'i.x()'
    100 loops, best of 3: 2.9 msec per loop

    [23:03:57 k2!126 cpython] ./python-cow -m timeit  -s 'import i' 'i.x()'
    1000000 loops, best of 3: 0.364 usec per loop


David



diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
--- a/Modules/_io/bytesio.c
+++ b/Modules/_io/bytesio.c
@@ -2,6 +2,12 @@
 #include "structmember.h"       /* for offsetof() */
 #include "_iomodule.h"
 
+enum io_flags {
+    /* initvalue describes a borrowed buffer we cannot modify and must later
+     * release */
+    IO_SHARED = 1
+};
+
 typedef struct {
     PyObject_HEAD
     char *buf;
@@ -11,6 +17,10 @@
     PyObject *dict;
     PyObject *weakreflist;
     Py_ssize_t exports;
+    Py_buffer initvalue;
+    /* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that
+     * we don't own buf. */
+    enum io_flags flags;
 } bytesio;
 
 typedef struct {
@@ -33,6 +43,47 @@
         return NULL; \
     }
 
+/* Unshare our buffer in preparation for writing, in the case that an
+ * initvalue object was provided, and we're currently borrowing its buffer.
+ * size indicates the total reserved buffer size allocated as part of
+ * unsharing, to avoid a potentially redundant allocation in the subsequent
+ * mutation.
+ */
+static int
+unshare(bytesio *self, size_t size)
+{
+    Py_ssize_t new_size = size;
+    Py_ssize_t copy_size = size;
+    char *new_buf;
+
+    /* Do nothing if buffer wasn't shared */
+    if (! (self->flags & IO_SHARED)) {
+        return 0;
+    }
+
+    /* If hint provided, adjust our new buffer size and truncate the amount of
+     * source buffer we copy as necessary. */
+    if (size > copy_size) {
+        copy_size = size;
+    }
+
+    /* Allocate or fail. */
+    new_buf = (char *)PyMem_Malloc(new_size);
+    if (new_buf == NULL) {
+        PyErr_NoMemory();
+        return -1;
+    }
+
+    /* Copy the (possibly now truncated) source string to the new buffer, and
+     * forget any reference used to keep the source buffer alive. */
+    memcpy(new_buf, self->buf, copy_size);
+    PyBuffer_Release(&self->initvalue);
+    self->flags &= ~IO_SHARED;
+    self->buf = new_buf;
+    self->buf_size = new_size;
+    self->string_size = (Py_ssize_t) copy_size;
+    return 0;
+}
 
 /* Internal routine to get a line from the buffer of a BytesIO
    object. Returns the length between the current position to the
@@ -125,11 +176,18 @@
 static Py_ssize_t
 write_bytes(bytesio *self, const char *bytes, Py_ssize_t len)
 {
+    size_t desired;
+
     assert(self->buf != NULL);
     assert(self->pos >= 0);
     assert(len >= 0);
 
-    if ((size_t)self->pos + len > self->buf_size) {
+    desired = (size_t)self->pos + len;
+    if (unshare(self, desired)) {
+        return -1;
+    }
+
+    if (desired > self->buf_size) {
         if (resize_buffer(self, (size_t)self->pos + len) < 0)
             return -1;
     }
@@ -502,6 +560,10 @@
         return NULL;
     }
 
+    if (unshare(self, size)) {
+        return NULL;
+    }
+
     if (size < self->string_size) {
         self->string_size = size;
         if (resize_buffer(self, size) < 0)
@@ -655,10 +717,13 @@
 static PyObject *
 bytesio_close(bytesio *self)
 {
-    if (self->buf != NULL) {
+    if (self->flags & IO_SHARED) {
+        PyBuffer_Release(&self->initvalue);
+        self->flags &= ~IO_SHARED;
+    } else if (self->buf != NULL) {
         PyMem_Free(self->buf);
-        self->buf = NULL;
     }
+    self->buf = NULL;
     Py_RETURN_NONE;
 }
 
@@ -788,10 +853,17 @@
                         "deallocated BytesIO object has exported buffers");
         PyErr_Print();
     }
-    if (self->buf != NULL) {
+
+    if (self->flags & IO_SHARED) {
+        /* We borrowed buf from another object */
+        PyBuffer_Release(&self->initvalue);
+        self->flags &= ~IO_SHARED;
+    } else if (self->buf != NULL) {
+        /* We owned buf */
         PyMem_Free(self->buf);
-        self->buf = NULL;
     }
+    self->buf = NULL;
+
     Py_CLEAR(self->dict);
     if (self->weakreflist != NULL)
         PyObject_ClearWeakRefs((PyObject *) self);
@@ -811,12 +883,6 @@
     /* tp_alloc initializes all the fields to zero. So we don't have to
        initialize them here. */
 
-    self->buf = (char *)PyMem_Malloc(0);
-    if (self->buf == NULL) {
-        Py_DECREF(self);
-        return PyErr_NoMemory();
-    }
-
     return (PyObject *)self;
 }
 
@@ -834,13 +900,32 @@
     self->string_size = 0;
     self->pos = 0;
 
+    /* Release any previous initvalue. */
+    if (self->flags & IO_SHARED) {
+        PyBuffer_Release(&self->initvalue);
+        self->buf = NULL;
+        self->flags &= ~IO_SHARED;
+    }
+
     if (initvalue && initvalue != Py_None) {
-        PyObject *res;
-        res = bytesio_write(self, initvalue);
-        if (res == NULL)
+        Py_buffer *buf = &self->initvalue;
+        if (PyObject_GetBuffer(initvalue, buf, PyBUF_CONTIG_RO) < 0) {
             return -1;
-        Py_DECREF(res);
-        self->pos = 0;
+        }
+        self->buf = self->initvalue.buf;
+        self->buf_size = (size_t)self->initvalue.len;
+        self->string_size = self->initvalue.len;
+        self->flags |= IO_SHARED;
+    }
+
+    /* If no initvalue provided, prepare a private buffer now. */
+    if (self->buf == NULL) {
+        self->buf = (char *)PyMem_Malloc(0);
+        if (self->buf == NULL) {
+            Py_DECREF(self);
+            PyErr_NoMemory();
+            return -1;
+        }
     }
 
     return 0;

From dw+python-dev at hmmz.org  Thu Jul 17 02:18:21 2014
From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org)
Date: Thu, 17 Jul 2014 00:18:21 +0000
Subject: [Python-Dev] cStringIO vs io.BytesIO
In-Reply-To: <20140716230754.GA22619@k2>
References: 
 <20140716230754.GA22619@k2>
Message-ID: <20140717001821.GA25779@k2>

It's worth note that a natural extension of this is to do something very
similar on the write side: instead of generating a temporary private
heap allocation, generate (and freely resize) a private PyBytes object
until it is exposed to the user, at which point, _getvalue() returns it,
and converts its into an IO_SHARED buffer.

That way another copy is avoided in the common case of building a
string, calling getvalue() once, then discarding the IO object.


David

On Wed, Jul 16, 2014 at 11:07:54PM +0000, dw+python-dev at hmmz.org wrote:
> On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:
> 
> > So making code 3.x compatible by ditching cStringIO can cause a serious
> > performance/memory? regressions. One can change the code to build the data
> > using BytesIO (without creating bytes objects in the first place), but that is
> > not always possible or convenient.
> > 
> > I believe this problem affects tornado (https://github.com/tornadoweb/tornado/
> > Do you know if there a workaround? Maybe there is some stdlib part that I'm
> > missing, or a module on PyPI? It is not that hard to write an own wrapper that
> > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an
> > existing solution or plans to fix it in Python itself - this BytesIO use case
> > looks quite important.
> 
> Regarding a fix, the problem seems mostly that the StringI/StringO
> specializations were removed, and the new implementation is basically
> just a StringO.
> 
> At a small cost to memory, it is easy to add a Py_buffer source and
> flags variable to the bytesio struct, with the buffers initially setup
> for reading, and if a mutation method is called, check for a
> copy-on-write flag, duplicate the source object into private memory,
> then continue operating as it does now.
> 
> Attached is a (rough) patch implementing this, feel free to try it with
> hg tip.
> 
>     [23:03:44 k2!124 cpython] cat i.py
>     import io
>     buf = b'x' * (1048576 * 16)
>     def x():
>         io.BytesIO(buf)
> 
>     [23:03:51 k2!125 cpython] ./python -m timeit  -s 'import i' 'i.x()'
>     100 loops, best of 3: 2.9 msec per loop
> 
>     [23:03:57 k2!126 cpython] ./python-cow -m timeit  -s 'import i' 'i.x()'
>     1000000 loops, best of 3: 0.364 usec per loop
> 
> 
> David
> 
> 
> 
> diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
> --- a/Modules/_io/bytesio.c
> +++ b/Modules/_io/bytesio.c
> @@ -2,6 +2,12 @@
>  #include "structmember.h"       /* for offsetof() */
>  #include "_iomodule.h"
>  
> +enum io_flags {
> +    /* initvalue describes a borrowed buffer we cannot modify and must later
> +     * release */
> +    IO_SHARED = 1
> +};
> +
>  typedef struct {
>      PyObject_HEAD
>      char *buf;
> @@ -11,6 +17,10 @@
>      PyObject *dict;
>      PyObject *weakreflist;
>      Py_ssize_t exports;
> +    Py_buffer initvalue;
> +    /* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that
> +     * we don't own buf. */
> +    enum io_flags flags;
>  } bytesio;
>  
>  typedef struct {
> @@ -33,6 +43,47 @@
>          return NULL; \
>      }
>  
> +/* Unshare our buffer in preparation for writing, in the case that an
> + * initvalue object was provided, and we're currently borrowing its buffer.
> + * size indicates the total reserved buffer size allocated as part of
> + * unsharing, to avoid a potentially redundant allocation in the subsequent
> + * mutation.
> + */
> +static int
> +unshare(bytesio *self, size_t size)
> +{
> +    Py_ssize_t new_size = size;
> +    Py_ssize_t copy_size = size;
> +    char *new_buf;
> +
> +    /* Do nothing if buffer wasn't shared */
> +    if (! (self->flags & IO_SHARED)) {
> +        return 0;
> +    }
> +
> +    /* If hint provided, adjust our new buffer size and truncate the amount of
> +     * source buffer we copy as necessary. */
> +    if (size > copy_size) {
> +        copy_size = size;
> +    }
> +
> +    /* Allocate or fail. */
> +    new_buf = (char *)PyMem_Malloc(new_size);
> +    if (new_buf == NULL) {
> +        PyErr_NoMemory();
> +        return -1;
> +    }
> +
> +    /* Copy the (possibly now truncated) source string to the new buffer, and
> +     * forget any reference used to keep the source buffer alive. */
> +    memcpy(new_buf, self->buf, copy_size);
> +    PyBuffer_Release(&self->initvalue);
> +    self->flags &= ~IO_SHARED;
> +    self->buf = new_buf;
> +    self->buf_size = new_size;
> +    self->string_size = (Py_ssize_t) copy_size;
> +    return 0;
> +}
>  
>  /* Internal routine to get a line from the buffer of a BytesIO
>     object. Returns the length between the current position to the
> @@ -125,11 +176,18 @@
>  static Py_ssize_t
>  write_bytes(bytesio *self, const char *bytes, Py_ssize_t len)
>  {
> +    size_t desired;
> +
>      assert(self->buf != NULL);
>      assert(self->pos >= 0);
>      assert(len >= 0);
>  
> -    if ((size_t)self->pos + len > self->buf_size) {
> +    desired = (size_t)self->pos + len;
> +    if (unshare(self, desired)) {
> +        return -1;
> +    }
> +
> +    if (desired > self->buf_size) {
>          if (resize_buffer(self, (size_t)self->pos + len) < 0)
>              return -1;
>      }
> @@ -502,6 +560,10 @@
>          return NULL;
>      }
>  
> +    if (unshare(self, size)) {
> +        return NULL;
> +    }
> +
>      if (size < self->string_size) {
>          self->string_size = size;
>          if (resize_buffer(self, size) < 0)
> @@ -655,10 +717,13 @@
>  static PyObject *
>  bytesio_close(bytesio *self)
>  {
> -    if (self->buf != NULL) {
> +    if (self->flags & IO_SHARED) {
> +        PyBuffer_Release(&self->initvalue);
> +        self->flags &= ~IO_SHARED;
> +    } else if (self->buf != NULL) {
>          PyMem_Free(self->buf);
> -        self->buf = NULL;
>      }
> +    self->buf = NULL;
>      Py_RETURN_NONE;
>  }
>  
> @@ -788,10 +853,17 @@
>                          "deallocated BytesIO object has exported buffers");
>          PyErr_Print();
>      }
> -    if (self->buf != NULL) {
> +
> +    if (self->flags & IO_SHARED) {
> +        /* We borrowed buf from another object */
> +        PyBuffer_Release(&self->initvalue);
> +        self->flags &= ~IO_SHARED;
> +    } else if (self->buf != NULL) {
> +        /* We owned buf */
>          PyMem_Free(self->buf);
> -        self->buf = NULL;
>      }
> +    self->buf = NULL;
> +
>      Py_CLEAR(self->dict);
>      if (self->weakreflist != NULL)
>          PyObject_ClearWeakRefs((PyObject *) self);
> @@ -811,12 +883,6 @@
>      /* tp_alloc initializes all the fields to zero. So we don't have to
>         initialize them here. */
>  
> -    self->buf = (char *)PyMem_Malloc(0);
> -    if (self->buf == NULL) {
> -        Py_DECREF(self);
> -        return PyErr_NoMemory();
> -    }
> -
>      return (PyObject *)self;
>  }
>  
> @@ -834,13 +900,32 @@
>      self->string_size = 0;
>      self->pos = 0;
>  
> +    /* Release any previous initvalue. */
> +    if (self->flags & IO_SHARED) {
> +        PyBuffer_Release(&self->initvalue);
> +        self->buf = NULL;
> +        self->flags &= ~IO_SHARED;
> +    }
> +
>      if (initvalue && initvalue != Py_None) {
> -        PyObject *res;
> -        res = bytesio_write(self, initvalue);
> -        if (res == NULL)
> +        Py_buffer *buf = &self->initvalue;
> +        if (PyObject_GetBuffer(initvalue, buf, PyBUF_CONTIG_RO) < 0) {
>              return -1;
> -        Py_DECREF(res);
> -        self->pos = 0;
> +        }
> +        self->buf = self->initvalue.buf;
> +        self->buf_size = (size_t)self->initvalue.len;
> +        self->string_size = self->initvalue.len;
> +        self->flags |= IO_SHARED;
> +    }
> +
> +    /* If no initvalue provided, prepare a private buffer now. */
> +    if (self->buf == NULL) {
> +        self->buf = (char *)PyMem_Malloc(0);
> +        if (self->buf == NULL) {
> +            Py_DECREF(self);
> +            PyErr_NoMemory();
> +            return -1;
> +        }
>      }
>  
>      return 0;
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/dw%2Bpython-dev%40hmmz.org

From ncoghlan at gmail.com  Thu Jul 17 03:28:16 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 16 Jul 2014 21:28:16 -0400
Subject: [Python-Dev] cStringIO vs io.BytesIO
In-Reply-To: <20140716230754.GA22619@k2>
References: 
 <20140716230754.GA22619@k2>
Message-ID: 

On 16 Jul 2014 20:00,  wrote:
> On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:
> > I believe this problem affects tornado (
https://github.com/tornadoweb/tornado/
> > Do you know if there a workaround? Maybe there is some stdlib part that
I'm
> > missing, or a module on PyPI? It is not that hard to write an own
wrapper that
> > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there
is an
> > existing solution or plans to fix it in Python itself - this BytesIO
use case
> > looks quite important.
>
> Regarding a fix, the problem seems mostly that the StringI/StringO
> specializations were removed, and the new implementation is basically
> just a StringO.

Right, I don't think there's a major philosophy change here, just a missing
optimisation that could be restored in 3.5.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From antoine at python.org  Thu Jul 17 03:51:27 2014
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 16 Jul 2014 21:51:27 -0400
Subject: [Python-Dev] cStringIO vs io.BytesIO
In-Reply-To: <20140716230754.GA22619@k2>
References: 
 <20140716230754.GA22619@k2>
Message-ID: 



Hi,

Le 16/07/2014 19:07, dw+python-dev at hmmz.org a ?crit :
>
> Attached is a (rough) patch implementing this, feel free to try it with
> hg tip.

Thanks for your work. Please post any patch to http://bugs.python.org

Regards

Antoine.



From kmike84 at gmail.com  Thu Jul 17 20:24:17 2014
From: kmike84 at gmail.com (Mikhail Korobov)
Date: Fri, 18 Jul 2014 00:24:17 +0600
Subject: [Python-Dev] cStringIO vs io.BytesIO
In-Reply-To: 
References: 
 <20140716230754.GA22619@k2>
 
Message-ID: 

That was an impressively fast draft patch!



2014-07-17 7:28 GMT+06:00 Nick Coghlan :

>
> On 16 Jul 2014 20:00,  wrote:
> > On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:
> > > I believe this problem affects tornado (
> https://github.com/tornadoweb/tornado/
> > > Do you know if there a workaround? Maybe there is some stdlib part
> that I'm
> > > missing, or a module on PyPI? It is not that hard to write an own
> wrapper that
> > > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there
> is an
> > > existing solution or plans to fix it in Python itself - this BytesIO
> use case
> > > looks quite important.
> >
> > Regarding a fix, the problem seems mostly that the StringI/StringO
> > specializations were removed, and the new implementation is basically
> > just a StringO.
>
> Right, I don't think there's a major philosophy change here, just a
> missing optimisation that could be restored in 3.5.
>
> Cheers,
> Nick.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From status at bugs.python.org  Fri Jul 18 18:07:59 2014
From: status at bugs.python.org (Python tracker)
Date: Fri, 18 Jul 2014 18:07:59 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20140718160759.5064A56A70@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2014-07-11 - 2014-07-18)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    4589 ( +1)
  closed 29188 (+47)
  total  33777 (+48)

Open issues with patches: 2154 


Issues opened (36)
==================

#21044: tarfile does not handle file .name being an int
http://bugs.python.org/issue21044  reopened by zach.ware

#21946: 'python -u' yields trailing carriage return '\r'  (Python2 for
http://bugs.python.org/issue21946  reopened by haypo

#21950: import sqlite3 not running after configure --prefix=/alt/path;
http://bugs.python.org/issue21950  reopened by r.david.murray

#21958: Allow python 2.7 to compile with Visual Studio 2013
http://bugs.python.org/issue21958  opened by Zachary.Turner

#21960: Better path handling in Idle find in files
http://bugs.python.org/issue21960  opened by terry.reedy

#21961: Add What's New for Idle.
http://bugs.python.org/issue21961  opened by terry.reedy

#21962: No timeout for asyncio.Event.wait() or asyncio.Condition.wait(
http://bugs.python.org/issue21962  opened by ajaborsk

#21963: 2.7.8 backport of Issue1856 (avoid daemon thread problems at s
http://bugs.python.org/issue21963  opened by ned.deily

#21964: inconsistency in list-generator comprehension with yield(-from
http://bugs.python.org/issue21964  opened by hakril

#21965: Add support for Memory BIO to _ssl
http://bugs.python.org/issue21965  opened by geertj

#21967: Interpreter crash upon accessing frame.f_restricted of a frame
http://bugs.python.org/issue21967  opened by anselm.kruis

#21969: WindowsPath constructor does not check for invalid characters
http://bugs.python.org/issue21969  opened by Antony.Lee

#21970: Broken code for handling file://host in urllib.request.FileHan
http://bugs.python.org/issue21970  opened by vadmium

#21971: Index and update turtledemo doc.
http://bugs.python.org/issue21971  opened by terry.reedy

#21972: Bugs in the lexer and parser documentation
http://bugs.python.org/issue21972  opened by Fran??ois-Ren??.Rideau

#21973: Idle should not quit on corrupted user config files
http://bugs.python.org/issue21973  opened by Tomk

#21975: Using pickled/unpickled sqlite3.Row results in segfault rather
http://bugs.python.org/issue21975  opened by Elizacat

#21976: Fix test_ssl.py to handle LibreSSL versioning appropriately
http://bugs.python.org/issue21976  opened by worr

#21980: Implement `logging.LogRecord.__repr__`
http://bugs.python.org/issue21980  opened by cool-RR

#21983: segfault in ctypes.cast
http://bugs.python.org/issue21983  opened by Anthony.LaTorre

#21986: Pickleability of code objects is inconsistent
http://bugs.python.org/issue21986  opened by ppperry

#21987: TarFile.getmember on directory requires trailing slash iff ove
http://bugs.python.org/issue21987  opened by moloney

#21989: Missing (optional) argument `start` and `end` in documentation
http://bugs.python.org/issue21989  opened by SylvainDe

#21990: saxutils defines an inner class where a normal one would do
http://bugs.python.org/issue21990  opened by alex

#21991: The new email API should use MappingProxyType instead of retur
http://bugs.python.org/issue21991  opened by r.david.murray

#21992: New AST node Else() should be introduced
http://bugs.python.org/issue21992  opened by Igor.Bronshteyn

#21995: Idle: pseudofiles have no buffer attribute.
http://bugs.python.org/issue21995  opened by terry.reedy

#21996: gettarinfo method does not handle files without text string na
http://bugs.python.org/issue21996  opened by vadmium

#21997: Pdb.set_trace debugging does not end correctly in IDLE
http://bugs.python.org/issue21997  opened by ppperry

#21998: asyncio: a new self-pipe should be created in the child proces
http://bugs.python.org/issue21998  opened by haypo

#21999: shlex: bug in posix more handling of empty strings
http://bugs.python.org/issue21999  opened by isoschiz

#22000: cross type comparisons clarification
http://bugs.python.org/issue22000  opened by Jim.Jewett

#22001: containers "same" does not always mean "__eq__".
http://bugs.python.org/issue22001  opened by Jim.Jewett

#22002: Make full use of test discovery in test subpackages
http://bugs.python.org/issue22002  opened by zach.ware

#22003: BytesIO copy-on-write
http://bugs.python.org/issue22003  opened by dw

#22005: datetime.__setstate__ fails decoding python2 pickle
http://bugs.python.org/issue22005  opened by eddygeek



Most recent 15 issues with no replies (15)
==========================================

#22005: datetime.__setstate__ fails decoding python2 pickle
http://bugs.python.org/issue22005

#22000: cross type comparisons clarification
http://bugs.python.org/issue22000

#21999: shlex: bug in posix more handling of empty strings
http://bugs.python.org/issue21999

#21998: asyncio: a new self-pipe should be created in the child proces
http://bugs.python.org/issue21998

#21997: Pdb.set_trace debugging does not end correctly in IDLE
http://bugs.python.org/issue21997

#21996: gettarinfo method does not handle files without text string na
http://bugs.python.org/issue21996

#21995: Idle: pseudofiles have no buffer attribute.
http://bugs.python.org/issue21995

#21992: New AST node Else() should be introduced
http://bugs.python.org/issue21992

#21991: The new email API should use MappingProxyType instead of retur
http://bugs.python.org/issue21991

#21990: saxutils defines an inner class where a normal one would do
http://bugs.python.org/issue21990

#21989: Missing (optional) argument `start` and `end` in documentation
http://bugs.python.org/issue21989

#21971: Index and update turtledemo doc.
http://bugs.python.org/issue21971

#21967: Interpreter crash upon accessing frame.f_restricted of a frame
http://bugs.python.org/issue21967

#21965: Add support for Memory BIO to _ssl
http://bugs.python.org/issue21965

#21960: Better path handling in Idle find in files
http://bugs.python.org/issue21960



Most recent 15 issues waiting for review (15)
=============================================

#22003: BytesIO copy-on-write
http://bugs.python.org/issue22003

#22002: Make full use of test discovery in test subpackages
http://bugs.python.org/issue22002

#21999: shlex: bug in posix more handling of empty strings
http://bugs.python.org/issue21999

#21990: saxutils defines an inner class where a normal one would do
http://bugs.python.org/issue21990

#21989: Missing (optional) argument `start` and `end` in documentation
http://bugs.python.org/issue21989

#21986: Pickleability of code objects is inconsistent
http://bugs.python.org/issue21986

#21976: Fix test_ssl.py to handle LibreSSL versioning appropriately
http://bugs.python.org/issue21976

#21975: Using pickled/unpickled sqlite3.Row results in segfault rather
http://bugs.python.org/issue21975

#21965: Add support for Memory BIO to _ssl
http://bugs.python.org/issue21965

#21958: Allow python 2.7 to compile with Visual Studio 2013
http://bugs.python.org/issue21958

#21955: ceval.c: implement fast path for integers with a single digit
http://bugs.python.org/issue21955

#21951: tcl test change crashes AIX
http://bugs.python.org/issue21951

#21947: `Dis` module doesn't know how to disassemble generators
http://bugs.python.org/issue21947

#21944: Allow copying of CodecInfo objects
http://bugs.python.org/issue21944

#21941: Clean up turtle TPen class
http://bugs.python.org/issue21941



Top 10 most discussed issues (10)
=================================

#21645: asyncio: Race condition in signal handling on FreeBSD
http://bugs.python.org/issue21645  16 msgs

#15443: datetime module has no support for nanoseconds
http://bugs.python.org/issue15443  14 msgs

#21815: imaplib truncates some untagged responses
http://bugs.python.org/issue21815  14 msgs

#21935: Implement AUTH command in smtpd.
http://bugs.python.org/issue21935  11 msgs

#21955: ceval.c: implement fast path for integers with a single digit
http://bugs.python.org/issue21955  10 msgs

#21975: Using pickled/unpickled sqlite3.Row results in segfault rather
http://bugs.python.org/issue21975   9 msgs

#21986: Pickleability of code objects is inconsistent
http://bugs.python.org/issue21986   9 msgs

#21927: BOM appears in stdin when using Powershell
http://bugs.python.org/issue21927   8 msgs

#1598: unexpected response in imaplib
http://bugs.python.org/issue1598   7 msgs

#18320: python installation is broken if prefix is overridden on an in
http://bugs.python.org/issue18320   7 msgs



Issues closed (43)
==================

#8849: python.exe problem with cvxopt
http://bugs.python.org/issue8849  closed by r.david.murray

#9390: Error in sys.excepthook on windows when redirecting output of 
http://bugs.python.org/issue9390  closed by zach.ware

#14714: PEP 414 tokenizing hook does not preserve tabs
http://bugs.python.org/issue14714  closed by aronacher

#15962: Windows STDIN/STDOUT Redirection is actually FIXED
http://bugs.python.org/issue15962  closed by terry.reedy

#16178: atexit._run_exitfuncs should be a public API
http://bugs.python.org/issue16178  closed by rhettinger

#16237: bdist_rpm SPEC files created with distutils may be distro spec
http://bugs.python.org/issue16237  closed by ncoghlan

#16382: Better warnings exception for bad category
http://bugs.python.org/issue16382  closed by berker.peksag

#16859: tarfile.TarInfo.fromtarfile does not check read() return value
http://bugs.python.org/issue16859  closed by lars.gustaebel

#16895: Batch file to mimic 'make' on Windows
http://bugs.python.org/issue16895  closed by zach.ware

#17308: Dialog.py crashes when putty Window resized
http://bugs.python.org/issue17308  closed by berker.peksag

#18144: FD leak in urllib2
http://bugs.python.org/issue18144  closed by serhiy.storchaka

#18974: Use argparse in the diff script
http://bugs.python.org/issue18974  closed by serhiy.storchaka

#19076: Pdb.do_break calls error with obsolete file kwarg
http://bugs.python.org/issue19076  closed by berker.peksag

#19355: Initial modernization of OpenWatcom support
http://bugs.python.org/issue19355  closed by Jeffrey.Armstrong

#20451: os.exec* mangles argv on windows (splits on spaces, etc)
http://bugs.python.org/issue20451  closed by rhettinger

#21059: idle_test.test_warning failure
http://bugs.python.org/issue21059  closed by zach.ware

#21163: asyncio doesn't warn if a task is destroyed during its executi
http://bugs.python.org/issue21163  closed by haypo

#21247: test_asyncio: test_subprocess_send_signal hangs on Fedora buil
http://bugs.python.org/issue21247  closed by haypo

#21323: CGI HTTP server not running scripts from subdirectories
http://bugs.python.org/issue21323  closed by ned.deily

#21599: Argument transport in attach and detach method in Server class
http://bugs.python.org/issue21599  closed by haypo

#21655: Write Unit Test for Vec2 and TNavigator class in the Turtle Mo
http://bugs.python.org/issue21655  closed by Lita.Cho

#21765: Idle: make 3.x HyperParser work with non-ascii identifiers.
http://bugs.python.org/issue21765  closed by terry.reedy

#21899: Futures are not marked as completed
http://bugs.python.org/issue21899  closed by Sebastian.Kreft.Deezer

#21906: Tools\Scripts\md5sum.py doesn't work in Python 3.x
http://bugs.python.org/issue21906  closed by berker.peksag

#21913: threading.Condition.wait() is not interruptible in Python 2.7
http://bugs.python.org/issue21913  closed by neologix

#21918: Convert test_tools to directory
http://bugs.python.org/issue21918  closed by zach.ware

#21953: pythonrun.c does not check std streams the same as fileio.c
http://bugs.python.org/issue21953  closed by steve.dower

#21957: ASCII Formfeed (FF) & ASCII Vertical Tab (VT) Have Hexadecimal
http://bugs.python.org/issue21957  closed by ned.deily

#21959: msi product code for 2.7.8150 not in Tools/msi/uuids.py
http://bugs.python.org/issue21959  closed by r.david.murray

#21966: InteractiveConsole does not support -q option
http://bugs.python.org/issue21966  closed by belopolsky

#21968: 'abort' object is not callable
http://bugs.python.org/issue21968  closed by Apple Grew

#21974: Typo in "Set" in PEP 289
http://bugs.python.org/issue21974  closed by rhettinger

#21977: In the re's token example OP and SKIP regexes can be improved
http://bugs.python.org/issue21977  closed by rhettinger

#21978: Support index access on OrderedDict views (e.g. o.keys()[7])
http://bugs.python.org/issue21978  closed by rhettinger

#21979: SyntaxError not raised on "0xaor 1"
http://bugs.python.org/issue21979  closed by mark.dickinson

#21981: Idle problem
http://bugs.python.org/issue21981  closed by eric.smith

#21982: Idle configDialog: fix regression and add minimal unittest
http://bugs.python.org/issue21982  closed by terry.reedy

#21984: list(itertools.repeat(1)) causes the system to hang
http://bugs.python.org/issue21984  closed by rhettinger

#21985: test_asyncio prints some junk
http://bugs.python.org/issue21985  closed by haypo

#21988: Decrease iterating overhead in timeit
http://bugs.python.org/issue21988  closed by gvanrossum

#21993: counterintuitive behavior of list.index with boolean values
http://bugs.python.org/issue21993  closed by ezio.melotti

#21994: Syntax error in the ssl module documentation
http://bugs.python.org/issue21994  closed by berker.peksag

#22004: io documentation refers to newline as newlines
http://bugs.python.org/issue22004  closed by python-dev

From techtonik at gmail.com  Sun Jul 20 16:34:27 2014
From: techtonik at gmail.com (anatoly techtonik)
Date: Sun, 20 Jul 2014 17:34:27 +0300
Subject: [Python-Dev] subprocess research - max limit for piped output
Message-ID: 

I am trying to figure out what is maximum size
for piped input in subprocess.check_output()

I've got limitation of about 500Mb after which
Python exits with MemoryError without any
additional details.

I have only 2.76Gb memory used out of 8Gb,
so what limit do I hit?

1. subprocess output read buffer
2. Python limit on size of variable
3. some OS limit on output pipes

Testcase attached.


C:\discovery\interface\subprocess>py dead.py
Testing size: 520Mb
..truncating to 545259520
..
Traceback (most recent call last):
  File "dead.py", line 66, in 
    backticks(r'type largefile')
  File "dead.py", line 36, in backticks
    output = subprocess.check_output(command, shell=True)
  File "C:\Python27\lib\subprocess.py", line 567, in check_output
    output, unused_err = process.communicate()
  File "C:\Python27\lib\subprocess.py", line 791, in communicate
    stdout = _eintr_retry_call(self.stdout.read)
  File "C:\Python27\lib\subprocess.py", line 476, in _eintr_retry_call
    return func(*args)
MemoryError
The process tried to write to a nonexistent pipe.

-- 
anatoly t.
-------------- next part --------------
import subprocess

# --- replacing shell backticks ---
# https://docs.python.org/2/library/subprocess.html#replacing-bin-sh-shell-backquote
#   output=`mycmd myarg`
#   output = check_output(["mycmd", "myarg"])
# not true, because mycmd is not passed to shell
try:
    pass #output = subprocess.check_output(["mycmd", "myarg"], shell=True)
except OSError as ex:
    # command not found.
    # it is impossible to catch output here, but shell outputs
    # message to stderr, which backticks doesn't catch either
    output = ''
except subprocess.CalledProcessError as ex:
    output = ex.output
# ^ information about error condition is lost
# ^ output in case of OSError is lost

# ux notes:
# - `mycmd myarg` > ["mycmd", "myarg"]
# - `` is invisible
#   subprocess.check_output is hardly rememberable
# - exception checking is excessive and not needed
#   (common pattern is to check return code)


def backticks(command):
   '''
   - no return code
   - no stderr capture
   '''
   try:
       # this doesn't escape shell patterns, such as:
       # ^ (windows cmd.exe shell)
       output = subprocess.check_output(command, shell=True)
   except OSError as ex:
       # command not found.
       # it is impossible to catch output here, but shell outputs
       # message to stderr, which backticks doesn't catch either
       output = ''
   except subprocess.CalledProcessError as ex:
       output = ex.output
   return output


import os
for size in range(520, 600, 2):
    print("Testing size: %sMb" % size)
    #cursize = os.path.getsize("largefile")
    with open("largefile", "ab") as data:
        data.seek(0, 2)
        cursize = data.tell()
        #print(cursize)
        limit = size*1024**2
        if cursize > limit:
            print('..truncating to %s' % limit)
            data.truncate(limit)
        else:
            print('..extending to %s' % limit)
            while cursize < limit:
                toadd = min(100, limit-cursize)
                data.write('1'*99+'\n')
                cursize += 100
    print("..")
    backticks(r'type largefile')


From antoine at python.org  Sun Jul 20 18:50:06 2014
From: antoine at python.org (Antoine Pitrou)
Date: Sun, 20 Jul 2014 12:50:06 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
Message-ID: 



Hi,

 > Thanks Victor, Nick, Ethan, and others for continued discussion on the
> scandir PEP 471 (most recent thread starts at
> https://mail.python.org/pipermail/python-dev/2014-July/135377.html).

Have you tried modifying importlib's _bootstrap.py to use scandir() 
instead of listdir() + stat()?

Regards

Antoine.



From benhoyt at gmail.com  Sun Jul 20 23:34:19 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Sun, 20 Jul 2014 17:34:19 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
Message-ID: 

> Have you tried modifying importlib's _bootstrap.py to use scandir() instead
> of listdir() + stat()?

No, I haven't -- I'm not familiar with that code. What does
_bootstrap.py do -- does it do a lot of listdir calls and stat-ing of
many files?

-Ben

From brett at python.org  Mon Jul 21 00:35:48 2014
From: brett at python.org (Brett Cannon)
Date: Sun, 20 Jul 2014 22:35:48 +0000
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
References: 
 
 
Message-ID: 

Oh yes. :) The file Antoine is referring to is the implementation of import.

On Sun, Jul 20, 2014, 17:34 Ben Hoyt  wrote:

> > Have you tried modifying importlib's _bootstrap.py to use scandir()
> instead
> > of listdir() + stat()?
>
> No, I haven't -- I'm not familiar with that code. What does
> _bootstrap.py do -- does it do a lot of listdir calls and stat-ing of
> many files?
>
> -Ben
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From antoine at python.org  Mon Jul 21 01:45:28 2014
From: antoine at python.org (Antoine Pitrou)
Date: Sun, 20 Jul 2014 19:45:28 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
Message-ID: 

Le 20/07/2014 17:34, Ben Hoyt a ?crit :
>> Have you tried modifying importlib's _bootstrap.py to use scandir() instead
>> of listdir() + stat()?
>
> No, I haven't -- I'm not familiar with that code. What does
> _bootstrap.py do -- does it do a lot of listdir calls and stat-ing of
> many files?

Quite a bit, although that should be dampened in recent 3.x versions, 
thanks to the caching of directory contents.

Even though there is tangible performance improvement from scandir(), it 
would be useful to find out if the API fits well.

Regards

Antoine.



From benhoyt at gmail.com  Mon Jul 21 17:32:05 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Mon, 21 Jul 2014 11:32:05 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 
Message-ID: 

> Even though there is tangible performance improvement from scandir(), it
> would be useful to find out if the API fits well.

Got it -- I see where you're coming from now. I'll take a quick look
(hopefully later this week).

-Ben

From victor.stinner at gmail.com  Mon Jul 21 17:57:12 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 21 Jul 2014 17:57:12 +0200
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
Message-ID: 

Hi,

2014-07-20 18:50 GMT+02:00 Antoine Pitrou :
> Have you tried modifying importlib's _bootstrap.py to use scandir() instead
> of listdir() + stat()?

IMO the current os.scandir() API does not fit importlib requirements.
importlib usually wants fresh data, whereas DirEntry cache cannot be
invalidated. It's probably possible to cache some os.stat() result in
importlib, but it looks like it requires a non trivial refactoring of
the code. I don't know importlib enough to suggest how to change it.

There are many open isssues related to stat() in importlib, I found these ones:

http://bugs.python.org/issue14604
http://bugs.python.org/issue14067
http://bugs.python.org/issue19216

Closed issues:

http://bugs.python.org/issue17330
http://bugs.python.org/issue18810


By the way, DirEntry constructor is not documented in the PEP. Should
we document it? It might be a way to "invalidate the cache":

entry = DirEntry(os.path.dirname(entry.path), entry.name)

Maybe it is an abuse of the API. A clear_cache() method would be less
ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
for a long time?

Another question: should we expose DirEntry type directly in the os
namespace? (os.DirEntry)

Victor

From Steve.Dower at microsoft.com  Mon Jul 21 18:11:45 2014
From: Steve.Dower at microsoft.com (Steve Dower)
Date: Mon, 21 Jul 2014 16:11:45 +0000
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
Message-ID: <5a4f4fb5c98347258ad1ed1c754d922f@DM2PR0301MB0734.namprd03.prod.outlook.com>

Victor Stinner wrote:
> 2014-07-20 18:50 GMT+02:00 Antoine Pitrou :
>> Have you tried modifying importlib's _bootstrap.py to use scandir() 
>> instead of listdir() + stat()?
>
> IMO the current os.scandir() API does not fit importlib requirements.
> importlib usually wants fresh data, whereas DirEntry cache cannot be
> invalidated. It's probably possible to cache some os.stat() result in
> importlib, but it looks like it requires a non trivial refactoring of
> the code. I don't know importlib enough to suggest how to change it.

The data is completely fresh at the time it is obtained, which is identical to using stat(). There will always be a race-condition between looking and doing, which is why we still use exception handling on actions.

> By the way, DirEntry constructor is not documented in the PEP. Should
> we document it? It might be a way to "invalidate the cache":
>
> entry = DirEntry(os.path.dirname(entry.path), entry.name)
>
> Maybe it is an abuse of the API. A clear_cache() method would be less
> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
> for a long time?

DirEntry is a convenient way to return a tuple without returning a tuple, that's all. If you want up to date info, call os.stat() and pass in the path. This should just be a better (and ideally transparent) substitute for os.listdir() in every single context.

Personally I'd make it a string subclass and put one-shot properties on it (i.e. call/cache stat() on first access where we don't already know the answer), which I think is close enough to where it's landed that I'm happy. (As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) )

Cheers,
Steve

From benhoyt at gmail.com  Mon Jul 21 18:48:50 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Mon, 21 Jul 2014 12:48:50 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
Message-ID: 

Thanks for an initial look into this, Victor.

> IMO the current os.scandir() API does not fit importlib requirements.
> importlib usually wants fresh data, whereas DirEntry cache cannot be
> invalidated. It's probably possible to cache some os.stat() result in
> importlib, but it looks like it requires a non trivial refactoring of
> the code. I don't know importlib enough to suggest how to change it.

Yes, with importlib already doing its own caching (somewhat
complicated, as the open and closed issues show), I get the feeling it
wouldn't be a good fit. Note that I'm not saying we wouldn't use it if
we were implementing importlib from scratch.

> By the way, DirEntry constructor is not documented in the PEP. Should
> we document it? It might be a way to "invalidate the cache":

I would prefer not to, just to keep things simple. Similar to creating
os.stat_result() objects ... you can kind of do it (see scandir.py),
but it's not recommended or even documented. The entire purpose of
DirEntry objects is so scandir can produce them, not for general use.

> entry = DirEntry(os.path.dirname(entry.path), entry.name)
>
> Maybe it is an abuse of the API. A clear_cache() method would be less
> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
> for a long time?
>
> Another question: should we expose DirEntry type directly in the os
> namespace? (os.DirEntry)

Again, I'd rather not expose this. It's quite system-specific (see the
different system versions in scandir.py), and trying to combine this,
make it consistent, and document it would be a bit of a pain, and also
possibly prevent future modifications (because then the parts of the
implementation would be set in stone).

I'm not really opposed to a clear_cache() method -- basically it'd set
_lstat and _stat and _d_type to None internally. However, I'd prefer
to keep it as is, and as the PEP says:

If developers want "refresh" behaviour (for example, for watching a
file's size change), they can simply use pathlib.Path objects, or call
the regular os.stat() or os.path.getsize() functions which get fresh
data from the operating system every call.

-Ben

From matsjoyce at gmail.com  Mon Jul 21 21:26:14 2014
From: matsjoyce at gmail.com (matsjoyce)
Date: Mon, 21 Jul 2014 19:26:14 +0000 (UTC)
Subject: [Python-Dev] Reviving restricted mode?
References: 
 <200902231657.52201.victor.stinner@haypocalc.com>
 
Message-ID: 

Sorry about being a bit late on this front (just 5 years...), but I've 
extended tav's jail to module level, and added the niceties. It's goal is 
similar to that of rexec, stopping IO, but not crashes. It is currently at 
https://github.com/matsjoyce/sandypython, and it has instructions as to its 
use. I've bashed it with all the exploits I've found online, and its still 
holding, so I thought the public might like ago.


From victor.stinner at gmail.com  Mon Jul 21 21:36:09 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Mon, 21 Jul 2014 21:36:09 +0200
Subject: [Python-Dev] Reviving restricted mode?
In-Reply-To: 
References: 
 <200902231657.52201.victor.stinner@haypocalc.com>
 
 
Message-ID: 

Hi,

2014-07-21 21:26 GMT+02:00 matsjoyce :
> Sorry about being a bit late on this front (just 5 years...), but I've
> extended tav's jail to module level, and added the niceties. It's goal is
> similar to that of rexec, stopping IO, but not crashes. It is currently at
> https://github.com/matsjoyce/sandypython, and it has instructions as to its
> use. I've bashed it with all the exploits I've found online, and its still
> holding, so I thought the public might like ago.

I wrote this project, started from tav's jail:
https://github.com/haypo/pysandbox/

I gave up because I know consider that pysandbox is broken by design.
Please read the LWN article:
https://lwn.net/Articles/574215/

Don't hesitate to ask more specific questions.

Victor

From ncoghlan at gmail.com  Mon Jul 21 23:37:09 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 22 Jul 2014 07:37:09 +1000
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: <5a4f4fb5c98347258ad1ed1c754d922f@DM2PR0301MB0734.namprd03.prod.outlook.com>
References: 
 
 
 <5a4f4fb5c98347258ad1ed1c754d922f@DM2PR0301MB0734.namprd03.prod.outlook.com>
Message-ID: 

On 22 Jul 2014 02:46, "Steve Dower"  wrote:
>
> Personally I'd make it a string subclass and put one-shot properties on
it (i.e. call/cache stat() on first access where we don't already know the
answer), which I think is close enough to where it's landed that I'm happy.
(As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) )

+1 for "_DirEntry" as the name in the implementation, and documenting its
behaviour under "scandir" rather than as a standalone object.

Only -0 for full documentation as a standalone class, though.

Cheers,
Nick.

>
> Cheers,
> Steve
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From victor.stinner at gmail.com  Tue Jul 22 00:26:02 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 22 Jul 2014 00:26:02 +0200
Subject: [Python-Dev] PEP 471 "scandir" accepted
Message-ID: 

Hi,

I asked privately Guido van Rossum if I can be the BDFL-delegate for
the PEP 471 and he agreed. I accept the latest version of the PEP:

    http://legacy.python.org/dev/peps/pep-0471/

I consider that the PEP 471 "scandir" was discussed enough to collect
all possible options (variations of the API) and that main flaws have
been detected. Ben Hoyt modified his PEP to list all these options,
and for each option gives advantages and drawbacks. Great job Ben :-)
Thanks all developers who contributed to the threads on the python-dev
mailing list!

The new version of the PEP has an optional "follow_symlinks" parameter
which is True by default. IMO this API fits better the common case,
list the content of a single directory, and it's now simple to not
follow symlinks to implement a recursive function like os.walk().

The PEP also explicitly mentions that os.walk() will be modified to
benefit of the new os.scandir() function.

I'm happy because the final API is very close to os.path functions and
pathlib.Path methods. Python stays consistent, which is a great power
of this language!

The PEP is accepted. It's time to review the implementation ;-) The
current code can be found at:

   https://github.com/benhoyt/scandir

(I don't think that Ben already updated his implementation for the
latest version of the PEP.)

Victor

From victor.stinner at gmail.com  Tue Jul 22 00:39:26 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 22 Jul 2014 00:39:26 +0200
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 
Message-ID: 

2014-07-21 18:48 GMT+02:00 Ben Hoyt :
>> By the way, DirEntry constructor is not documented in the PEP. Should
>> we document it? It might be a way to "invalidate the cache":
>
> I would prefer not to, just to keep things simple. Similar to creating
> os.stat_result() objects ... you can kind of do it (see scandir.py),
> but it's not recommended or even documented. The entire purpose of
> DirEntry objects is so scandir can produce them, not for general use.
>
>> entry = DirEntry(os.path.dirname(entry.path), entry.name)
>>
>> Maybe it is an abuse of the API. A clear_cache() method would be less
>> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry
>> for a long time?
>>
>> Another question: should we expose DirEntry type directly in the os
>> namespace? (os.DirEntry)
>
> Again, I'd rather not expose this. It's quite system-specific (see the
> different system versions in scandir.py), and trying to combine this,
> make it consistent, and document it would be a bit of a pain, and also
> possibly prevent future modifications (because then the parts of the
> implementation would be set in stone).

We should mimic os.stat() and os.stat_result: os.stat_result symbol
exists in the os namespace, but the type constructor is not
documented. No need for extra protection like not adding the type in
the os module, or adding a "_" prefix to the name.

By the way, it's possible to serialize a stat_result with pickle.

See also my issue "Enhance doc of os.stat_result":
http://bugs.python.org/issue21813

> I'm not really opposed to a clear_cache() method -- basically it'd set
> _lstat and _stat and _d_type to None internally. However, I'd prefer
> to keep it as is, and as the PEP says: (...)

Ok, agreed.

Victor

From benhoyt at gmail.com  Tue Jul 22 04:27:09 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Mon, 21 Jul 2014 22:27:09 -0400
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
References: 
Message-ID: 

> I asked privately Guido van Rossum if I can be the BDFL-delegate for
> the PEP 471 and he agreed. I accept the latest version of the PEP:
>
>     http://legacy.python.org/dev/peps/pep-0471/

Thank you!

> The PEP also explicitly mentions that os.walk() will be modified to
> benefit of the new os.scandir() function.

Yes, this was a good suggestion to include that explicitly -- in
actual fact, speeding up os.walk() was my main goal initially.

> The PEP is accepted.

Superb. Could you please update the PEP with the Resolution and
BDFL-Delegate fields?

> It's time to review the implementation ;-) The current code can be found at:
>
>    https://github.com/benhoyt/scandir
>
> (I don't think that Ben already updated his implementation for the
> latest version of the PEP.)

I have actually updated my GitHub repo for the current PEP (did this
last Saturday). However, there are still a few open issues, the main
one is that my scandir.py module doesn't handle the bytes/str thing
properly.

I intend to work on the CPython implementation over the next few
weeks. However, a couple of thoughts up-front:

I think if I were doing this from scratch I'd reimplement listdir() in
Python as "return [e.name for e in scandir(path)]". However, I'm not
sure this is a good idea, as I don't really want listdir() to suddenly
use more memory and perform slightly *worse* due to the extra DirEntry
object allocations.

So my basic plan is to have an internal helper function in
posixmodule.c that either yields DirEntry objects or strings. And then
listdir() would simply be defined something like "return
list(_scandir(path, yield_strings=True))" in C or in Python.

My reasoning is that then there'll be much less (if any) code
duplication between scandir() and listdir().

Does this sound like a reasonable approach?

-Ben

From benhoyt at gmail.com  Tue Jul 22 04:32:10 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Mon, 21 Jul 2014 22:32:10 -0400
Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()
In-Reply-To: 
References: 
 
 
 
 
Message-ID: 

> We should mimic os.stat() and os.stat_result: os.stat_result symbol
> exists in the os namespace, but the type constructor is not
> documented. No need for extra protection like not adding the type in
> the os module, or adding a "_" prefix to the name.

Yeah, that works for me.

> By the way, it's possible to serialize a stat_result with pickle.

That makes sense, as stat_result is basically just a tuple and a bit
extra. I wonder if it should be possible to pickle DirEntry objects?
I'm thinking possibly not. If so, would it cache the stat or file type
info?

-Ben

From victor.stinner at gmail.com  Tue Jul 22 09:39:17 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 22 Jul 2014 09:39:17 +0200
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
References: 
 
Message-ID: 

Modify os.listdir() to use os.scandir() is not part of the PEP, you should
not do that. If you worry about performances, try to implement my free list
idea.

You may modify the C code of listdir() to share as much code as possible. I
mean you can implement your idea in C.

Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From 4kir4.1i at gmail.com  Tue Jul 22 09:33:41 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Tue, 22 Jul 2014 11:33:41 +0400
Subject: [Python-Dev] PEP 471 "scandir" accepted
References: 
 
Message-ID: <87r41donje.fsf@gmail.com>

Ben Hoyt  writes:

> I think if I were doing this from scratch I'd reimplement listdir() in
> Python as "return [e.name for e in scandir(path)]".
...
> So my basic plan is to have an internal helper function in
> posixmodule.c that either yields DirEntry objects or strings. And then
> listdir() would simply be defined something like "return
> list(_scandir(path, yield_strings=True))" in C or in Python.
>
> My reasoning is that then there'll be much less (if any) code
> duplication between scandir() and listdir().
>
> Does this sound like a reasonable approach?

Note: listdir() accepts an integer path (an open file descriptor that
refers to a directory) that is passed to fdopendir() on POSIX [4] i.e.,
*you can't use scandir() to replace listdir() in this case* (as I've
already mentioned in [1]). See the corresponding tests from [2].

[1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
[2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html

>From os.listdir() docs [3]:

> This function can also support specifying a file descriptor; the file
> descriptor must refer to a directory.

[3] https://docs.python.org/3.4/library/os.html#os.listdir
[4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736


--
Akira


From benhoyt at gmail.com  Tue Jul 22 17:52:45 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 22 Jul 2014 11:52:45 -0400
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: <87r41donje.fsf@gmail.com>
References: 
 
 <87r41donje.fsf@gmail.com>
Message-ID: 

> Note: listdir() accepts an integer path (an open file descriptor that
> refers to a directory) that is passed to fdopendir() on POSIX [4] i.e.,
> *you can't use scandir() to replace listdir() in this case* (as I've
> already mentioned in [1]). See the corresponding tests from [2].
>
> [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
> [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html
>
> From os.listdir() docs [3]:
>
>> This function can also support specifying a file descriptor; the file
>> descriptor must refer to a directory.
>
> [3] https://docs.python.org/3.4/library/os.html#os.listdir
> [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736

Fair point.

Yes, I hadn't realized listdir supported dir_fd (must have been
looking at 2.x docs), though you've pointed it out at [1] above. and I
guess I wasn't thinking about implementation at the time.

It would be easy enough (I think) to have the helper function support
both, but raise an error in the scandir() function if the type of path
is an integer.

However, given that we have to support this for listdir() anyway, I
think it's worth reconsidering whether scandir()'s directory argument
can be an integer FD. Given that listdir() already supports it, it
will almost certainly be asked for later anyway for someone who's
porting some listdir code that uses an FD. Thoughts, Victor?

-Ben

From victor.stinner at gmail.com  Tue Jul 22 18:16:14 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 22 Jul 2014 18:16:14 +0200
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
References: 
 
 <87r41donje.fsf@gmail.com>
 
Message-ID: 

2014-07-22 17:52 GMT+02:00 Ben Hoyt :
> However, given that we have to support this for listdir() anyway, I
> think it's worth reconsidering whether scandir()'s directory argument
> can be an integer FD. Given that listdir() already supports it, it
> will almost certainly be asked for later anyway for someone who's
> porting some listdir code that uses an FD. Thoughts, Victor?

Please focus on what was accepted in the PEP. We should first test
os.scandir(). In a few months, with better feedbacks, we can consider
extending os.scandir() to support a file descriptor. There are
different issues which should be discussed and decided to implement it
(ex: handle the lifetime of the directory file descriptor).

Victor

From ncoghlan at gmail.com  Tue Jul 22 22:57:18 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 23 Jul 2014 06:57:18 +1000
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
References: 
 
 <87r41donje.fsf@gmail.com>
 
 
Message-ID: 

On 23 Jul 2014 02:18, "Victor Stinner"  wrote:
>
> 2014-07-22 17:52 GMT+02:00 Ben Hoyt :
> > However, given that we have to support this for listdir() anyway, I
> > think it's worth reconsidering whether scandir()'s directory argument
> > can be an integer FD. Given that listdir() already supports it, it
> > will almost certainly be asked for later anyway for someone who's
> > porting some listdir code that uses an FD. Thoughts, Victor?
>
> Please focus on what was accepted in the PEP. We should first test
> os.scandir(). In a few months, with better feedbacks, we can consider
> extending os.scandir() to support a file descriptor. There are
> different issues which should be discussed and decided to implement it
> (ex: handle the lifetime of the directory file descriptor).

As Victor suggests, getting the core version working and incorporated first
is a good way to go. Future enhancements (like accepting a file descriptor)
and refactorings (like eliminating the code duplication with listdir) don't
need to (and hence shouldn't) go into the initial patch.

Cheers,
Nick.

>
> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From alex.gaynor at gmail.com  Tue Jul 22 23:03:36 2014
From: alex.gaynor at gmail.com (Alex Gaynor)
Date: Tue, 22 Jul 2014 21:03:36 +0000 (UTC)
Subject: [Python-Dev] =?utf-8?q?=5BPEP466=5D_SSLSockets=2C_and_sockets=2C_?=
	=?utf-8?q?=5Fsocketobjects_oh_my!?=
Message-ID: 

Hi all,

I've been happily working on the SSL module backports for Python2 (pursuant to
PEP466), and I've hit something of a snag:

In python3, the SSLSocket keeps a weak reference to the underlying socket,
rather than a strong reference, as Python2 uses.

Unfortunately, due to the way sockets work in Python2, this doesn't work:

On Python2, _socketobject composes around _real_socket from the _socket module,
whereas on Python3, it subclasses _socket.socket. Since you now have a Python-
level class, you can weak reference it.

The question is:

a) Should we backport weak referencing _socket.sockets (changing the structure
   of the module seems overly invasive, albeit completely backwards
   compatible)?
b) Does anyone know why weak references are used in the first place? The commit
   message just alludes to fixing a leak with no reference to an issue.

Anyone who's interested in the state of the branch can see it at:
github.com/alex/cpython on the backport-ssl branch. Note that many many tests
are still failing, and you'll need to apply the patch from
http://bugs.python.org/issue22023 to get it to work.

Thanks,
Alex

PS: Any help in getting http://bugs.python.org/issue22023 landed which be very
much appreciated.


From benhoyt at gmail.com  Tue Jul 22 23:07:37 2014
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 22 Jul 2014 17:07:37 -0400
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
References: 
 
 <87r41donje.fsf@gmail.com>
 
 
 
Message-ID: 

Makes sense, thanks. -Ben

On Tue, Jul 22, 2014 at 4:57 PM, Nick Coghlan  wrote:
>
> On 23 Jul 2014 02:18, "Victor Stinner"  wrote:
>>
>> 2014-07-22 17:52 GMT+02:00 Ben Hoyt :
>> > However, given that we have to support this for listdir() anyway, I
>> > think it's worth reconsidering whether scandir()'s directory argument
>> > can be an integer FD. Given that listdir() already supports it, it
>> > will almost certainly be asked for later anyway for someone who's
>> > porting some listdir code that uses an FD. Thoughts, Victor?
>>
>> Please focus on what was accepted in the PEP. We should first test
>> os.scandir(). In a few months, with better feedbacks, we can consider
>> extending os.scandir() to support a file descriptor. There are
>> different issues which should be discussed and decided to implement it
>> (ex: handle the lifetime of the directory file descriptor).
>
> As Victor suggests, getting the core version working and incorporated first
> is a good way to go. Future enhancements (like accepting a file descriptor)
> and refactorings (like eliminating the code duplication with listdir) don't
> need to (and hence shouldn't) go into the initial patch.
>
> Cheers,
> Nick.
>
>>
>> Victor
>
>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

From antoine at python.org  Tue Jul 22 23:25:27 2014
From: antoine at python.org (Antoine Pitrou)
Date: Tue, 22 Jul 2014 17:25:27 -0400
Subject: [Python-Dev] [PEP466] SSLSockets, and sockets,
	_socketobjects oh my!
In-Reply-To: 
References: 
Message-ID: 

Le 22/07/2014 17:03, Alex Gaynor a ?crit :
>
> The question is:
>
> a) Should we backport weak referencing _socket.sockets (changing the structure
>     of the module seems overly invasive, albeit completely backwards
>     compatible)?
> b) Does anyone know why weak references are used in the first place? The commit
>     message just alludes to fixing a leak with no reference to an issue.

Because :
- the SSLSocket has a strong reference to the ssl object (self._sslobj)
- self._sslobj having a strong reference to the SSLSocket would mean 
both would only get destroyed on a GC collection

I assume that's what "leak" means here :-)

As for 2.x, I don't see why you couldn't just continue using a strong 
reference.

Regards

Antoine.



From ncoghlan at gmail.com  Tue Jul 22 23:44:54 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 23 Jul 2014 07:44:54 +1000
Subject: [Python-Dev] [PEP466] SSLSockets, and sockets,
	_socketobjects oh my!
In-Reply-To: 
References: 
 
Message-ID: 

On 23 Jul 2014 07:28, "Antoine Pitrou"  wrote:
>
> Le 22/07/2014 17:03, Alex Gaynor a ?crit :
>
>>
>> The question is:
>>
>> a) Should we backport weak referencing _socket.sockets (changing the
structure
>>     of the module seems overly invasive, albeit completely backwards
>>     compatible)?
>> b) Does anyone know why weak references are used in the first place? The
commit
>>     message just alludes to fixing a leak with no reference to an issue.
>
>
> Because :
> - the SSLSocket has a strong reference to the ssl object (self._sslobj)
> - self._sslobj having a strong reference to the SSLSocket would mean both
would only get destroyed on a GC collection
>
> I assume that's what "leak" means here :-)
>
> As for 2.x, I don't see why you couldn't just continue using a strong
reference.

As Antoine says, if the cycle already exists in Python 2 (and it sounds
like it does), we can just skip backporting the weak reference change.

I'll also give the Fedora Python list a heads up about your repo to see if
anyone there can help you with the backport.

Cheers,
Nick.

>
> Regards
>
> Antoine.
>
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From victor.stinner at gmail.com  Tue Jul 22 23:57:53 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 22 Jul 2014 23:57:53 +0200
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
References: 
 
Message-ID: 

2014-07-22 4:27 GMT+02:00 Ben Hoyt :
>> The PEP is accepted.
>
> Superb. Could you please update the PEP with the Resolution and
> BDFL-Delegate fields?

Done.

Victor

From antoine at python.org  Wed Jul 23 01:00:18 2014
From: antoine at python.org (Antoine Pitrou)
Date: Tue, 22 Jul 2014 19:00:18 -0400
Subject: [Python-Dev] [PEP466] SSLSockets, and sockets,
	_socketobjects oh my!
In-Reply-To: 
References: 
 
 
Message-ID: 

Le 22/07/2014 17:44, Nick Coghlan a ?crit :
>
>  >
>  > As for 2.x, I don't see why you couldn't just continue using a strong
> reference.
>
> As Antoine says, if the cycle already exists in Python 2 (and it sounds
> like it does), we can just skip backporting the weak reference change.

No, IIRC there shouldn't be a cycle. It's just complicated in a 
different way than 3.x :-)

Regards

Antoine.



From 4kir4.1i at gmail.com  Wed Jul 23 01:21:14 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Wed, 23 Jul 2014 03:21:14 +0400
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
 (Ben Hoyt's message of "Tue, 22 Jul 2014 11:52:45 -0400")
References: 
 
 <87r41donje.fsf@gmail.com>
 
Message-ID: <871ttdnfo5.fsf@gmail.com>

Ben Hoyt  writes:

>> Note: listdir() accepts an integer path (an open file descriptor that
>> refers to a directory) that is passed to fdopendir() on POSIX [4] i.e.,
>> *you can't use scandir() to replace listdir() in this case* (as I've
>> already mentioned in [1]). See the corresponding tests from [2].
>>
>> [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
>> [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html
>>
>> From os.listdir() docs [3]:
>>
>>> This function can also support specifying a file descriptor; the file
>>> descriptor must refer to a directory.
>>
>> [3] https://docs.python.org/3.4/library/os.html#os.listdir
>> [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736
>
> Fair point.
>
> Yes, I hadn't realized listdir supported dir_fd (must have been
> looking at 2.x docs), though you've pointed it out at [1] above. and I
> guess I wasn't thinking about implementation at the time.

FYI, dir_fd is related but *different*: compare "specifying a file
descriptor" [1] vs. "paths relative to directory descriptors" [2].

"NOTE: os.supports_fd and os.supports_dir_fd are different sets." [3]:

  >>> import os
  >>> os.listdir in os.supports_fd
  True
  >>> os.listdir in os.supports_dir_fd
  False


[1] https://docs.python.org/3/library/os.html#path-fd
[2] https://docs.python.org/3/library/os.html#dir-fd
[3] https://mail.python.org/pipermail/python-dev/2014-July/135296.html

To be clear: *listdir() does not support dir_fd* though it can be
emulated using os.open(dir_fd=..).

You can safely ignore the rest of the e-mail until you want to implement
path-fd [1] support for os.scandir() in several months.

Here's code example that demonstrates both path-fd [1] and dir-fd [2]:

  import contextlib
  import os

  with contextlib.ExitStack() as stack:
      dir_fd = os.open('/etc', os.O_RDONLY)
      stack.callback(os.close, dir_fd)
      fd = os.open('init.d', os.O_RDONLY, dir_fd=dir_fd) # dir-fd [2]
      stack.callback(os.close, fd)
      print("\n".join(os.listdir(fd))) # path-fd [1]

It is the same as os.listdir('/etc/init.d') unless '/etc' is symlinked
to refer to another directory after the first os.open('/etc',..)
call. See also, os.fwalk(dir_fd=..) [4]

[4] https://docs.python.org/3/library/os.html#os.fwalk

> However, given that we have to support this for listdir() anyway, I
> think it's worth reconsidering whether scandir()'s directory argument
> can be an integer FD.

What is entry.path in this case? If input directory is a file descriptor
(an integer) then os.path.join(directory, entry.name) won't work.

"PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 )." [5]

[5] https://mail.python.org/pipermail/python-dev/2014-July/135441.html

On the other hand os.fwalk() [4] that supports both path-fd [1] and
dir-fd [2] could be implemented without entry.path property if
os.scandir() supports just path-fd [1]. os.fwalk() provides a safe way
to traverse a directory tree without symlink races e.g., [6]:

  def get_tree_size(directory):
      """Return total size of files in directory and subdirs."""
      return sum(entry.lstat().st_size
                 for root, dirs, files, rootfd in fwalk(directory)
                 for entry in files)

[6] http://legacy.python.org/dev/peps/pep-0471/#examples

where fwalk() is the exact copy of os.fwalk() except that it uses
_fwalk() which is defined in terms of scandir():

  import os

  # adapt os._fwalk() to use scandir() instead of os.listdir()
  def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks):
      # Note: This uses O(depth of the directory tree) file descriptors:
      # if necessary, it can be adapted to only require O(1) FDs, see
      # http://bugs.python.org/issue13734

      entries = scandir(topfd)
      dirs, nondirs = [], []
      for entry in entries: #XXX call onerror on OSError on next() and return?
          # report symlinks to directories as directories (like os.walk)
          #  but no recursion into symlinked subdirectories unless
          #  follow_symlinks is true

          # add dangling symlinks as nondirs (DirEntry.is_dir() doesn't
          #  raise on broken links)
          try:
              (dirs if entry.is_dir() else nondirs).append(entry)
          except FileNotFoundError:
              continue # ignore disappeared files

      if topdown:
          yield toppath, dirs, nondirs, topfd

      for entry in dirs:
          try:
              orig_st = entry.stat(follow_symlinks=follow_symlinks)
              #XXX O_DIRECTORY, O_CLOEXEC, [? O_NOCTTY, O_SEARCH ?]
              dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd)
          except OSError as err:
              if onerror is not None:
                  onerror(err)
              return
          try:
              if follow_symlinks or os.path.samestat(orig_st, os.stat(dirfd)):
                  dirpath = os.path.join(toppath, entry.name) # entry.path
                  yield from _fwalk(dirfd, dirpath, topdown, onerror,
                                    follow_symlinks)
          finally:
              close(dirfd) # or use with entry.opendir() as dirfd: ...

      if not topdown:
          yield toppath, dirs, nondirs, topfd


i.e., if os.scandir() supports specifying file descriptors [1] then it
is relatively straightforward to define os.fwalk() in terms of it. Would
scandir() provide the same performance benefits as for os.walk()?

entry.stat() can be implemented without entry.path when entry._directory
(or whatever other DirEntry's attribute that stores the first parameter
to os.scandir(fd)) is an open file descriptor that refers to a directory:

  def stat(self, *, follow_symlinks=True):
      return os.stat(self.name, #NOTE: ignore caching
          follow_symlinks=follow_symlinks, dir_fd=self._directory)
  lstat = lambda self: self.stat(follow_symlinks=False)


--
Akira

From antoine at python.org  Wed Jul 23 03:23:16 2014
From: antoine at python.org (Antoine Pitrou)
Date: Tue, 22 Jul 2014 21:23:16 -0400
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
References: 
Message-ID: 

Le 21/07/2014 18:26, Victor Stinner a ?crit :
>
> I'm happy because the final API is very close to os.path functions and
> pathlib.Path methods. Python stays consistent, which is a great power
> of this language!

By the way, http://bugs.python.org/issue19767 could benefit too.

Regards

Antoine.



From alex.gaynor at gmail.com  Wed Jul 23 21:36:07 2014
From: alex.gaynor at gmail.com (Alex Gaynor)
Date: Wed, 23 Jul 2014 19:36:07 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPEP466=5D_SSLSockets=2C_and_sockets=2C?=
	=?utf-8?q?=09=5Fsocketobjects_oh_my!?=
References: 
 
 
 
Message-ID: 

Antoine Pitrou  python.org> writes:
> No, IIRC there shouldn't be a cycle. It's just complicated in a 
> different way than 3.x 
> 
> Regards
> 
> Antoine.
> 

Indeed, you're right, this is just differently convoluted so no leak (not that
I would call "collected by a normal GC" a leak :-)).

That said, I've hit another issue, with SNI callbacks. The first argument to an
SNI callback is the socket. The callback is set up by some C code, which right
now has access to only the _socket.socket object, not the ssl.SSLSocket object,
which is what the public API needs there.

Possible solutions are:

* Pass the SSLObject *in addition* to the _socket.socket object to the C code.
  This generates some additional divergence from the Python3 code, but is
  probably basically straightforward.
* Try to refactor the socket code in the same way as Python3 did, so we can
  pass *only* the SSLObject here. This is some nasty scope creep for PEP466,
  but would make the overall _ssl.c diff smaller.
* Some super sweet and simple thing I haven't thought of yet.

Thoughts?

By way of a general status update, the only failing tests left are this, and a
few things about SSLError's str(), so this will hopefully be ready to upload
any day now for review.

Cheers,
Alex

PS: Please review and merge http://bugs.python.org/issue22023 :-)


From antoine at python.org  Wed Jul 23 23:02:26 2014
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 23 Jul 2014 17:02:26 -0400
Subject: [Python-Dev] [PEP466] SSLSockets, and sockets,
	_socketobjects oh my!
In-Reply-To: 
References: 
 
 
  
Message-ID: 

Le 23/07/2014 15:36, Alex Gaynor a ?crit :
>
> That said, I've hit another issue, with SNI callbacks. The first argument to an
> SNI callback is the socket. The callback is set up by some C code, which right
> now has access to only the _socket.socket object, not the ssl.SSLSocket object,
> which is what the public API needs there.
>
> Possible solutions are:
>
> * Pass the SSLObject *in addition* to the _socket.socket object to the C code.
>    This generates some additional divergence from the Python3 code, but is
>    probably basically straightforward.

You mean for use with SSL_set_app_data?




From alex.gaynor at gmail.com  Wed Jul 23 23:10:39 2014
From: alex.gaynor at gmail.com (Alex Gaynor)
Date: Wed, 23 Jul 2014 21:10:39 +0000 (UTC)
Subject: [Python-Dev]
	=?utf-8?q?=5BPEP466=5D_SSLSockets=2C_and_sockets=2C?=
	=?utf-8?q?=09=5Fsocketobjects_oh_my!?=
References: 
 
 
  
 
Message-ID: 

Antoine Pitrou  python.org> writes:

> 
> You mean for use with SSL_set_app_data?

Yes, if you look in ``_servername_callback``, you can see where it uses 
``SSL_get_app_data`` and then reads ``ssl->Socket``, which is supposed to be 
the same object that's returned by ``context.wrap_socket()``. 

Alex



From ncoghlan at gmail.com  Thu Jul 24 00:06:26 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 24 Jul 2014 08:06:26 +1000
Subject: [Python-Dev] [PEP466] SSLSockets, and sockets,
	_socketobjects oh my!
In-Reply-To: 
References: 
 
 
 
 
Message-ID: 

On 24 Jul 2014 05:37, "Alex Gaynor"  wrote:
>
> Possible solutions are:
>
> * Pass the SSLObject *in addition* to the _socket.socket object to the C
code.
>   This generates some additional divergence from the Python3 code, but is
>   probably basically straightforward.
> * Try to refactor the socket code in the same way as Python3 did, so we
can
>   pass *only* the SSLObject here. This is some nasty scope creep for
PEP466,
>   but would make the overall _ssl.c diff smaller.
> * Some super sweet and simple thing I haven't thought of yet.
>
> Thoughts?

Wearing my "risk management" hat, option 1 sounds significantly more
appealing than option 2 :)

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ethan at stoneleaf.us  Thu Jul 24 02:34:13 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 23 Jul 2014 17:34:13 -0700
Subject: [Python-Dev] PEP 471 "scandir" accepted
In-Reply-To: 
References: 
Message-ID: <53D05485.8050406@stoneleaf.us>

On 07/21/2014 03:26 PM, Victor Stinner wrote:
>
> The PEP is accepted.

Thanks, Victor!

Congratulations, Ben!

--
~Ethan~

From phil at riverbankcomputing.com  Thu Jul 24 18:55:15 2014
From: phil at riverbankcomputing.com (Phil Thompson)
Date: Thu, 24 Jul 2014 17:55:15 +0100
Subject: [Python-Dev] =?utf-8?q?Does_Zip_Importer_have_to_be_Special=3F?=
Message-ID: 

I have an importer for use in applications that embed an interpreter 
that does a similar job to the Zip importer (except that the storage is 
a C data structure rather than a .zip file). Just like the Zip importer 
I need to import my importer and add it to sys.path_hooks. However the 
earliest opportunity I have to do this is after the Py_Initialize() call 
returns - but this is too late because some parts of the standard 
library have already needed to be imported.

My current workaround is to include a modified version of _bootstrap.py 
as a frozen module that has the necessary steps added to the end of its 
_install() function.

The Zip importer doesn't have this problem because it gets special 
treatment - the call to its equivalent code is hard-coded and happens 
exactly when needed.

What would help is a table of functions that were called where 
_PyImportZip_Init() is currently called. By default the only entry in 
the table would be _PyImportZip_Init. There would be a way of modifying 
the table, either like how PyImport_FrozenModules is handled or how 
Inittab is handled.

...or if there is a better solution that I have missed that doesn't 
require a modified _bootstrap.py.

Thanks,
Phil

From brett at python.org  Thu Jul 24 19:48:59 2014
From: brett at python.org (Brett Cannon)
Date: Thu, 24 Jul 2014 17:48:59 +0000
Subject: [Python-Dev] Does Zip Importer have to be Special?
References: 
Message-ID: 

On Thu Jul 24 2014 at 1:07:12 PM, Phil Thompson 
wrote:

> I have an importer for use in applications that embed an interpreter
> that does a similar job to the Zip importer (except that the storage is
> a C data structure rather than a .zip file). Just like the Zip importer
> I need to import my importer and add it to sys.path_hooks. However the
> earliest opportunity I have to do this is after the Py_Initialize() call
> returns - but this is too late because some parts of the standard
> library have already needed to be imported.
>
> My current workaround is to include a modified version of _bootstrap.py
> as a frozen module that has the necessary steps added to the end of its
> _install() function.
>
> The Zip importer doesn't have this problem because it gets special
> treatment - the call to its equivalent code is hard-coded and happens
> exactly when needed.
>
> What would help is a table of functions that were called where
> _PyImportZip_Init() is currently called. By default the only entry in
> the table would be _PyImportZip_Init. There would be a way of modifying
> the table, either like how PyImport_FrozenModules is handled or how
> Inittab is handled.
>
> ...or if there is a better solution that I have missed that doesn't
> require a modified _bootstrap.py.
>

Basically you want a way to specify arguments into
importlib._bootstrap._install() so that sys.path_hooks and sys.meta_path
were configurable instead of hard-coded (it could also be done just past
importlib being installed, but that's a minor detail). Either way there is
technically no reason not to allow for it, just lack of motivation since
this would only come up for people who embed the interpreter AND have a
custom importer which affects loading the stdlib as well (any reason you
can't freeze the stdblib as a solution?).

We could go the route of some static array that people could modify.
Another option would be to allow for the specification of a single function
which is called just prior to importing the rest of the stdlib,

The problem with all of this is you are essentially asking for a hook to
let you have code have access to the interpreter state before it is fully
initialized. Zipimport and the various bits of code that get loaded during
startup are special since they are coded to avoid touching anything that
isn't ready to be used. So if we expose something that allows access prior
to full initialization it would have to be documented as having no
guarantees of interpreter state, etc. so we are not held to some API that
makes future improvements difficult.

IOW allowing for easy patching of Python is probably the best option I can
think of. Would tweaking importlib._bootstrap._install() to accept
specified values for sys.meta_path and sys.path_hooks be enough so that you
can change the call site for those functions?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From phil at riverbankcomputing.com  Thu Jul 24 20:12:13 2014
From: phil at riverbankcomputing.com (Phil Thompson)
Date: Thu, 24 Jul 2014 19:12:13 +0100
Subject: [Python-Dev] =?utf-8?q?Does_Zip_Importer_have_to_be_Special=3F?=
In-Reply-To: 
References: 
 
Message-ID: 

On 24/07/2014 6:48 pm, Brett Cannon wrote:
> On Thu Jul 24 2014 at 1:07:12 PM, Phil Thompson 
> 
> wrote:
> 
>> I have an importer for use in applications that embed an interpreter
>> that does a similar job to the Zip importer (except that the storage 
>> is
>> a C data structure rather than a .zip file). Just like the Zip 
>> importer
>> I need to import my importer and add it to sys.path_hooks. However the
>> earliest opportunity I have to do this is after the Py_Initialize() 
>> call
>> returns - but this is too late because some parts of the standard
>> library have already needed to be imported.
>> 
>> My current workaround is to include a modified version of 
>> _bootstrap.py
>> as a frozen module that has the necessary steps added to the end of 
>> its
>> _install() function.
>> 
>> The Zip importer doesn't have this problem because it gets special
>> treatment - the call to its equivalent code is hard-coded and happens
>> exactly when needed.
>> 
>> What would help is a table of functions that were called where
>> _PyImportZip_Init() is currently called. By default the only entry in
>> the table would be _PyImportZip_Init. There would be a way of 
>> modifying
>> the table, either like how PyImport_FrozenModules is handled or how
>> Inittab is handled.
>> 
>> ...or if there is a better solution that I have missed that doesn't
>> require a modified _bootstrap.py.
>> 
> 
> Basically you want a way to specify arguments into
> importlib._bootstrap._install() so that sys.path_hooks and 
> sys.meta_path
> were configurable instead of hard-coded (it could also be done just 
> past
> importlib being installed, but that's a minor detail). Either way there 
> is
> technically no reason not to allow for it, just lack of motivation 
> since
> this would only come up for people who embed the interpreter AND have a
> custom importer which affects loading the stdlib as well (any reason 
> you
> can't freeze the stdblib as a solution?).

Not really. I'd lose the compression my importer implements.

(Are there any problems with freezing packages rather than simple 
modules?)

> We could go the route of some static array that people could modify.
> Another option would be to allow for the specification of a single 
> function
> which is called just prior to importing the rest of the stdlib,
> 
> The problem with all of this is you are essentially asking for a hook 
> to
> let you have code have access to the interpreter state before it is 
> fully
> initialized. Zipimport and the various bits of code that get loaded 
> during
> startup are special since they are coded to avoid touching anything 
> that
> isn't ready to be used. So if we expose something that allows access 
> prior
> to full initialization it would have to be documented as having no
> guarantees of interpreter state, etc. so we are not held to some API 
> that
> makes future improvements difficult.
> 
> IOW allowing for easy patching of Python is probably the best option I 
> can
> think of. Would tweaking importlib._bootstrap._install() to accept
> specified values for sys.meta_path and sys.path_hooks be enough so that 
> you
> can change the call site for those functions?

My importer runs under PathFinder so it needs sys.path as well (and 
doesn't need sys.meta_path).

Phil

From brett at python.org  Thu Jul 24 20:26:21 2014
From: brett at python.org (Brett Cannon)
Date: Thu, 24 Jul 2014 18:26:21 +0000
Subject: [Python-Dev] Does Zip Importer have to be Special?
References: 
 
 
Message-ID: 

On Thu Jul 24 2014 at 2:12:20 PM, Phil Thompson 
wrote:

> On 24/07/2014 6:48 pm, Brett Cannon wrote:
> > On Thu Jul 24 2014 at 1:07:12 PM, Phil Thompson
> > 
> > wrote:
> >
> >> I have an importer for use in applications that embed an interpreter
> >> that does a similar job to the Zip importer (except that the storage
> >> is
> >> a C data structure rather than a .zip file). Just like the Zip
> >> importer
> >> I need to import my importer and add it to sys.path_hooks. However the
> >> earliest opportunity I have to do this is after the Py_Initialize()
> >> call
> >> returns - but this is too late because some parts of the standard
> >> library have already needed to be imported.
> >>
> >> My current workaround is to include a modified version of
> >> _bootstrap.py
> >> as a frozen module that has the necessary steps added to the end of
> >> its
> >> _install() function.
> >>
> >> The Zip importer doesn't have this problem because it gets special
> >> treatment - the call to its equivalent code is hard-coded and happens
> >> exactly when needed.
> >>
> >> What would help is a table of functions that were called where
> >> _PyImportZip_Init() is currently called. By default the only entry in
> >> the table would be _PyImportZip_Init. There would be a way of
> >> modifying
> >> the table, either like how PyImport_FrozenModules is handled or how
> >> Inittab is handled.
> >>
> >> ...or if there is a better solution that I have missed that doesn't
> >> require a modified _bootstrap.py.
> >>
> >
> > Basically you want a way to specify arguments into
> > importlib._bootstrap._install() so that sys.path_hooks and
> > sys.meta_path
> > were configurable instead of hard-coded (it could also be done just
> > past
> > importlib being installed, but that's a minor detail). Either way there
> > is
> > technically no reason not to allow for it, just lack of motivation
> > since
> > this would only come up for people who embed the interpreter AND have a
> > custom importer which affects loading the stdlib as well (any reason
> > you
> > can't freeze the stdblib as a solution?).
>
> Not really. I'd lose the compression my importer implements.
>
> (Are there any problems with freezing packages rather than simple
> modules?)
>

Nope, modules and packages are both supported.


>
> > We could go the route of some static array that people could modify.
> > Another option would be to allow for the specification of a single
> > function
> > which is called just prior to importing the rest of the stdlib,
> >
> > The problem with all of this is you are essentially asking for a hook
> > to
> > let you have code have access to the interpreter state before it is
> > fully
> > initialized. Zipimport and the various bits of code that get loaded
> > during
> > startup are special since they are coded to avoid touching anything
> > that
> > isn't ready to be used. So if we expose something that allows access
> > prior
> > to full initialization it would have to be documented as having no
> > guarantees of interpreter state, etc. so we are not held to some API
> > that
> > makes future improvements difficult.
> >
> > IOW allowing for easy patching of Python is probably the best option I
> > can
> > think of. Would tweaking importlib._bootstrap._install() to accept
> > specified values for sys.meta_path and sys.path_hooks be enough so that
> > you
> > can change the call site for those functions?
>
> My importer runs under PathFinder so it needs sys.path as well (and
> doesn't need sys.meta_path).
>

sys.path can be set via PYTHONPATH, etc. so that shouldn't be as much of an
issue.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From ncoghlan at gmail.com  Thu Jul 24 22:42:39 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Jul 2014 06:42:39 +1000
Subject: [Python-Dev] Does Zip Importer have to be Special?
In-Reply-To: 
References: 
 
Message-ID: 

On 25 Jul 2014 03:51, "Brett Cannon"  wrote:

> The problem with all of this is you are essentially asking for a hook to
let you have code have access to the interpreter state before it is fully
initialized. Zipimport and the various bits of code that get loaded during
startup are special since they are coded to avoid touching anything that
isn't ready to be used. So if we expose something that allows access prior
to full initialization it would have to be documented as having no
guarantees of interpreter state, etc. so we are not held to some API that
makes future improvements difficult.

Note that this is *exactly* the problem PEP 432 is designed to handle:
separating the configuration of the core interpreter from the configuration
of the operating system interfaces, so the latter can run relatively
normally (at least compared to today).

As you say, though it's a niche problem compared to something like
packaging, which is why it got bumped down my personal priority list. I
haven't even got back to the first preparatory step I identified which is
to separate out our main functions to a separate "Programs" directory so
it's easier to distinguish "embeds Python" sections of the code from the
more typical "is part of Python" and "extends Python" code.

> IOW allowing for easy patching of Python is probably the best option I
can think of.

Yeah, that sounds reasonable - IIRC, Christian ended up going with a
similar "make it patch friendly" approach for the hashing changes, rather
than going overboard with configuration options.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From phil at riverbankcomputing.com  Fri Jul 25 11:33:41 2014
From: phil at riverbankcomputing.com (Phil Thompson)
Date: Fri, 25 Jul 2014 10:33:41 +0100
Subject: [Python-Dev] =?utf-8?q?Does_Zip_Importer_have_to_be_Special=3F?=
In-Reply-To: 
References: 
 
 
Message-ID: 

On 24/07/2014 9:42 pm, Nick Coghlan wrote:
> On 25 Jul 2014 03:51, "Brett Cannon"  wrote:
> 
>> The problem with all of this is you are essentially asking for a hook 
>> to
> let you have code have access to the interpreter state before it is 
> fully
> initialized. Zipimport and the various bits of code that get loaded 
> during
> startup are special since they are coded to avoid touching anything 
> that
> isn't ready to be used. So if we expose something that allows access 
> prior
> to full initialization it would have to be documented as having no
> guarantees of interpreter state, etc. so we are not held to some API 
> that
> makes future improvements difficult.
> 
> Note that this is *exactly* the problem PEP 432 is designed to handle:
> separating the configuration of the core interpreter from the 
> configuration
> of the operating system interfaces, so the latter can run relatively
> normally (at least compared to today).

The implementation of PEP 432 would be great.

> As you say, though it's a niche problem compared to something like
> packaging, which is why it got bumped down my personal priority list. I
> haven't even got back to the first preparatory step I identified which 
> is
> to separate out our main functions to a separate "Programs" directory 
> so
> it's easier to distinguish "embeds Python" sections of the code from 
> the
> more typical "is part of Python" and "extends Python" code.

Is there any way for somebody you don't trust :) to be able to help move 
it forward?

Phil

From phil at riverbankcomputing.com  Fri Jul 25 11:36:18 2014
From: phil at riverbankcomputing.com (Phil Thompson)
Date: Fri, 25 Jul 2014 10:36:18 +0100
Subject: [Python-Dev] =?utf-8?q?Does_Zip_Importer_have_to_be_Special=3F?=
In-Reply-To: 
References: 
 
 
 
Message-ID: <43d9658bcad1ed2e82f89314bfdd9fcd@www.riverbankcomputing.com>

On 24/07/2014 7:26 pm, Brett Cannon wrote:
> On Thu Jul 24 2014 at 2:12:20 PM, Phil Thompson 
> 
> wrote:
> 
>> On 24/07/2014 6:48 pm, Brett Cannon wrote:
>> > IOW allowing for easy patching of Python is probably the best option I
>> > can
>> > think of. Would tweaking importlib._bootstrap._install() to accept
>> > specified values for sys.meta_path and sys.path_hooks be enough so that
>> > you
>> > can change the call site for those functions?
>> 
>> My importer runs under PathFinder so it needs sys.path as well (and
>> doesn't need sys.meta_path).
> 
> sys.path can be set via PYTHONPATH, etc. so that shouldn't be as much 
> of an
> issue.

I prefer to have Py_IgnoreEnvironmentFlag set.

Also I'm not clear at what point I would import my custom importer?

Phil

From ncoghlan at gmail.com  Fri Jul 25 14:30:54 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Jul 2014 22:30:54 +1000
Subject: [Python-Dev] Does Zip Importer have to be Special?
In-Reply-To: 
References: 
 
 
 
Message-ID: 

On 25 July 2014 19:33, Phil Thompson  wrote:
> On 24/07/2014 9:42 pm, Nick Coghlan wrote:
>> As you say, though it's a niche problem compared to something like
>> packaging, which is why it got bumped down my personal priority list. I
>> haven't even got back to the first preparatory step I identified which is
>> to separate out our main functions to a separate "Programs" directory so
>> it's easier to distinguish "embeds Python" sections of the code from the
>> more typical "is part of Python" and "extends Python" code.
>
>
> Is there any way for somebody you don't trust :) to be able to help move it
> forward?

This thread prompted me to finally commit one of the smaller pieces of
preparatory refactoring, moving the 3 applications we have that embed
the CPython runtime out to a separate directory:
http://bugs.python.org/issue18093 (that seems like a trivial change,
but I found it made a surprisingly big difference when trying to keep
the various moving parts of the initialisation sequence straight in my
head)

The other preparatory refactoring would be to split the monster
pythonrun.c file in 2, by creating a separate "lifecycle.c" file. In
my original PEP 432 branch I split it into 3 (pythonrun.c,
bootstrap.c, shutdown.c) but that's actually quite an intrusive change
- you end up have to expose a lot of otherwise static variables to the
linker so the startup and shutdown code can both see them. Splitting
in two should achieve most of the same benefits (i.e. separating the
lifecycle management of the interpreter itself from the normal runtime
operation code) without having to expose so much additional
information to the linker (and hence change the names to include the
_Py prefix).

The origin of those refactorings is the fact that attempting to merge
the default branch into my PEP 432 development branch
(https://bitbucket.org/ncoghlan/cpython_sandbox/branch/pep432_modular_bootstrap)
was generally a pain due to the merge conflicts around the structural
changes. Doing the structural refactorings *first* makes it more
feasible to work on the patch and do regular merges in from default.
Since these are areas that aren't likely to change in a maintenance
release, the risk of merge conflicts when merging forward from 3.4 to
default is low even with code moved around on default. By contrast, I
regularly hit significant problems when trying to merge from default
to the feature branch.

The existing feature branch is dated enough now (more than 18 months
since the last commit!) that I wouldn't try to use it directly.
Instead, I'd recommend starting a new clone based on the GitHub or
BitBucket mirror (according to version control system and hosting
service preference), and then use the current PEP draft and my old
feature branch as a point of reference for starting another
implementation attempt. (You may also be able to find some interested
collaborators on http://bugs.python.org/issue13533, as I suspect PEP
432 is a prerequisite to resolving their issues as well)

Cheers,
Nick.

P.S. I'm also starting to think that PEP 432 may pave the way for a
locale independent startup sequence, which would let us offer a "-X
utf8" option to tell the interpreter to ignore the OS locale settings
entirely when deciding which encodings to use for various things. That
would be a possible future enhancement rather than something to pursue
in the initial implementation, though.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From status at bugs.python.org  Fri Jul 25 18:07:56 2014
From: status at bugs.python.org (Python tracker)
Date: Fri, 25 Jul 2014 18:07:56 +0200 (CEST)
Subject: [Python-Dev] Summary of Python tracker Issues
Message-ID: <20140725160756.508F2568DE@psf.upfronthosting.co.za>


ACTIVITY SUMMARY (2014-07-18 - 2014-07-25)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open    4591 ( +2)
  closed 29248 (+60)
  total  33839 (+62)

Open issues with patches: 2160 


Issues opened (42)
==================

#19884: Importing readline produces erroneous output
http://bugs.python.org/issue19884  reopened by haypo

#22010: Idle: better management of Shell window output
http://bugs.python.org/issue22010  opened by terry.reedy

#22011: test_os extended attribute setxattr tests can fail with ENOSPC
http://bugs.python.org/issue22011  opened by Hibou57

#22012: struct.unpack('?', '\x02') returns (False,) on Mac OSX
http://bugs.python.org/issue22012  opened by wayedt

#22013: Add at least minimal support for thread groups
http://bugs.python.org/issue22013  opened by rhettinger

#22014: Add summary table for OS exception <-> errno mapping
http://bugs.python.org/issue22014  opened by ncoghlan

#22016: Add a new 'surrogatereplace' output only error handler
http://bugs.python.org/issue22016  opened by ncoghlan

#22018: Add a new signal.set_wakeup_socket() function
http://bugs.python.org/issue22018  opened by haypo

#22021: shutil.make_archive()  root_dir do not work
http://bugs.python.org/issue22021  opened by DemoHT

#22023: PyUnicode_FromFormat is broken on python 2
http://bugs.python.org/issue22023  opened by alex

#22024: Add to shutil the ability to wait until files are definitely d
http://bugs.python.org/issue22024  opened by zach.ware

#22025: webbrowser.get(command_line) does not support Windows-style pa
http://bugs.python.org/issue22025  opened by dan.oreilly

#22027: RFC 6531 (SMTPUTF8) support in smtplib
http://bugs.python.org/issue22027  opened by zvyn

#22028: Python 3.4.1 Installer ended prematurely (Windows msi)
http://bugs.python.org/issue22028  opened by DieInSente

#22029: argparse - CSS white-space: like control for individual text b
http://bugs.python.org/issue22029  opened by paul.j3

#22033: Subclass friendly reprs
http://bugs.python.org/issue22033  opened by serhiy.storchaka

#22034: posixpath.join() and bytearray
http://bugs.python.org/issue22034  opened by serhiy.storchaka

#22035: Fatal error in dbm.gdbm
http://bugs.python.org/issue22035  opened by serhiy.storchaka

#22038: Implement atomic operations on non-x86 platforms
http://bugs.python.org/issue22038  opened by Vitor.de.Lima

#22039: PyObject_SetAttr doesn't mention value = NULL
http://bugs.python.org/issue22039  opened by pitrou

#22041: http POST request with python 3.3 through web proxy
http://bugs.python.org/issue22041  opened by AlexMJ

#22042: signal.set_wakeup_fd(fd): set the fd to non-blocking mode
http://bugs.python.org/issue22042  opened by haypo

#22043: Use a monotonic clock to compute timeouts
http://bugs.python.org/issue22043  opened by haypo

#22044: Premature Py_DECREF while generating a TypeError in call_tzinf
http://bugs.python.org/issue22044  opened by Knio

#22045: Python make issue
http://bugs.python.org/issue22045  opened by skerr

#22046: ZipFile.read() should mention that it might throw NotImplement
http://bugs.python.org/issue22046  opened by detly

#22047: argparse improperly prints mutually exclusive options when the
http://bugs.python.org/issue22047  opened by Sam.Kerr

#22049: argparse: type= doesn't honor nargs > 1
http://bugs.python.org/issue22049  opened by Chris.Bruner

#22051: Turtledemo: stop reloading demos
http://bugs.python.org/issue22051  opened by terry.reedy

#22052: Comparison operators called in reverse order for subclasses wi
http://bugs.python.org/issue22052  opened by mark.dickinson

#22054: Add os.get_blocking() and os.set_blocking() functions
http://bugs.python.org/issue22054  opened by haypo

#22057: The doc say all globals are copied on eval(), but only __built
http://bugs.python.org/issue22057  opened by amishne

#22058: datetime.datetime() should accept a datetime.date as init para
http://bugs.python.org/issue22058  opened by facundobatista

#22059: incorrect type conversion from str to bytes in asynchat module
http://bugs.python.org/issue22059  opened by hoxily

#22060: Clean up ctypes.test, use unittest test discovery
http://bugs.python.org/issue22060  opened by zach.ware

#22062: Fix pathlib.Path.(r)glob doc glitches.
http://bugs.python.org/issue22062  opened by terry.reedy

#22063: asyncio: sock_xxx() methods of event loops should make the soc
http://bugs.python.org/issue22063  opened by haypo

#22064: Misleading message from 2to3 when skipping optional fixers
http://bugs.python.org/issue22064  opened by ncoghlan

#22065: Update turtledemo menu creation
http://bugs.python.org/issue22065  opened by terry.reedy

#22066: subprocess.communicate() does not receive full output from the
http://bugs.python.org/issue22066  opened by juj

#22067: time_test fails after strptime()
http://bugs.python.org/issue22067  opened by serhiy.storchaka

#22068: test_gc fails after test_idle
http://bugs.python.org/issue22068  opened by serhiy.storchaka



Most recent 15 issues with no replies (15)
==========================================

#22067: time_test fails after strptime()
http://bugs.python.org/issue22067

#22066: subprocess.communicate() does not receive full output from the
http://bugs.python.org/issue22066

#22064: Misleading message from 2to3 when skipping optional fixers
http://bugs.python.org/issue22064

#22060: Clean up ctypes.test, use unittest test discovery
http://bugs.python.org/issue22060

#22057: The doc say all globals are copied on eval(), but only __built
http://bugs.python.org/issue22057

#22051: Turtledemo: stop reloading demos
http://bugs.python.org/issue22051

#22046: ZipFile.read() should mention that it might throw NotImplement
http://bugs.python.org/issue22046

#22045: Python make issue
http://bugs.python.org/issue22045

#22039: PyObject_SetAttr doesn't mention value = NULL
http://bugs.python.org/issue22039

#22035: Fatal error in dbm.gdbm
http://bugs.python.org/issue22035

#22034: posixpath.join() and bytearray
http://bugs.python.org/issue22034

#22033: Subclass friendly reprs
http://bugs.python.org/issue22033

#22027: RFC 6531 (SMTPUTF8) support in smtplib
http://bugs.python.org/issue22027

#22024: Add to shutil the ability to wait until files are definitely d
http://bugs.python.org/issue22024

#22016: Add a new 'surrogatereplace' output only error handler
http://bugs.python.org/issue22016



Most recent 15 issues waiting for review (15)
=============================================

#22068: test_gc fails after test_idle
http://bugs.python.org/issue22068

#22065: Update turtledemo menu creation
http://bugs.python.org/issue22065

#22060: Clean up ctypes.test, use unittest test discovery
http://bugs.python.org/issue22060

#22054: Add os.get_blocking() and os.set_blocking() functions
http://bugs.python.org/issue22054

#22051: Turtledemo: stop reloading demos
http://bugs.python.org/issue22051

#22044: Premature Py_DECREF while generating a TypeError in call_tzinf
http://bugs.python.org/issue22044

#22043: Use a monotonic clock to compute timeouts
http://bugs.python.org/issue22043

#22042: signal.set_wakeup_fd(fd): set the fd to non-blocking mode
http://bugs.python.org/issue22042

#22041: http POST request with python 3.3 through web proxy
http://bugs.python.org/issue22041

#22038: Implement atomic operations on non-x86 platforms
http://bugs.python.org/issue22038

#22035: Fatal error in dbm.gdbm
http://bugs.python.org/issue22035

#22034: posixpath.join() and bytearray
http://bugs.python.org/issue22034

#22033: Subclass friendly reprs
http://bugs.python.org/issue22033

#22029: argparse - CSS white-space: like control for individual text b
http://bugs.python.org/issue22029

#22027: RFC 6531 (SMTPUTF8) support in smtplib
http://bugs.python.org/issue22027



Top 10 most discussed issues (10)
=================================

#22018: Add a new signal.set_wakeup_socket() function
http://bugs.python.org/issue22018  35 msgs

#22003: BytesIO copy-on-write
http://bugs.python.org/issue22003  18 msgs

#21933: Allow the user to change font sizes with the text pane of turt
http://bugs.python.org/issue21933  16 msgs

#22012: struct.unpack('?', '\x02') returns (False,) on Mac OSX
http://bugs.python.org/issue22012  10 msgs

#1602: windows console doesn't print or input Unicode
http://bugs.python.org/issue1602   9 msgs

#22041: http POST request with python 3.3 through web proxy
http://bugs.python.org/issue22041   8 msgs

#22058: datetime.datetime() should accept a datetime.date as init para
http://bugs.python.org/issue22058   8 msgs

#18643: add a fallback socketpair() implementation in test.support
http://bugs.python.org/issue18643   7 msgs

#19884: Importing readline produces erroneous output
http://bugs.python.org/issue19884   7 msgs

#22013: Add at least minimal support for thread groups
http://bugs.python.org/issue22013   7 msgs



Issues closed (60)
==================

#1049450: Solaris: EINTR exception in select/socket calls in telnetlib
http://bugs.python.org/issue1049450  closed by haypo

#4350: Remove dead code from Tkinter.py
http://bugs.python.org/issue4350  closed by serhiy.storchaka

#5718: Problem compiling ffi part of build on AIX 5.3.
http://bugs.python.org/issue5718  closed by skrah

#6167: Tkinter.Scrollbar: the activate method needs to return a value
http://bugs.python.org/issue6167  closed by serhiy.storchaka

#11266: asyncore does not handle EINTR in recv, send, connect, accept,
http://bugs.python.org/issue11266  closed by haypo

#11945: Adopt and document consistent semantics for handling NaN value
http://bugs.python.org/issue11945  closed by rhettinger

#12184: socketserver.ForkingMixin collect_children routine needs to co
http://bugs.python.org/issue12184  closed by neologix

#12801: C realpath not used by os.path.realpath
http://bugs.python.org/issue12801  closed by haypo

#15275: isinstance is called a more times that needed in ntpath
http://bugs.python.org/issue15275  closed by serhiy.storchaka

#15759: "make suspicious" doesn't display instructions in case of fail
http://bugs.python.org/issue15759  closed by serhiy.storchaka

#15982: asyncore.dispatcher does not handle windows socket error code 
http://bugs.python.org/issue15982  closed by haypo

#16133: asyncore.dispatcher.recv doesn't handle EAGAIN / EWOULDBLOCK
http://bugs.python.org/issue16133  closed by haypo

#16494: Add a method on importlib.SourceLoader for creating bytecode f
http://bugs.python.org/issue16494  closed by brett.cannon

#16547: IDLE raises an exception in tkinter after fresh file's text ha
http://bugs.python.org/issue16547  closed by serhiy.storchaka

#17210: documentation of PyUnicode_Format() states wrong argument type
http://bugs.python.org/issue17210  closed by python-dev

#17391: _cursesmodule Fails to Build on GCC 2.95 (static)
http://bugs.python.org/issue17391  closed by neologix

#17709: http://docs.python.org/2.7/objects.inv doesn't support :func:`
http://bugs.python.org/issue17709  closed by asvetlov

#18093: Move main functions to a separate Programs directory
http://bugs.python.org/issue18093  closed by ncoghlan

#18132: buttons in turtledemo disappear on small screens
http://bugs.python.org/issue18132  closed by terry.reedy

#18168: plistlib output self-sorted dictionary
http://bugs.python.org/issue18168  closed by serhiy.storchaka

#18392: Doc: PyObject_Malloc() is not documented
http://bugs.python.org/issue18392  closed by zach.ware

#18436: Add mapping of symbol to function to operator module
http://bugs.python.org/issue18436  closed by zach.ware

#19629: support.rmtree fails on symlinks under Windows
http://bugs.python.org/issue19629  closed by berker.peksag

#21035: Python's HTTP server implementations hangs after 16.343 reques
http://bugs.python.org/issue21035  closed by neologix

#21500: Make use of the "load_tests" protocol in test_importlib packag
http://bugs.python.org/issue21500  closed by zach.ware

#21566: make use of the new default socket.listen() backlog argument
http://bugs.python.org/issue21566  closed by neologix

#21597: Allow turtledemo code pane to get wider.
http://bugs.python.org/issue21597  closed by terry.reedy

#21645: asyncio: Race condition in signal handling on FreeBSD
http://bugs.python.org/issue21645  closed by haypo

#21665: 2.7.7 ttk widgets not themed
http://bugs.python.org/issue21665  closed by python-dev

#21772: platform.uname() not EINTR safe
http://bugs.python.org/issue21772  closed by neologix

#21813: Enhance doc of os.stat_result
http://bugs.python.org/issue21813  closed by haypo

#21868: Tbuffer in turtle allows negative size
http://bugs.python.org/issue21868  closed by rhettinger

#21882: turtledemo modules imported by test___all__ cause side effects
http://bugs.python.org/issue21882  closed by terry.reedy

#21888: plistlib.FMT_BINARY behavior doesn't send required dict parame
http://bugs.python.org/issue21888  closed by serhiy.storchaka

#21901: test_selectors.PollSelectorTestCase.test_above_fd_setsize repo
http://bugs.python.org/issue21901  closed by neologix

#21947: `Dis` module doesn't know how to disassemble generators
http://bugs.python.org/issue21947  closed by ncoghlan

#21976: Fix test_ssl.py to handle LibreSSL versioning appropriately
http://bugs.python.org/issue21976  closed by pitrou

#21989: Missing (optional) argument `start` and `end` in documentation
http://bugs.python.org/issue21989  closed by r.david.murray

#22002: Make full use of test discovery in test subpackages
http://bugs.python.org/issue22002  closed by python-dev

#22006: thread module documentation erroneously(?) states not all buil
http://bugs.python.org/issue22006  closed by mark.dickinson

#22007: sys.stdout.write on Python 2.7 is not EINTR safe
http://bugs.python.org/issue22007  closed by neologix

#22008: Symtable's syntax warning should contain the word "because"
http://bugs.python.org/issue22008  closed by python-dev

#22009: pdb.set_trace() crashes with UnicodeDecodeError when binary da
http://bugs.python.org/issue22009  closed by ned.deily

#22015: C signal handler doesn't save/restore errno
http://bugs.python.org/issue22015  closed by haypo

#22017: Bad reference counting in the _warnings module
http://bugs.python.org/issue22017  closed by python-dev

#22019: ntpath.join() error with Chinese character Path
http://bugs.python.org/issue22019  closed by ezio.melotti

#22020: tutorial 9.10. Generators statement error
http://bugs.python.org/issue22020  closed by ezio.melotti

#22022: test_pathlib: shutil.rmtree() sporadic failures on Windows
http://bugs.python.org/issue22022  closed by zach.ware

#22026: 2.7.8 ttk Button text display problem
http://bugs.python.org/issue22026  closed by zach.ware

#22030: Use calloc in set resizing
http://bugs.python.org/issue22030  closed by rhettinger

#22031: Hexadecimal id in reprs
http://bugs.python.org/issue22031  closed by serhiy.storchaka

#22032: Use __qualname__ together with __module__
http://bugs.python.org/issue22032  closed by serhiy.storchaka

#22036: Obsolete reference to stringobject in comment
http://bugs.python.org/issue22036  closed by python-dev

#22037: Poor grammar in asyncio TCP echo client example
http://bugs.python.org/issue22037  closed by asvetlov

#22040: Add a "force" parameter to shutil.rmtree
http://bugs.python.org/issue22040  closed by r.david.murray

#22048: Add weighted random choice to random package
http://bugs.python.org/issue22048  closed by mark.dickinson

#22050: argparse: read nargs > 1 options from file doesn't work
http://bugs.python.org/issue22050  closed by r.david.murray

#22053: turtledemo: clean up start and stop, fix warning
http://bugs.python.org/issue22053  closed by terry.reedy

#22055: Incomplete sentence in asyncio BaseEventLoop doc
http://bugs.python.org/issue22055  closed by asvetlov

#22061: Restore deleted tkinter functions with deprecaton dummies.
http://bugs.python.org/issue22061  closed by serhiy.storchaka

From khannaagrim at gmail.com  Tue Jul 29 17:11:22 2014
From: khannaagrim at gmail.com (agrim khanna)
Date: Tue, 29 Jul 2014 20:41:22 +0530
Subject: [Python-Dev] Contribute to Python.org
Message-ID: 

Respected Sir,

I am Agrim Khanna, undergraduate student in IIIT Allahabad, India. I wanted
to contribute to python.org but didnt know how to start. I have elementary
knowledge of python language.

Could you please help me on the same.

Yours Sincerely,
Agrim Khanna
IIIT-Allahabad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From victor.stinner at gmail.com  Tue Jul 29 17:40:01 2014
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 29 Jul 2014 17:40:01 +0200
Subject: [Python-Dev] Contribute to Python.org
In-Reply-To: 
References: 
Message-ID: 

Hi,

You should read the  Python Developer Guide:

https://docs.python.org/devguide/

You can also join the core mentorship mailing list:

http://pythonmentors.com/

Welcome!

Victor

2014-07-29 17:11 GMT+02:00 agrim khanna :
> Respected Sir,
>
> I am Agrim Khanna, undergraduate student in IIIT Allahabad, India. I wanted
> to contribute to python.org but didnt know how to start. I have elementary
> knowledge of python language.
>
> Could you please help me on the same.
>
> Yours Sincerely,
> Agrim Khanna
> IIIT-Allahabad
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>

From khannaagrim at gmail.com  Tue Jul 29 22:44:53 2014
From: khannaagrim at gmail.com (agrim khanna)
Date: Wed, 30 Jul 2014 02:14:53 +0530
Subject: [Python-Dev] Contribute to Python.org
Message-ID: 

Respected Sir/Madam,

I have installed the setup on my machine and have compiled and run it as
well. I was unable to figure out how to make a patch and how to find a
suitable bug for me to fix. I request you to guide me in the same.

Yours Sincerely,
Agrim Khanna
IIIT-Allahabad, India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From brett at python.org  Tue Jul 29 22:55:54 2014
From: brett at python.org (Brett Cannon)
Date: Tue, 29 Jul 2014 20:55:54 +0000
Subject: [Python-Dev] Contribute to Python.org
References: 
Message-ID: 

On Tue Jul 29 2014 at 4:52:14 PM agrim khanna  wrote:

> Respected Sir/Madam,
>
> I have installed the setup on my machine and have compiled and run it as
> well. I was unable to figure out how to make a patch and how to find a
> suitable bug for me to fix. I request you to guide me in the same.
>

How to make a patch is in the devguide which was sent to you in your last
email: https://docs.python.org/devguide/patch.html

Finding issues is also covered in the devguide as well as you are able to
ask for help on the core-mentoship mailing list (also in the last email
sent to you: http://pythonmentors.com/).


>
> Yours Sincerely,
> Agrim Khanna
> IIIT-Allahabad, India
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From storchaka at gmail.com  Wed Jul 30 05:59:15 2014
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 30 Jul 2014 06:59:15 +0300
Subject: [Python-Dev] cpython: Issue #22003: When initialized from a
 bytes object, io.BytesIO() now
In-Reply-To: <3hNDzH5WHWz7Ljk@mail.python.org>
References: <3hNDzH5WHWz7Ljk@mail.python.org>
Message-ID: 

30.07.14 02:45, antoine.pitrou ???????(??):
> http://hg.python.org/cpython/rev/79a5fbe2c78f
> changeset:   91935:79a5fbe2c78f
> parent:      91933:fbd104359ef8
> user:        Antoine Pitrou 
> date:        Tue Jul 29 19:41:11 2014 -0400
> summary:
>    Issue #22003: When initialized from a bytes object, io.BytesIO() now
> defers making a copy until it is mutated, improving performance and
> memory use on some use cases.
>
> Patch by David Wilson.

Did you compare this with issue #15381 [1]?

[1] http://bugs.python.org/issue15381


From storchaka at gmail.com  Wed Jul 30 08:11:24 2014
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 30 Jul 2014 09:11:24 +0300
Subject: [Python-Dev] cpython: Issue #22003: When initialized from a
 bytes object, io.BytesIO() now
In-Reply-To: 
References: <3hNDzH5WHWz7Ljk@mail.python.org> 
Message-ID: 

30.07.14 06:59, Serhiy Storchaka ???????(??):
> 30.07.14 02:45, antoine.pitrou ???????(??):
>> http://hg.python.org/cpython/rev/79a5fbe2c78f
>> changeset:   91935:79a5fbe2c78f
>> parent:      91933:fbd104359ef8
>> user:        Antoine Pitrou 
>> date:        Tue Jul 29 19:41:11 2014 -0400
>> summary:
>>    Issue #22003: When initialized from a bytes object, io.BytesIO() now
>> defers making a copy until it is mutated, improving performance and
>> memory use on some use cases.
>>
>> Patch by David Wilson.
>
> Did you compare this with issue #15381 [1]?
>
> [1] http://bugs.python.org/issue15381

Using microbenchmark from issue22003:

$ cat i.py
import io
word = b'word'
line = (word * int(79/len(word))) + b'\n'
ar = line * int((4 * 1048576) / len(line))
def readlines():
     return len(list(io.BytesIO(ar)))
print('lines: %s' % (readlines(),))
$ ./python -m timeit -s 'import i' 'i.readlines()'

Before patch: 10 loops, best of 3: 46.9 msec per loop
After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop
After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop



From ncoghlan at gmail.com  Wed Jul 30 13:46:15 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Jul 2014 21:46:15 +1000
Subject: [Python-Dev] Contribute to Python.org
In-Reply-To: 
References: 
 
Message-ID: 

On 30 July 2014 01:40, Victor Stinner  wrote:
> Hi,
>
> You should read the  Python Developer Guide:
>
> https://docs.python.org/devguide/
>
> You can also join the core mentorship mailing list:
>
> http://pythonmentors.com/

For python.org *itself* (as in, the Django application now powering
the site), the contribution process is not yet as clear, but the code
and issue tracker are at https://github.com/python/pythondotorg and
https://mail.python.org/mailman/listinfo/pydotorg-www is the relevant
mailing list.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From antoine at python.org  Wed Jul 30 15:59:48 2014
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 30 Jul 2014 09:59:48 -0400
Subject: [Python-Dev] cpython: Issue #22003: When initialized from a
 bytes object, io.BytesIO() now
In-Reply-To: 
References: <3hNDzH5WHWz7Ljk@mail.python.org> 
 
Message-ID: 


Le 30/07/2014 02:11, Serhiy Storchaka a ?crit :
> 30.07.14 06:59, Serhiy Storchaka ???????(??):
>> 30.07.14 02:45, antoine.pitrou ???????(??):
>>> http://hg.python.org/cpython/rev/79a5fbe2c78f
>>> changeset:   91935:79a5fbe2c78f
>>> parent:      91933:fbd104359ef8
>>> user:        Antoine Pitrou 
>>> date:        Tue Jul 29 19:41:11 2014 -0400
>>> summary:
>>>    Issue #22003: When initialized from a bytes object, io.BytesIO() now
>>> defers making a copy until it is mutated, improving performance and
>>> memory use on some use cases.
>>>
>>> Patch by David Wilson.
>>
>> Did you compare this with issue #15381 [1]?

Not really, but David's patch is simple enough and does a good job of 
accelerating the read-only BytesIO case.

> $ ./python -m timeit -s 'import i' 'i.readlines()'
>
> Before patch: 10 loops, best of 3: 46.9 msec per loop
> After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop
> After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop

I'm surprised your patch does better here. Any idea why?

Regards

Antoine.



From dw+python-dev at python.org  Wed Jul 30 11:46:30 2014
From: dw+python-dev at python.org (dw+python-dev at python.org)
Date: Wed, 30 Jul 2014 09:46:30 +0000
Subject: [Python-Dev] cpython: Issue #22003: When initialized from a
 bytes object, io.BytesIO() now
In-Reply-To: 
References: <3hNDzH5WHWz7Ljk@mail.python.org> 
 
Message-ID: <20140730094630.GA786@k2>

Hi Serhiy,

At least conceptually, 15381 seems the better approach, but getting a
correct implementation may take more iterations than the (IMHO) simpler
change in 22003. For my tastes, the current 15381 implementation seems a
little too magical in relying on Py_REFCNT() as the sole indication that
a PyBytes can be mutated.

For the sake of haste, 22003 only addresses the specific regression
introduced in Python 3.x BytesIO, compared to 2.x StringI, where 3.x
lacked an equivalent no-copies specialization.


David

From martin at v.loewis.de  Wed Jul 30 20:03:35 2014
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 30 Jul 2014 20:03:35 +0200
Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module
In-Reply-To: 
References: 
Message-ID: <53D93377.90301@v.loewis.de>

Am 14.07.14 15:57, schrieb Tim Tisdall:
> Also, is there a method to test changes against all the different *nix
> variations?  Is Bluez the standard across the different *nix variations?

Perhaps not the answer you expected, but: Python uses autoconf for
feature testing. You can be certain that the API *will* vary across
system vendors. For example, FreeBSD apparently uses ng_hci(4):

http://www.unix.com/man-page/freebsd/4/ng_hci/

If you add features, all you need to make sure that Python continues
to compile when the platform feature is not present. People using the
other systems are then free to contribute support for their platforms.

Regards,
Martin


From storchaka at gmail.com  Wed Jul 30 21:48:52 2014
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 30 Jul 2014 22:48:52 +0300
Subject: [Python-Dev] cpython: Issue #22003: When initialized from a
 bytes object, io.BytesIO() now
In-Reply-To: 
References: <3hNDzH5WHWz7Ljk@mail.python.org> 
  
Message-ID: 

30.07.14 16:59, Antoine Pitrou ???????(??):
>
> Le 30/07/2014 02:11, Serhiy Storchaka a ?crit :
>> 30.07.14 06:59, Serhiy Storchaka ???????(??):
>>> 30.07.14 02:45, antoine.pitrou ???????(??):
>>>> http://hg.python.org/cpython/rev/79a5fbe2c78f
>>>> changeset:   91935:79a5fbe2c78f
>>>> parent:      91933:fbd104359ef8
>>>> user:        Antoine Pitrou 
>>>> date:        Tue Jul 29 19:41:11 2014 -0400
>>>> summary:
>>>>    Issue #22003: When initialized from a bytes object, io.BytesIO() now
>>>> defers making a copy until it is mutated, improving performance and
>>>> memory use on some use cases.
>>>>
>>>> Patch by David Wilson.
>>>
>>> Did you compare this with issue #15381 [1]?
>
> Not really, but David's patch is simple enough and does a good job of
> accelerating the read-only BytesIO case.

Ignoring tests and comments my patch adds/removes/modifies about 200 
lines, and David's patch -- about 150 lines of code. But it's __sizeof__ 
looks not correct, correcting it requires changing about 50 lines. In 
sum the complexity of both patches is about equal.

>> $ ./python -m timeit -s 'import i' 'i.readlines()'
>>
>> Before patch: 10 loops, best of 3: 46.9 msec per loop
>> After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop
>> After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop
>
> I'm surprised your patch does better here. Any idea why?

I didn't look at David's patch too close yet. But my patch includes 
optimization for end-of-line scanning.



From zachary.ware+pydev at gmail.com  Wed Jul 30 22:11:51 2014
From: zachary.ware+pydev at gmail.com (Zachary Ware)
Date: Wed, 30 Jul 2014 15:11:51 -0500
Subject: [Python-Dev] [Python-checkins] cpython: Issue #22003: When
 initialized from a bytes object, io.BytesIO() now
In-Reply-To: <3hNDzH5WHWz7Ljk@mail.python.org>
References: <3hNDzH5WHWz7Ljk@mail.python.org>
Message-ID: 

I'd like to point out a couple of compiler warnings on Windows:

On Tue, Jul 29, 2014 at 6:45 PM, antoine.pitrou
 wrote:
> diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
> --- a/Modules/_io/bytesio.c
> +++ b/Modules/_io/bytesio.c
> @@ -33,6 +37,45 @@
>          return NULL; \
>      }
>
> +/* Ensure we have a buffer suitable for writing, in the case that an initvalue
> + * object was provided, and we're currently borrowing its buffer. `size'
> + * indicates the new buffer size allocated as part of unsharing, to avoid a
> + * redundant reallocation caused by any subsequent mutation. `truncate'
> + * indicates whether truncation should occur if `size` < self->string_size.
> + *
> + * Do nothing if the buffer wasn't shared. Returns 0 on success, or sets an
> + * exception and returns -1 on failure. Existing state is preserved on failure.
> + */
> +static int
> +unshare(bytesio *self, size_t preferred_size, int truncate)
> +{
> +    if (self->initvalue) {
> +        Py_ssize_t copy_size;
> +        char *new_buf;
> +
> +        if((! truncate) && preferred_size < self->string_size) {

..\Modules\_io\bytesio.c(56): warning C4018: '<' : signed/unsigned mismatch

> +            preferred_size = self->string_size;
> +        }
> +
> +        new_buf = (char *)PyMem_Malloc(preferred_size);
> +        if (new_buf == NULL) {
> +            PyErr_NoMemory();
> +            return -1;
> +        }
> +
> +        copy_size = self->string_size;
> +        if (copy_size > preferred_size) {

..\Modules\_io\bytesio.c(67): warning C4018: '>' : signed/unsigned mismatch

> +            copy_size = preferred_size;
> +        }
> +
> +        memcpy(new_buf, self->buf, copy_size);
> +        Py_CLEAR(self->initvalue);
> +        self->buf = new_buf;
> +        self->buf_size = preferred_size;
> +        self->string_size = (Py_ssize_t) copy_size;
> +    }
> +    return 0;
> +}
>
>  /* Internal routine to get a line from the buffer of a BytesIO
>     object. Returns the length between the current position to the

-- 
Zach

From antoine at python.org  Wed Jul 30 23:23:25 2014
From: antoine at python.org (Antoine Pitrou)
Date: Wed, 30 Jul 2014 17:23:25 -0400
Subject: [Python-Dev] cpython: Issue #22003: When initialized from a
 bytes object, io.BytesIO() now
In-Reply-To: 
References: <3hNDzH5WHWz7Ljk@mail.python.org> 
  
 
Message-ID: 

Le 30/07/2014 15:48, Serhiy Storchaka a ?crit :
>
> Ignoring tests and comments my patch adds/removes/modifies about 200
> lines, and David's patch -- about 150 lines of code. But it's __sizeof__
> looks not correct, correcting it requires changing about 50 lines. In
> sum the complexity of both patches is about equal.

I meant that David's approach is conceptually simpler, which makes it 
easier to review.
Regardless, there is no exclusive-OR here: if you can improve over the 
current version, there's no reason not to consider it/

> I didn't look at David's patch too close yet. But my patch includes
> optimization for end-of-line scanning.

Ahah, unrelated stuff :-)




From storchaka at gmail.com  Thu Jul 31 16:09:41 2014
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 31 Jul 2014 17:09:41 +0300
Subject: [Python-Dev] cpython: Issue #22003: When initialized from a
 bytes object, io.BytesIO() now
In-Reply-To: 
References: <3hNDzH5WHWz7Ljk@mail.python.org> 
  
  
Message-ID: 

31.07.14 00:23, Antoine Pitrou ???????(??):
> Le 30/07/2014 15:48, Serhiy Storchaka a ?crit :
> I meant that David's approach is conceptually simpler, which makes it
> easier to review.
> Regardless, there is no exclusive-OR here: if you can improve over the
> current version, there's no reason not to consider it/

Unfortunately there is no anything common in implementations. 
Conceptually David came in his last patch to same idea as in issue15381 
but with different and less general implementation. To apply my patch 
you need first rollback issue22003 changes (except tests).