From timothy.c.delaney at gmail.com Tue Jul 1 00:07:23 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 1 Jul 2014 08:07:23 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: On 1 July 2014 03:05, Ben Hoyt wrote: > > So, here's my alternative proposal: add an "ensure_lstat" flag to > > scandir() itself, and don't have *any* methods on DirEntry, only > > attributes. > ... > > Most importantly, *regardless of platform*, the cached stat result (if > > not None) would reflect the state of the entry at the time the > > directory was scanned, rather than at some arbitrary later point in > > time when lstat() was first called on the DirEntry object. > I'm torn between whether I'd prefer the stat fields to be populated on Windows if ensure_lstat=False or not. There are good arguments each way, but overall I'm inclining towards having it consistent with POSIX - don't populate them unless ensure_lstat=True. +0 for stat fields to be None on all platforms unless ensure_lstat=True. > Yeah, I quite like this. It does make the caching more explicit and > consistent. It's slightly annoying that it's less like pathlib.Path > now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't > matter. The differences in naming may highlight the difference in > caching, so maybe it's a good thing. > See my comments below on .fullname. > Two further questions from me: > > 1) How does error handling work? Now os.stat() will/may be called > during iteration, so in __next__. But it hard to catch errors because > you don't call __next__ explicitly. Is this a problem? How do other > iterators that make system calls or raise errors handle this? > I think it just needs to be documented that iterating may throw the same exceptions as os.lstat(). It's a little trickier if you don't want the scope of your exception to be too broad, but you can always wrap the iteration in a generator to catch and handle the exceptions you care about, and allow the rest to propagate. def scandir_accessible(path='.'): gen = os.scandir(path) while True: try: yield next(gen) except PermissionError: pass 2) There's still the open question in the PEP of whether to include a > way to access the full path. This is cheap to build, it has to be > built anyway on POSIX systems, and it's quite useful for further > operations on the file. I think the best way to handle this is a > .fullname or .full_name attribute as suggested elsewhere. Thoughts? > +1 for .fullname. The earlier suggestion to have __str__ return the name is killed I think by the fact that .fullname could be bytes. It would be nice if pathlib.Path objects were enhanced to take a DirEntry and use the .fullname automatically, but you could always call Path(direntry.fullname). Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jul 1 00:38:45 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 30 Jun 2014 15:38:45 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: <53B1E6F5.2040905@stoneleaf.us> On 06/30/2014 03:07 PM, Tim Delaney wrote: > On 1 July 2014 03:05, Ben Hoyt wrote: >> >> So, here's my alternative proposal: add an "ensure_lstat" flag to >> scandir() itself, and don't have *any* methods on DirEntry, only >> attributes. >> ... >> Most importantly, *regardless of platform*, the cached stat result (if >> not None) would reflect the state of the entry at the time the >> directory was scanned, rather than at some arbitrary later point in >> time when lstat() was first called on the DirEntry object. > > I'm torn between whether I'd prefer the stat fields to be populated > on Windows if ensure_lstat=False or not. There are good arguments each > way, but overall I'm inclining towards having it consistent with POSIX > - don't populate them unless ensure_lstat=True. > > +0 for stat fields to be None on all platforms unless ensure_lstat=True. If a Windows user just needs the free info, why should s/he have to pay the price of a full stat call? I see no reason to hold the Windows side back and not take advantage of what it has available. There are plenty of posix calls that Windows is not able to use, after all. -- ~Ethan~ From timothy.c.delaney at gmail.com Tue Jul 1 01:15:59 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Tue, 1 Jul 2014 09:15:59 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B1E6F5.2040905@stoneleaf.us> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B1E6F5.2040905@stoneleaf.us> Message-ID: On 1 July 2014 08:38, Ethan Furman wrote: > On 06/30/2014 03:07 PM, Tim Delaney wrote: > >> I'm torn between whether I'd prefer the stat fields to be populated >> on Windows if ensure_lstat=False or not. There are good arguments each >> way, but overall I'm inclining towards having it consistent with POSIX >> - don't populate them unless ensure_lstat=True. >> >> +0 for stat fields to be None on all platforms unless ensure_lstat=True. >> > > If a Windows user just needs the free info, why should s/he have to pay > the price of a full stat call? I see no reason to hold the Windows side > back and not take advantage of what it has available. There are plenty of > posix calls that Windows is not able to use, after all. > On Windows ensure_lstat would either be either a NOP (if the fields are always populated), or it simply determines if the fields get populated. No extra stat call. On POSIX it's the difference between an extra stat call or not. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Tue Jul 1 01:25:49 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 30 Jun 2014 16:25:49 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney wrote: > On 1 July 2014 03:05, Ben Hoyt wrote: >> >> > So, here's my alternative proposal: add an "ensure_lstat" flag to >> > scandir() itself, and don't have *any* methods on DirEntry, only >> > attributes. >> ... >> >> > Most importantly, *regardless of platform*, the cached stat result (if >> > not None) would reflect the state of the entry at the time the >> > directory was scanned, rather than at some arbitrary later point in >> > time when lstat() was first called on the DirEntry object. > > > I'm torn between whether I'd prefer the stat fields to be populated on > Windows if ensure_lstat=False or not. There are good arguments each way, but > overall I'm inclining towards having it consistent with POSIX - don't > populate them unless ensure_lstat=True. > > +0 for stat fields to be None on all platforms unless ensure_lstat=True. This won't work well if lstat info is only needed for some entries. Is that a common use-case? It was mentioned earlier in the thread. -- Devin From ethan at stoneleaf.us Tue Jul 1 01:45:18 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 30 Jun 2014 16:45:18 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B1E6F5.2040905@stoneleaf.us> Message-ID: <53B1F68E.5000908@stoneleaf.us> On 06/30/2014 04:15 PM, Tim Delaney wrote: > On 1 July 2014 08:38, Ethan Furman wrote: >> On 06/30/2014 03:07 PM, Tim Delaney wrote: >>> >>> I'm torn between whether I'd prefer the stat fields to be populated >>> on Windows if ensure_lstat=False or not. There are good arguments each >>> way, but overall I'm inclining towards having it consistent with POSIX >>> - don't populate them unless ensure_lstat=True. >>> >>> +0 for stat fields to be None on all platforms unless ensure_lstat=True. >> >> If a Windows user just needs the free info, why should s/he have to pay >> the price of a full stat call? I see no reason to hold the Windows side >> back and not take advantage of what it has available. There are plenty >> of posix calls that Windows is not able to use, after all. > > On Windows ensure_lstat would either be either a NOP (if the fields are > always populated), or it simply determines if the fields get populated. > No extra stat call. I suppose the exact behavior is still under discussion, as there are only two or three fields one gets "for free" on Windows (I think...), where as an os.stat call would get everything available for the platform. > On POSIX it's the difference between an extra stat call or not. Agreed on this part. Still, no reason to slow down the Windows side by throwing away info unnecessarily -- that's why this PEP exists, after all. -- ~Ethan~ From benhoyt at gmail.com Tue Jul 1 03:28:00 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 30 Jun 2014 21:28:00 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B1F68E.5000908@stoneleaf.us> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B1E6F5.2040905@stoneleaf.us> <53B1F68E.5000908@stoneleaf.us> Message-ID: > I suppose the exact behavior is still under discussion, as there are only > two or three fields one gets "for free" on Windows (I think...), where as an > os.stat call would get everything available for the platform. No, Windows is nice enough to give you all the same stat_result fields during scandir (via FindFirstFile/FindNextFile) as a regular os.stat(). -Ben From v+python at g.nevcal.com Tue Jul 1 04:04:43 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 30 Jun 2014 19:04:43 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> Message-ID: <53B2173B.1010709@g.nevcal.com> On 6/30/2014 4:25 PM, Devin Jeanpierre wrote: > On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney > wrote: >> On 1 July 2014 03:05, Ben Hoyt wrote: >>>> So, here's my alternative proposal: add an "ensure_lstat" flag to >>>> scandir() itself, and don't have *any* methods on DirEntry, only >>>> attributes. >>> ... >>> >>>> Most importantly, *regardless of platform*, the cached stat result (if >>>> not None) would reflect the state of the entry at the time the >>>> directory was scanned, rather than at some arbitrary later point in >>>> time when lstat() was first called on the DirEntry object. >> >> I'm torn between whether I'd prefer the stat fields to be populated on >> Windows if ensure_lstat=False or not. There are good arguments each way, but >> overall I'm inclining towards having it consistent with POSIX - don't >> populate them unless ensure_lstat=True. >> >> +0 for stat fields to be None on all platforms unless ensure_lstat=True. > This won't work well if lstat info is only needed for some entries. Is > that a common use-case? It was mentioned earlier in the thread. If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() API to update the data for those that need it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Tue Jul 1 04:17:00 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 30 Jun 2014 19:17:00 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B2173B.1010709@g.nevcal.com> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B2173B.1010709@g.nevcal.com> Message-ID: The proposal I was replying to was that: - There is no .refresh() - ensure_lstat=False means no OS has populated attributes - ensure_lstat=True means ever OS has populated attributes Even if we add a .refresh(), the latter two items mean that you can't avoid doing extra work (either too much on windows, or too much on linux), if you want only a subset of the files' lstat info. -- Devin P.S. your mail client's quoting breaks my mail client (gmail)'s quoting. On Mon, Jun 30, 2014 at 7:04 PM, Glenn Linderman wrote: > On 6/30/2014 4:25 PM, Devin Jeanpierre wrote: > > On Mon, Jun 30, 2014 at 3:07 PM, Tim Delaney > wrote: > > On 1 July 2014 03:05, Ben Hoyt wrote: > > So, here's my alternative proposal: add an "ensure_lstat" flag to > scandir() itself, and don't have *any* methods on DirEntry, only > attributes. > > ... > > Most importantly, *regardless of platform*, the cached stat result (if > not None) would reflect the state of the entry at the time the > directory was scanned, rather than at some arbitrary later point in > time when lstat() was first called on the DirEntry object. > > I'm torn between whether I'd prefer the stat fields to be populated on > Windows if ensure_lstat=False or not. There are good arguments each way, but > overall I'm inclining towards having it consistent with POSIX - don't > populate them unless ensure_lstat=True. > > +0 for stat fields to be None on all platforms unless ensure_lstat=True. > > This won't work well if lstat info is only needed for some entries. Is > that a common use-case? It was mentioned earlier in the thread. > > > If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() > API to update the data for those that need it. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/jeanpierreda%40gmail.com > From ncoghlan at gmail.com Tue Jul 1 04:17:44 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Jul 2014 12:17:44 +1000 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B2173B.1010709@g.nevcal.com> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B2173B.1010709@g.nevcal.com> Message-ID: On 30 Jun 2014 19:13, "Glenn Linderman" wrote: > > > If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() API to update the data for those that need it. I'm -1 on a refresh API for DirEntry - just use pathlib in that case. Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jul 1 03:44:57 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 30 Jun 2014 18:44:57 -0700 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B1E6F5.2040905@stoneleaf.us> <53B1F68E.5000908@stoneleaf.us> Message-ID: <53B21299.1020006@stoneleaf.us> On 06/30/2014 06:28 PM, Ben Hoyt wrote: >> I suppose the exact behavior is still under discussion, as there are only >> two or three fields one gets "for free" on Windows (I think...), where as an >> os.stat call would get everything available for the platform. > > No, Windows is nice enough to give you all the same stat_result fields > during scandir (via FindFirstFile/FindNextFile) as a regular > os.stat(). Very nice. Even less reason then to throw it away. :) -- ~Ethan~ From eric at trueblade.com Tue Jul 1 04:59:33 2014 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 30 Jun 2014 22:59:33 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B2173B.1010709@g.nevcal.com> Message-ID: <53B22415.1080801@trueblade.com> On 6/30/2014 10:17 PM, Nick Coghlan wrote: > > On 30 Jun 2014 19:13, "Glenn Linderman" > wrote: >> >> >> If it is, use ensure_lstat=False, and use the proposed (by me) > .refresh() API to update the data for those that need it. > > I'm -1 on a refresh API for DirEntry - just use pathlib in that case. I'm not sure refresh() is the best name, but I think a "get_stat_info_from_direntry_or_call_stat()" (hah!) makes sense. If you really need the stat info, then you can write simple code like: for entry in os.scandir(path): mtime = entry.get_stat_info_from_direntry_or_call_stat().st_mtime And it won't call stat() any more times than needed. Once per file on Posix, zero times per file on Windows. Without an API like this, you'll need a check in the application code on whether or not to call stat(). Eric. From tjreedy at udel.edu Tue Jul 1 06:35:24 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 01 Jul 2014 00:35:24 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B21299.1020006@stoneleaf.us> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B1E6F5.2040905@stoneleaf.us> <53B1F68E.5000908@stoneleaf.us> <53B21299.1020006@stoneleaf.us> Message-ID: On 6/30/2014 9:44 PM, Ethan Furman wrote: > On 06/30/2014 06:28 PM, Ben Hoyt wrote: >>> I suppose the exact behavior is still under discussion, as there are >>> only >>> two or three fields one gets "for free" on Windows (I think...), >>> where as an >>> os.stat call would get everything available for the platform. >> >> No, Windows is nice enough to give you all the same stat_result fields >> during scandir (via FindFirstFile/FindNextFile) as a regular >> os.stat(). > > Very nice. Even less reason then to throw it away. :) I agree. -- Terry Jan Reedy From victor.stinner at gmail.com Tue Jul 1 08:55:12 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 08:55:12 +0200 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: <53B2173B.1010709@g.nevcal.com> References: <53AD4B13.8070100@sotecware.net> <20140629105235.GM13014@ando> <53B2173B.1010709@g.nevcal.com> Message-ID: 2014-07-01 4:04 GMT+02:00 Glenn Linderman : >> +0 for stat fields to be None on all platforms unless ensure_lstat=True. > > This won't work well if lstat info is only needed for some entries. Is > that a common use-case? It was mentioned earlier in the thread. > > If it is, use ensure_lstat=False, and use the proposed (by me) .refresh() > API to update the data for those that need it. We should make DirEntry as simple as possible. In Python, the classic behaviour is to not define an attribute if it's not available on a platform. For example, stat().st_file_attributes is only available on Windows. I don't like the idea of the ensure_lstat parameter because os.scandir would have to call two system calls, it makes harder to guess which syscall failed (readdir or lstat). If you need lstat on UNIX, write: if hasattr(entry, 'lstat_result'): size = entry.lstat_result.st_size else: size = os.lstat(entry.fullname()).st_size Victor From victor.stinner at gmail.com Tue Jul 1 09:44:02 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 09:44:02 +0200 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) Message-ID: Hi, IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str). scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it. There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It *can* be an issue for very deep file hierarchy. If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc. The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc. Example: fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd) Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!) Victor From victor.stinner at gmail.com Tue Jul 1 09:48:49 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 09:48:49 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) Message-ID: Hi, @Ben: it's time to update your PEP to complete it with this discussion! IMO DirEntry must be as simple as possible and portable: - os.scandir(str) - DirEntry.lstat_result object only available on Windows, same result than os.lstat() - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where directory would be an hidden attribute of DirEntry Notes: - DirEntry.lstat_result is better than DirEntry.lstat() because it makes explicitly that lstat_result is only computed once. When I call DirEntry.lstat(), I expect to get the current status of the file, not the cached one. It's also hard to explain (document) that DirEntry.lstat() may or may call a system call. Don't do that, use DirEntry.lstat_result. - I don't think that we should support scandir(bytes). If you really want to support os.scandir(bytes), it must raise an error on Windows since bytes filename are already deprecated. It wouldn't make sense to add new function with a deprecated feature. Since we have the PEP 383 (surrogateescape), it's better to advice to use Unicode on all platforms. Almost all Python functions are able to encode back Unicode filename automatically. Use os.fsencode() to encode manually if needd. - We may not define a DirEntry.fullname() method: the directory name is usually well known. Ok, but every time that I use os.listdir(), I write os.path.join(directory, name) because in some cases I want the full path. Example: interesting = [] for name in os.listdir(path): fullpath = os.path.join(path, name) if os.path.isdir(fullpath): continue if ... test on the file ...: # i need the full path here, not the relative path # (ex: my own recursive "scandir"/"walk" function) interesting.append(fullpath) - It must not be possible to "refresh" a DirEntry object. Call os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get fresh data. DirEntry is only computed once, that's all. It's well defined. - No Windows wildcard, you wrote that the feature has many corner cases, and it's only available on Windows. It's easy to combine scandir with fnmatch. Victor From benhoyt at gmail.com Tue Jul 1 14:26:15 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jul 2014 08:26:15 -0400 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. We can always add it later. -Ben On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner wrote: > Hi, > > IMO we must decide if scandir() must support or not file descriptor. > It's an important decision which has an important impact on the API. > > > To support scandir(fd), the minimum is to store dir_fd in DirEntry: > dir_fd would be None for scandir(str). > > > scandir(fd) must not close the file descriptor, it should be done by > the caller. Handling the lifetime of the file descriptor is a > difficult problem, it's better to let the user decide how to handle > it. > > There is the problem of the limit of open file descriptors, usually > 1024 but it can be lower. It *can* be an issue for very deep file > hierarchy. > > If we choose to support scandir(fd), it's probably safer to not use > scandir(fd) by default in os.walk() (use scandir(str) instead), wait > until the feature is well tested, corner cases are well known, etc. > > > The second step is to enhance pathlib.Path to support an optional file > descriptor. Path already has methods on filenames like chmod(), > exists(), rename(), etc. > > > Example: > > fd = os.open(path, os.O_DIRECTORY) > try: > for entry in os.scandir(fd): > # ... use entry to benefit of entry cache: is_dir(), lstat_result ... > path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) > # ... use path which uses dir_fd ... > finally: > os.close(fd) > > Problem: if the path object is stored somewhere and use after the > loop, Path methods will fail because dir_fd was closed. It's even > worse if a new directory uses the same file descriptor :-/ (security > issue, or at least tricky bugs!) > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com From victor.stinner at gmail.com Tue Jul 1 15:01:26 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 15:01:26 +0200 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: 2014-07-01 14:26 GMT+02:00 Ben Hoyt : > Thanks, Victor. > > I don't have any experience with dir_fd handling, so unfortunately > can't really comment here. > > What advantages does it bring? I notice that even os.listdir() on > Python 3.4 doesn't have anything related to file descriptors, so I'd > be in favour of not including support. See https://docs.python.org/dev/library/os.html#dir-fd The idea is to make sure that you get files from the same directory. Problems occur when a directory is moved or a symlink is modified. Example: - you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned by www user (website) - you would like to remove the file "x": call unlink("/tmp/copy/passwd") - ... but just before that, an attacker replaces the /tmp/copy directory with a symlink to /etc - you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh Using unlink("passwd", dir_fd=tmp_copy_fd), you don't have this issue. You are sure that you are working in /tmp/copy directory. You can imagine a lot of other scenarios to override files and read sensitive files. Hopefully, the Linux rm commands knows unlinkat() sycall ;-) haypo at selma$ mkdir -p a/b/c haypo at selma$ strace -e unlinkat rm -rf a unlinkat(5, "c", AT_REMOVEDIR) = 0 unlinkat(4, "b", AT_REMOVEDIR) = 0 unlinkat(AT_FDCWD, "a", AT_REMOVEDIR) = 0 +++ exited with 0 +++ We should implement a similar think in shutil.rmtree(). See also os.fwalk() which is a version of os.walk() providing dir_fd. > We can always add it later. I would prefer to discuss that right now. My proposition is to accept an int for scandir() and copy the int into DirEntry.dir_fd. It's not that complex :-) The enhancement of the pathlib module can be done later. By the way, I know that Antoine Pitrou wanted to implemented file descriptors in pathlib, but the feature was rejected or at least delayed. Victor From benhoyt at gmail.com Tue Jul 1 15:00:32 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jul 2014 09:00:32 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: Thanks for spinning this off to (hopefully) finished the discussion. I agree it's nearly time to update the PEP. > @Ben: it's time to update your PEP to complete it with this > discussion! IMO DirEntry must be as simple as possible and portable: > > - os.scandir(str) > - DirEntry.lstat_result object only available on Windows, same result > than os.lstat() > - DirEntry.fullname(): os.path.join(directory, DirEntry.name), where > directory would be an hidden attribute of DirEntry I'm quite strongly against this, and I think it's actually the worst of both worlds. It is not as good an API because: (a) it doesn't call stat for you (on POSIX), so you have to check an attribute and call scandir manually if you need it, turning what should be one line of code into four. Your proposal above was kind of how I had it originally, where you had to do extra tests and call scandir manually if you needed it (see https://mail.python.org/pipermail/python-dev/2013-May/126119.html) (b) the .lstat_result attribute is available on Windows but not on POSIX, meaning it's very easy for Windows developers to write code that will run and work fine on Windows, but then break horribly on POSIX; I think it'd be better if it broke hard on Windows to make writing cross-platform code easy The two alternates are: 1) the original proposal in the current version of PEP 471, where DirEntry has an .lstat() method which calls stat() on POSIX but is free on Windows 2) Nick Coghlan's proposal on the previous thread (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) suggesting an ensure_lstat keyword param to scandir if you need the lstat_result value I would make one small tweak to Nick Coghlan's proposal to make writing cross-platform code easier. Instead of .lstat_result being None sometimes (on POSIX), have it None always unless you specify ensure_lstat=True. (Actually, call it get_lstat=True to kind of make this more obvious.) Per (b) above, this means Windows developers wouldn't accidentally write code which failed on POSIX systems -- it'd fail fast on Windows too if you accessed .lstat_result without specifying get_lstat=True. I'm still unsure which of these I like better. I think #1's API is slightly nicer without the ensure_lstat parameter, and error handling of the stat() is more explicit. But #2 always fetches the stat info at the same time as the dir entry info, so eliminates the problem of having the file info change between scandir iteration and the .lstat() call. I'm leaning towards preferring #2 (Nick's proposal) because it solves or gets around the caching issue. My one concern is error handling. Is it an issue if scandir's __next__ can raise an OSError either from the readdir() call or the call to stat()? My thinking is probably not. In practice, would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? I guess it could if the file is deleted, but then if it were deleted a microsecond earlier the readdir() would fail anyway, or not? Or does readdir give you a consistent, "snap-shotted" view on things? The one other thing I'm not quite sure about with Nick's proposal is the name .lstat_result, as it's long. I can see why he suggested that, as .lstat sounds like a verb, but maybe that's okay? If we can have .is_dir and .is_file as attributes, my thinking is an .lstat attribute is fine too. I don't feel too strongly though. > - I don't think that we should support scandir(bytes). If you really > want to support os.scandir(bytes), it must raise an error on Windows > since bytes filename are already deprecated. It wouldn't make sense to > add new function with a deprecated feature. Since we have the PEP 383 > (surrogateescape), it's better to advice to use Unicode on all > platforms. Almost all Python functions are able to encode back Unicode > filename automatically. Use os.fsencode() to encode manually if needd. Really, are bytes filenames deprecated? I think maybe they should be, as they don't work on Windows :-), but the latest Python "os" docs (https://docs.python.org/3.5/library/os.html) still say that all functions that accept path names accept either str or bytes, and return a value of the same type where necessary. So I think scandir() should do the same thing. > - We may not define a DirEntry.fullname() method: the directory name > is usually well known. Ok, but every time that I use os.listdir(), I > write os.path.join(directory, name) because in some cases I want the > full path. Agreed. I use this a lot too. However, I'd prefer a .fullname attribute rather than a method, as it's free/cheap to compute and doesn't require OS calls. Out of interest, why do we have .is_dir and .stat_result but .fullname rather than .full_name? .fullname seems reasonable to me, but maybe consistency is a good thing here? > - It must not be possible to "refresh" a DirEntry object. Call > os.stat(entry.fullname()) or pathlib.Path(entry.fullname()) to get > fresh data. DirEntry is only computed once, that's all. It's well > defined. I agree refresh() is not needed -- just use os.stat() or pathlib. > - No Windows wildcard, you wrote that the feature has many corner > cases, and it's only available on Windows. It's easy to combine > scandir with fnmatch. Agreed. -Ben From victor.stinner at gmail.com Tue Jul 1 16:28:10 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 1 Jul 2014 16:28:10 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: 2014-07-01 15:00 GMT+02:00 Ben Hoyt : > (a) it doesn't call stat for you (on POSIX), so you have to check an > attribute and call scandir manually if you need it, Yes, and that's something common when you use the os module. For example, don't try to call os.fork(), os.getgid() or os.fchmod() on Windows :-) Closer to your PEP, the following OS attributes are only available on UNIX: st_blocks, st_blksize, st_rdev, st_flags; and st_file_attributes is only available on Windows. I don't think that using lstat_result is a common need when browsing a directoy. In most cases, you only need is_dir() and the name attribute. > 1) the original proposal in the current version of PEP 471, where > DirEntry has an .lstat() method which calls stat() on POSIX but is > free on Windows On UNIX, does it mean that .lstat() calls os.lstat() at the first call, and then always return the same result? It would be different than os.lstat() and pathlib.Path.stat() :-( I would prefer to have the same behaviour than pathlib and os (you know, the well known consistency of Python stdlib). As I wrote, I expect a function call to always retrieve the new status. > 2) Nick Coghlan's proposal on the previous thread > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > suggesting an ensure_lstat keyword param to scandir if you need the > lstat_result value I don't like this idea because it makes error handling more complex. The syntax to catch exceptions on an iterator is verbose (while: try: next() except ...). Whereas calling os.lstat(entry.fullname()) is explicit and it's easy to surround it with try/except. > .lstat_result being None sometimes (on POSIX), Don't do that, it's not how Python handles portability. We use hasattr(). > would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? Yes, it can happen. The filesystem is system-wide and shared by all users. The file can be deleted. > Really, are bytes filenames deprecated? Yes, in all functions of the os module since Python 3.3. I'm sure because I implemented the deprecation :-) Try open(b'test.txt', w') on Windows with python -Werror. > I think maybe they should be, as they don't work on Windows :-) Windows has an API dedicated to bytes filenames, the ANSI API. But this API has annoying bugs: it replaces unencodable characters by question marks, and there is no option to be noticed on the encoding error. Different users complained about that. It was decided to not change Python since Python is a light wrapper over the kernel system calls. But bytes filenames are now deprecated to advice users to use the native type for filenames on Windows: Unicode! > but the latest Python "os" docs > (https://docs.python.org/3.5/library/os.html) still say that all > functions that accept path names accept either str or bytes, Maybe I forgot to update the documentation :-( > So I think scandir() should do the same thing. You may support scandir(bytes) on Windows but you will need to emit a deprecation warning too. (which are silent by default.) Victor From j.wielicki at sotecware.net Tue Jul 1 16:59:13 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Tue, 01 Jul 2014 16:59:13 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: <53B2CCC1.3000409@sotecware.net> On 01.07.2014 15:00, Ben Hoyt wrote: > I'm leaning towards preferring #2 (Nick's proposal) because it solves > or gets around the caching issue. My one concern is error handling. Is > it an issue if scandir's __next__ can raise an OSError either from the > readdir() call or the call to stat()? My thinking is probably not. In > practice, would it ever really happen that readdir() would succeed but > an os.stat() immediately after would fail? I guess it could if the > file is deleted, but then if it were deleted a microsecond earlier the > readdir() would fail anyway, or not? Or does readdir give you a > consistent, "snap-shotted" view on things? No need for a microsecond-timed deletion -- a directory with +r but without +x will allow you to list the entries, but stat calls on the files will fail with EPERM: $ ls -l drwxr--r--. 2 root root 60 1. Jul 16:52 test $ sudo ls -l test total 0 -rw-r--r--. 1 root root 0 1. Jul 16:52 foo $ ls test ls: cannot access test/foo: Permission denied total 0 -????????? ? ? ? ? ? foo $ stat test/foo stat: cannot stat ?test/foo?: Permission denied I had the idea to treat a failing lstat() inside scandir() as if the entry wasn?t found at all, but in this context, this seems wrong too. regards, jwi From techtonik at gmail.com Tue Jul 1 07:16:52 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 1 Jul 2014 08:16:52 +0300 Subject: [Python-Dev] Excess help() output Message-ID: Hi, The help() output is confusing for beginners: >>> class B(object): ... pass ... >>> help(B) Help on class B in module __main__: class B(__builtin__.object) | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) Is it possible to remove this section from help output? Why is it here at all? >>> dir(B) ['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__'] -- anatoly t. From benhoyt at gmail.com Tue Jul 1 17:30:37 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jul 2014 11:30:37 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: <53B2CCC1.3000409@sotecware.net> References: <53B2CCC1.3000409@sotecware.net> Message-ID: > No need for a microsecond-timed deletion -- a directory with +r but > without +x will allow you to list the entries, but stat calls on the > files will fail with EPERM: Ah -- very good to know, thanks. This definitely points me in the direction of wanting better control over error handling. Speaking of errors, and thinking of handling errors during iteration -- in what cases (if any) would an individual readdir fail if the opendir succeeded? -Ben From ncoghlan at gmail.com Tue Jul 1 17:33:06 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Jul 2014 01:33:06 +1000 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: <53B2CCC1.3000409@sotecware.net> References: <53B2CCC1.3000409@sotecware.net> Message-ID: On 1 Jul 2014 07:31, "Victor Stinner" wrote: > > 2014-07-01 15:00 GMT+02:00 Ben Hoyt : > > 2) Nick Coghlan's proposal on the previous thread > > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > > suggesting an ensure_lstat keyword param to scandir if you need the > > lstat_result value > > I don't like this idea because it makes error handling more complex. > The syntax to catch exceptions on an iterator is verbose (while: try: > next() except ...). Actually, we may need to copy the os.walk API and accept an "onerror" callback as a scandir argument. Regardless of whether or not we have "ensure_lstat", the iteration step could fail, so I don't believe we can just transfer the existing approach of catching exceptions from the listdir call. > Whereas calling os.lstat(entry.fullname()) is explicit and it's easy > to surround it with try/except. > > > > .lstat_result being None sometimes (on POSIX), > > Don't do that, it's not how Python handles portability. We use hasattr(). That's not true in general - we do either, depending on context. With the addition of an os.walk style onerror callback, I'm still in favour of a "get_lstat" flag (tweaked as Ben suggests to always be None unless requested, so Windows code is less likely to be inadvertently non-portable) > > would it ever really happen that readdir() would succeed but an os.stat() immediately after would fail? > > Yes, it can happen. The filesystem is system-wide and shared by all > users. The file can be deleted. We need per-iteration error handling for the readdir call anyway, so I think an onerror callback is a better option than dropping the ability to easily obtain full stat information as part of the iteration. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Tue Jul 1 17:42:25 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jul 2014 11:42:25 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: <53B2CCC1.3000409@sotecware.net> Message-ID: > We need per-iteration error handling for the readdir call anyway, so I think > an onerror callback is a better option than dropping the ability to easily > obtain full stat information as part of the iteration. I don't mind the idea of an "onerror" callback, but it's adding complexity. Putting aside the question of caching/timing for a second and assuming .lstat() as per the current PEP 471, do we really need per-iteration error handling for readdir()? When would that actually fail in practice? -Ben From ethan at stoneleaf.us Tue Jul 1 17:34:20 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 01 Jul 2014 08:34:20 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: <53B2CCC1.3000409@sotecware.net> References: <53B2CCC1.3000409@sotecware.net> Message-ID: <53B2D4FC.2090601@stoneleaf.us> On 07/01/2014 07:59 AM, Jonas Wielicki wrote: > > I had the idea to treat a failing lstat() inside scandir() as if the > entry wasn?t found at all, but in this context, this seems wrong too. Well, os.walk supports passing in an error handler -- perhaps scandir should as well. -- ~Ethan~ From janzert at janzert.com Tue Jul 1 18:06:58 2014 From: janzert at janzert.com (Janzert) Date: Tue, 01 Jul 2014 12:06:58 -0400 Subject: [Python-Dev] PEP 471 -- os.scandir() function -- a better and faster directory iterator In-Reply-To: References: Message-ID: On 6/26/2014 6:59 PM, Ben Hoyt wrote: > Rationale > ========= > > Python's built-in ``os.walk()`` is significantly slower than it needs > to be, because -- in addition to calling ``os.listdir()`` on each > directory -- it executes the system call ``os.stat()`` or > ``GetFileAttributes()`` on each file to determine whether the entry is > a directory or not. > > But the underlying system calls -- ``FindFirstFile`` / > ``FindNextFile`` on Windows and ``readdir`` on Linux and OS X -- > already tell you whether the files returned are directories or not, so > no further system calls are needed. In short, you can reduce the > number of system calls from approximately 2N to N, where N is the > total number of files and directories in the tree. (And because > directory trees are usually much wider than they are deep, it's often > much better than this.) > One of the major reasons for this seems to be efficiently using information that is already available from the OS "for free". Unfortunately it seems that the current API and most of the leading alternate proposals hide from the user what information is actually there "free" and what is going to incur an extra cost. I would prefer an API that simply gives whatever came for free from the OS and then let the user decide if the extra expense is worth the extra information. Maybe that stat information was only going to be used for an informational log that can be skipped if it's going to incur extra expense? Janzert From 4kir4.1i at gmail.com Tue Jul 1 17:58:03 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Tue, 01 Jul 2014 19:58:03 +0400 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) References: Message-ID: <87d2dpm5pw.fsf@gmail.com> Ben Hoyt writes: > Thanks, Victor. > > I don't have any experience with dir_fd handling, so unfortunately > can't really comment here. > > What advantages does it bring? I notice that even os.listdir() on > Python 3.4 doesn't have anything related to file descriptors, so I'd > be in favour of not including support. We can always add it later. > > -Ben FYI, os.listdir does support file descriptors in Python 3.3+ try: >>> import os >>> os.listdir(os.open('.', os.O_RDONLY)) NOTE: os.supports_fd and os.supports_dir_fd are different sets. See also, https://mail.python.org/pipermail/python-dev/2014-June/135265.html -- Akira P.S. Please, don't put your answer on top of the message you are replying to. > > On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner wrote: >> Hi, >> >> IMO we must decide if scandir() must support or not file descriptor. >> It's an important decision which has an important impact on the API. >> >> >> To support scandir(fd), the minimum is to store dir_fd in DirEntry: >> dir_fd would be None for scandir(str). >> >> >> scandir(fd) must not close the file descriptor, it should be done by >> the caller. Handling the lifetime of the file descriptor is a >> difficult problem, it's better to let the user decide how to handle >> it. >> >> There is the problem of the limit of open file descriptors, usually >> 1024 but it can be lower. It *can* be an issue for very deep file >> hierarchy. >> >> If we choose to support scandir(fd), it's probably safer to not use >> scandir(fd) by default in os.walk() (use scandir(str) instead), wait >> until the feature is well tested, corner cases are well known, etc. >> >> >> The second step is to enhance pathlib.Path to support an optional file >> descriptor. Path already has methods on filenames like chmod(), >> exists(), rename(), etc. >> >> >> Example: >> >> fd = os.open(path, os.O_DIRECTORY) >> try: >> for entry in os.scandir(fd): >> # ... use entry to benefit of entry cache: is_dir(), lstat_result ... >> path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) >> # ... use path which uses dir_fd ... >> finally: >> os.close(fd) >> >> Problem: if the path object is stored somewhere and use after the >> loop, Path methods will fail because dir_fd was closed. It's even >> worse if a new directory uses the same file descriptor :-/ (security >> issue, or at least tricky bugs!) >> >> Victor >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com From ncoghlan at gmail.com Tue Jul 1 18:50:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Jul 2014 09:50:48 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: <53B2CCC1.3000409@sotecware.net> Message-ID: On 1 July 2014 08:42, Ben Hoyt wrote: >> We need per-iteration error handling for the readdir call anyway, so I think >> an onerror callback is a better option than dropping the ability to easily >> obtain full stat information as part of the iteration. > > I don't mind the idea of an "onerror" callback, but it's adding > complexity. Putting aside the question of caching/timing for a second > and assuming .lstat() as per the current PEP 471, do we really need > per-iteration error handling for readdir()? When would that actually > fail in practice? An NFS mount dropping the connection or a USB key being removed are the first that come to mind, but I expect there are others. I find it's generally better to just assume that any system call may fail for obscure reasons and put the infrastructure in place to deal with it rather than getting ugly, hard to track down bugs later. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From alex.gaynor at gmail.com Tue Jul 1 20:26:27 2014 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Tue, 1 Jul 2014 18:26:27 +0000 (UTC) Subject: [Python-Dev] Network Security Backport Status Message-ID: Hi all, I wanted to bring everyone up to speed on the status of PEP 466, what's been completed, and what's left to do. First the completed stuff: * hmac.compare_digest * hashlib.pbkdf2_hmac Are both backported, and I've added support to use them in Django, so users should start seeing these benefits just as soon as we get a Python release into their hands. Now the uncompleted stuff: * Persistent file descriptor for ``os.urandom`` * SSL module It's the SSL module that I'll spend the rest of this email talking about. Backporting the features from the Python3 version of this module has proven more difficult than I had expected. This is primarily because the stdlib took a maintenance strategy that was different from what most Python projects have done for their 2/3 support: multiple independent codebases. I've tried a few different strategies for the backport, none of which has worked: * Copying the ``ssl.py``, ``test_ssl.py``, and ``_ssl.c`` files from Python3 and trying to port all the code. * Coping just ``test_ssl.py`` and then copying individual chunks/functions as necessary to get stuff to pass. * Manually doing stuff. All of these proved to be a massive undertaking, and made it too easy to accidentally introduce breaking changes. I've come up with a new approach, which I believe is most likely to be successful, but I'll need help to implement it. The idea is to find the most recent commit which is a parent of both the ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` related file on the ``default`` branch, and attempt to replay it on the ``2.7`` branch. Require manual review on each commit to make sure it compiles, and to ensure it doesn't make any backwards incompatible changes. I think this provides the most iterative and guided approach to getting this done. I can do all the work of reviewing each commit, but I need some help from a mercurial expert to automate the cherry-picking/rebasing of every single commit. What do folks think? Does this approach make sense? Anyone willing to help with the mercurial scripting? Cheers, Alex From ncoghlan at gmail.com Tue Jul 1 21:00:38 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Jul 2014 12:00:38 -0700 Subject: [Python-Dev] Network Security Backport Status In-Reply-To: References: Message-ID: On 1 Jul 2014 11:28, "Alex Gaynor" wrote: > > I've come up with a new approach, which I believe is most likely to be > successful, but I'll need help to implement it. > > The idea is to find the most recent commit which is a parent of both the > ``2.7`` and ``default`` branches. Then take every single change to an ``ssl`` > related file on the ``default`` branch, and attempt to replay it on the ``2.7`` > branch. Require manual review on each commit to make sure it compiles, and to > ensure it doesn't make any backwards incompatible changes. > > I think this provides the most iterative and guided approach to getting this > done. Sounds promising, although it may still have some challenges if the SSL code depends on earlier changes to other code. > I can do all the work of reviewing each commit, but I need some help from a > mercurial expert to automate the cherry-picking/rebasing of every single > commit. > > What do folks think? Does this approach make sense? Anyone willing to help with > the mercurial scripting? For the Mercurial part, it's probably worth posing that as a Stack Overflow question: Given two named branches in http://hg.python.org (default and 2.7) and 4 files (Python module, C module, tests, docs): - find the common ancestor - find all the commits affecting those files on default & graft them to 2.7 (with a chance to test and edit each one first) It's just a better environment for asking & answering that kind of question :) Cheers, Nick. > > Cheers, > Alex > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.wielicki at sotecware.net Tue Jul 1 20:45:22 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Tue, 01 Jul 2014 20:45:22 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: <53B2CCC1.3000409@sotecware.net> Message-ID: <53B301C2.6070206@sotecware.net> On 01.07.2014 17:30, Ben Hoyt wrote: >> No need for a microsecond-timed deletion -- a directory with +r but >> without +x will allow you to list the entries, but stat calls on the >> files will fail with EPERM: > > Ah -- very good to know, thanks. This definitely points me in the > direction of wanting better control over error handling. > > Speaking of errors, and thinking of handling errors during iteration > -- in what cases (if any) would an individual readdir fail if the > opendir succeeded? readdir(3) manpage suggests that readdir can only fail if an invalid directory fd was passed. regards, jwi > > -Ben > From antoine at python.org Tue Jul 1 22:54:28 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 01 Jul 2014 16:54:28 -0400 Subject: [Python-Dev] Network Security Backport Status In-Reply-To: References: Message-ID: Le 01/07/2014 14:26, Alex Gaynor a ?crit : > > I can do all the work of reviewing each commit, but I need some help from a > mercurial expert to automate the cherry-picking/rebasing of every single > commit. > > What do folks think? Does this approach make sense? Anyone willing to help with > the mercurial scripting? I don't think this makes much sense; Mercurial won't be smarter than you are. I think you'd have a better chance of succeeding by backporting one feature at a time. IMO, you'd first want to backport the _SSLContext base class and SSLContext.wrap_socket(). The latter *will* require some manual coding to adapt to 2.7's different SSLSocket implementation, not just applying patch hunks around. Regards Antoine. From guido at python.org Tue Jul 1 22:59:00 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 1 Jul 2014 13:59:00 -0700 Subject: [Python-Dev] Network Security Backport Status In-Reply-To: References: Message-ID: I have to agree with Antoine -- I don't think there's a shortcut that avoids *someone* actually having to understand the code to the point of being able to recreate the same behavior in the different context (pun not intended) of Python 2. On Tue, Jul 1, 2014 at 1:54 PM, Antoine Pitrou wrote: > Le 01/07/2014 14:26, Alex Gaynor a ?crit : > > >> I can do all the work of reviewing each commit, but I need some help from >> a >> mercurial expert to automate the cherry-picking/rebasing of every single >> commit. >> >> What do folks think? Does this approach make sense? Anyone willing to >> help with >> the mercurial scripting? >> > > I don't think this makes much sense; Mercurial won't be smarter than you > are. I think you'd have a better chance of succeeding by backporting one > feature at a time. IMO, you'd first want to backport the _SSLContext base > class and SSLContext.wrap_socket(). The latter *will* require some manual > coding to adapt to 2.7's different SSLSocket implementation, not just > applying patch hunks around. > > Regards > > Antoine. > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Jul 1 23:20:17 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 1 Jul 2014 22:20:17 +0100 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: On 1 July 2014 14:00, Ben Hoyt wrote: > 2) Nick Coghlan's proposal on the previous thread > (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) > suggesting an ensure_lstat keyword param to scandir if you need the > lstat_result value > > I would make one small tweak to Nick Coghlan's proposal to make > writing cross-platform code easier. Instead of .lstat_result being > None sometimes (on POSIX), have it None always unless you specify > ensure_lstat=True. (Actually, call it get_lstat=True to kind of make > this more obvious.) Per (b) above, this means Windows developers > wouldn't accidentally write code which failed on POSIX systems -- it'd > fail fast on Windows too if you accessed .lstat_result without > specifying get_lstat=True. This is getting very complicated (at least to me, as a Windows user, where the basic idea seems straightforward). It seems to me that the right model is the standard "thin wrapper round the OS feature" that acts as a building block - it's typical of the rest of the os module. I think that thin wrapper is needed - even if the various bells and whistles are useful, they can be built on top of a low-level version (whereas the converse is not the case). Typically, such thin wrappers expose POSIX semantics by default, and Windows behaviour follows as closely as possible (see for example stat, where st_ino makes no sense on Windows, but is present). In this case, we're exposing Windows semantics, and POSIX is the one needing to fit the model, but the principle is the same. On that basis, optional attributes (as used in stat results) seem entirely sensible. The documentation for DirEntry could easily be written to parallel that of a stat result: """ The return value is an object whose attributes correspond to the data the OS returns about a directory entry: * name - the object's name * full_name - the object's full name (including path) * is_dir - whether the object is a directory * is file - whether the object is a plain file * is_symlink - whether the object is a symbolic link On Windows, the following attributes are also available * st_size - the size, in bytes, of the object (only meaningful for files) * st_atime - time of last access * st_mtime - time of last write * st_ctime - time of creation * st_file_attributes - Windows file attribute bits (see the FILE_ATTRIBUTE_* constants in the stat module) """ That's no harder to understand (or to work with) than the equivalent stat result. The only difference is that the unavailable attributes can be queried on POSIX, there's just a separate system call involved (with implications in terms of performance, error handling and potential race conditions). The version of scandir with the ensure_lstat argument is easy to write based on one with optional arguments (I'm playing fast and loose with adding attributes to DirEntry values here, just for the sake of an example - the details are left as an exercise) def scandir_ensure(path='.', ensure_lstat=False): for entry in os.scandir(path): if ensure_lstat and not hasattr(entry, 'st_size'): stat_data = os.lstat(entry.full_name) entry.st_size = stat_data.st_size entry.st_atime = stat_data.st_atime entry.st_mtime = stat_data.st_mtime entry.st_ctime = stat_data.st_ctime # Ignore file_attributes, as we'll never get here on Windows yield entry Variations on how you handle errors in the lstat call, etc, can be added to taste. Please, let's stick to a low-level wrapper round the OS API for the first iteration of this feature. Enhancements can be added later, when real-world usage has proved their value. Paul From v+python at g.nevcal.com Tue Jul 1 23:39:51 2014 From: v+python at g.nevcal.com (Glenn Linderman) Date: Tue, 01 Jul 2014 14:39:51 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: <53B32AA7.1050305@g.nevcal.com> On 7/1/2014 2:20 PM, Paul Moore wrote: > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. I almost wrote this whole message this morning, but didn't have time. Thanks, Paul, for digging through the details. +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jul 1 23:30:48 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 01 Jul 2014 14:30:48 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: <53B32888.4020604@stoneleaf.us> On 07/01/2014 02:20 PM, Paul Moore wrote: > > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. +1 From rosuav at gmail.com Wed Jul 2 03:13:56 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jul 2014 11:13:56 +1000 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: On Wed, Jul 2, 2014 at 7:20 AM, Paul Moore wrote: > I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). +1. Make everything as simple as possible (but no simpler). ChrisA From benjamin at python.org Wed Jul 2 07:55:14 2014 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 01 Jul 2014 22:55:14 -0700 Subject: [Python-Dev] [RELEASE] Python 2.7.8 Message-ID: <1404280514.30741.136823729.74EB0C0B@webmail.messagingengine.com> Greetings, I have the distinct privilege of informing you that the latest release of the Python 2.7 series, 2.7.8, has been released and is available for download. 2.7.8 contains several important regression fixes and security changes: - The openssl version bundled in the Windows installer has been updated. - A regression in the mimetypes module on Windows has been fixed. [1] - A possible overflow in the buffer type has been fixed. [2] - A bug in the CGIHTTPServer module which allows arbitrary execution of code in the server root has been patched. [3] - A regression in the handling of UNC paths in os.path.join has been fixed. [4] Downloads of 2.7.8 are at https://www.python.org/download/releases/2.7.8/ The full changelog is located at http://hg.python.org/cpython/raw-file/v2.7.8/Misc/NEWS This is a production release. As always, please report bugs to http://bugs.python.org/ Till next time, Benjamin Peterson 2.7 Release Manager (on behalf of all of Python's contributors) [1] http://bugs.python.org/issue21652 [2] http://bugs.python.org/issue21831 [3] http://bugs.python.org/issue21766 [4] http://bugs.python.org/issue21672 From ncoghlan at gmail.com Wed Jul 2 08:35:48 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Jul 2014 23:35:48 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: On 1 July 2014 14:20, Paul Moore wrote: > On 1 July 2014 14:00, Ben Hoyt wrote: >> 2) Nick Coghlan's proposal on the previous thread >> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html) >> suggesting an ensure_lstat keyword param to scandir if you need the >> lstat_result value >> >> I would make one small tweak to Nick Coghlan's proposal to make >> writing cross-platform code easier. Instead of .lstat_result being >> None sometimes (on POSIX), have it None always unless you specify >> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make >> this more obvious.) Per (b) above, this means Windows developers >> wouldn't accidentally write code which failed on POSIX systems -- it'd >> fail fast on Windows too if you accessed .lstat_result without >> specifying get_lstat=True. > > This is getting very complicated (at least to me, as a Windows user, > where the basic idea seems straightforward). > > It seems to me that the right model is the standard "thin wrapper > round the OS feature" that acts as a building block - it's typical of > the rest of the os module. I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). > Typically, such thin wrappers expose POSIX semantics by default, and > Windows behaviour follows as closely as possible (see for example > stat, where st_ino makes no sense on Windows, but is present). In this > case, we're exposing Windows semantics, and POSIX is the one needing > to fit the model, but the principle is the same. > > On that basis, optional attributes (as used in stat results) seem > entirely sensible. > > The documentation for DirEntry could easily be written to parallel > that of a stat result: > > """ > The return value is an object whose attributes correspond to the data > the OS returns about a directory entry: > > * name - the object's name > * full_name - the object's full name (including path) > * is_dir - whether the object is a directory > * is file - whether the object is a plain file > * is_symlink - whether the object is a symbolic link > > On Windows, the following attributes are also available > > * st_size - the size, in bytes, of the object (only meaningful for files) > * st_atime - time of last access > * st_mtime - time of last write > * st_ctime - time of creation > * st_file_attributes - Windows file attribute bits (see the > FILE_ATTRIBUTE_* constants in the stat module) > """ > > That's no harder to understand (or to work with) than the equivalent > stat result. The only difference is that the unavailable attributes > can be queried on POSIX, there's just a separate system call involved > (with implications in terms of performance, error handling and > potential race conditions). > > The version of scandir with the ensure_lstat argument is easy to write > based on one with optional arguments (I'm playing fast and loose with > adding attributes to DirEntry values here, just for the sake of an > example - the details are left as an exercise) > > def scandir_ensure(path='.', ensure_lstat=False): > for entry in os.scandir(path): > if ensure_lstat and not hasattr(entry, 'st_size'): > stat_data = os.lstat(entry.full_name) > entry.st_size = stat_data.st_size > entry.st_atime = stat_data.st_atime > entry.st_mtime = stat_data.st_mtime > entry.st_ctime = stat_data.st_ctime > # Ignore file_attributes, as we'll never get here on Windows > yield entry > > Variations on how you handle errors in the lstat call, etc, can be > added to taste. > > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. +1 from me - especially if this recipe goes in at least the PEP, and potentially even the docs. I'm also OK with postponing onerror support for the time being - that should be straightforward to add later if we decide we need it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From j.wielicki at sotecware.net Wed Jul 2 12:25:47 2014 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Wed, 02 Jul 2014 12:25:47 +0200 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: <53B3DE2B.3050209@sotecware.net> On 01.07.2014 23:20, Paul Moore wrote: > [snip] > Please, let's stick to a low-level wrapper round the OS API for the > first iteration of this feature. Enhancements can be added later, when > real-world usage has proved their value. > > Paul +1 to the whole thing. That?s an ingeniously simple solution to the issues we?re having here. regards, jwi From cf.natali at gmail.com Wed Jul 2 12:51:43 2014 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 2 Jul 2014 11:51:43 +0100 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: 2014-07-01 8:44 GMT+01:00 Victor Stinner : > > IMO we must decide if scandir() must support or not file descriptor. > It's an important decision which has an important impact on the API. I don't think we should support it: it's way too complicated to use, error-prone, and leads to messy APIs. From victor.stinner at gmail.com Wed Jul 2 13:59:26 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 2 Jul 2014 13:59:26 +0200 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: 2014-07-02 12:51 GMT+02:00 Charles-Fran?ois Natali : > I don't think we should support it: it's way too complicated to use, > error-prone, and leads to messy APIs. Can you please elaborate? Which kind of issue do you see? Handling the lifetime of the directory file descriptor? You don't like the dir_fd parameter of os functions? I don't have an opinion of supporting scandir(int). I asked to discuss it in the PEP directly. Victor From benhoyt at gmail.com Wed Jul 2 14:41:28 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 2 Jul 2014 08:41:28 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: Thanks for the effort in your response, Paul. I'm all for KISS, but let's just slow down a bit here. > I think that thin wrapper is needed - even > if the various bells and whistles are useful, they can be built on top > of a low-level version (whereas the converse is not the case). Yes, but API design is important. For example, urllib2 has a kind of the "thin wrapper approach", but millions of people use the 3rd-party "requests" library because it's just so much nicer to use. There are low-level functions in the "os" module, but there are also a lot of higher-level functions (os.walk) and functions that smooth over cross-platform issues (os.stat). Detailed comments below. > The return value is an object whose attributes correspond to the data > the OS returns about a directory entry: > > * name - the object's name > * full_name - the object's full name (including path) > * is_dir - whether the object is a directory > * is file - whether the object is a plain file > * is_symlink - whether the object is a symbolic link > > On Windows, the following attributes are also available > > * st_size - the size, in bytes, of the object (only meaningful for files) > * st_atime - time of last access > * st_mtime - time of last write > * st_ctime - time of creation > * st_file_attributes - Windows file attribute bits (see the > FILE_ATTRIBUTE_* constants in the stat module) Again, this seems like a nice simple idea, but I think it's actually a worst-of-both-worlds solution -- it has a few problems: 1) It's a nasty API to actually write code with. If you try to use it, it gives off a "made only for low-level library authors" rather than "designed for developers" smell. For example, here's a get_tree_size() function I use written in both versions (original is the PEP 471 version with the addition of .full_name): def get_tree_size_original(path): """Return total size of all files in directory tree at path.""" total = 0 for entry in os.scandir(path): if entry.is_dir(): total += get_tree_size_original(entry.full_name) else: total += entry.lstat().st_size return total def get_tree_size_new(path): """Return total size of all files in directory tree at path.""" total = 0 for entry in os.scandir(path): if hasattr(entry, 'is_dir') and hasattr(entry, 'st_size'): is_dir = entry.is_dir size = entry.st_size else: st = os.lstat(entry.full_name) is_dir = stat.S_ISDIR(st.st_mode) size = st.st_size if is_dir: total += get_tree_size_new(entry.full_name) else: total += size return total I know which version I'd rather write and maintain! It seems to me new users and folks new to Python could easily write the top version, but the bottom is longer, more complicated, and harder to get right. It would also be very easy to write code in a way that works on Windows but bombs hard on POSIX. 2) It seems like your assumption is that is_dir/is_file/is_symlink are always available on POSIX via readdir. This isn't actually the case (this was discussed in the original threads) -- if readdir() returns dirent.d_type as DT_UNKNOWN, then you actually have to call os.stat() anyway to get it. So, as the above definition of get_tree_size_new() shows, you have to use getattr/hasattr on everything: is_dir/is_file/is_symlink as well as the st_* attributes. 3) It's not much different in concept to the PEP 471 version, except that PEP 471 has a built-in .lstat() method, making the user's life much easier. This is the sense in which it's the worst of both worlds -- it's a far less nice API to use, but it still has the same issues with race conditions the original does. So thinking about this again: First, based on the +1's to Paul's new solution, I don't think people are too concerned about the race condition issue (attributes being different between the original readdir and the os.stat calls). I think this is probably fair -- if folks care, they can handle it in an application-specific way. So that means Paul's new solution and the original PEP 471 approach are both okay on that score. Second, comparing PEP 471 to Nick's solution: error handling is much more straight-forward and simple to document with the original PEP 471 approach (just try/catch around the function calls) than with Nick's get_lstat=True approach of doing the stat() if needed inside the iterator. To catch errors with that approach, you'd either have to do a "while True" loop and try/catch around next(it) manually (which is very yucky code), or we'd have to add an onerror callback, which is somewhat less nice to use and harder to document (signature of the callback, exception object, etc). So given all of the above, I'm fairly strongly in favour of the approach in the original PEP 471 due to it's easy-to-use API and straight-forward try/catch approach to error handling. (My second option would be Nick's get_lstat=True with the onerror callback. My third option would be Paul's attribute-only solution, as it's just very hard to use.) Thoughts? -Ben From p.f.moore at gmail.com Wed Jul 2 15:48:12 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 2 Jul 2014 14:48:12 +0100 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: tl;dr - I agree with your points and think that the original PEP 471 proposal is fine. The details here are just clarification of why my proposal wasn't just "use PEP 471 as written" in the first place... On 2 July 2014 13:41, Ben Hoyt wrote: > 1) It's a nasty API to actually write code with. If you try to use it, > it gives off a "made only for low-level library authors" rather than > "designed for developers" smell. For example, here's a get_tree_size() > function I use written in both versions (original is the PEP 471 > version with the addition of .full_name): > > def get_tree_size_original(path): > """Return total size of all files in directory tree at path.""" > total = 0 > for entry in os.scandir(path): > if entry.is_dir(): > total += get_tree_size_original(entry.full_name) > else: > total += entry.lstat().st_size > return total > > def get_tree_size_new(path): > """Return total size of all files in directory tree at path.""" > total = 0 > for entry in os.scandir(path): > if hasattr(entry, 'is_dir') and hasattr(entry, 'st_size'): > is_dir = entry.is_dir > size = entry.st_size > else: > st = os.lstat(entry.full_name) > is_dir = stat.S_ISDIR(st.st_mode) > size = st.st_size > if is_dir: > total += get_tree_size_new(entry.full_name) > else: > total += size > return total > > I know which version I'd rather write and maintain! Fair point. But *only* because is_dir isn't guaranteed to be available. I could debate other aspects of your translation to use my API, but it's not relevant as my proposal was flawed in terms of is_XXX. > It seems to me new > users and folks new to Python could easily write the top version, but > the bottom is longer, more complicated, and harder to get right. Given the is_dir point, agreed. > It > would also be very easy to write code in a way that works on Windows > but bombs hard on POSIX. You may have a point here - my Windows bias may be showing. It's already awfully easy to write code that works on POSIX but bombs hard on Windows (deleting open files, for example) so I find it tempting to think "give them a taste of their own medicine" :-) More seriously, it seems to me that the scandir API is specifically designed to write efficient code on platforms where the OS gives information that allows you to do so. Warping the API too much to cater for platforms where that isn't possible seems to have the priorities backwards. Making the API not be an accident waiting to happen is fine, though. And let's be careful, too. My position is that it's not too hard to write code that works on Windows, Linux and OS X but you're right you could miss the problem with platforms that don't even support a free is_dir(). It's *easier* to write Windows-only code by mistake, but the fix to cover the "big three" is pretty simple (if not hasattr, lstat). > 2) It seems like your assumption is that is_dir/is_file/is_symlink are > always available on POSIX via readdir. This isn't actually the case > (this was discussed in the original threads) -- if readdir() returns > dirent.d_type as DT_UNKNOWN, then you actually have to call os.stat() > anyway to get it. So, as the above definition of get_tree_size_new() > shows, you have to use getattr/hasattr on everything: > is_dir/is_file/is_symlink as well as the st_* attributes. Ah, the wording in the PEP says "Linux, Windows, OS X". Superficially, that said "everywhere" to me. It might be worth calling out specifically some examples where it's not available without an extra system call, just to make the point explicit. You're right, though, that blows away the simplicity of my proposal. The original PEP 471 seems precisely right to me, in that case. I was only really arguing for attributes because they seem more obviously static than a method call. And personally I don't care about that aspect. > 3) It's not much different in concept to the PEP 471 version, except > that PEP 471 has a built-in .lstat() method, making the user's life > much easier. This is the sense in which it's the worst of both worlds > -- it's a far less nice API to use, but it still has the same issues > with race conditions the original does. Agreed. My intent was never to remove the race conditions, I see them as the responsibility of the application to consider (many applications simply won't care, and those that do will likely want a specific solution, not a library-level compromise). > So thinking about this again: > > First, based on the +1's to Paul's new solution, I don't think people > are too concerned about the race condition issue (attributes being > different between the original readdir and the os.stat calls). I think > this is probably fair -- if folks care, they can handle it in an > application-specific way. So that means Paul's new solution and the > original PEP 471 approach are both okay on that score. +1. That was my main point, in actual fact > Second, comparing PEP 471 to Nick's solution: error handling is much > more straight-forward and simple to document with the original PEP 471 > approach (just try/catch around the function calls) than with Nick's > get_lstat=True approach of doing the stat() if needed inside the > iterator. To catch errors with that approach, you'd either have to do > a "while True" loop and try/catch around next(it) manually (which is > very yucky code), or we'd have to add an onerror callback, which is > somewhat less nice to use and harder to document (signature of the > callback, exception object, etc). Agreed. If my solution had worked, it would have been by isolating a few extra cases where you could guarantee errors won't happen. But actually, errors *can* happen in those cases, on certain systems. So PEP 471 wins on all counts here too. > So given all of the above, I'm fairly strongly in favour of the > approach in the original PEP 471 due to it's easy-to-use API and > straight-forward try/catch approach to error handling. (My second > option would be Nick's get_lstat=True with the onerror callback. My > third option would be Paul's attribute-only solution, as it's just > very hard to use.) Agreed. The solution I proposed isn't just "very hard to use", it's actually wrong. If is_XXX are optional attributes, that's not my solution, and I agree it's *awful*. Paul. PS I'd suggest adding a "Rejected proposals" section to the PEP which mentions the race condition issue and points to this discussion as an indication that people didn't seem to see it as a problem. On 2 July 2014 13:41, Ben Hoyt wrote: > Thanks for the effort in your response, Paul. > > I'm all for KISS, but let's just slow down a bit here. > >> I think that thin wrapper is needed - even >> if the various bells and whistles are useful, they can be built on top >> of a low-level version (whereas the converse is not the case). > > Yes, but API design is important. For example, urllib2 has a kind of > the "thin wrapper approach", but millions of people use the 3rd-party > "requests" library because it's just so much nicer to use. > > There are low-level functions in the "os" module, but there are also a > lot of higher-level functions (os.walk) and functions that smooth over > cross-platform issues (os.stat). > > Detailed comments below. > >> The return value is an object whose attributes correspond to the data >> the OS returns about a directory entry: >> >> * name - the object's name >> * full_name - the object's full name (including path) >> * is_dir - whether the object is a directory >> * is file - whether the object is a plain file >> * is_symlink - whether the object is a symbolic link >> >> On Windows, the following attributes are also available >> >> * st_size - the size, in bytes, of the object (only meaningful for files) >> * st_atime - time of last access >> * st_mtime - time of last write >> * st_ctime - time of creation >> * st_file_attributes - Windows file attribute bits (see the >> FILE_ATTRIBUTE_* constants in the stat module) > > Again, this seems like a nice simple idea, but I think it's actually a > worst-of-both-worlds solution -- it has a few problems: > > 1) It's a nasty API to actually write code with. If you try to use it, > it gives off a "made only for low-level library authors" rather than > "designed for developers" smell. For example, here's a get_tree_size() > function I use written in both versions (original is the PEP 471 > version with the addition of .full_name): > > def get_tree_size_original(path): > """Return total size of all files in directory tree at path.""" > total = 0 > for entry in os.scandir(path): > if entry.is_dir(): > total += get_tree_size_original(entry.full_name) > else: > total += entry.lstat().st_size > return total > > def get_tree_size_new(path): > """Return total size of all files in directory tree at path.""" > total = 0 > for entry in os.scandir(path): > if hasattr(entry, 'is_dir') and hasattr(entry, 'st_size'): > is_dir = entry.is_dir > size = entry.st_size > else: > st = os.lstat(entry.full_name) > is_dir = stat.S_ISDIR(st.st_mode) > size = st.st_size > if is_dir: > total += get_tree_size_new(entry.full_name) > else: > total += size > return total > > I know which version I'd rather write and maintain! It seems to me new > users and folks new to Python could easily write the top version, but > the bottom is longer, more complicated, and harder to get right. It > would also be very easy to write code in a way that works on Windows > but bombs hard on POSIX. > > 2) It seems like your assumption is that is_dir/is_file/is_symlink are > always available on POSIX via readdir. This isn't actually the case > (this was discussed in the original threads) -- if readdir() returns > dirent.d_type as DT_UNKNOWN, then you actually have to call os.stat() > anyway to get it. So, as the above definition of get_tree_size_new() > shows, you have to use getattr/hasattr on everything: > is_dir/is_file/is_symlink as well as the st_* attributes. > > 3) It's not much different in concept to the PEP 471 version, except > that PEP 471 has a built-in .lstat() method, making the user's life > much easier. This is the sense in which it's the worst of both worlds > -- it's a far less nice API to use, but it still has the same issues > with race conditions the original does. > > So thinking about this again: > > First, based on the +1's to Paul's new solution, I don't think people > are too concerned about the race condition issue (attributes being > different between the original readdir and the os.stat calls). I think > this is probably fair -- if folks care, they can handle it in an > application-specific way. So that means Paul's new solution and the > original PEP 471 approach are both okay on that score. > > Second, comparing PEP 471 to Nick's solution: error handling is much > more straight-forward and simple to document with the original PEP 471 > approach (just try/catch around the function calls) than with Nick's > get_lstat=True approach of doing the stat() if needed inside the > iterator. To catch errors with that approach, you'd either have to do > a "while True" loop and try/catch around next(it) manually (which is > very yucky code), or we'd have to add an onerror callback, which is > somewhat less nice to use and harder to document (signature of the > callback, exception object, etc). > > So given all of the above, I'm fairly strongly in favour of the > approach in the original PEP 471 due to it's easy-to-use API and > straight-forward try/catch approach to error handling. (My second > option would be Nick's get_lstat=True with the onerror callback. My > third option would be Paul's attribute-only solution, as it's just > very hard to use.) > > Thoughts? > > -Ben From benhoyt at gmail.com Wed Jul 2 16:48:50 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 2 Jul 2014 10:48:50 -0400 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: References: Message-ID: Thanks for the clarifications and support. > Ah, the wording in the PEP says "Linux, Windows, OS X". Superficially, > that said "everywhere" to me. It might be worth calling out > specifically some examples where it's not available without an extra > system call, just to make the point explicit. Good call. I'll update the wording in the PEP here and try to call out specific examples of where is_dir() could call os.stat(). Hard-core POSIX people, do you know when readdir() d_type will be DT_UNKNOWN on (for example) Linux or OS X? I suspect this can happen on certain network filesystems, but I'm not sure. > PS I'd suggest adding a "Rejected proposals" section to the PEP which > mentions the race condition issue and points to this discussion as an > indication that people didn't seem to see it as a problem. Definitely agreed. I'll add this, and clarify various other issues in the PEP, and then repost. -Ben From cf.natali at gmail.com Wed Jul 2 19:20:41 2014 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 2 Jul 2014 18:20:41 +0100 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: > 2014-07-02 12:51 GMT+02:00 Charles-Fran?ois Natali : >> I don't think we should support it: it's way too complicated to use, >> error-prone, and leads to messy APIs. > > Can you please elaborate? Which kind of issue do you see? Handling the > lifetime of the directory file descriptor? Yes, among other things. You can e.g. have a look at os.fwalk() or shutil._rmtree_safe_fd() to see that using those *properly* is far from being trivial. > You don't like the dir_fd parameter of os functions? Exactly, I think it complicates the API for little benefit (FWIW, no other language I know of exposes them). From Nikolaus at rath.org Wed Jul 2 23:59:01 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 02 Jul 2014 14:59:01 -0700 Subject: [Python-Dev] My summary of the scandir (PEP 471) In-Reply-To: (Ben Hoyt's message of "Wed, 2 Jul 2014 10:48:50 -0400") References: Message-ID: <877g3vxw0q.fsf@rath.org> Ben Hoyt writes: > Thanks for the clarifications and support. > >> Ah, the wording in the PEP says "Linux, Windows, OS X". Superficially, >> that said "everywhere" to me. It might be worth calling out >> specifically some examples where it's not available without an extra >> system call, just to make the point explicit. > > Good call. I'll update the wording in the PEP here and try to call out > specific examples of where is_dir() could call os.stat(). > > Hard-core POSIX people, do you know when readdir() d_type will be > DT_UNKNOWN on (for example) Linux or OS X? I suspect this can happen > on certain network filesystems, but I'm not sure. Any fuse file system mounted by some other user and without -o allow_other. For these entries, stat() will fail as well. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From bcannon at gmail.com Fri Jul 4 15:00:26 2014 From: bcannon at gmail.com (Brett Cannon) Date: Fri, 04 Jul 2014 13:00:26 +0000 Subject: [Python-Dev] [Python-checkins] Daily reference leaks (42917d774476): sum=9 References: Message-ID: Looks like there is an actual leak found by test_io. Any ideas on what may have introduced it? On Fri Jul 04 2014 at 5:01:02 AM, wrote: > results for 42917d774476 on branch "default" > -------------------------------------------- > > test_functools leaked [0, 0, 3] memory blocks, sum=3 > test_io leaked [2, 2, 2] references, sum=6 > > > Command line was: ['./python', '-m', 'test.regrtest', '-uall', '-R', > '3:3:/home/antoine/cpython/refleaks/reflogODkfML', '-x'] > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > https://mail.python.org/mailman/listinfo/python-checkins > -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jul 4 18:07:58 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 4 Jul 2014 18:07:58 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140704160758.440EB56A6A@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-06-27 - 2014-07-04) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4603 (-40) closed 29086 (+82) total 33689 (+42) Open issues with patches: 2150 Issues opened (34) ================== #8631: subprocess.Popen.communicate(...) hangs on Windows http://bugs.python.org/issue8631 reopened by terry.reedy #20155: Regression test test_httpservers fails, hangs on Windows http://bugs.python.org/issue20155 reopened by r.david.murray #21876: os.rename(src,dst) does nothing when src and dst files are har http://bugs.python.org/issue21876 opened by Aaron.Swan #21877: External.bat and pcbuild of tkinter do not match. http://bugs.python.org/issue21877 opened by terry.reedy #21878: wsgi.simple_server's wsgi.input read/readline waits forever in http://bugs.python.org/issue21878 opened by rschoon #21879: str.format() gives poor diagnostic on placeholder mismatch http://bugs.python.org/issue21879 opened by roysmith #21880: IDLE: Ability to run 3rd party code checkers http://bugs.python.org/issue21880 opened by sahutd #21881: python cannot parse tcl value http://bugs.python.org/issue21881 opened by schwab #21882: turtledemo modules imported by test___all__ cause side effects http://bugs.python.org/issue21882 opened by ned.deily #21883: relpath: Provide better errors when mixing bytes and strings http://bugs.python.org/issue21883 opened by Matt.Bachmann #21885: shutil.copytree hangs (on copying root directory of a lxc cont http://bugs.python.org/issue21885 opened by krichter #21886: asyncio: Future.set_result() called on cancelled Future raises http://bugs.python.org/issue21886 opened by haypo #21888: plistlib.FMT_BINARY behavior doesn't send required dict parame http://bugs.python.org/issue21888 opened by n8henrie #21889: https://docs.python.org/2/library/multiprocessing.html#process http://bugs.python.org/issue21889 opened by krichter #21890: wsgiref.simple_server sends headers on empty bytes http://bugs.python.org/issue21890 opened by rschoon #21895: signal.pause() doesn't wake up on SIGCHLD in non-main thread http://bugs.python.org/issue21895 opened by bkabrda #21896: Unexpected ConnectionResetError in urllib.request against a va http://bugs.python.org/issue21896 opened by Tymoteusz.Paul #21897: frame.f_locals causes segfault on Python >=3.4.1 http://bugs.python.org/issue21897 opened by msmhrt #21898: .hgignore: Missing ignores for Eclipse/pydev http://bugs.python.org/issue21898 opened by andymaier #21899: Futures are not marked as completed http://bugs.python.org/issue21899 opened by Sebastian.Kreft.Deezer #21901: test_selectors.PollSelectorTestCase.test_above_fd_setsize repo http://bugs.python.org/issue21901 opened by r.david.murray #21902: Docstring of math.acosh, asinh, and atanh http://bugs.python.org/issue21902 opened by kdavies4 #21903: ctypes documentation MessageBoxA example produces error http://bugs.python.org/issue21903 opened by Dan.O'Donovan #21905: RuntimeError in pickle.whichmodule when sys.modules if mutate http://bugs.python.org/issue21905 opened by Olivier.Grisel #21906: Tools\Scripts\md5sum.py doesn't work in Python 3.x http://bugs.python.org/issue21906 opened by torrin #21907: Update Windows build batch scripts http://bugs.python.org/issue21907 opened by zach.ware #21909: PyLong_FromString drops const http://bugs.python.org/issue21909 opened by h.venev #21910: File protocol should document if writelines must handle genera http://bugs.python.org/issue21910 opened by JanKanis #21911: "IndexError: tuple index out of range" should include the requ http://bugs.python.org/issue21911 opened by cool-RR #21913: Possible deadlock in threading.Condition.wait() in Python 2.7. http://bugs.python.org/issue21913 opened by sangeeth #21914: Create unit tests for Turtle guionly http://bugs.python.org/issue21914 opened by Lita.Cho #21915: telnetlib.Telnet constructor does not match telnetlib.Telnet._ http://bugs.python.org/issue21915 opened by yaneurabeya #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 opened by ingrid #21917: Python 2.7.7 Tests fail, and math is faulty http://bugs.python.org/issue21917 opened by repcsike Most recent 15 issues with no replies (15) ========================================== #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 #21909: PyLong_FromString drops const http://bugs.python.org/issue21909 #21899: Futures are not marked as completed http://bugs.python.org/issue21899 #21898: .hgignore: Missing ignores for Eclipse/pydev http://bugs.python.org/issue21898 #21889: https://docs.python.org/2/library/multiprocessing.html#process http://bugs.python.org/issue21889 #21885: shutil.copytree hangs (on copying root directory of a lxc cont http://bugs.python.org/issue21885 #21874: test_strptime fails on rhel/centos/fedora systems http://bugs.python.org/issue21874 #21865: Improve invalid category exception for warnings.filterwarnings http://bugs.python.org/issue21865 #21859: Add Python implementation of FileIO http://bugs.python.org/issue21859 #21854: Fix cookielib in unicodeless build http://bugs.python.org/issue21854 #21853: Fix inspect in unicodeless build http://bugs.python.org/issue21853 #21852: Fix optparse in unicodeless build http://bugs.python.org/issue21852 #21851: Fix gettext in unicodeless build http://bugs.python.org/issue21851 #21850: Fix httplib and SimpleHTTPServer in unicodeless build http://bugs.python.org/issue21850 #21847: Fix xmlrpc in unicodeless build http://bugs.python.org/issue21847 Most recent 15 issues waiting for review (15) ============================================= #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 #21914: Create unit tests for Turtle guionly http://bugs.python.org/issue21914 #21907: Update Windows build batch scripts http://bugs.python.org/issue21907 #21906: Tools\Scripts\md5sum.py doesn't work in Python 3.x http://bugs.python.org/issue21906 #21905: RuntimeError in pickle.whichmodule when sys.modules if mutate http://bugs.python.org/issue21905 #21903: ctypes documentation MessageBoxA example produces error http://bugs.python.org/issue21903 #21902: Docstring of math.acosh, asinh, and atanh http://bugs.python.org/issue21902 #21898: .hgignore: Missing ignores for Eclipse/pydev http://bugs.python.org/issue21898 #21897: frame.f_locals causes segfault on Python >=3.4.1 http://bugs.python.org/issue21897 #21890: wsgiref.simple_server sends headers on empty bytes http://bugs.python.org/issue21890 #21883: relpath: Provide better errors when mixing bytes and strings http://bugs.python.org/issue21883 #21880: IDLE: Ability to run 3rd party code checkers http://bugs.python.org/issue21880 #21868: Tbuffer in turtle allows negative size http://bugs.python.org/issue21868 #21865: Improve invalid category exception for warnings.filterwarnings http://bugs.python.org/issue21865 #21862: cProfile command-line should accept "-m module_name" as an alt http://bugs.python.org/issue21862 Top 10 most discussed issues (10) ================================= #21902: Docstring of math.acosh, asinh, and atanh http://bugs.python.org/issue21902 13 msgs #21911: "IndexError: tuple index out of range" should include the requ http://bugs.python.org/issue21911 11 msgs #12067: Doc: remove errors about mixed-type comparisons. http://bugs.python.org/issue12067 8 msgs #20155: Regression test test_httpservers fails, hangs on Windows http://bugs.python.org/issue20155 8 msgs #12750: datetime.strftime('%s') should respect tzinfo http://bugs.python.org/issue12750 7 msgs #21090: File read silently stops after EIO I/O error http://bugs.python.org/issue21090 7 msgs #12420: distutils tests fail if PATH is not defined http://bugs.python.org/issue12420 6 msgs #14050: Tutorial, list.sort() and items comparability http://bugs.python.org/issue14050 6 msgs #21882: turtledemo modules imported by test___all__ cause side effects http://bugs.python.org/issue21882 6 msgs #2571: can cmd.py's API/docs for the use of an alternate stdin be imp http://bugs.python.org/issue2571 5 msgs Issues closed (72) ================== #2057: difflib: add patch capability http://bugs.python.org/issue2057 closed by terry.reedy #4899: doctest should support fixtures http://bugs.python.org/issue4899 closed by terry.reedy #5207: extend strftime/strptime format for RFC3339 and RFC2822 http://bugs.python.org/issue5207 closed by belopolsky #5638: test_httpservers fails CGI tests if --enable-shared http://bugs.python.org/issue5638 closed by ned.deily #5862: multiprocessing 'using a remote manager' example errors and po http://bugs.python.org/issue5862 closed by berker.peksag #5930: Transient error in multiprocessing (test_number_of_objects) http://bugs.python.org/issue5930 closed by haypo #6692: asyncore kqueue support http://bugs.python.org/issue6692 closed by haypo #7506: multiprocessing.managers.BaseManager.__reduce__ references Bas http://bugs.python.org/issue7506 closed by berker.peksag #7885: test_distutils fails if Python built in separate directory http://bugs.python.org/issue7885 closed by ned.deily #9860: Building python outside of source directory fails http://bugs.python.org/issue9860 closed by belopolsky #10000: mark more tests as CPython specific http://bugs.python.org/issue10000 closed by rhettinger #10236: Sporadic failures of test_ssl http://bugs.python.org/issue10236 closed by ned.deily #10402: sporadic test_bsddb3 failures http://bugs.python.org/issue10402 closed by jcea #10445: _ast py3k : add lineno back to "args" node http://bugs.python.org/issue10445 closed by Claudiu.Popa #10941: imaplib: Internaldate2tuple produces wrong result if date is n http://bugs.python.org/issue10941 closed by r.david.murray #11273: asyncore creates selec (or poll) on every iteration http://bugs.python.org/issue11273 closed by haypo #11279: test_posix and lack of "id -G" support - less noise required? http://bugs.python.org/issue11279 closed by python-dev #11389: unittest: no way to control verbosity of doctests from cmd http://bugs.python.org/issue11389 closed by terry.reedy #11453: asyncore.file_wrapper should implement __del__ and call close http://bugs.python.org/issue11453 closed by haypo #11762: Ast doc: warning and version number http://bugs.python.org/issue11762 closed by berker.peksag #12401: unset PYTHON* environment variables when running tests http://bugs.python.org/issue12401 closed by haypo #12498: asyncore.dispatcher_with_send, disconnection problem + miss-co http://bugs.python.org/issue12498 closed by haypo #12814: Possible intermittent bug in test_array http://bugs.python.org/issue12814 closed by ned.deily #12842: Docs: first parameter of tp_richcompare() always has the corre http://bugs.python.org/issue12842 closed by asvetlov #12876: Make Test Error : ImportError: No module named _sha256 http://bugs.python.org/issue12876 closed by gregory.p.smith #13103: copy of an asyncore dispatcher causes infinite recursion http://bugs.python.org/issue13103 closed by haypo #13413: time.daylight incorrect behavior in linux glibc http://bugs.python.org/issue13413 closed by belopolsky #13689: fix CGI Web Applications with Python link in howto/urllib2 http://bugs.python.org/issue13689 closed by berker.peksag #13985: Menu.tk_popup : menu doesn't disapear when main window is ico http://bugs.python.org/issue13985 closed by ned.deily #14069: In extensions (?...) the lookbehind assertion cannot choose be http://bugs.python.org/issue14069 closed by ezio.melotti #14097: Improve the "introduction" page of the tutorial http://bugs.python.org/issue14097 closed by zach.ware #14235: test_cmd.py does not correctly call reload() http://bugs.python.org/issue14235 closed by berker.peksag #14709: http.client fails sending read()able Object http://bugs.python.org/issue14709 closed by ned.deily #15014: smtplib: add support for arbitrary auth methods http://bugs.python.org/issue15014 closed by r.david.murray #15549: openssl version in windows builds does not support renegotiati http://bugs.python.org/issue15549 closed by ned.deily #15750: test_localtime_daylight_false_dst_true raises OverflowError: m http://bugs.python.org/issue15750 closed by haypo #15870: PyType_FromSpec should take metaclass as an argument http://bugs.python.org/issue15870 closed by belopolsky #16188: Windows C Runtime Library Mismatch http://bugs.python.org/issue16188 closed by rlinscheer #16474: More code coverage for imp module http://bugs.python.org/issue16474 closed by berker.peksag #17399: test_multiprocessing hang on Windows, non-sockets http://bugs.python.org/issue17399 closed by terry.reedy #18258: Fix test discovery for test_codecmaps*.py http://bugs.python.org/issue18258 closed by zach.ware #18592: Idle: test SearchDialogBase.py http://bugs.python.org/issue18592 closed by terry.reedy #19024: Document asterisk (*), splat or star operator http://bugs.python.org/issue19024 closed by terry.reedy #19870: Backport Cookie fix to 2.7 (httponly / secure flag) http://bugs.python.org/issue19870 closed by berker.peksag #20218: Add methods to `pathlib.Path`: `write_text`, `read_text`, `wri http://bugs.python.org/issue20218 closed by cool-RR #20961: Fix usages of the note directive in the documentation http://bugs.python.org/issue20961 closed by berker.peksag #21046: Document formulas used in statistics http://bugs.python.org/issue21046 closed by ezio.melotti #21151: winreg.SetValueEx causes crash if value = None http://bugs.python.org/issue21151 closed by python-dev #21447: Intermittent asyncio.open_connection / futures.InvalidStateErr http://bugs.python.org/issue21447 closed by haypo #21582: use support.captured_stdx context managers - test_asyncore http://bugs.python.org/issue21582 closed by python-dev #21652: Python 2.7.7 regression in mimetypes module on Windows http://bugs.python.org/issue21652 closed by python-dev #21679: Prevent extraneous fstat during open() http://bugs.python.org/issue21679 closed by pitrou #21755: test_importlib.test_locks fails --without-threads http://bugs.python.org/issue21755 closed by berker.peksag #21778: PyBuffer_FillInfo() from 3.3 http://bugs.python.org/issue21778 closed by skrah #21780: make unicodedata module 64-bit safe http://bugs.python.org/issue21780 closed by python-dev #21781: make _ssl module 64-bit clean http://bugs.python.org/issue21781 closed by haypo #21811: Anticipate fixes to 3.x and 2.7 for OS X 10.10 Yosemite suppor http://bugs.python.org/issue21811 closed by ned.deily #21856: memoryview: test slice clamping http://bugs.python.org/issue21856 closed by terry.reedy #21857: assert that functions clearing the current exception are not c http://bugs.python.org/issue21857 closed by haypo #21863: Display module names of C functions in cProfile http://bugs.python.org/issue21863 closed by pitrou #21871: Python 2.7.7 regression in mimetypes read_windows_registry http://bugs.python.org/issue21871 closed by python-dev #21884: turtle regression of issue #21823: "uncaught exception" on "AM http://bugs.python.org/issue21884 closed by ned.deily #21887: Python3 can't detect Tcl/Tk 8.6.1 http://bugs.python.org/issue21887 closed by ned.deily #21891: sysmodule.c, #define terminated with semicolon. http://bugs.python.org/issue21891 closed by ned.deily #21892: hashtable.c not using PY_FORMAT_SIZE_T http://bugs.python.org/issue21892 closed by python-dev #21893: unicodeobject.c not using PY_FORMAT_SIZE_T http://bugs.python.org/issue21893 closed by haypo #21894: ImportError: cannot import name jit http://bugs.python.org/issue21894 closed by ned.deily #21900: .hgignore: Missing ignores for downloaded doc build tools http://bugs.python.org/issue21900 closed by r.david.murray #21904: Multiple closures accessing the same non-local variable always http://bugs.python.org/issue21904 closed by r.david.murray #21908: Grammatical error in 3.4 tutorial http://bugs.python.org/issue21908 closed by r.david.murray #21912: Deferred logging may use outdated references http://bugs.python.org/issue21912 closed by vinay.sajip #777588: asyncore/Windows: select() doesn't report errors for a non-blo http://bugs.python.org/issue777588 closed by haypo From geertj at gmail.com Sat Jul 5 20:04:04 2014 From: geertj at gmail.com (Geert Jansen) Date: Sat, 5 Jul 2014 20:04:04 +0200 Subject: [Python-Dev] Memory BIO for _ssl Message-ID: Hi, the topic of a memory BIO for the _ssl module in the stdlib was discussed before here: http://mail.python.org/pipermail/python-ideas/2012-November/017686.html Since I need this for my Gruvi async framework, I want to volunteer to write a patch. It should be useful as well to Py3K's asyncio and other async frameworks. It would be good to get some feedback before I start on this. I was thinking of the following approach: * Add a new type to _ssl: PySSLMemoryBIO * PySSLMemoryBIO has a public constructor, and at least the following methods: puts() puts_eof() and gets(). I aligned the terminology with the method names in OpenSSL. puts_eof() does a BIO_set_mem_eof_return(-1). * All accesses to the memory BIO as non-blocking. * Update PySSLSocket to add support for SSL_set_bio(). The fact that the memory BIO is non-blocking makes it easier. None of the logic in and around check_socket_and_wait_for_timeout() for example needs to be changed. For the parts that deal with the socket directly, and that are in the code path for non-blocking IO, I think the preference would be i) try to change the code to use BIO methods that works for both sockets and memory BIOs, and ii) if not possible, special case it. * At this point the PySSLSocket name is a bit of a misnomer as it does more than sockets. Probably not an issue. * Add a method _wrap_bio(rbio, wbio, ...) to _SSLContext. * Expose the low-level methods via the "ssl" module. Creating an SSLSocket with a memory BIO would work something like this: context = SSLContext() rbio = ssl.MemoryBIO() wbio = ssl.MemoryBIO() sslsock = ssl.wrap_bio(rbio, wbio) To pass SSL data from the network and decrypt it into application level data (and potentially new SSL level data): rbio.puts(ssldata) appdata = sslsock.read() ssldata = wbio.gets() I currently have a utility class in my async IO framework (gruvi.io) called SslPipe that does the above, but it uses a socketpair instead of a memory BIO, and hence it works with the current _ssl. See here: https://github.com/geertj/gruvi/blob/master/gruvi/ssl.py#L86 This approach, while fine and very fast on Linux, gives me problems on Windows. It appears that on some older Windows versions, when I write data to one side of an (emulated) socket pair, it takes some time for it to become available at the other side. That breaks the synchronous interface that I need in order for this to work. And I can't fully work around it as I do not know in all situations whether or not to expect data on the socketpair. A memory BIO should be the right solution to this. Any feedback? Regards, Geert From breamoreboy at yahoo.co.uk Sun Jul 6 02:19:02 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 06 Jul 2014 01:19:02 +0100 Subject: [Python-Dev] Pending issues Message-ID: The following is a list of the 18 pending issues on the bug tracker. All have been in this state for at least one month so I'm assuming that they can be closed or they wouldn't have been set to pending in the first place. Can somebody take a look at them with a view to closing them or setting them back to open if needed. 16221 tokenize.untokenize() "compat" mode misses the encoding when using an iterator 15600 expose the finder details used by the FileFinder path hook 12588 test_capi.test_subinterps() failed on OpenBSD (powerpc) 7979 connect_ex returns 103 often 17668 re.split loses characters matching ungrouped parts of a pattern 11204 re module: strange behaviour of space inside {m, n} 14518 Add bcrypt $2a$ to crypt.py 15883 Add Py_errno to work around multiple CRT issue 19919 SSL: test_connect_ex_error fails with EWOULDBLOCK 20026 sqlite: handle correctly invalid isolation_level 18228 AIX locale parsing failure 1602742 itemconfigure returns incorrect text property of text items 19954 test_tk floating point exception on my gentoo box with tk 8.6.1 21084 IDLE can't deal with characters above the range (U+0000-U+FFFF) 20997 Wrong URL fragment identifier in search result 6895 locale._parse_localename fails when localename does not contain encoding information 1669539 Improve Windows os.path.join (ntpath.join) "smart" joining 21231 Issue a python 3 warning when old style classes are defined. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From antoine at python.org Mon Jul 7 01:49:23 2014 From: antoine at python.org (Antoine Pitrou) Date: Sun, 06 Jul 2014 19:49:23 -0400 Subject: [Python-Dev] Memory BIO for _ssl In-Reply-To: References: Message-ID: Hi, Le 05/07/2014 14:04, Geert Jansen a ?crit : > Since I need this for my Gruvi async framework, I want to volunteer to > write a patch. It should be useful as well to Py3K's asyncio and other > async frameworks. It would be good to get some feedback before I start > on this. Thanks for volunteering! This would be a very welcome addition. Thoughts: > I was thinking of the following approach: > > * Add a new type to _ssl: PySSLMemoryBIO > * PySSLMemoryBIO has a public constructor, and at least the following > methods: puts() puts_eof() and gets(). I aligned the terminology with > the method names in OpenSSL. puts_eof() does a > BIO_set_mem_eof_return(-1). Hmm... I haven't looked in detail, but at least I'd like those to be called read() and write() (and write_eof()), like most other I/O methods in Python. Or if we want to avoid confusion, add an explicit suffix (write_incoming?). > * All accesses to the memory BIO as non-blocking. Sounds sensible indeed (otherwise what would they wait for?). > * Update PySSLSocket to add support for SSL_set_bio(). The fact that > the memory BIO is non-blocking makes it easier. None of the logic in > and around check_socket_and_wait_for_timeout() for example needs to be > changed. For the parts that deal with the socket directly, and that > are in the code path for non-blocking IO, I think the preference would > be i) try to change the code to use BIO methods that works for both > sockets and memory BIOs, and ii) if not possible, special case it. That sounds good in the principle. I don't enough about memory BIOs to know whether you will have issues doing so :-) > * At this point the PySSLSocket name is a bit of a misnomer as it > does more than sockets. Probably not an issue. Agreed. > * Add a method _wrap_bio(rbio, wbio, ...) to _SSLContext. > * Expose the low-level methods via the "ssl" module. > > Creating an SSLSocket with a memory BIO would work something like this: > > context = SSLContext() > rbio = ssl.MemoryBIO() > wbio = ssl.MemoryBIO() > sslsock = ssl.wrap_bio(rbio, wbio) The one thing I find confusing is the r(ead)bio / w(rite)bio terminology (because you actually read and write from both). Perhaps incoming and outgoing would be clearer. Regards Antoine. From nad at acm.org Mon Jul 7 01:54:50 2014 From: nad at acm.org (Ned Deily) Date: Sun, 06 Jul 2014 16:54:50 -0700 Subject: [Python-Dev] buildbot.python.org down again? Message-ID: As of the moment, buildbot.python.org seems to be down again. Where is the best place to report problems like this? -- Ned Deily, nad at acm.org From tjreedy at udel.edu Mon Jul 7 08:33:04 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 07 Jul 2014 02:33:04 -0400 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: Message-ID: On 7/6/2014 7:54 PM, Ned Deily wrote: > As of the moment, buildbot.python.org seems to be down again. Several hours later, back up. > Where is the best place to report problems like this? We should have, if not already, an automatic system to detect down servers and report (email) to appropriate persons. -- Terry Jan Reedy From martin at v.loewis.de Mon Jul 7 08:39:07 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 07 Jul 2014 08:39:07 +0200 Subject: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None) In-Reply-To: References: Message-ID: <53BA408B.5050901@v.loewis.de> Am 01.07.14 09:44, schrieb Victor Stinner: > scandir(fd) must not close the file descriptor, it should be done by > the caller. Handling the lifetime of the file descriptor is a > difficult problem, it's better to let the user decide how to handle > it. This is an open issue still: when is the file descriptor closed. I think the generator returned from scandir needs to support a .close method that guarantees to close the file descriptor. AFAICT, the pure-Python prototype of scandir already does, but it should be specified in the PEP. While we are at it: is it intended that the generator will also support the other generator methods, in particular .send and .throw? Regards, Martin From andreas.r.maier at gmx.de Mon Jul 7 13:22:27 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 07 Jul 2014 13:22:27 +0200 Subject: [Python-Dev] == on object tests identity in 3.x Message-ID: <53BA82F3.1070403@gmx.de> While discussing Python issue #12067 (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 implements '==' and '!=' on the object type such that if no special equality test operations are implemented in derived classes, there is a default implementation that tests for identity (as opposed to equality of the values). The relevant code is in function do_richcompare() in Objects/object.c. IMHO, that default implementation contradicts the definition that '==' and '!=' test for equality of the values of an object. Python 2.x does not seem to have such a default implementation; == and != raise an exception if attempted on objects that don't implement equality in derived classes. I'd like to gather comments on this issue, specifically: -> Can someone please elaborate what the reason for that is? -> Where is the discrepancy between the documentation of == and its default implementation on object documented? To me, a sensible default implementation for == on object would be (in Python): if v is w: return True; elif type(v) != type(w): return False else: raise ValueError("Equality cannot be determined in default implementation") Andy From benjamin at python.org Mon Jul 7 17:15:47 2014 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 07 Jul 2014 08:15:47 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BA82F3.1070403@gmx.de> References: <53BA82F3.1070403@gmx.de> Message-ID: <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> On Mon, Jul 7, 2014, at 04:22, Andreas Maier wrote: > While discussing Python issue #12067 > (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 > implements '==' and '!=' on the object type such that if no special > equality test operations are implemented in derived classes, there is a > default implementation that tests for identity (as opposed to equality > of the values). > > The relevant code is in function do_richcompare() in Objects/object.c. > > IMHO, that default implementation contradicts the definition that '==' > and '!=' test for equality of the values of an object. > > Python 2.x does not seem to have such a default implementation; == and > != raise an exception if attempted on objects that don't implement > equality in derived classes. Why do you think that? % python Python 2.7.6 (default, May 29 2014, 22:22:15) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class x(object): pass ... >>> class y(object): pass ... >>> x != y True >>> x == y False From rosuav at gmail.com Mon Jul 7 17:22:54 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Jul 2014 01:22:54 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> Message-ID: On Tue, Jul 8, 2014 at 1:15 AM, Benjamin Peterson wrote: > Why do you think that? > > % python > Python 2.7.6 (default, May 29 2014, 22:22:15) > [GCC 4.7.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> class x(object): pass > ... >>>> class y(object): pass > ... >>>> x != y > True >>>> x == y > False Your analysis is flawed - you're testing the equality of the types, not of instances. But your conclusion's correct; testing instances does work the same way you're implying: rosuav at sikorsky:~$ python Python 2.7.3 (default, Mar 13 2014, 11:03:55) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class x(object): pass ... >>> class y(object): pass ... >>> x() != y() True >>> x() == y() False >>> x() == x() False >>> z = x() >>> z == z True ChrisA From guido at python.org Mon Jul 7 17:44:28 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Jul 2014 08:44:28 -0700 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: Message-ID: It would still be nice to know who "the appropriate persons" are. Too much of our infrastructure seems to be maintained by house elves or the ITA. On Sun, Jul 6, 2014 at 11:33 PM, Terry Reedy wrote: > On 7/6/2014 7:54 PM, Ned Deily wrote: > >> As of the moment, buildbot.python.org seems to be down again. >> > > Several hours later, back up. > > > > Where is the best place to report problems like this? > > We should have, if not already, an automatic system to detect down servers > and report (email) to appropriate persons. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > guido%40python.org > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Mon Jul 7 17:55:50 2014 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 07 Jul 2014 08:55:50 -0700 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: Message-ID: <1404748550.13353.138929529.030DAD36@webmail.messagingengine.com> On Mon, Jul 7, 2014, at 08:44, Guido van Rossum wrote: > It would still be nice to know who "the appropriate persons" are. Too > much > of our infrastructure seems to be maintained by house elves or the ITA. :) Is ITA "International Trombone Association"? From andreas.r.maier at gmx.de Mon Jul 7 17:29:54 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 07 Jul 2014 17:29:54 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> Message-ID: <53BABCF2.50607@gmx.de> Am 07.07.2014 17:15, schrieb Benjamin Peterson: > On Mon, Jul 7, 2014, at 04:22, Andreas Maier wrote: >> >> Python 2.x does not seem to have such a default implementation; == and >> != raise an exception if attempted on objects that don't implement >> equality in derived classes. > > Why do you think that? Because I looked at the source code of try_rich_compare() in object.c of the 2.7 stream in the repository. Now, looking deeper into that module, it turns out there is a whole number of variations of comparison functions, so maybe I looked at the wrong one. Instead of trying to figure out how they are called, it is probably easier to just try it out, as you did. Your example certainly shows that == between instances of type object returns a value. So the Python 2.7 implementation shows the same discrepancy as Python 3.x regarding the == and != default implementation. Does anyone know why? Andy From python-dev at masklinn.net Mon Jul 7 17:58:39 2014 From: python-dev at masklinn.net (Xavier Morel) Date: Mon, 7 Jul 2014 17:58:39 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BA82F3.1070403@gmx.de> References: <53BA82F3.1070403@gmx.de> Message-ID: <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> On 2014-07-07, at 13:22 , Andreas Maier wrote: > While discussing Python issue #12067 (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 implements '==' and '!=' on the object type such that if no special equality test operations are implemented in derived classes, there is a default implementation that tests for identity (as opposed to equality of the values). > > The relevant code is in function do_richcompare() in Objects/object.c. > > IMHO, that default implementation contradicts the definition that '==' and '!=' test for equality of the values of an object. > > Python 2.x does not seem to have such a default implementation; == and != raise an exception if attempted on objects that don't implement equality in derived classes. That's incorrect on two levels: 1. What Terry notes in the bug comments is that because all Python 3 types inherit from object this can be done as a default __eq__/__ne__, in Python 2 the fallback is encoded in the comparison framework (PyObject_Compare and friends): http://hg.python.org/cpython/file/01ec8bb7187f/Objects/object.c#l756 2. Unless comparison methods are overloaded and throw an error it will always return either True or False (for comparison operator), never throw. > I'd like to gather comments on this issue, specifically: > > -> Can someone please elaborate what the reason for that is? > > -> Where is the discrepancy between the documentation of == and its default implementation on object documented? > > To me, a sensible default implementation for == on object would be (in Python): > > if v is w: > return True; > elif type(v) != type(w): > return False > else: > raise ValueError("Equality cannot be determined in default implementation") Why would comparing two objects of different types return False but comparing two objects of the same type raise an error? From andreas.r.maier at gmx.de Mon Jul 7 18:11:07 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 07 Jul 2014 18:11:07 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> References: <53BA82F3.1070403@gmx.de> <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> Message-ID: <53BAC69B.70901@gmx.de> Am 07.07.2014 17:58, schrieb Xavier Morel: > > On 2014-07-07, at 13:22 , Andreas Maier wrote: > >> While discussing Python issue #12067 (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 implements '==' and '!=' on the object type such that if no special equality test operations are implemented in derived classes, there is a default implementation that tests for identity (as opposed to equality of the values). >> >> The relevant code is in function do_richcompare() in Objects/object.c. >> >> IMHO, that default implementation contradicts the definition that '==' and '!=' test for equality of the values of an object. >> >> Python 2.x does not seem to have such a default implementation; == and != raise an exception if attempted on objects that don't implement equality in derived classes. > > That's incorrect on two levels: > > 1. What Terry notes in the bug comments is that because all Python 3 > types inherit from object this can be done as a default __eq__/__ne__, > in Python 2 the fallback is encoded in the comparison framework > (PyObject_Compare and friends): > http://hg.python.org/cpython/file/01ec8bb7187f/Objects/object.c#l756 > 2. Unless comparison methods are overloaded and throw an error it will > always return either True or False (for comparison operator), never throw. I was incorrect for Python 2.x. >> I'd like to gather comments on this issue, specifically: >> >> -> Can someone please elaborate what the reason for that is? >> >> -> Where is the discrepancy between the documentation of == and its default implementation on object documented? >> >> To me, a sensible default implementation for == on object would be (in Python): >> >> if v is w: >> return True; >> elif type(v) != type(w): >> return False >> else: >> raise ValueError("Equality cannot be determined in default implementation") > > Why would comparing two objects of different types return False Because I think (but I'm not sure) that the type should play a role for comparison of values. But maybe that does not embrace duck typing sufficiently, and the type should be ignored by default for comparing object values. > but comparing two objects of the same type raise an error? That I'm sure of: Because the default implementation (after having exhausted all possibilities of calling __eq__ and friends) has no way to find out whether the values(!!) of the objects are equal. Andy From ethan at stoneleaf.us Mon Jul 7 17:55:08 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 08:55:08 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BA82F3.1070403@gmx.de> References: <53BA82F3.1070403@gmx.de> Message-ID: <53BAC2DC.9030600@stoneleaf.us> On 07/07/2014 04:22 AM, Andreas Maier wrote: > > Where is the discrepancy between the documentation of == and its default implementation on object documented? There's seems to be no discrepancy (at least, you have not shown it), but to answer the question about why the default equals operation is an identity test: - all objects should be equal to themselves (there is only one that isn't, and it's weird) - equality tests should not, as a general rule, raise exceptions -- they should return True or False -- ~Ethan~ From andreas.r.maier at gmx.de Mon Jul 7 18:56:10 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 07 Jul 2014 18:56:10 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BAC2DC.9030600@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> Message-ID: <53BAD12A.20209@gmx.de> Am 07.07.2014 17:55, schrieb Ethan Furman: > On 07/07/2014 04:22 AM, Andreas Maier wrote: >> >> Where is the discrepancy between the documentation of == and its >> default implementation on object documented? > > There's seems to be no discrepancy (at least, you have not shown it), The documentation states consistently that == tests the equality of the value of an object. The default implementation of == in both 2.x and 3.x tests the object identity. Is that not a discrepancy? > but to answer the question about why the default equals operation is an > identity test: > > - all objects should be equal to themselves (there is only one that > isn't, and it's weird) I agree. But that is not a reason to conclude that different objects (as per their identity) should be unequal. Which is what the default implementation does. > - equality tests should not, as a general rule, raise exceptions -- > they should return True or False Why not? Ordering tests also raise exceptions if ordering is not implemented. Andy From guido at python.org Mon Jul 7 19:22:21 2014 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Jul 2014 10:22:21 -0700 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: <1404748550.13353.138929529.030DAD36@webmail.messagingengine.com> References: <1404748550.13353.138929529.030DAD36@webmail.messagingengine.com> Message-ID: It's a reference to Neil Stephenson's Anathem. On Jul 7, 2014 8:55 AM, "Benjamin Peterson" wrote: > On Mon, Jul 7, 2014, at 08:44, Guido van Rossum wrote: > > It would still be nice to know who "the appropriate persons" are. Too > > much > > of our infrastructure seems to be maintained by house elves or the ITA. > > :) Is ITA "International Trombone Association"? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Mon Jul 7 19:47:31 2014 From: antoine at python.org (Antoine Pitrou) Date: Mon, 07 Jul 2014 13:47:31 -0400 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: <1404748550.13353.138929529.030DAD36@webmail.messagingengine.com> Message-ID: Le 07/07/2014 13:22, Guido van Rossum a ?crit : > It's a reference to Neil Stephenson's Anathem. According to Google, it doesn't look like he played the trombone, though. Regards Antoine. > > On Jul 7, 2014 8:55 AM, "Benjamin Peterson" > wrote: > > On Mon, Jul 7, 2014, at 08:44, Guido van Rossum wrote: > > It would still be nice to know who "the appropriate persons" are. Too > > much > > of our infrastructure seems to be maintained by house elves or > the ITA. > > :) Is ITA "International Trombone Association"? > > > From ethan at stoneleaf.us Mon Jul 7 19:43:34 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 10:43:34 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BAD12A.20209@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> Message-ID: <53BADC46.40400@stoneleaf.us> On 07/07/2014 09:56 AM, Andreas Maier wrote: > Am 07.07.2014 17:55, schrieb Ethan Furman: >> On 07/07/2014 04:22 AM, Andreas Maier wrote: >>> >>> Where is the discrepancy between the documentation of == and its >>> default implementation on object documented? >> >> There's seems to be no discrepancy (at least, you have not shown it), > > The documentation states consistently that == tests the equality of the value of an object. The default implementation > of == in both 2.x and 3.x tests the object identity. Is that not a discrepancy? One could say that the value of an object is the object itself. Since different objects are different, then they are not equal. >> but to answer the question about why the default equals operation is an >> identity test: >> >> - all objects should be equal to themselves (there is only one that >> isn't, and it's weird) > > I agree. But that is not a reason to conclude that different objects (as per their identity) should be unequal. Which is > what the default implementation does. Python cannot know which values are important in an equality test, and which are not. So it refuses to guess. Think of a chess board, for example. Are any two black pawns equal? All 16 pawns came from the same Pawn class, the only differences would be in the color and position, but the movement type is the same for all. So equality for a pawn might mean the same color, or it might mean color and position, or it might mean can move to the same position... it's up to the programmer to decide which of the possibilities is the correct one. Quite frankly, have equality mean identity in this case also makes a lot of sense. >> - equality tests should not, as a general rule, raise exceptions -- >> they should return True or False > > Why not? Ordering tests also raise exceptions if ordering is not implemented. Besides the pawn example, this is probably a matter of practicality over purity -- equality tests are used extensively through-out Python, and having exceptions raised at possibly any moment would not be a fun nor productive environment. Ordering is much less frequent, and since we already tried always ordering things, falling back to type name if necessary, we have discovered that that is not a good trade-off. So now if one tries to order things without specifying how it should be done, one gets an exception. -- ~Ethan~ From tjreedy at udel.edu Mon Jul 7 20:20:42 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 07 Jul 2014 14:20:42 -0400 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BA82F3.1070403@gmx.de> References: <53BA82F3.1070403@gmx.de> Message-ID: On 7/7/2014 7:22 AM, Andreas Maier wrote: > While discussing Python issue #12067 > (http://bugs.python.org/issue12067#msg222442), I learned that Python 3.4 > implements '==' and '!=' on the object type such that if no special > equality test operations are implemented in derived classes, there is a > default implementation that tests for identity (as opposed to equality > of the values). > > The relevant code is in function do_richcompare() in Objects/object.c. > > IMHO, that default implementation contradicts the definition that '==' > and '!=' test for equality of the values of an object. A discrepancy between code and doc can be solved by changing either the code or doc. This is a case where the code should not change (for back compatibility with long standing behavior, if nothing else) and the doc should. -- Terry Jan Reedy From francismb at email.de Mon Jul 7 21:01:59 2014 From: francismb at email.de (francis) Date: Mon, 07 Jul 2014 21:01:59 +0200 Subject: [Python-Dev] Tracker Stats In-Reply-To: <20140623201225.0DA80250DE6@webabinitio.net> References: <53A84D41.6070508@email.de> <20140623201225.0DA80250DE6@webabinitio.net> Message-ID: <53BAEEA7.8050408@email.de> On 06/23/2014 10:12 PM, R. David Murray wrote: > The stats graphs are based on the data generated for the > weekly issue report. I have a patched version of that > report that adds the bug/enhancement info. I'll try to dig > it up this week; someone ping me if I forget :) It think > the patch will need to be updated based on Ezio's changes. > ping From ethan at stoneleaf.us Mon Jul 7 21:26:12 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 12:26:12 -0700 Subject: [Python-Dev] Tracker Stats In-Reply-To: <53BAEEA7.8050408@email.de> References: <53A84D41.6070508@email.de> <20140623201225.0DA80250DE6@webabinitio.net> <53BAEEA7.8050408@email.de> Message-ID: <53BAF454.6060304@stoneleaf.us> On 07/07/2014 12:01 PM, francis wrote: > On 06/23/2014 10:12 PM, R. David Murray wrote: > >> The stats graphs are based on the data generated for the >> weekly issue report. I have a patched version of that >> report that adds the bug/enhancement info. I'll try to dig >> it up this week; someone ping me if I forget :) It think >> the patch will need to be updated based on Ezio's changes. >> > ping pong From ethan at stoneleaf.us Mon Jul 7 18:09:28 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 09:09:28 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BABCF2.50607@gmx.de> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> Message-ID: <53BAC638.7030704@stoneleaf.us> On 07/07/2014 08:29 AM, Andreas Maier wrote: > > So the Python 2.7 implementation shows the same discrepancy as Python 3.x regarding the == and != default implementation. Why do you see this as a discrepancy? Just because two instances from the same object have the same value does not mean they are equal. For a real-life example, look at twins: biologically identical, yet not equal. looking-forward-to-the-rebuttal-mega-thread'ly yrs, -- ~Ethan~ From zuo at chopin.edu.pl Mon Jul 7 23:11:03 2014 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Mon, 07 Jul 2014 23:11:03 +0200 Subject: [Python-Dev] =?utf-8?q?=3D=3D_on_object_tests_identity_in_3=2Ex?= In-Reply-To: <53BAC69B.70901@gmx.de> References: <53BA82F3.1070403@gmx.de> <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> <53BAC69B.70901@gmx.de> Message-ID: <8564322772978800ae89623d1426b469@chopin.edu.pl> 07.07.2014 18:11, Andreas Maier wrote: > Am 07.07.2014 17:58, schrieb Xavier Morel: >> >> On 2014-07-07, at 13:22 , Andreas Maier >> wrote: >> >>> While discussing Python issue #12067 >>> (http://bugs.python.org/issue12067#msg222442), I learned that Python >>> 3.4 implements '==' and '!=' on the object type such that if no >>> special equality test operations are implemented in derived classes, >>> there is a default implementation that tests for identity (as opposed >>> to equality of the values). [...] >>> IMHO, that default implementation contradicts the definition that >>> '==' and '!=' test for equality of the values of an object. [...] >>> To me, a sensible default implementation for == on object would be >>> (in Python): >>> >>> if v is w: >>> return True; >>> elif type(v) != type(w): >>> return False >>> else: >>> raise ValueError("Equality cannot be determined in default >>> implementation") >> >> Why would comparing two objects of different types return False > > Because I think (but I'm not sure) that the type should play a role > for comparison of values. But maybe that does not embrace duck typing > sufficiently, and the type should be ignored by default for comparing > object values. > >> but comparing two objects of the same type raise an error? > > That I'm sure of: Because the default implementation (after having > exhausted all possibilities of calling __eq__ and friends) has no way > to find out whether the values(!!) of the objects are equal. IMHO, in Python context, "value" is a very vague term. Quite often we can read it as the very basic (but not the only one) notion of "what makes objects being equal or not" -- and then saying that "objects are compared by value" is a tautology. In other words, what object's "value" is -- is dependent on its nature: e.g. the value of a list is what are the values of its consecutive (indexed) items; the value of a set is based on values of all its elements without notion of order or repetition; the value of a number is a set of its abstract mathematical properties that determine what makes objects being equal, greater, lesser, how particular arithmetic operations work etc... I think, there is no universal notion of "the value of a Python object". The notion of identity seems to be most generic (every object has it, event if it does not have any other property) -- and that's why by default it is used to define the most basic feature of object's *value*, i.e. "what makes objects being equal or not" (== and !=). Another possibility would be to raise TypeError but, as Ethan Furman wrote, it would be impractical (e.g. key-type-heterogenic dicts or sets would be practically impossible to work with). On the other hand, the notion of sorting order (< > <= >=) is a much more specialized object property. Cheers. *j From rob.cliffe at btinternet.com Mon Jul 7 23:31:55 2014 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 07 Jul 2014 22:31:55 +0100 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <8564322772978800ae89623d1426b469@chopin.edu.pl> References: <53BA82F3.1070403@gmx.de> <791FBFD1-C906-4E11-9144-3062B78702E8@masklinn.net> <53BAC69B.70901@gmx.de> <8564322772978800ae89623d1426b469@chopin.edu.pl> Message-ID: <53BB11CB.8020802@btinternet.com> On 07/07/2014 22:11, Jan Kaliszewski wrote: > [snip] > > IMHO, in Python context, "value" is a very vague term. Quite often we > can read it as the very basic (but not the only one) notion of "what > makes objects being equal or not" -- and then saying that "objects are > compared by value" is a tautology. > > In other words, what object's "value" is -- is dependent on its > nature: e.g. the value of a list is what are the values of its > consecutive (indexed) items; the value of a set is based on values of > all its elements without notion of order or repetition; the value of a > number is a set of its abstract mathematical properties that determine > what makes objects being equal, greater, lesser, how particular > arithmetic operations work etc... > > I think, there is no universal notion of "the value of a Python > object". The notion of identity seems to be most generic (every > object has it, event if it does not have any other property) -- and > that's why by default it is used to define the most basic feature of > object's *value*, i.e. "what makes objects being equal or not" (== and > !=). Another possibility would be to raise TypeError but, as Ethan > Furman wrote, it would be impractical (e.g. key-type-heterogenic dicts > or sets would be practically impossible to work with). On the other > hand, the notion of sorting order (< > <= >=) is a much more > specialized object property. Quite so. x, y = object(), object() print 'Equal:', ' '.join(attr for attr in dir(x) if getattr(x,attr)==getattr(y,attr)) print 'Unequal:', ' '.join(attr for attr in dir(x) if getattr(x,attr)!=getattr(y,attr)) Equal: __class__ __doc__ __new__ __subclasshook__ Unequal: __delattr__ __format__ __getattribute__ __hash__ __init__ __reduce__ __reduce_ex__ __repr__ __setattr__ __sizeof__ __str__ Andreas, what attribute or combination of attributes do you think should be the "values" of x and y? Rob Cliffe From ezio.melotti at gmail.com Tue Jul 8 00:38:05 2014 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Tue, 8 Jul 2014 01:38:05 +0300 Subject: [Python-Dev] Tracker Stats In-Reply-To: <53BAEEA7.8050408@email.de> References: <53A84D41.6070508@email.de> <20140623201225.0DA80250DE6@webabinitio.net> <53BAEEA7.8050408@email.de> Message-ID: On Mon, Jul 7, 2014 at 10:01 PM, francis wrote: > On 06/23/2014 10:12 PM, R. David Murray wrote: > >> The stats graphs are based on the data generated for the >> weekly issue report. I have a patched version of that >> report that adds the bug/enhancement info. I'll try to dig >> it up this week; someone ping me if I forget :) It think >> the patch will need to be updated based on Ezio's changes. >> > ping > If you just want some numbers you can try this: >>> import xmlrpclib >>> x = xmlrpclib.ServerProxy('http://bugs.python.org/xmlrpc', allow_none=True) >>> open_issues = x.filter('issue', None, dict(status=1)) # 1 == open >>> len(open_issues) 4541 >>> len(x.filter('issue', open_issues, dict(type=5))) # behavior 1798 >>> len(x.filter('issue', open_issues, dict(type=6))) # enhancement 1557 >>> len(x.filter('issue', open_issues, dict(type=1))) # crash 122 >>> len(x.filter('issue', open_issues, dict(type=2))) # compile error 141 >>> len(x.filter('issue', open_issues, dict(type=3))) # resource usage 103 >>> len(x.filter('issue', open_issues, dict(type=4))) # security 32 >>> len(x.filter('issue', open_issues, dict(type=7))) # performance 83 Best Regards, Ezio Melotti From andreas.r.maier at gmx.de Tue Jul 8 01:36:25 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:36:25 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BADC46.40400@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> Message-ID: <53BB2EF9.80002@gmx.de> Am 2014-07-07 19:43, schrieb Ethan Furman: > On 07/07/2014 09:56 AM, Andreas Maier wrote: >> Am 07.07.2014 17:55, schrieb Ethan Furman: >>> On 07/07/2014 04:22 AM, Andreas Maier wrote: >>>> >>>> Where is the discrepancy between the documentation of == and its >>>> default implementation on object documented? >>> >>> There's seems to be no discrepancy (at least, you have not shown it), >> >> The documentation states consistently that == tests the equality of >> the value of an object. The default implementation >> of == in both 2.x and 3.x tests the object identity. Is that not a >> discrepancy? > > One could say that the value of an object is the object itself. Since > different objects are different, then they are not equal. > >>> but to answer the question about why the default equals operation is an >>> identity test: >>> >>> - all objects should be equal to themselves (there is only one that >>> isn't, and it's weird) >> >> I agree. But that is not a reason to conclude that different objects >> (as per their identity) should be unequal. Which is >> what the default implementation does. > > Python cannot know which values are important in an equality test, and > which are not. So it refuses to guess. > Well, one could argue that using the address of an object for its value equality test is pretty close to guessing, considering that given a sensible definition of value equality, objects of different identity can very well be equal but will always be considered unequal based on the address. > Think of a chess board, for example. Are any two black pawns equal? > All 16 pawns came from the same Pawn class, the only differences would > be in the color and position, but the movement type is the same for all. > > So equality for a pawn might mean the same color, or it might mean > color and position, or it might mean can move to the same position... > it's up to the programmer to decide which of the possibilities is the > correct one. Quite frankly, have equality mean identity in this case > also makes a lot of sense. That's why I think equality is only defined once the class designer has defined it. Using the address as a default for equality (that is, in absence of such a designer's definition) may be an easy-to-implement default, but not a very logical or sensible one. > >>> - equality tests should not, as a general rule, raise exceptions -- >>> they should return True or False >> >> Why not? Ordering tests also raise exceptions if ordering is not >> implemented. > > Besides the pawn example, this is probably a matter of practicality > over purity -- equality tests are used extensively through-out Python, > and having exceptions raised at possibly any moment would not be a fun > nor productive environment. > So we have many cases of classes whose designers thought about whether a sensible definition of equality was needed, and decided that an address/identity-based equality definition was just what they needed, yet they did not want to or could not use the "is" operator? Can you give me an example for such a class (besides type object)? (I.e. a class that does not have __eq__() and __ne__() but whose instances are compared with == or !=) > Ordering is much less frequent, and since we already tried always > ordering things, falling back to type name if necessary, we have > discovered that that is not a good trade-off. So now if one tries to > order things without specifying how it should be done, one gets an > exception. In Python 2, the default ordering implementation on type object uses the identity (address) as the basis for ordering. In Python 3, that was changed to raise an exception. That seems to be in sync with what you are saying. Maybe it would have been possible to also change that for the default equality implementation in Python 3. But it was not changed. As I wrote in another response, we now need to document this properly. From andreas.r.maier at gmx.de Tue Jul 8 01:37:09 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:37:09 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2AC7.2060009@gmx.de> References: <53BB2AC7.2060009@gmx.de> Message-ID: <53BB2F25.3020205@gmx.de> Am 2014-07-07 23:11, schrieb Jan Kaliszewski: > 07.07.2014 18:11, Andreas Maier wrote: >> Am 07.07.2014 17:58, schrieb Xavier Morel: >>> On 2014-07-07, at 13:22 , Andreas Maier wrote: >>> >>>> While discussing Python issue #12067 >>>> (http://bugs.python.org/issue12067#msg222442), I learned that >>>> Python 3.4 implements '==' and '!=' on the object type such that if >>>> no special equality test operations are implemented in derived >>>> classes, there is a default implementation that tests for identity >>>> (as opposed to equality of the values). > [...] >>>> IMHO, that default implementation contradicts the definition that >>>> '==' and '!=' test for equality of the values of an object. > [...] >>>> To me, a sensible default implementation for == on object would be >>>> (in Python): >>>> >>>> if v is w: >>>> return True; >>>> elif type(v) != type(w): >>>> return False >>>> else: >>>> raise ValueError("Equality cannot be determined in default >>>> implementation") >>> >>> Why would comparing two objects of different types return False >> >> Because I think (but I'm not sure) that the type should play a role >> for comparison of values. But maybe that does not embrace duck typing >> sufficiently, and the type should be ignored by default for comparing >> object values. >> >>> but comparing two objects of the same type raise an error? >> >> That I'm sure of: Because the default implementation (after having >> exhausted all possibilities of calling __eq__ and friends) has no way >> to find out whether the values(!!) of the objects are equal. > > IMHO, in Python context, "value" is a very vague term. Quite often we > can read it as the very basic (but not the only one) notion of "what > makes objects being equal or not" -- and then saying that "objects are > compared by value" is a tautology. > > In other words, what object's "value" is -- is dependent on its > nature: e.g. the value of a list is what are the values of its > consecutive (indexed) items; the value of a set is based on values of > all its elements without notion of order or repetition; the value of a > number is a set of its abstract mathematical properties that determine > what makes objects being equal, greater, lesser, how particular > arithmetic operations work etc... > > I think, there is no universal notion of "the value of a Python > object". The notion of identity seems to be most generic (every > object has it, event if it does not have any other property) -- and > that's why by default it is used to define the most basic feature of > object's *value*, i.e. "what makes objects being equal or not" (== and > !=). Another possibility would be to raise TypeError but, as Ethan > Furman wrote, it would be impractical (e.g. key-type-heterogenic dicts > or sets would be practically impossible to work with). On the other > hand, the notion of sorting order (< > <= >=) is a much more > specialized object property. On the universal notion of a value in Python: In both 2.x and 3.x, it reads (in 3.1. Objects, values and types): - "Every object has an identity, a type and a value." - "An object's /identity/ never changes once it has been created; .... The /value/ of some objects can change. Objects whose value can change are said to be /mutable/; objects whose value is unchangeable once they are created are called /immutable/." These are clear indications that there is an intention to have separate concepts of identity and value in Python. If an instance of type object can exist but does not have a universal notion of value, it should not allow operations that need a value. I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python. The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it. I'll try to summarize in a separate posting. Andy -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.r.maier at gmx.de Tue Jul 8 01:37:48 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:37:48 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2A09.4070208@gmx.de> References: <53BB2A09.4070208@gmx.de> Message-ID: <53BB2F4C.8060402@gmx.de> Am 2014-07-07 23:31, schrieb Rob Cliffe: > > On 07/07/2014 22:11, Jan Kaliszewski wrote: >> [snip] >> >> IMHO, in Python context, "value" is a very vague term. Quite often >> we can read it as the very basic (but not the only one) notion of >> "what makes objects being equal or not" -- and then saying that >> "objects are compared by value" is a tautology. >> >> In other words, what object's "value" is -- is dependent on its >> nature: e.g. the value of a list is what are the values of its >> consecutive (indexed) items; the value of a set is based on values of >> all its elements without notion of order or repetition; the value of >> a number is a set of its abstract mathematical properties that >> determine what makes objects being equal, greater, lesser, how >> particular arithmetic operations work etc... >> >> I think, there is no universal notion of "the value of a Python >> object". The notion of identity seems to be most generic (every >> object has it, event if it does not have any other property) -- and >> that's why by default it is used to define the most basic feature of >> object's *value*, i.e. "what makes objects being equal or not" (== >> and !=). Another possibility would be to raise TypeError but, as >> Ethan Furman wrote, it would be impractical (e.g. >> key-type-heterogenic dicts or sets would be practically impossible to >> work with). On the other hand, the notion of sorting order (< > <= >> >=) is a much more specialized object property. > Quite so. > > x, y = object(), object() > print 'Equal:', ' '.join(attr for attr in dir(x) if > getattr(x,attr)==getattr(y,attr)) > print 'Unequal:', ' '.join(attr for attr in dir(x) if > getattr(x,attr)!=getattr(y,attr)) > > Equal: __class__ __doc__ __new__ __subclasshook__ > Unequal: __delattr__ __format__ __getattribute__ __hash__ __init__ > __reduce__ __reduce_ex__ __repr__ __setattr__ __sizeof__ __str__ > > Andreas, what attribute or combination of attributes do you think > should be the "values" of x and y? > Rob Cliffe > Whatever the object's type defines to be the value. Which requires the presence of an __eq__() or __ne__() implementation. I could even live with a default implementation on type object that ANDs the equality of all instance data attributes and class data attributes, but that is not possible because type object does not have a notion of such data attributes. Reverting to using the identity for the value of an instance of type object is somehow helpless. It may make existing code work, but it is not very logical. I could even argue it makes some logical code fail, because while it reliably detects that the same objects are equal, it fails to detect that different objects may also be equal (at least under a sensible definition of value equality). Having said all this: As a few people already wrote, we cannot change the implementation within a major release. So the real question is how we document it. I'll try to summarize in a separate posting. Andy From benjamin at python.org Tue Jul 8 01:49:40 2014 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 07 Jul 2014 16:49:40 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2EF9.80002@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> Message-ID: <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> On Mon, Jul 7, 2014, at 16:36, Andreas Maier wrote: > Am 2014-07-07 19:43, schrieb Ethan Furman: > > On 07/07/2014 09:56 AM, Andreas Maier wrote: > >> Am 07.07.2014 17:55, schrieb Ethan Furman: > >>> On 07/07/2014 04:22 AM, Andreas Maier wrote: > >>>> > >>>> Where is the discrepancy between the documentation of == and its > >>>> default implementation on object documented? > >>> > >>> There's seems to be no discrepancy (at least, you have not shown it), > >> > >> The documentation states consistently that == tests the equality of > >> the value of an object. The default implementation > >> of == in both 2.x and 3.x tests the object identity. Is that not a > >> discrepancy? > > > > One could say that the value of an object is the object itself. Since > > different objects are different, then they are not equal. > > > >>> but to answer the question about why the default equals operation is an > >>> identity test: > >>> > >>> - all objects should be equal to themselves (there is only one that > >>> isn't, and it's weird) > >> > >> I agree. But that is not a reason to conclude that different objects > >> (as per their identity) should be unequal. Which is > >> what the default implementation does. > > > > Python cannot know which values are important in an equality test, and > > which are not. So it refuses to guess. > > > Well, one could argue that using the address of an object for its value > equality test is pretty close to guessing, considering that given a > sensible definition of value equality, objects of different identity can > very well be equal but will always be considered unequal based on the > address. Probably the best argument for the behavior is that "x is y" should imply "x == y", which preludes raising an exception. No such invariant is desired for ordering, so default implementations of < and > are not provided in Python 3. From andreas.r.maier at gmx.de Tue Jul 8 01:53:06 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:53:06 +0200 Subject: [Python-Dev] == on object tests identity in 3.x - summary Message-ID: <53BB32E2.40805@gmx.de> Thanks to all who responded. In absence of class-specific equality test methods, the default implementations revert to use the identity (=address) of the object as a basis for the test, in both Python 2 and Python 3. In absence of specific ordering test methods, the default implementations revert to use the identity (=address) of the object as a basis for the test, in Python 2. In Python 3, an exception is raised in that case. The bottom line of the discussion seems to be that this behavior is intentional, and a lot of code depends on it. We still need to figure out how to document this. Options could be: 1. We define that the default for the value of an object is its identity. That allows to describe the behavior of the equality test without special casing such objects, but it does not work for ordering. Also, I have difficulties stating what constitutes that default case, because it can really only be explained by referring to the presence or absence of the class-specific equality test and ordering test methods. 2. We don't say anything about the default value of an object, and describe the behavior of the equality test and ordering test, which both need to cover the case that the object does not have the respective test methods. It seems to me that only option 2 really works. Comments and further options welcome. Andy From andreas.r.maier at gmx.de Tue Jul 8 01:55:55 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 01:55:55 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> Message-ID: <53BB338B.2080401@gmx.de> Am 2014-07-08 01:49, schrieb Benjamin Peterson: > On Mon, Jul 7, 2014, at 16:36, Andreas Maier wrote: >> Am 2014-07-07 19:43, schrieb Ethan Furman: >>> On 07/07/2014 09:56 AM, Andreas Maier wrote: >>>> Am 07.07.2014 17:55, schrieb Ethan Furman: >>>>> On 07/07/2014 04:22 AM, Andreas Maier wrote: >>>>>> Where is the discrepancy between the documentation of == and its >>>>>> default implementation on object documented? >>>>> There's seems to be no discrepancy (at least, you have not shown it), >>>> The documentation states consistently that == tests the equality of >>>> the value of an object. The default implementation >>>> of == in both 2.x and 3.x tests the object identity. Is that not a >>>> discrepancy? >>> One could say that the value of an object is the object itself. Since >>> different objects are different, then they are not equal. >>> >>>>> but to answer the question about why the default equals operation is an >>>>> identity test: >>>>> >>>>> - all objects should be equal to themselves (there is only one that >>>>> isn't, and it's weird) >>>> I agree. But that is not a reason to conclude that different objects >>>> (as per their identity) should be unequal. Which is >>>> what the default implementation does. >>> Python cannot know which values are important in an equality test, and >>> which are not. So it refuses to guess. >>> >> Well, one could argue that using the address of an object for its value >> equality test is pretty close to guessing, considering that given a >> sensible definition of value equality, objects of different identity can >> very well be equal but will always be considered unequal based on the >> address. > Probably the best argument for the behavior is that "x is y" should > imply "x == y", which preludes raising an exception. No such invariant > is desired for ordering, so default implementations of < and > are not > provided in Python 3. I agree that "x is y" should imply "x == y". The problem of the default implementation is that "x is not y" implies "x != y" and that may or may not be true under a sensible definition of equality. From andreas.r.maier at gmx.de Tue Jul 8 02:12:14 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 02:12:14 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BAC638.7030704@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> <53BAC638.7030704@stoneleaf.us> Message-ID: <53BB375E.8010904@gmx.de> Am 2014-07-07 18:09, schrieb Ethan Furman: > Just because two instances from the same object have the same value > does not mean they are equal. For a real-life example, look at > twins: biologically identical, yet not equal. I think they *are* equal in Python if they have the same value, by definition, because somewhere the Python docs state that equality compares the object's values. The reality though is that value is more vague than equality test (as it was already pointed out in this thread): A class designer can directly implement what equality means to the class, but he or she cannot implement an accessor method for the value. The value plays a role only indirectly as part of equality and ordering tests. Andy From ethan at stoneleaf.us Tue Jul 8 01:50:57 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 16:50:57 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2EF9.80002@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> Message-ID: <53BB3261.6080705@stoneleaf.us> On 07/07/2014 04:36 PM, Andreas Maier wrote: > Am 2014-07-07 19:43, schrieb Ethan Furman: >> >> Python cannot know which values are important in an equality test, and which are not. So it refuses to guess. > > Well, one could argue that using the address of an object for its value equality test is pretty close to guessing, > considering that given a sensible definition of value equality, objects of different identity can very well be equal but > will always be considered unequal based on the address. And what would be this 'sensible definition'? > So we have many cases of classes whose designers thought about whether a sensible definition of equality was needed, and > decided that an address/identity-based equality definition was just what they needed, yet they did not want to or could > not use the "is" operator? 1) The address of the object is irrelevant. While that is what CPython uses, it is not what every Python uses. 2) The 'is' operator is specialized, and should only rarely be needed. If equals is what you mean, use '=='. 3) If Python forced us to write our own __eq__ /for every single class/ what would happen? Well, I suspect quite a few would make their own 'object' to inherit from, and would have the fallback of __eq__ meaning object identity. Practicality beats purity. > Can you give me an example for such a class (besides type object)? (I.e. a class that does not have __eq__() and > __ne__() but whose instances are compared with == or !=) I never add __eq__ to my classes until I come upon a place where I need to check if two instances of those classes are 'equal', for whatever I need equal to mean in that case. >> Ordering is much less frequent, and since we already tried always ordering things, falling back to type name if >> necessary, we have discovered that that is not a good trade-off. So now if one tries to order things without >> specifying how it should be done, one gets an exception. > > In Python 2, the default ordering implementation on type object uses the identity (address) as the basis for ordering. > In Python 3, that was changed to raise an exception. That seems to be in sync with what you are saying. > > Maybe it would have been possible to also change that for the default equality implementation in Python 3. But it was > not changed. As I wrote in another response, we now need to document this properly. Doc patches are gratefully accepted. :) -- ~Ethan~ From ethan at stoneleaf.us Tue Jul 8 01:52:17 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 16:52:17 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> Message-ID: <53BB32B1.2090300@stoneleaf.us> On 07/07/2014 04:49 PM, Benjamin Peterson wrote: > > Probably the best argument for the behavior is that "x is y" should > imply "x == y", which preludes raising an exception. No such invariant > is desired for ordering, so default implementations of < and > are not > provided in Python 3. Nice. This bit should definitely make it into the doc patch if not already in the docs. -- ~Ethan~ From ethan at stoneleaf.us Tue Jul 8 02:22:16 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 17:22:16 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB375E.8010904@gmx.de> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> <53BAC638.7030704@stoneleaf.us> <53BB375E.8010904@gmx.de> Message-ID: <53BB39B8.20707@stoneleaf.us> On 07/07/2014 05:12 PM, Andreas Maier wrote: > Am 2014-07-07 18:09, schrieb Ethan Furman: >> >> Just because two instances from the same object have the same value does not mean they are equal. For a real-life >> example, look at twins: biologically identical, yet not equal. > > I think they *are* equal in Python if they have the same value, by definition, because somewhere the Python docs state > that equality compares the object's values. And is personality of no value, then? > The reality though is that value is more vague than equality test (as it was already pointed out in this thread): A > class designer can directly implement what equality means to the class, but he or she cannot implement an accessor > method for the value. The value plays a role only indirectly as part of equality and ordering tests. Not sure what you mean by this. -- ~Ethan~ From stephen at xemacs.org Tue Jul 8 03:44:40 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Jul 2014 10:44:40 +0900 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB338B.2080401@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> <53BB338B.2080401@gmx.de> Message-ID: <87fvictyif.fsf@uwakimon.sk.tsukuba.ac.jp> Andreas Maier writes: > The problem of the default implementation is that "x is not y" > implies "x != y" and that may or may not be true under a sensible > definition of equality. I noticed this a long time ago and just decided it was covered by "consenting adults". That is, if the "sensible definition" of x == y is such that it can be true simultaneously with x != y, it's the programmer's responsibility to notice that, and to provide an implementation. But there's no issue that lack of an explicit implementation of comparison causes a program to have ambiguous meaning. I also consider that for "every object has a value" to make sense as a description of Python, that value must be representable by an object. The obvious default representation for the value of any object is the object itself! Now, for this purpose you don't need a "canonical representation" of an object's value. In particular, equality comparisons need not explicitly construct a representative object. Some do, some don't, I would suppose. For example, in comparing an integer with a float, I would convert the integer to float and compare, but in comparing float and complex I would check the complex for x.im == 0.0, and if true, return the value of x.re == y. I'm not sure how you interpret "value" to find the behavior of Python (the default comparison) problematic. I suspect you'd have a hard time coming up with an interpretation consistent with Python's object orientation. That said, it's probably worth documenting, but I don't know how much of the above should be introduced into the documentation. Steve From andreas.r.maier at gmx.de Tue Jul 8 03:18:16 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 03:18:16 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB3261.6080705@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <53BB3261.6080705@stoneleaf.us> Message-ID: <53BB46D8.6040101@gmx.de> Am 2014-07-08 01:50, schrieb Ethan Furman: > On 07/07/2014 04:36 PM, Andreas Maier wrote: >> Am 2014-07-07 19:43, schrieb Ethan Furman: >>> >>> Python cannot know which values are important in an equality test, >>> and which are not. So it refuses to guess. >> >> Well, one could argue that using the address of an object for its >> value equality test is pretty close to guessing, >> considering that given a sensible definition of value equality, >> objects of different identity can very well be equal but >> will always be considered unequal based on the address. > > And what would be this 'sensible definition'? One that only a class designer can define. That's why I argued for raising an exception if that is not defined. But as I stated elsewhere in this thread: It is as it is, and we need to document it. > >> So we have many cases of classes whose designers thought about >> whether a sensible definition of equality was needed, and >> decided that an address/identity-based equality definition was just >> what they needed, yet they did not want to or could >> not use the "is" operator? > > 1) The address of the object is irrelevant. While that is what > CPython uses, it is not what every Python uses. > > 2) The 'is' operator is specialized, and should only rarely be > needed. If equals is what you mean, use '=='. > > 3) If Python forced us to write our own __eq__ /for every single > class/ what would happen? Well, I suspect quite a few would make > their own 'object' to inherit from, and would have the fallback of > __eq__ meaning object identity. Practicality beats purity. > > >> Can you give me an example for such a class (besides type object)? >> (I.e. a class that does not have __eq__() and >> __ne__() but whose instances are compared with == or !=) > > I never add __eq__ to my classes until I come upon a place where I > need to check if two instances of those classes are 'equal', for > whatever I need equal to mean in that case. With that strategy, you would not be hurt if the default implementation raised an exception in case the two objects are not identical. ;-) >>> Ordering is much less frequent, and since we already tried always >>> ordering things, falling back to type name if >>> necessary, we have discovered that that is not a good trade-off. So >>> now if one tries to order things without >>> specifying how it should be done, one gets an exception. >> >> In Python 2, the default ordering implementation on type object uses >> the identity (address) as the basis for ordering. >> In Python 3, that was changed to raise an exception. That seems to be >> in sync with what you are saying. >> >> Maybe it would have been possible to also change that for the default >> equality implementation in Python 3. But it was >> not changed. As I wrote in another response, we now need to document >> this properly. > > Doc patches are gratefully accepted. :) Understood. I will be working on it. :-) Andy From andreas.r.maier at gmx.de Tue Jul 8 03:29:34 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Tue, 08 Jul 2014 03:29:34 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB39B8.20707@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> <53BAC638.7030704@stoneleaf.us> <53BB375E.8010904@gmx.de> <53BB39B8.20707@stoneleaf.us> Message-ID: <53BB497E.4020005@gmx.de> Am 2014-07-08 02:22, schrieb Ethan Furman: > On 07/07/2014 05:12 PM, Andreas Maier wrote: >> Am 2014-07-07 18:09, schrieb Ethan Furman: >>> >>> Just because two instances from the same object have the same value >>> does not mean they are equal. For a real-life >>> example, look at twins: biologically identical, yet not equal. >> >> I think they *are* equal in Python if they have the same value, by >> definition, because somewhere the Python docs state >> that equality compares the object's values. > > And is personality of no value, then? I guess you are pulling my leg, Ethan ... ;-) But again, for a definition of equality between instances of a Python class representing twins, one has to decide what attributes of the twins are supposed to be part of that. If the designer of the class decides that just the biology atributes are part of equality, fine. If he or she decides that personality attributes are additionally part of equality, also fine. >> The reality though is that value is more vague than equality test (as >> it was already pointed out in this thread): A >> class designer can directly implement what equality means to the >> class, but he or she cannot implement an accessor >> method for the value. The value plays a role only indirectly as part >> of equality and ordering tests. > > Not sure what you mean by this. Equality has a precise implementation (and hence definition) in Python; value does not. So to argue that value and equality can be different, is moot in a way, because it is not clear in Python what the value of an object is. Andy From stephen at xemacs.org Tue Jul 8 03:51:51 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Jul 2014 10:51:51 +0900 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB375E.8010904@gmx.de> References: <53BA82F3.1070403@gmx.de> <1404746147.31528.138910801.7FAD5364@webmail.messagingengine.com> <53BABCF2.50607@gmx.de> <53BAC638.7030704@stoneleaf.us> <53BB375E.8010904@gmx.de> Message-ID: <87egxwty6g.fsf@uwakimon.sk.tsukuba.ac.jp> Andreas Maier writes: > A class designer can directly implement what equality means to the > class, but he or she cannot implement an accessor method for the > value. Of course she can! What you mean to say, I think, is that Python does not insist on an accessor method for the value. Ie, there is no dunder method __value__ on instances of class object. From steve at pearwood.info Tue Jul 8 03:58:33 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 8 Jul 2014 11:58:33 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB32B1.2090300@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> <53BB32B1.2090300@stoneleaf.us> Message-ID: <20140708015833.GD13014@ando> On Mon, Jul 07, 2014 at 04:52:17PM -0700, Ethan Furman wrote: > On 07/07/2014 04:49 PM, Benjamin Peterson wrote: > > > >Probably the best argument for the behavior is that "x is y" should > >imply "x == y", which preludes raising an exception. No such invariant > >is desired for ordering, so default implementations of < and > are not > >provided in Python 3. > > Nice. This bit should definitely make it into the doc patch if not already > in the docs. However, saying this should not preclude classes where this is not the case, e.g. IEEE-754 NANs. I would not like this wording (which otherwise is very nice) to be used in the future to force reflexivity on object equality. https://en.wikipedia.org/wiki/Reflexive_relation To try to cut off arguments: - Yes, it is fine to have the default implementation of __eq__ assume reflexivity. - Yes, it is fine for standard library containers (lists, dicts, etc.) to assume reflexivity of their items. - I'm fully aware that some people think the non-reflexivity of NANs is logically nonsensical and a mistake. I do not agree with them. - I'm not looking to change anything here, the current behaviour is fine, I just want to ensure that an otherwise admirable doc change does not get interpreted in the future in a way that prevents classes from defining __eq__ to be non-reflexive. -- Steven From rob.cliffe at btinternet.com Tue Jul 8 03:59:30 2014 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 08 Jul 2014 02:59:30 +0100 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2F25.3020205@gmx.de> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> Message-ID: <53BB5082.500@btinternet.com> On 08/07/2014 00:37, Andreas Maier wrote: > [...] > Am 2014-07-07 23:11, schrieb Jan Kaliszewski: >> >> IMHO, in Python context, "value" is a very vague term. Quite often >> we can read it as the very basic (but not the only one) notion of >> "what makes objects being equal or not" -- and then saying that >> "objects are compared by value" is a tautology. >> >> In other words, what object's "value" is -- is dependent on its >> nature: e.g. the value of a list is what are the values of its >> consecutive (indexed) items; the value of a set is based on values of >> all its elements without notion of order or repetition; the value of >> a number is a set of its abstract mathematical properties that >> determine what makes objects being equal, greater, lesser, how >> particular arithmetic operations work etc... >> >> I think, there is no universal notion of "the value of a Python >> object". The notion of identity seems to be most generic (every >> object has it, event if it does not have any other property) -- and >> that's why by default it is used to define the most basic feature of >> object's *value*, i.e. "what makes objects being equal or not" (== >> and !=). Another possibility would be to raise TypeError but, as >> Ethan Furman wrote, it would be impractical (e.g. >> key-type-heterogenic dicts or sets would be practically impossible to >> work with). On the other hand, the notion of sorting order (< > <= >> >=) is a much more specialized object property. > +1. See below. > On the universal notion of a value in Python: In both 2.x and 3.x, it > reads (in 3.1. Objects, values and types): > - "*Every object has an identity, a type and a value.*" Hm, is that *really* true? Every object has an identity and a type, sure. Every *variable* has a value, which is an object (an instance of some class). (I think? :-) ) But ISTM that the notion of the value of an *object* exists more in our minds than in Python. We say that number and string objects have a value because the concepts of number and string, including how to compare them, are intuitive for us, and these objects by design reflect our concepts with some degree of fidelity. Ditto for lists, dictionaries and sets which are only slightly less intuitive. If I came across an int object and had no concept of what an integer number was, how would I know what its "value" is supposed to be? If I'm given an int object, "i", say, and pretend I don't know what an integer is, I see that len(dir(i)) == 64 # Python 2.7 (and there may be attributes that dir doesn't show). How can I know from this bewildering list of 64 attributes (say they were all written in Swahili) that I can obtain the "real" (pun not intended) "value" with i.real or possibly i.numerator or i.__str__() or maybe somewhere else? ISTM "value" is a convention between humans, not something intrinsic to a class definition. Or at best something that is implied by the implementation of the comparison (or other) operators in the class. And can the following *objects* (class instances) be said to have a (obvious) value? obj1 = object() def obj2(): pass obj3 = (x for x in range(3)) obj4 = xrange(4) And is there any sensible way of comparing two such similar objects, e.g. obj3 = (x for x in range(3)) obj3a = (x for x in range(3)) except by id? Well, possibly in some cases. You might define two functions as equal if their code objects are identical (I'm outside my competence here, so please no-one correct me if I've got the technical detail wrong). But I don't see how you can compare two generators (other than by id) except by calling them both destructively (possibly an infinite number of times, and hoping that neither has unpredictable behaviour, side effects, etc.). As has already been said (more or less) in this thread, if you want to be able to compare any two objects of the same type, and not by id, you probably end up with a circular definition of "value" as "that (function of an object's attributes) which is compared". Which is ultimately an implementation decision for each type, not anything intrinsic to the type. So it makes sense to consistently fall back on id when nothing else obvious suggests itself. > - "An object's /identity/ never changes once it has been created; .... > The /value/ of some objects can change. Objects whose value can change > are said to be /mutable/; objects whose value is unchangeable once > they are created are called /immutable/." ISTM it needs to be explicitly documented for each class what the "value" of an instance is intended to be. Oh, I'm being pedantic here, sure. But I wonder if enforcing it would lead to more clarity of thought (maybe even the realisation that some objects don't have a value?? :-) ). > > These are clear indications that there is an intention to have > separate concepts of identity and value in Python. If an instance of > type object can exist but does not have a universal notion of value, > it should not allow operations that need a value. As Jan says, this would make comparing container objects a pain. Apologies if this message is a bit behind the times. There have been about 10 contributions since I started composing this! Best wishes, Rob Cliffe [...] -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Jul 8 04:15:27 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Jul 2014 12:15:27 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB5082.500@btinternet.com> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> Message-ID: On Tue, Jul 8, 2014 at 11:59 AM, Rob Cliffe wrote: > If I came across an int object and had no concept of what an integer number > was, how would I know what its "value" is supposed to be? The value of an integer is the number it represents. In CPython, it's entirely possible to have multiple integer objects (ie objects with unique identities) with the same value, although AIUI there are Pythons for which that's not the case. The value of a float, Fraction, Decimal, or complex is also the number it represents, so when you compare 1==1.0, the answer is that they have the same value. They can't possibly have the same identity (every object has a single type), but they have the same value. But what *is* that value? It's not something that can be independently recognized, because casting to a different type might change the value: >>> i = 2**53+1 >>> f = float(i) >>> i == f False >>> f == int(f) True Ergo the comparison of a float to an int cannot be done by casting the int to float, nor by casting the float to int; it has to be done by comparing the abstract numbers represented. Those are the objects' values. But what's the value of a sentinel object? _SENTINEL = object() def f(x, y=_SENTINEL): do_something_with(x) if y is not _SENTINEL: do_something_with(y) I'd say this is a reasonable argument for the default object value to be identity. ChrisA From steve at pearwood.info Tue Jul 8 04:32:34 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 8 Jul 2014 12:32:34 +1000 Subject: [Python-Dev] == on object tests identity in 3.x - summary In-Reply-To: <53BB32E2.40805@gmx.de> References: <53BB32E2.40805@gmx.de> Message-ID: <20140708023234.GE13014@ando> On Tue, Jul 08, 2014 at 01:53:06AM +0200, Andreas Maier wrote: > Thanks to all who responded. > > In absence of class-specific equality test methods, the default > implementations revert to use the identity (=address) of the object as a > basis for the test, in both Python 2 and Python 3. Scrub out the "= address" part. Python does not require that objects even have an address, that is not part of the language definition. (If I simulate a Python interpreter in my head, what is the address of the objects?) CPython happens to use the address of objects as their identity, but that is an implementation-specific trick, not a language guarantee, and it is documented as such. Neither IronPython nor Jython use the address as ID. > In absence of specific ordering test methods, the default > implementations revert to use the identity (=address) of the object as a > basis for the test, in Python 2. I don't think that is correct. This is using Python 2.7: py> a = (1, 2) py> b = "Hello World!" py> id(a) < id(b) True py> a < b False And just to be sure that neither a nor b are controlling this: py> a.__lt__(b) NotImplemented py> b.__gt__(a) NotImplemented So the identity of the instances a and b are not used for < , although the identity of their types may be: py> id(type(a)) < id(type(b)) False Using the identity of the instances would be silly, since that would mean that sorting a list of mixed types would depend on the items' history, not their values. > In Python 3, an exception is raised in that case. I don't think the ordering methods are terribly relevant to the behaviour of equals. > The bottom line of the discussion seems to be that this behavior is > intentional, and a lot of code depends on it. > > We still need to figure out how to document this. Options could be: I'm not sure it needs to be documented other than to say that the default object.__eq__ compares by identity. Everything else is, in my opinion, over-thinking it. > 1. We define that the default for the value of an object is its > identity. That allows to describe the behavior of the equality test > without special casing such objects, but it does not work for ordering. Why does it need to work for ordering? Not all values define ordering relations. Unlike type and identity, "value" does not have a single concrete definition, it depends on the class designer. In the case of object, the value of an object instance is itself, i.e. its identity. I don't think we need more than that. -- Steven From ethan at stoneleaf.us Tue Jul 8 04:25:58 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 19:25:58 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <20140708015833.GD13014@ando> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> <53BB32B1.2090300@stoneleaf.us> <20140708015833.GD13014@ando> Message-ID: <53BB56B6.8030306@stoneleaf.us> On 07/07/2014 06:58 PM, Steven D'Aprano wrote: > On Mon, Jul 07, 2014 at 04:52:17PM -0700, Ethan Furman wrote: >> On 07/07/2014 04:49 PM, Benjamin Peterson wrote: >>> >>> Probably the best argument for the behavior is that "x is y" should >>> imply "x == y", which preludes raising an exception. No such invariant >>> is desired for ordering, so default implementations of < and > are not >>> provided in Python 3. >> >> Nice. This bit should definitely make it into the doc patch if not already >> in the docs. > > However, saying this should not preclude classes where this is not the > case, e.g. IEEE-754 NANs. I would not like this wording (which otherwise > is very nice) to be used in the future to force reflexivity on object > equality. > > https://en.wikipedia.org/wiki/Reflexive_relation > > To try to cut off arguments: > > - Yes, it is fine to have the default implementation of __eq__ > assume reflexivity. > > - Yes, it is fine for standard library containers (lists, dicts, > etc.) to assume reflexivity of their items. > > - I'm fully aware that some people think the non-reflexivity of > NANs is logically nonsensical and a mistake. I do not agree > with them. > > - I'm not looking to change anything here, the current behaviour > is fine, I just want to ensure that an otherwise admirable doc > change does not get interpreted in the future in a way that > prevents classes from defining __eq__ to be non-reflexive. +1 From ethan at stoneleaf.us Tue Jul 8 04:29:17 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 19:29:17 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB46D8.6040101@gmx.de> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <53BB3261.6080705@stoneleaf.us> <53BB46D8.6040101@gmx.de> Message-ID: <53BB577D.4040208@stoneleaf.us> On 07/07/2014 06:18 PM, Andreas Maier wrote: > Am 2014-07-08 01:50, schrieb Ethan Furman: >> >> I never add __eq__ to my classes until I come upon a place where I need to check if two instances of those classes are >> 'equal', for whatever I need equal to mean in that case. > > With that strategy, you would not be hurt if the default implementation raised an exception in case the two objects are > not identical. ;-) Yes, I would. Not identical means not equal until I say otherwise. Raising an exception instead of returning False (for __eq__) would be horrible. -- ~Ethan~ From steve at pearwood.info Tue Jul 8 05:12:02 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 8 Jul 2014 13:12:02 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB5082.500@btinternet.com> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> Message-ID: <20140708031202.GF13014@ando> On Tue, Jul 08, 2014 at 02:59:30AM +0100, Rob Cliffe wrote: > >- "*Every object has an identity, a type and a value.*" > > Hm, is that *really* true? Yes. It's pretty much true by definition: objects are *defined* to have an identity, type and value, even if that value is abstract rather than concrete. > Every object has an identity and a type, sure. > Every *variable* has a value, which is an object (an instance of some > class). (I think? :-) ) I don't think so. Variables can be undefined, which means they don't have a value: py> del x py> print x Traceback (most recent call last): File "", line 1, in NameError: name 'x' is not defined > But ISTM that the notion of the value of an *object* exists more in our > minds than in Python. Pretty much. How could it be otherwise? Human beings define the semantics of objects, that is, their value, not Python. [...] > If I came across an int object and had no concept of what an integer > number was, how would I know what its "value" is supposed to be? You couldn't, any more than you would know what the value of a Watzit object was if you knew nothing about Watzits. The value of an object is intimitely tied to its semantics, what the object represents and what it is intended to be used for. In general, we can say nothing about the value of an object until we've read the documentation for the object. But we can be confident that the object has *some* value, otherwise what would be the point of it? In some cases, that value might be nothing more than it's identity, but that's okay. I think the problem we're having here is that some people are looking for a concrete definition of what the value of an object is, but there isn't one. [...] > And can the following *objects* (class instances) be said to have a > (obvious) value? > obj1 = object() > def obj2(): pass > obj3 = (x for x in range(3)) > obj4 = xrange(4) The value as understood by a human reader, as opposed to the value as assumed by Python, is not necessarily the same. As far as Python is concerned, the value of all four objects is the object itself, i.e. its identity. (For avoidance of doubt, not its id(), which is just a number.) A human reader could infer more than Python: - the second object is a "do nothing" function; - the third object is a lazy sequence (0, 1, 2); - the fourth object is a lazy sequence (0, 1, 2, 3); but since the class designer didn't deem it important enough, or practical enough, to implement an __eq__ method that takes those things into account, *for the purposes of equality* (but perhaps not other purposes) we say that the value is just the object itself, its identity. > And is there any sensible way of comparing two such similar objects, e.g. > obj3 = (x for x in range(3)) > obj3a = (x for x in range(3)) > except by id? In principle, one might peer into the two generators and note that they perform exactly the same computations on exactly the same input, and therefore should be deemed to have the same value. But since that's hard, and "exactly the same" is not always well-defined, Python doesn't try to be too clever and just uses a simpler idea: the value is the object itself. > Well, possibly in some cases. You might define two functions as equal > if their code objects are identical (I'm outside my competence here, so > please no-one correct me if I've got the technical detail wrong). But I > don't see how you can compare two generators (other than by id) except > by calling them both destructively (possibly an infinite number of > times, and hoping that neither has unpredictable behaviour, side > effects, etc.). Generator objects have code objects as well. py> x = (a for a in (1, 2)) py> x.gi_code at 0xb7ee39f8, file "", line 1> > >- "An object's /identity/ never changes once it has been created; .... > >The /value/ of some objects can change. Objects whose value can change > >are said to be /mutable/; objects whose value is unchangeable once > >they are created are called /immutable/." > > ISTM it needs to be explicitly documented for each class what the > "value" of an instance is intended to be. Why? What value (pun intended) is there in adding an explicit statement of value to every single class? "The value of a str is the str's sequence of characters." "The value of a list is the list's sequence of items." "The value of an int is the int's numeric value." "The value of a float is the float's numeric value, or in the case of INFs and NANs, that they are an INF or NAN." "The value of a complex number is the ordered pair of its real and imaginary components." "The value of a re MatchObject is the MatchObject itself." I don't see any benefit to forcing all classes to explicitly document this sort of thing. It's nearly always redundant and unnecessary. -- Steven From rosuav at gmail.com Tue Jul 8 05:31:46 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Jul 2014 13:31:46 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <20140708031202.GF13014@ando> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> Message-ID: On Tue, Jul 8, 2014 at 1:12 PM, Steven D'Aprano wrote: > Why? What value (pun intended) is there in adding an explicit statement > of value to every single class? > > "The value of a str is the str's sequence of characters." > "The value of a list is the list's sequence of items." > "The value of an int is the int's numeric value." > "The value of a float is the float's numeric value, or in the case of > INFs and NANs, that they are an INF or NAN." > "The value of a complex number is the ordered pair of its real and > imaginary components." > "The value of a re MatchObject is the MatchObject itself." > > I don't see any benefit to forcing all classes to explicitly document > this sort of thing. It's nearly always redundant and unnecessary. It's important where it's not obvious. For instance, two lists with the same items are equal, two tuples with the same items are equal, but a list and a tuple with the same items aren't. Doesn't mean it necessarily has to be documented, though. ChrisA From stephen at xemacs.org Tue Jul 8 05:34:33 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Jul 2014 12:34:33 +0900 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB3261.6080705@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <53BB3261.6080705@stoneleaf.us> Message-ID: <87bnt0ttfa.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > And what would be this 'sensible definition' [of value equality]? I think that's the wrong question. I suppose Andreas's point is that when the programmer doesn't provide a definition, there is no such thing as a "sensible definition" to default to. I disagree, but given that as the point of discussion, asking what the definition is, is moot. > 2) The 'is' operator is specialized, and should only rarely be > needed. Nitpick: Except that it's the preferred way to express identity with singletons, AFAIK. ("if x is None: ...", not "if x == None: ...".) From rob.cliffe at btinternet.com Tue Jul 8 06:02:39 2014 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 08 Jul 2014 05:02:39 +0100 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <20140708031202.GF13014@ando> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> Message-ID: <53BB6D5F.1010800@btinternet.com> On 08/07/2014 04:12, Steven D'Aprano wrote: > On Tue, Jul 08, 2014 at 02:59:30AM +0100, Rob Cliffe wrote: > >>> - "*Every object has an identity, a type and a value.*" >> Hm, is that *really* true? > Yes. It's pretty much true by definition: objects are *defined* to have > an identity, type and value, even if that value is abstract rather than > concrete. Except that in your last paragraph you imply that an explicit *definition* of the value is normally not in the docs. > > >> Every object has an identity and a type, sure. >> Every *variable* has a value, which is an object (an instance of some >> class). (I think? :-) ) > I don't think so. Variables can be undefined, which means they don't > have a value: > > py> del x > py> print x > Traceback (most recent call last): > File "", line 1, in > NameError: name 'x' is not defined I was aware of that but I considered that a deleted variable no longer existed. Not that it's important. > > >> But ISTM that the notion of the value of an *object* exists more in our >> minds than in Python. > Pretty much. How could it be otherwise? Human beings define the > semantics of objects, that is, their value, not Python. > > > [...] >> If I came across an int object and had no concept of what an integer >> number was, how would I know what its "value" is supposed to be? > You couldn't, any more than you would know what the value of a Watzit > object was if you knew nothing about Watzits. The value of an object is > intimitely tied to its semantics, what the object represents and what it > is intended to be used for. In general, we can say nothing about the > value of an object until we've read the documentation for the object. > > But we can be confident that the object has *some* value, otherwise what > would be the point of it? In some cases, that value might be nothing > more than it's identity, but that's okay. > > I think the problem we're having here is that some people are looking > for a concrete definition of what the value of an object is, but there > isn't one. > > > [...] >> And can the following *objects* (class instances) be said to have a >> (obvious) value? >> obj1 = object() >> def obj2(): pass >> obj3 = (x for x in range(3)) >> obj4 = xrange(4) > The value as understood by a human reader, as opposed to the value as > assumed by Python, is not necessarily the same. As far as Python is > concerned, the value of all four objects is the object itself, i.e. its > identity. Is this mentioned in the docs? I couldn't find it in a quick look through the 2.7.8 language reference. > (For avoidance of doubt, not its id(), which is just a > number.) > > A human reader could infer more than Python: > > - the second object is a "do nothing" function; > - the third object is a lazy sequence (0, 1, 2); > - the fourth object is a lazy sequence (0, 1, 2, 3); > > but since the class designer didn't deem it important enough, or > practical enough, to implement an __eq__ method that takes those things > into account, *for the purposes of equality* (but perhaps not other > purposes) we say that the value is just the object itself, its identity. > > > >> And is there any sensible way of comparing two such similar objects, e.g. >> obj3 = (x for x in range(3)) >> obj3a = (x for x in range(3)) >> except by id? > In principle, one might peer into the two generators and note that they > perform exactly the same computations on exactly the same input, and > therefore should be deemed to have the same value. But since that's > hard, and "exactly the same" is not always well-defined, Python doesn't > try to be too clever and just uses a simpler idea: the value is the > object itself. Sure, I wasn't suggesting it was a sensible thing to do (quite the opposite), just playing devil's advocate. > > >> Well, possibly in some cases. You might define two functions as equal >> if their code objects are identical (I'm outside my competence here, so >> please no-one correct me if I've got the technical detail wrong). But I >> don't see how you can compare two generators (other than by id) except >> by calling them both destructively (possibly an infinite number of >> times, and hoping that neither has unpredictable behaviour, side >> effects, etc.). > Generator objects have code objects as well. > > py> x = (a for a in (1, 2)) > py> x.gi_code > at 0xb7ee39f8, file "", line 1> > >>> - "An object's /identity/ never changes once it has been created; .... >>> The /value/ of some objects can change. Objects whose value can change >>> are said to be /mutable/; objects whose value is unchangeable once >>> they are created are called /immutable/." >> ISTM it needs to be explicitly documented for each class what the >> "value" of an instance is intended to be. > Why? What value (pun intended) is there in adding an explicit statement > of value to every single class? It troubles me a bit that "value" seems to be a fuzzy concept - it has an obvious meaning for some types (int, float, list etc.) but for callable objects you tell me that their value is the object itself, but I can't find it in the docs. (Is the same true for module objects?) Apart from anything else: "Objects whose value can change are said to be mutable" How can we say if an object is mutable if we don't know what its value is? Are callables non-mutable? (Presumably?) What about modules? (Their *attributes* can be changed.) Or are these questions considered stupid and/or irrelevant? > > "The value of a str is the str's sequence of characters." > "The value of a list is the list's sequence of items." > "The value of an int is the int's numeric value." > "The value of a float is the float's numeric value, or in the case of > INFs and NANs, that they are an INF or NAN." > "The value of a complex number is the ordered pair of its real and > imaginary components." > "The value of a re MatchObject is the MatchObject itself." > > I don't see any benefit to forcing all classes to explicitly document > this sort of thing. It's nearly always redundant and unnecessary. > "nearly always" yes, but there might be one or two cases where it would help. Sorry, I don't have an example at present. Thanks for a very full answer, Steven. Rob Cliffe From ethan at stoneleaf.us Tue Jul 8 05:47:23 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 07 Jul 2014 20:47:23 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <87bnt0ttfa.fsf@uwakimon.sk.tsukuba.ac.jp> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <53BB3261.6080705@stoneleaf.us> <87bnt0ttfa.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <53BB69CB.6040407@stoneleaf.us> On 07/07/2014 08:34 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: > >> And what would be this 'sensible definition' [of value equality]? > > I think that's the wrong question. I suppose Andreas's point is that > when the programmer doesn't provide a definition, there is no such > thing as a "sensible definition" to default to. I disagree, but given > that as the point of discussion, asking what the definition is, is moot. He eventually made that point, but until he did I thought he meant that there was such a sensible default definition, he just wasn't sharing what he thought it might be with us. >> 2) The 'is' operator is specialized, and should only rarely be >> needed. > > Nitpick: Except that it's the preferred way to express identity with > singletons, AFAIK. ("if x is None: ...", not "if x == None: ...".) Not a nit at all, at least in my code -- the number of times I use '==' far outweighs the number of times I use 'is'. Thus, 'is' is rare. (Now, of course, I'll have to go measure that assertion and probably find out I am wrong :/ ). -- ~Ethan~ From ncoghlan at gmail.com Tue Jul 8 06:58:50 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Jul 2014 21:58:50 -0700 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: Message-ID: On 7 Jul 2014 10:47, "Guido van Rossum" wrote: > > It would still be nice to know who "the appropriate persons" are. Too much of our infrastructure seems to be maintained by house elves or the ITA. I volunteered to be the board's liaison to the infrastructure team, and getting more visibility around what the infrastructure *is* and how it's monitored and supported is going to be part of that. That will serve a couple of key purposes: - making the points of escalation clearer if anything breaks or needs improvement (although "infrastructure at python.org" is a good default choice) - making the current "todo" list of the infrastructure team more visible (both to calibrate resolution time expectations and to provide potential contributors an idea of what's involved) Noah has already set up http://status.python.org/ to track service status, I can see about getting buildbot.python.org added to the list. Cheers, Nick. > > > On Sun, Jul 6, 2014 at 11:33 PM, Terry Reedy wrote: >> >> On 7/6/2014 7:54 PM, Ned Deily wrote: >>> >>> As of the moment, buildbot.python.org seems to be down again. >> >> >> Several hours later, back up. >> >> >> > Where is the best place to report problems like this? >> >> We should have, if not already, an automatic system to detect down servers and report (email) to appropriate persons. >> >> -- >> Terry Jan Reedy >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jul 8 07:23:35 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Jul 2014 22:23:35 -0700 Subject: [Python-Dev] == on object tests identity in 3.x - summary In-Reply-To: <53BB32E2.40805@gmx.de> References: <53BB32E2.40805@gmx.de> Message-ID: On 7 Jul 2014 19:22, "Andreas Maier" wrote: > > Thanks to all who responded. > > In absence of class-specific equality test methods, the default implementations revert to use the identity (=address) of the object as a basis for the test, in both Python 2 and Python 3. > > In absence of specific ordering test methods, the default implementations revert to use the identity (=address) of the object as a basis for the test, in Python 2. In Python 3, an exception is raised in that case. In Python 2, it orders by type, and only then by id (which happens to be the address in CPython). > > The bottom line of the discussion seems to be that this behavior is intentional, and a lot of code depends on it. > > We still need to figure out how to document this. Options could be: > > 1. We define that the default for the value of an object is its identity. That allows to describe the behavior of the equality test without special casing such objects, but it does not work for ordering. Also, I have difficulties stating what constitutes that default case, because it can really only be explained by referring to the presence or absence of the class-specific equality test and ordering test methods. > > 2. We don't say anything about the default value of an object, and describe the behavior of the equality test and ordering test, which both need to cover the case that the object does not have the respective test methods. The behaviour of Python 3's type system is fully covered by equality defaulting to comparing by identity, and ordering comparisons having to be defined explicitly. The docs at https://docs.python.org/3/reference/expressions.html#not-in could likely be clarified, but they do cover this (they just cover a lot about the builtins at the same time). > It seems to me that only option 2 really works. Indeed, and that's the version already documented. Regards, Nick. > > > Comments and further options welcome. > > Andy > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jul 8 09:01:00 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Jul 2014 16:01:00 +0900 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB6D5F.1010800@btinternet.com> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> Message-ID: <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> Rob Cliffe writes: > > Why? What value (pun intended) is there in adding an explicit statement > > of value to every single class? > It troubles me a bit that "value" seems to be a fuzzy concept - it has > an obvious meaning for some types (int, float, list etc.) but for > callable objects you tell me that their value is the object itself, Value is *abstract* and implicit, but not fuzzy: it's what you compare when you test for equality. It's abstract in the sense that "inside of Python" an object's value has to be an object (everything is an object). Now, the question is "do we need a canonical representation of objects' values?" Ie, do we need a mapping from from every object conceivable within Python to a specific object that is its value? Since Python generally allows, even prefers, duck-typing, the answer presumably is "no". (Maybe you can think of Python programs you'd like to write where the answer is "yes", but I don't have any examples.) And in fact there is no such mapping in Python. So the answer I propose is that an object's value needs a representation in Python, but that representation doesn't need to be unique. Any object is a representation of its own value, and if you need two different objects to be equal to each other, you must define their __eq__ methods to produce that result. This (the fact that any object represents its value, and so can be used as "the" standard of comparison for that value) is why it's so important that equality be reflexive, symmetric, and transitive, and why we really want to be careful about creating objects like NaN whose definition is "my value isn't a value", and therefore "a = float('NaN'); a == a" evaluates to False. I agree with Steven d'A that this rule is not part of the language definition and shouldn't be, but it's the rule of thumb I find hardest to imagine *ever* wanting to break in my own code (although I sort of understand why the IEEE 754 committee found they had to). > How can we say if an object is mutable if we don't know what its > value is? Mutability is a different question. You can define a class whose instances have mutable attributes but are nonetheless all compare equal regardless of the contents of those attributes. OTOH, the test for mutability to try to mutate it. If that doesn't raise, it's mutable. Steve From rosuav at gmail.com Tue Jul 8 09:09:27 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Jul 2014 17:09:27 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jul 8, 2014 at 5:01 PM, Stephen J. Turnbull wrote: > I agree with Steven d'A that this rule is not part of the language > definition and shouldn't be, but it's the rule of thumb I find hardest > to imagine *ever* wanting to break in my own code (although I sort of > understand why the IEEE 754 committee found they had to). The reason NaN isn't equal to itself is because there are X bit patterns representing NaN, but an infinite number of possible non-numbers that could result from a calculation. Is float("inf")-float("inf") equal to float("inf")/float("inf")? There are three ways NaN equality could have been defined: 1) All NaNs are equal, as if NaN is some kind of "special number". 2) NaNs are equal if they have the exact same bit pattern, and unequal else. 3) All NaNs are unequal, even if they have the same bit pattern. The first option is very dangerous, because it'll mean that "NaN pollution" can actually result in unexpected equality. The second looks fine - a NaN is equal to itself, for instance - but it suffers from the pigeonhole problem, in that eventually you'll have two numbers which resulted from different calculations and happen to have the same bit pattern. The third is what IEEE went with. It's the sanest option. ChrisA From donald at stufft.io Tue Jul 8 09:33:32 2014 From: donald at stufft.io (Donald Stufft) Date: Tue, 8 Jul 2014 03:33:32 -0400 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: Message-ID: <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io> On Jul 8, 2014, at 12:58 AM, Nick Coghlan wrote: > > On 7 Jul 2014 10:47, "Guido van Rossum" wrote: > > > > It would still be nice to know who "the appropriate persons" are. Too much of our infrastructure seems to be maintained by house elves or the ITA. > > I volunteered to be the board's liaison to the infrastructure team, and getting more visibility around what the infrastructure *is* and how it's monitored and supported is going to be part of that. That will serve a couple of key purposes: > > - making the points of escalation clearer if anything breaks or needs improvement (although "infrastructure at python.org" is a good default choice) > - making the current "todo" list of the infrastructure team more visible (both to calibrate resolution time expectations and to provide potential contributors an idea of what's involved) > > Noah has already set up http://status.python.org/ to track service status, I can see about getting buildbot.python.org added to the list. > > Cheers, > Nick. > > We (the infrastructure team) were actually looking earlier about buildbot.python.org and we?re not entirely sure who "owns" buildbot.python.org. Unfortunately a lot of the *.python.org services are in a similar state where there is no clear owner. Generally we've not wanted to just step in and take over for fear of stepping on someones toes but it appears that perhaps buildbot.p.o has no owner? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From stephen at xemacs.org Tue Jul 8 09:53:50 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Jul 2014 16:53:50 +0900 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > The reason NaN isn't equal to itself is because there are X bit > patterns representing NaN, but an infinite number of possible > non-numbers that could result from a calculation. I understand that. But you're missing at least two alternatives that involve raising on some calculations involving NaN, as well as the fact that forcing inequality of two NaNs produced by equivalent calculations is arguably just as wrong as allowing equality of two NaNs produced by the different calculations. That's where things get fuzzy for me -- in Python I would expect that preserving invariants would be more important than computational efficiency, but evidently it's not. I assume that I would have a better grasp on why Python chose to go this way rather than that if I understood IEEE 754 better. From rosuav at gmail.com Tue Jul 8 09:59:11 2014 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Jul 2014 17:59:11 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jul 8, 2014 at 5:53 PM, Stephen J. Turnbull wrote: > But you're missing at least two alternatives that > involve raising on some calculations involving NaN, as well as the > fact that forcing inequality of two NaNs produced by equivalent > calculations is arguably just as wrong as allowing equality of two > NaNs produced by the different calculations. This is off-topic for this thread, but still... The trouble is that your "arguably just as wrong" is an indistinguishable case. If you don't want two different calculations' NaNs to *ever* compare equal, the only solution is to have all NaNs compare unequal - otherwise, two calculations might happen to produce the same bitpattern, as there are only a finite number of them available. > That's where things get > fuzzy for me -- in Python I would expect that preserving invariants > would be more important than computational efficiency, but evidently > it's not. What invariant is being violated for efficiency? As I see it, it's one possible invariant (things should be equal to themselves) coming up against another possible invariant (one way of generating NaN is unequal to any other way of generating NaN). Raising an exception is, of course, the purpose of signalling NaNs rather than quiet NaNs, which is a separate consideration from how they compare. ChrisA From benhoyt at gmail.com Tue Jul 8 15:52:18 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 8 Jul 2014 09:52:18 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal Message-ID: Hi folks, After some very good python-dev feedback on my first version of PEP 471, I've updated the PEP to clarify a few things and added various "Rejected ideas" subsections. Here's a link to the new version (I've also copied the full text below): http://legacy.python.org/dev/peps/pep-0471/ -- new PEP as HTML http://hg.python.org/peps/rev/0da4736c27e8 -- changes Specifically, I've made these changes (not an exhaustive list): * Clarified wording in several places, for example "Linux and OS X" -> "POSIX-based systems" * Added a new "Notes on exception handling" section * Added a thorough "Rejected ideas" section with the various ideas that have been discussed previously and rejected for various reasons * Added a description of the .full_name attribute, which folks seemed to generally agree is a good idea * Removed the "open issues" section, as the three open issues have either been included (full_name) or rejected (windows_wildcard) One known error in the PEP is that the "Notes" sections should be top-level sections, not be subheadings of "Examples". If someone would like to give me ("benhoyt") commit access to the peps repo, I can fix this and any other issues that come up. I'd love to see this finalized! If you're going to comment with suggestions to change the API, please ensure you've first read the "rejected ideas" sections in the PEP as well as the relevant python-dev discussion (linked to in the PEP). Thanks, Ben PEP: 471 Title: os.scandir() function -- a better and faster directory iterator Version: $Revision$ Last-Modified: $Date$ Author: Ben Hoyt Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-May-2014 Python-Version: 3.5 Post-History: 27-Jun-2014, 8-Jul-2014 Abstract ======== This PEP proposes including a new directory iteration function, ``os.scandir()``, in the standard library. This new function adds useful functionality and increases the speed of ``os.walk()`` by 2-10 times (depending on the platform and file system) by significantly reducing the number of times ``stat()`` needs to be called. Rationale ========= Python's built-in ``os.walk()`` is significantly slower than it needs to be, because -- in addition to calling ``os.listdir()`` on each directory -- it executes the ``stat()`` system call or ``GetFileAttributes()`` on each file to determine whether the entry is a directory or not. But the underlying system calls -- ``FindFirstFile`` / ``FindNextFile`` on Windows and ``readdir`` on POSIX systems -- already tell you whether the files returned are directories or not, so no further system calls are needed. Further, the Windows system calls return all the information for a ``stat_result`` object, such as file size and last modification time. In short, you can reduce the number of system calls required for a tree function like ``os.walk()`` from approximately 2N to N, where N is the total number of files and directories in the tree. (And because directory trees are usually wider than they are deep, it's often much better than this.) In practice, removing all those extra system calls makes ``os.walk()`` about **8-9 times as fast on Windows**, and about **2-3 times as fast on POSIX systems**. So we're not talking about micro- optimizations. See more `benchmarks here`_. .. _`benchmarks here`: https://github.com/benhoyt/scandir#benchmarks Somewhat relatedly, many people (see Python `Issue 11406`_) are also keen on a version of ``os.listdir()`` that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories. So, as well as providing a ``scandir()`` iterator function for calling directly, Python's existing ``os.walk()`` function could be sped up a huge amount. .. _`Issue 11406`: http://bugs.python.org/issue11406 Implementation ============== The implementation of this proposal was written by Ben Hoyt (initial version) and Tim Golden (who helped a lot with the C extension module). It lives on GitHub at `benhoyt/scandir`_. .. _`benhoyt/scandir`: https://github.com/benhoyt/scandir Note that this module has been used and tested (see "Use in the wild" section in this PEP), so it's more than a proof-of-concept. However, it is marked as beta software and is not extensively battle-tested. It will need some cleanup and more thorough testing before going into the standard library, as well as integration into ``posixmodule.c``. Specifics of proposal ===================== Specifically, this PEP proposes adding a single function to the ``os`` module in the standard library, ``scandir``, that takes a single, optional string as its argument:: scandir(path='.') -> generator of DirEntry objects Like ``listdir``, ``scandir`` calls the operating system's directory iteration system calls to get the names of the files in the ``path`` directory, but it's different from ``listdir`` in two ways: * Instead of returning bare filename strings, it returns lightweight ``DirEntry`` objects that hold the filename string and provide simple methods that allow access to the additional data the operating system returned. * It returns a generator instead of a list, so that ``scandir`` acts as a true iterator instead of returning the full list immediately. ``scandir()`` yields a ``DirEntry`` object for each file and directory in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each ``DirEntry`` object has the following attributes and methods: * ``name``: the entry's filename, relative to the ``path`` argument (corresponds to the return values of ``os.listdir``) * ``full_name``: the entry's full path name -- the equivalent of ``os.path.join(path, entry.name)`` * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never requires a system call on Windows, and usually doesn't on POSIX systems * ``is_file()``: like ``os.path.isfile()``, but much cheaper -- it never requires a system call on Windows, and usually doesn't on POSIX systems * ``is_symlink()``: like ``os.path.islink()``, but much cheaper -- it never requires a system call on Windows, and usually doesn't on POSIX systems * ``lstat()``: like ``os.lstat()``, but much cheaper on some systems -- it only requires a system call on POSIX systems The ``is_X`` methods may perform a ``stat()`` call under certain conditions (for example, on certain file systems on POSIX systems), and therefore possibly raise ``OSError``. The ``lstat()`` method will call ``stat()`` on POSIX systems and therefore also possibly raise ``OSError``. See the "Notes on exception handling" section for more details. The ``DirEntry`` attribute and method names were chosen to be the same as those in the new ``pathlib`` module for consistency. Like the other functions in the ``os`` module, ``scandir()`` accepts either a bytes or str object for the ``path`` parameter, and returns the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the same type as ``path``. However, it is *strongly recommended* to use the str type, as this ensures cross-platform support for Unicode filenames. Examples ======== Below is a good usage pattern for ``scandir``. This is in fact almost exactly how the scandir module's faster ``os.walk()`` implementation uses it:: dirs = [] non_dirs = [] for entry in os.scandir(path): if entry.is_dir(): dirs.append(entry) else: non_dirs.append(entry) The above ``os.walk()``-like code will be significantly faster with scandir than ``os.listdir()`` and ``os.path.isdir()`` on both Windows and POSIX systems. Or, for getting the total size of files in a directory tree, showing use of the ``DirEntry.lstat()`` method and ``DirEntry.full_name`` attribute:: def get_tree_size(path): """Return total size of files in path and subdirs.""" total = 0 for entry in os.scandir(path): if entry.is_dir(): total += get_tree_size(entry.full_name) else: total += entry.lstat().st_size return total Note that ``get_tree_size()`` will get a huge speed boost on Windows, because no extra stat call are needed, but on POSIX systems the size information is not returned by the directory iteration functions, so this function won't gain anything there. Notes on caching ---------------- The ``DirEntry`` objects are relatively dumb -- the ``name`` and ``full_name`` attributes are obviously always cached, and the ``is_X`` and ``lstat`` methods cache their values (immediately on Windows via ``FindNextFile``, and on first use on POSIX systems via a ``stat`` call) and never refetch from the system. For this reason, ``DirEntry`` objects are intended to be used and thrown away after iteration, not stored in long-lived data structured and the methods called again and again. If developers want "refresh" behaviour (for example, for watching a file's size change), they can simply use ``pathlib.Path`` objects, or call the regular ``os.lstat()`` or ``os.path.getsize()`` functions which get fresh data from the operating system every call. Notes on exception handling --------------------------- ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods rather than attributes or properties, to make it clear that they may not be cheap operations, and they may do a system call. As a result, these methods may raise ``OSError``. For example, ``DirEntry.lstat()`` will always make a system call on POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a ``stat()`` system call on such systems if ``readdir()`` returns a ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under certain conditions or on certain file systems. For this reason, when a user requires fine-grained error handling, it's good to catch ``OSError`` around these method calls and then handle as appropriate. For example, below is a version of the ``get_tree_size()`` example shown above, but with basic error handling added:: def get_tree_size(path): """Return total size of files in path and subdirs. If is_dir() or lstat() fails, print an error message to stderr and assume zero size (for example, file has been deleted). """ total = 0 for entry in os.scandir(path): try: is_dir = entry.is_dir() except OSError as error: print('Error calling is_dir():', error, file=sys.stderr) continue if is_dir: total += get_tree_size(entry.full_name) else: try: total += entry.lstat().st_size except OSError as error: print('Error calling lstat():', error, file=sys.stderr) return total Support ======= The scandir module on GitHub has been forked and used quite a bit (see "Use in the wild" in this PEP), but there's also been a fair bit of direct support for a scandir-like function from core developers and others on the python-dev and python-ideas mailing lists. A sampling: * **python-dev**: a good number of +1's and very few negatives for scandir and PEP 471 on `this June 2014 python-dev thread `_ * **Nick Coghlan**, a core Python developer: "I've had the local Red Hat release engineering team express their displeasure at having to stat every file in a network mounted directory tree for info that is present in the dirent structure, so a definite +1 to os.scandir from me, so long as it makes that info available." [`source1 `_] * **Tim Golden**, a core Python developer, supports scandir enough to have spent time refactoring and significantly improving scandir's C extension module. [`source2 `_] * **Christian Heimes**, a core Python developer: "+1 for something like yielddir()" [`source3 `_] and "Indeed! I'd like to see the feature in 3.4 so I can remove my own hack from our code base." [`source4 `_] * **Gregory P. Smith**, a core Python developer: "As 3.4beta1 happens tonight, this isn't going to make 3.4 so i'm bumping this to 3.5. I really like the proposed design outlined above." [`source5 `_] * **Guido van Rossum** on the possibility of adding scandir to Python 3.5 (as it was too late for 3.4): "The ship has likewise sailed for adding scandir() (whether to os or pathlib). By all means experiment and get it ready for consideration for 3.5, but I don't want to add it to 3.4." [`source6 `_] Support for this PEP itself (meta-support?) was given by Nick Coghlan on python-dev: "A PEP reviewing all this for 3.5 and proposing a specific os.scandir API would be a good thing." [`source7 `_] Use in the wild =============== To date, the ``scandir`` implementation is definitely useful, but has been clearly marked "beta", so it's uncertain how much use of it there is in the wild. Ben Hoyt has had several reports from people using it. For example: * Chris F: "I am processing some pretty large directories and was half expecting to have to modify getdents. So thanks for saving me the effort." [via personal email] * bschollnick: "I wanted to let you know about this, since I am using Scandir as a building block for this code. Here's a good example of scandir making a radical performance improvement over os.listdir." [`source8 `_] * Avram L: "I'm testing our scandir for a project I'm working on. Seems pretty solid, so first thing, just want to say nice work!" [via personal email] Others have `requested a PyPI package`_ for it, which has been created. See `PyPI package`_. .. _`requested a PyPI package`: https://github.com/benhoyt/scandir/issues/12 .. _`PyPI package`: https://pypi.python.org/pypi/scandir GitHub stats don't mean too much, but scandir does have several watchers, issues, forks, etc. Here's the run-down as of the stats as of July 7, 2014: * Watchers: 17 * Stars: 57 * Forks: 20 * Issues: 4 open, 26 closed **However, the much larger point is this:**, if this PEP is accepted, ``os.walk()`` can easily be reimplemented using ``scandir`` rather than ``listdir`` and ``stat``, increasing the speed of ``os.walk()`` very significantly. There are thousands of developers, scripts, and production code that would benefit from this large speedup of ``os.walk()``. For example, on GitHub, there are almost as many uses of ``os.walk`` (194,000) as there are of ``os.mkdir`` (230,000). Rejected ideas ============== Naming ------ The only other real contender for this function's name was ``iterdir()``. However, ``iterX()`` functions in Python (mostly found in Python 2) tend to be simple iterator equivalents of their non-iterator counterparts. For example, ``dict.iterkeys()`` is just an iterator version of ``dict.keys()``, but the objects returned are identical. In ``scandir()``'s case, however, the return values are quite different objects (``DirEntry`` objects vs filename strings), so this should probably be reflected by a difference in name -- hence ``scandir()``. See some `relevant discussion on python-dev `_. Wildcard support ---------------- ``FindFirstFile``/``FindNextFile`` on Windows support passing a "wildcard" like ``*.jpg``, so at first folks (this PEP's author included) felt it would be a good idea to include a ``windows_wildcard`` keyword argument to the ``scandir`` function so users could pass this in. However, on further thought and discussion it was decided that this would be bad idea, *unless it could be made cross-platform* (a ``pattern`` keyword argument or similar). This seems easy enough at first -- just use the OS wildcard support on Windows, and something like ``fnmatch`` or ``re`` afterwards on POSIX-based systems. Unfortunately the exact Windows wildcard matching rules aren't really documented anywhere by Microsoft, and they're quite quirky (see this `blog post `_), meaning it's very problematic to emulate using ``fnmatch`` or regexes. So the consensus was that Windows wildcard support was a bad idea. It would be possible to add at a later date if there's a cross-platform way to achieve it, but not for the initial version. Read more on the `this Nov 2012 python-ideas thread `_ and this `June 2014 python-dev thread on PEP 471 `_. DirEntry attributes being properties ------------------------------------ In some ways it would be nicer for the ``DirEntry`` ``is_X()`` and ``lstat()`` to be properties instead of methods, to indicate they're very cheap or free. However, this isn't quite the case, as ``lstat()`` will require an OS call on POSIX-based systems but not on Windows. Even ``is_dir()`` and friends may perform an OS call on POSIX-based systems if the ``dirent.d_type`` value is ``DT_UNKNOWN`` (on certain file systems). Also, people would expect the attribute access ``entry.is_dir`` to only ever raise ``AttributeError``, not ``OSError`` in the case it makes a system call under the covers. Calling code would have to have a ``try``/``except`` around what looks like a simple attribute access, and so it's much better to make them *methods*. See `this May 2013 python-dev thread `_ where this PEP author makes this case and there's agreement from a core developers. DirEntry fields being "static" attribute-only objects ----------------------------------------------------- In `this July 2014 python-dev message `_, Paul Moore suggested a solution that was a "thin wrapper round the OS feature", where the ``DirEntry`` object had only static attributes: ``name``, ``full_name``, and ``is_X``, with the ``st_X`` attributes only present on Windows. The idea was to use this simpler, lower-level function as a building block for higher-level functions. At first there was general agreement that simplifying in this way was a good thing. However, there were two problems with this approach. First, the assumption is the ``is_dir`` and similar attributes are always present on POSIX, which isn't the case (if ``d_type`` is not present or is ``DT_UNKNOWN``). Second, it's a much harder-to-use API in practice, as even the ``is_dir`` attributes aren't always present on POSIX, and would need to be tested with ``hasattr()`` and then ``os.stat()`` called if they weren't present. See `this July 2014 python-dev response `_ from this PEP's author detailing why this option is a non-ideal solution, and the subsequent reply from Paul Moore voicing agreement. DirEntry fields being static with an ensure_lstat option -------------------------------------------------------- Another seemingly simpler and attractive option was suggested by Nick Coghlan in this `June 2014 python-dev message `_: make ``DirEntry.is_X`` and ``DirEntry.lstat_result`` properties, and populate ``DirEntry.lstat_result`` at iteration time, but only if the new argument ``ensure_lstat=True`` was specified on the ``scandir()`` call. This does have the advantage over the above in that you can easily get the stat result from ``scandir()`` if you need it. However, it has the serious disadvantage that fine-grained error handling is messy, because ``stat()`` will be called (and hence potentially raise ``OSError``) during iteration, leading to a rather ugly, hand-made iteration loop:: it = os.scandir(path) while True: try: entry = next(it) except OSError as error: handle_error(path, error) except StopIteration: break Or it means that ``scandir()`` would have to accept an ``onerror`` argument -- a function to call when ``stat()`` errors occur during iteration. This seems to this PEP's author neither as direct nor as Pythonic as ``try``/``except`` around a ``DirEntry.lstat()`` call. See `Ben Hoyt's July 2014 reply `_ to the discussion summarizing this and detailing why he thinks the original PEP 471 proposal is "the right one" after all. Return values being (name, stat_result) two-tuples -------------------------------------------------- Initially this PEP's author proposed this concept as a function called ``iterdir_stat()`` which yielded two-tuples of (name, stat_result). This does have the advantage that there are no new types introduced. However, the ``stat_result`` is only partially filled on POSIX-based systems (most fields set to ``None`` and other quirks), so they're not really ``stat_result`` objects at all, and this would have to be thoroughly documented as different from ``os.stat()``. Also, Python has good support for proper objects with attributes and methods, which makes for a saner and simpler API than two-tuples. It also makes the ``DirEntry`` objects more extensible and future-proof as operating systems add functionality and we want to include this in ``DirEntry``. See also some previous discussion: * `May 2013 python-dev thread `_ where Nick Coghlan makes the original case for a ``DirEntry``-style object. * `June 2014 python-dev thread `_ where Nick Coghlan makes (another) good case against the two-tuple approach. Return values being overloaded stat_result objects -------------------------------------------------- Another alternative discussed was making the return values to be overloaded ``stat_result`` objects with ``name`` and ``full_name`` attributes. However, apart from this being a strange (and strained!) kind of overloading, this has the same problems mentioned above -- most of the ``stat_result`` information is not fetched by ``readdir()`` on POSIX systems, only (part of) the ``st_mode`` value. Return values being pathlib.Path objects ---------------------------------------- With Antoine Pitrou's new standard library ``pathlib`` module, it at first seems like a great idea for ``scandir()`` to return instances of ``pathlib.Path``. However, ``pathlib.Path``'s ``is_X()`` and ``lstat()`` functions are explicitly not cached, whereas ``scandir`` has to cache them by design, because it's (often) returning values from the original directory iteration system call. And if the ``pathlib.Path`` instances returned by ``scandir`` cached lstat values, but the ordinary ``pathlib.Path`` objects explicitly don't, that would be more than a little confusing. Guido van Rossum explicitly rejected ``pathlib.Path`` caching lstat in the context of scandir `here `_, making ``pathlib.Path`` objects a bad choice for scandir return values. Possible improvements ===================== There are many possible improvements one could make to scandir, but here is a short list of some this PEP's author has in mind: * scandir could potentially be further sped up by calling ``readdir`` / ``FindNextFile`` say 50 times per ``Py_BEGIN_ALLOW_THREADS`` block so that it stays in the C extension module for longer, and may be somewhat faster as a result. This approach hasn't been tested, but was suggested by on Issue 11406 by Antoine Pitrou. [`source9 `_] * scandir could use a free list to avoid the cost of memory allocation for each iteration -- a short free list of 10 or maybe even 1 may help. Suggested by Victor Stinner on a `python-dev thread on June 27`_. .. _`python-dev thread on June 27`: https://mail.python.org/pipermail/python-dev/2014-June/135232.html Previous discussion =================== * `Original thread Ben Hoyt started on python-ideas`_ about speeding up ``os.walk()`` * Python `Issue 11406`_, which includes the original proposal for a scandir-like function * `Further thread Ben Hoyt started on python-dev`_ that refined the ``scandir()`` API, including Nick Coghlan's suggestion of scandir yielding ``DirEntry``-like objects * `Another thread Ben Hoyt started on python-dev`_ to discuss the interaction between scandir and the new ``pathlib`` module * `Final thread Ben Hoyt started on python-dev`_ to discuss the first version of this PEP, with extensive discussion about the API. * `Question on StackOverflow`_ about why ``os.walk()`` is slow and pointers on how to fix it (this inspired the author of this PEP early on) * `BetterWalk`_, this PEP's author's previous attempt at this, on which the scandir code is based .. _`Original thread Ben Hoyt started on python-ideas`: https://mail.python.org/pipermail/python-ideas/2012-November/017770.html .. _`Further thread Ben Hoyt started on python-dev`: https://mail.python.org/pipermail/python-dev/2013-May/126119.html .. _`Another thread Ben Hoyt started on python-dev`: https://mail.python.org/pipermail/python-dev/2013-November/130572.html .. _`Final thread Ben Hoyt started on python-dev`: https://mail.python.org/pipermail/python-dev/2014-June/135215.html .. _`Question on StackOverflow`: http://stackoverflow.com/questions/2485719/very-quickly-getting-total-size-of-folder .. _`BetterWalk`: https://github.com/benhoyt/betterwalk Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From guido at python.org Tue Jul 8 16:48:39 2014 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jul 2014 07:48:39 -0700 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io> References: <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io> Message-ID: May the true owner of buildbot.python.org stand up! (But I do think there may well not be anyone who feels they own it. And that's a problem for its long term viability.) Generally speaking, as an organization we should set up a process for managing ownership of *all* infrastructure in a uniform way. I don't mean to say that we need to manage all infrastructure uniformly, just that we need to have a process for identifying and contacting the owner(s) for each piece of infrastructure, as well as collecting other information that people besides the owners might need to know. You can use a wiki page for that list for all I care, but have a process for what belongs there, how/when to update it, and even an owner for the wiki page! Stuff like this shouldn't be just in a few people's heads (even if they are board members) nor should it be in a file in a repo that nobody has ever heard of. On Tue, Jul 8, 2014 at 12:33 AM, Donald Stufft wrote: > > On Jul 8, 2014, at 12:58 AM, Nick Coghlan wrote: > > > On 7 Jul 2014 10:47, "Guido van Rossum" wrote: > > > > It would still be nice to know who "the appropriate persons" are. Too > much of our infrastructure seems to be maintained by house elves or the ITA. > > I volunteered to be the board's liaison to the infrastructure team, and > getting more visibility around what the infrastructure *is* and how it's > monitored and supported is going to be part of that. That will serve a > couple of key purposes: > > - making the points of escalation clearer if anything breaks or needs > improvement (although "infrastructure at python.org" is a good default > choice) > - making the current "todo" list of the infrastructure team more visible > (both to calibrate resolution time expectations and to provide potential > contributors an idea of what's involved) > > Noah has already set up http://status.python.org/ to track service > status, I can see about getting buildbot.python.org added to the list. > > Cheers, > Nick. > > > We (the infrastructure team) were actually looking earlier about > buildbot.python.org and we?re not entirely sure who "owns" > buildbot.python.org. > Unfortunately a lot of the *.python.org services are in a similar state > where > there is no clear owner. Generally we've not wanted to just step in and > take > over for fear of stepping on someones toes but it appears that perhaps > buildbot.p.o has no owner? > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 > DCFA > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jul 8 17:13:08 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 8 Jul 2014 17:13:08 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: Hi, 2014-07-08 15:52 GMT+02:00 Ben Hoyt : > After some very good python-dev feedback on my first version of PEP > 471, I've updated the PEP to clarify a few things and added various > "Rejected ideas" subsections. Here's a link to the new version (I've > also copied the full text below): Thanks, the new PEP looks better. > * Removed the "open issues" section, as the three open issues have > either been included (full_name) or rejected (windows_wildcard) I remember a pending question on python-dev: - Martin von Loewis asked if the scandir generator would have send() and close() methods as any Python generator. I didn't see a reply on the mailing (nor in the PEP). > One known error in the PEP is that the "Notes" sections should be > top-level sections, not be subheadings of "Examples". If someone would > like to give me ("benhoyt") commit access to the peps repo, I can fix > this and any other issues that come up. Or just send me your new PEP ;-) > Notes on caching > ---------------- > > The ``DirEntry`` objects are relatively dumb -- the ``name`` and > ``full_name`` attributes are obviously always cached, and the ``is_X`` > and ``lstat`` methods cache their values (immediately on Windows via > ``FindNextFile``, and on first use on POSIX systems via a ``stat`` > call) and never refetch from the system. It is not clear to me which methods share the cache. On UNIX, is_dir() and is_file() call os.stat(); whereas lstat() and is_symlink() call os.lstat(). If os.stat() says that the file is not a symlink, I guess that you can use os.stat() result for lstat() and is_symlink() methods? In the worst case, if the path is a symlink, would it be possible that os.stat() and os.lstat() become "inconsistent" if the symlink is modified between the two calls? If yes, I don't think that it's an issue, it's just good to know it. For symlinks, readdir() returns the status of the linked file or of the symlink? Victor From 2014 at jmunch.dk Tue Jul 8 16:58:33 2014 From: 2014 at jmunch.dk (Anders J. Munch) Date: Tue, 08 Jul 2014 16:58:33 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <53BC0719.1070705@jmunch.dk> Chris Angelico wrote: > > This is off-topic for this thread, but still... > > The trouble is that your "arguably just as wrong" is an > indistinguishable case. If you don't want two different calculations' > NaNs to *ever* compare equal, the only solution is to have all NaNs > compare unequal For two NaNs computed differently to compare equal is no worse than 2+2 comparing equal to 1+3. You're comparing values, not their history. You've prompted me to get a rant on the subject off my chest, I just posted an article on NaN comparisons to python-list. regards, Anders From janzert at janzert.com Tue Jul 8 17:44:54 2014 From: janzert at janzert.com (Janzert) Date: Tue, 08 Jul 2014 11:44:54 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: On 7/8/2014 9:52 AM, Ben Hoyt wrote: > DirEntry fields being "static" attribute-only objects > ----------------------------------------------------- > > In `this July 2014 python-dev message > `_, > Paul Moore suggested a solution that was a "thin wrapper round the OS > feature", where the ``DirEntry`` object had only static attributes: > ``name``, ``full_name``, and ``is_X``, with the ``st_X`` attributes > only present on Windows. The idea was to use this simpler, lower-level > function as a building block for higher-level functions. > > At first there was general agreement that simplifying in this way was > a good thing. However, there were two problems with this approach. > First, the assumption is the ``is_dir`` and similar attributes are > always present on POSIX, which isn't the case (if ``d_type`` is not > present or is ``DT_UNKNOWN``). Second, it's a much harder-to-use API > in practice, as even the ``is_dir`` attributes aren't always present > on POSIX, and would need to be tested with ``hasattr()`` and then > ``os.stat()`` called if they weren't present. > Only exposing what the OS provides for free will make the API too difficult to use in the common case. But is there a nice way to expand the API that will allow the user who is trying to avoid extra expense know what information is already available? Even if the initial version doesn't have a way to check what information is there for free, ensuring there is a clean way to add this in the future would be really nice. Janzert From steve at pearwood.info Tue Jul 8 18:57:45 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 9 Jul 2014 02:57:45 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20140708165745.GJ13014@ando> On Tue, Jul 08, 2014 at 04:53:50PM +0900, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > The reason NaN isn't equal to itself is because there are X bit > > patterns representing NaN, but an infinite number of possible > > non-numbers that could result from a calculation. > > I understand that. But you're missing at least two alternatives that > involve raising on some calculations involving NaN, as well as the > fact that forcing inequality of two NaNs produced by equivalent > calculations is arguably just as wrong as allowing equality of two > NaNs produced by the different calculations. I don't think so. Floating point == represents *numeric* equality, not (for example) equality in the sense of "All Men Are Created Equal". Not even numeric equality in the most general sense, but specifically in the sense of (approximately) real-valued numbers, so it's an extremely precise definition of "equal", not fuzzy in any way. In an early post, you suggested that NANs don't have a value, or that they have a value which is not a value. I don't think that's a good way to look at it. I think the obvious way to think of it is that NAN's value is Not A Number, exactly like it says on the box. Now, if something is not a number, obviously you cannot compare it numerically: "Considered as numbers, is the sound of rain on a tin roof numerically equal to the sight of a baby smiling?" Some might argue that the only valid answer to this question is "Mu", https://en.wikipedia.org/wiki/Mu_%28negative%29#.22Unasking.22_the_question but if we're forced to give a Yes/No True/False answer, then clearly False is the only sensible answer. No, Virginia, Santa Claus is not the same number as Santa Claus. To put it another way, if x is not a number, then x != y for all possible values of y -- including x. [Disclaimer: despite the name, IEEE-754 arguably does not intend NANs to be Not A Number in the sense that Santa Claus is not a number, but more like "it's some number, but it's impossible to tell which". However, despite that, the standard specifies behaviour which is best thought of in terms of as the Santa Claus model.] > That's where things get > fuzzy for me -- in Python I would expect that preserving invariants > would be more important than computational efficiency, but evidently > it's not. I'm not sure what you're referring to here. Is it that containers such as lists and dicts are permitted to optimize equality tests with identity tests for speed? py> NAN = float('NAN') py> a = [1, 2, NAN, 4] py> NAN in a # identity is checked before equality True py> any(x == NAN for x in a) False When this came up for discussion last time, the clear consensus was that this is reasonable behaviour. NANs and other such "weird" objects are too rare and too specialised for built-in classes to carry the burden of having to allow for them. If you want a "NAN-aware list", you can make one yourself. > I assume that I would have a better grasp on why Python > chose to go this way rather than that if I understood IEEE 754 better. See the answer by Stephen Canon here: http://stackoverflow.com/questions/1565164/ [quote] It is not possible to specify a fixed-size arithmetic type that satisfies all of the properties of real arithmetic that we know and love. The 754 committee has to decide to bend or break some of them. This is guided by some pretty simple principles: When we can, we match the behavior of real arithmetic. When we can't, we try to make the violations as predictable and as easy to diagnose as possible. [end quote] In particular, reflexivity for NANs was dropped for a number of reasons, some stronger than others: - One of the weaker reasons for NAN non-reflexivity is that it preserved the identity x == y <=> x - y == 0. Although that is the cornerstone of real arithmetic, it's violated by IEEE-754 INFs, so violating it for NANs is not a big deal either. - Dropping reflexivity preserves the useful property that NANs compare unequal to everything. - Practicality beats purity: dropping reflexivity allowed programmers to identify NANs without waiting years or decades for programming languages to implement isnan() functions. E.g. before Python had math.isnan(), I made my own: def isnan(x): return isinstance(x, float) and x != x - Keeping reflexivity for NANs would have implied some pretty nasty things, e.g. if log(-3) == log(-5), then -3 == -5. Basically, and I realise that many people disagree with their decision (notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the IEEE-754 committee led by William Kahan decided that the problems caused by having NANs compare unequal to themselves were much less than the problems that would have been caused without it. -- Steven From steve at pearwood.info Tue Jul 8 19:00:46 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 9 Jul 2014 03:00:46 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BC0719.1070705@jmunch.dk> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <53BC0719.1070705@jmunch.dk> Message-ID: <20140708170046.GK13014@ando> On Tue, Jul 08, 2014 at 04:58:33PM +0200, Anders J. Munch wrote: > For two NaNs computed differently to compare equal is no worse than 2+2 > comparing equal to 1+3. You're comparing values, not their history. a = -23 b = -42 if log(a) == log(b): print "a == b" -- Steven From rosuav at gmail.com Tue Jul 8 19:13:00 2014 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 9 Jul 2014 03:13:00 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <20140708170046.GK13014@ando> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <53BC0719.1070705@jmunch.dk> <20140708170046.GK13014@ando> Message-ID: On Wed, Jul 9, 2014 at 3:00 AM, Steven D'Aprano wrote: > On Tue, Jul 08, 2014 at 04:58:33PM +0200, Anders J. Munch wrote: > >> For two NaNs computed differently to compare equal is no worse than 2+2 >> comparing equal to 1+3. You're comparing values, not their history. > > a = -23 > b = -42 > if log(a) == log(b): > print "a == b" That could also happen from rounding error, though. >>> a = 2.0**52 >>> b = a+1.0 >>> a == b False >>> log(a) == log(b) True Any time you do any operation on numbers that are close together but not equal, you run the risk of getting results that, in finite-precision floating point, are deemed equal, even though mathematically they shouldn't be (two unequal numbers MUST have unequal logarithms). ChrisA From python at mrabarnett.plus.com Tue Jul 8 19:33:31 2014 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 08 Jul 2014 18:33:31 +0100 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <20140708165745.GJ13014@ando> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <20140708165745.GJ13014@ando> Message-ID: <53BC2B6B.3080209@mrabarnett.plus.com> On 2014-07-08 17:57, Steven D'Aprano wrote: [snip] > > In particular, reflexivity for NANs was dropped for a number of reasons, > some stronger than others: > > - One of the weaker reasons for NAN non-reflexivity is that it preserved > the identity x == y <=> x - y == 0. Although that is the cornerstone > of real arithmetic, it's violated by IEEE-754 INFs, so violating it > for NANs is not a big deal either. > > - Dropping reflexivity preserves the useful property that NANs compare > unequal to everything. > > - Practicality beats purity: dropping reflexivity allowed programmers > to identify NANs without waiting years or decades for programming > languages to implement isnan() functions. E.g. before Python had > math.isnan(), I made my own: > > def isnan(x): > return isinstance(x, float) and x != x > > - Keeping reflexivity for NANs would have implied some pretty nasty > things, e.g. if log(-3) == log(-5), then -3 == -5. > The log of a negative number is a complex number. > > Basically, and I realise that many people disagree with their decision > (notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the > IEEE-754 committee led by William Kahan decided that the problems caused > by having NANs compare unequal to themselves were much less than the > problems that would have been caused without it. > From benhoyt at gmail.com Tue Jul 8 20:03:00 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 8 Jul 2014 14:03:00 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: > I remember a pending question on python-dev: > > - Martin von Loewis asked if the scandir generator would have send() > and close() methods as any Python generator. I didn't see a reply on > the mailing (nor in the PEP). Good call. Looks like you're referring to this message: https://mail.python.org/pipermail/python-dev/2014-July/135324.html I'm not actually familiar with the purpose of .close() and .send()/.throw() on generators. Do you typically call these functions manually, or are they called automatically by the generator protocol? > It is not clear to me which methods share the cache. > > On UNIX, is_dir() and is_file() call os.stat(); whereas lstat() and > is_symlink() call os.lstat(). > > If os.stat() says that the file is not a symlink, I guess that you can > use os.stat() result for lstat() and is_symlink() methods? > > In the worst case, if the path is a symlink, would it be possible that > os.stat() and os.lstat() become "inconsistent" if the symlink is > modified between the two calls? If yes, I don't think that it's an > issue, it's just good to know it. > > For symlinks, readdir() returns the status of the linked file or of the symlink? I think you're misunderstanding is_dir() and is_file(), as these don't actually call os.stat(). All DirEntry methods either call nothing or os.lstat() to get the stat info on the entry itself (not the destination of the symlink). In light of this, I don't think what you're describing above is an issue. -Ben From benhoyt at gmail.com Tue Jul 8 20:05:53 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 8 Jul 2014 14:05:53 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: > Only exposing what the OS provides for free will make the API too difficult > to use in the common case. But is there a nice way to expand the API that > will allow the user who is trying to avoid extra expense know what > information is already available? > > Even if the initial version doesn't have a way to check what information is > there for free, ensuring there is a clean way to add this in the future > would be really nice. We could easily add ".had_type" and ".had_lstat" properties (not sure on the names), that would be true if the is_X information and lstat information was fetched, respectively. Basically both would always be True on Windows, but on POSIX only had_type would be True d_type is present and != DT_UNKNOWN. I don't feel this is actually necessary, but it's not hard to add. Thoughts? -Ben From ethan at stoneleaf.us Tue Jul 8 21:02:56 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 08 Jul 2014 12:02:56 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: <53BC4060.5090805@stoneleaf.us> On 07/08/2014 11:05 AM, Ben Hoyt wrote: >> Only exposing what the OS provides for free will make the API too difficult >> to use in the common case. But is there a nice way to expand the API that >> will allow the user who is trying to avoid extra expense know what >> information is already available? >> >> Even if the initial version doesn't have a way to check what information is >> there for free, ensuring there is a clean way to add this in the future >> would be really nice. > > We could easily add ".had_type" and ".had_lstat" properties (not sure > on the names), that would be true if the is_X information and lstat > information was fetched, respectively. Basically both would always be > True on Windows, but on POSIX only had_type would be True d_type is > present and != DT_UNKNOWN. > > I don't feel this is actually necessary, but it's not hard to add. > > Thoughts? Better to just have the attributes be None if they were not fetched. None is better than hasattr anyway, at least in the respect of not having to catch exceptions to function properly. -- ~Ethan~ From benhoyt at gmail.com Tue Jul 8 21:34:26 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 8 Jul 2014 15:34:26 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BC4060.5090805@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> Message-ID: > Better to just have the attributes be None if they were not fetched. None > is better than hasattr anyway, at least in the respect of not having to > catch exceptions to function properly. The thing is, is_dir() and lstat() are not attributes (for a good reason). Please read the relevant "Rejected ideas" sections and let us know what you think. :-) -Ben From victor.stinner at gmail.com Tue Jul 8 21:55:59 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 8 Jul 2014 21:55:59 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: Le mardi 8 juillet 2014, Ben Hoyt a ?crit : > > > It is not clear to me which methods share the cache. > > > > On UNIX, is_dir() and is_file() call os.stat(); whereas lstat() and > > is_symlink() call os.lstat(). > > > > If os.stat() says that the file is not a symlink, I guess that you can > > use os.stat() result for lstat() and is_symlink() methods? > > > > In the worst case, if the path is a symlink, would it be possible that > > os.stat() and os.lstat() become "inconsistent" if the symlink is > > modified between the two calls? If yes, I don't think that it's an > > issue, it's just good to know it. > > > > For symlinks, readdir() returns the status of the linked file or of the > symlink? > > I think you're misunderstanding is_dir() and is_file(), as these don't > actually call os.stat(). All DirEntry methods either call nothing or > os.lstat() to get the stat info on the entry itself (not the > destination of the symlink). Oh. Extract of your PEP: "is_dir(): like os.path.isdir(), but much cheaper". genericpath.isdir() and genericpath.isfile() use os.stat(), whereas posixpath.islink() uses os.lstat(). Is it a mistake in the PEP? > In light of this, I don't think what you're describing above is an issue. I'm not saying that there is an issue, I'm just trying to understand. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Tue Jul 8 22:09:36 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 8 Jul 2014 16:09:36 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: >> I think you're misunderstanding is_dir() and is_file(), as these don't >> actually call os.stat(). All DirEntry methods either call nothing or >> os.lstat() to get the stat info on the entry itself (not the >> destination of the symlink). > > > Oh. Extract of your PEP: "is_dir(): like os.path.isdir(), but much cheaper". > > genericpath.isdir() and genericpath.isfile() use os.stat(), whereas > posixpath.islink() uses os.lstat(). > > Is it a mistake in the PEP? Ah, you're dead right -- this is basically a bug in the PEP, as DirEntry.is_dir() is not like os.path.isdir() in that it is based on the entry itself (like lstat), not following the link. I'll improve the wording here and update the PEP. -Ben From ethan at stoneleaf.us Tue Jul 8 22:22:33 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 08 Jul 2014 13:22:33 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> Message-ID: <53BC5309.6000605@stoneleaf.us> On 07/08/2014 12:34 PM, Ben Hoyt wrote: >> >> Better to just have the attributes be None if they were not fetched. None >> is better than hasattr anyway, at least in the respect of not having to >> catch exceptions to function properly. > > The thing is, is_dir() and lstat() are not attributes (for a good > reason). Please read the relevant "Rejected ideas" sections and let us > know what you think. :-) I did better than that -- I read the whole thing! ;) -1 on the PEP's implementation. Just like an attribute does not imply a system call, having a method named 'is_dir' /does/ imply a system call, and not having one can be just as misleading. If we have this: size = 0 for entry in scandir('/some/path'): size += entry.st_size - on Windows, this should Just Work (if I have the names correct ;) - on Posix, etc., this should fail noisily with either an AttributeError ('entry' has no 'st_size') or a TypeError (cannot add None) and the solution is equally simple: for entry in scandir('/some/path', stat=True): - if not Windows, perform a stat call at the same time Now, of course, we might get errors. I am not a big fan of wrapping everything in try/except, particularly when we already have a model to follow -- os.walk: for entry in scandir('/some/path', stat=True, onerror=record_and_skip): If we don't care if an error crashes the script, leave off onerror. If we don't need st_size and friends, leave off stat=True. If we get better performance on Windows instead of Linux, that's okay. scandir is going into os because it may not behave the same on every platform. Heck, even some non-os modules (multiprocessing comes to mind) do not behave the same on every platform. I think caching the attributes for DirEntry is fine, but let's do it as a snapshot of that moment in time, not name now, and attributes in 30 minutes when we finally get to you because we had a lot of processing/files ahead of you (you being a DirEntry ;) . -- ~Ethan~ From ethan at stoneleaf.us Tue Jul 8 23:05:22 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 08 Jul 2014 14:05:22 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BC5309.6000605@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> Message-ID: <53BC5D12.30105@stoneleaf.us> On 07/08/2014 01:22 PM, Ethan Furman wrote: > > I think caching the attributes for DirEntry is fine, but let's do it as a snapshot of that moment in time, not name now, > and attributes in 30 minutes when we finally get to you because we had a lot of processing/files ahead of you (you being > a DirEntry ;) . This bit is wrong, I think, since scandir is a generator -- there wouldn't be much time passing between the direntry call and the stat call in any case. Hopefully my other points still hold. -- ~Ethan~ From benhoyt at gmail.com Wed Jul 9 03:08:03 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 8 Jul 2014 21:08:03 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BC5309.6000605@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> Message-ID: > I did better than that -- I read the whole thing! ;) Thanks. :-) > -1 on the PEP's implementation. > > Just like an attribute does not imply a system call, having a > method named 'is_dir' /does/ imply a system call, and not > having one can be just as misleading. Why does a method imply a system call? os.path.join() and str.lower() don't make system calls. Isn't it just a matter of clear documentation? Anyway -- less philosophical discussion below. > If we have this: > > size = 0 > for entry in scandir('/some/path'): > size += entry.st_size > > - on Windows, this should Just Work (if I have the names correct ;) > - on Posix, etc., this should fail noisily with either an AttributeError > ('entry' has no 'st_size') or a TypeError (cannot add None) > > and the solution is equally simple: > > for entry in scandir('/some/path', stat=True): > > - if not Windows, perform a stat call at the same time I'm not totally opposed to this, which is basically a combination of Nick Coghlan's and Paul Moore's recent proposals mentioned in the PEP. However, as discussed on python-dev, there are some edge cases it doesn't handle very well, and it's messier to handle errors (requires onerror as you mention below). I presume you're suggesting that is_dir/is_file/is_symlink should be regular attributes, and accessing them should never do a system call. But what if the system doesn't support d_type (eg: Solaris) or the d_type value is DT_UNKNOWN (can happen on Linux, OS X, BSD)? The options are: 1) scandir() would always call lstat() in the case of missing/unknown d_type. If so, scandir() is actually more expensive than listdir(), and as a result it's no longer safe to implement listdir in terms of scandir: def listdir(path='.'): return [e.name for e in scandir(path)] 2) Or would it be better to have another flag like scandir(path, type=True) to ensure the is_X type info is fetched? This is explicit, but also getting kind of unwieldly. 3) A third option is for the is_X attributes to be absent in this case (hasattr tests required, and the user would do the lstat manually). But as I noted on python-dev recently, you basically always want is_X, so this leads to unwieldly and code that's twice as long as it needs to be. See here: https://mail.python.org/pipermail/python-dev/2014-July/135312.html 4) I gather in your proposal above, scandir will call lstat() if stat=True? Except where does it put the values? Surely it should return an existing stat_result object, rather than stuffing everything onto the DirEntry, or throwing away some values on Linux? In this case, I'd prefer Nick Coghlan's approach of ensure_lstat and a .stat_result attribute. However, this still has the "what if d_type is missing or DT_UNKNOWN" issue. It seems to me that making is_X() methods handles this exact scenario -- methods are so you don't have to do the dirty work. So yes, the real world is messy due to missing is_X values, but I think it's worth getting this right, and is_X() methods can do this while keeping the API simple and cross-platform. > Now, of course, we might get errors. I am not a big fan of wrapping everything in try/except, particularly when we already have a model to follow -- os.walk: I don't mind the onerror too much if we went with this kind of approach. It's not quite as nice as a standard try/except around the method call, but it's definitely workable and has a precedent with os.walk(). It seems a bit like we're going around in circles here, and I think we have all the information and options available to us, so I'm going to SUMMARIZE. We have a choice before us, a fork in the road. :-) We can choose one of these options for the scandir API: 1) The current PEP 471 approach. This solves the issue with d_type being missing or DT_UNKNOWN, it doesn't require onerror, and it's a really tidy API that doesn't explode with AttributeErrors if you write code on Windows (without thinking too hard) and then move to Linux. I think all of these points are important -- the cross-platform one not the least, because we want to make it easy, even *trivial*, for people to write cross-platform code. For reference, here's what get_tree_size() looks like with this approach, not including error handling with try/except: def get_tree_size(path): total = 0 for entry in os.scandir(path): if entry.is_dir(): total += get_tree_size(entry.full_name) else: total += entry.lstat().st_size return total 2) Nick Coghlan's model of only fetching the lstat value if ensure_lstat=True, and including an onerror callback for error handling when scandir calls lstat internally. However, as described, we'd also need an ensure_type=True option, so that scandir() isn't way slower than listdir() if you actually don't want the is_X values and d_type is missing/unknown. For reference, here's what get_tree_size() looks like with this approach, not including error handling with onerror: def get_tree_size(path): total = 0 for entry in os.scandir(path, ensure_type=True, ensure_lstat=True): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat_result.st_size return total I'm fairly strongly in favour of approach #1, but I wouldn't die if everyone else thinks the benefits of #2 outweigh the somewhat less nice API. Comments and votes, please! -Ben From steve at pearwood.info Wed Jul 9 03:22:42 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 9 Jul 2014 11:22:42 +1000 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BC2B6B.3080209@mrabarnett.plus.com> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <20140708165745.GJ13014@ando> <53BC2B6B.3080209@mrabarnett.plus.com> Message-ID: <20140709012242.GL13014@ando> On Tue, Jul 08, 2014 at 06:33:31PM +0100, MRAB wrote: > The log of a negative number is a complex number. Only in complex arithmetic. In real arithmetic, the log of a negative number isn't a number at all. -- Steven From ethan at stoneleaf.us Wed Jul 9 03:31:55 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 08 Jul 2014 18:31:55 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> Message-ID: <53BC9B8B.40509@stoneleaf.us> On 07/08/2014 06:08 PM, Ben Hoyt wrote: >> >> Just like an attribute does not imply a system call, having a >> method named 'is_dir' /does/ imply a system call, and not >> having one can be just as misleading. > > Why does a method imply a system call? os.path.join() and str.lower() > don't make system calls. Isn't it just a matter of clear > documentation? Anyway -- less philosophical discussion below. In this case because the names are exactly the same as the os versions which /do/ make a system call. > I presume you're suggesting that is_dir/is_file/is_symlink should be > regular attributes, and accessing them should never do a system call. > But what if the system doesn't support d_type (eg: Solaris) or the > d_type value is DT_UNKNOWN (can happen on Linux, OS X, BSD)? The > options are: So if I'm finally understanding the root problem here: - listdir returns a list of strings, one for each filename and one for each directory, and keeps no other O/S supplied info. - os.walk, which uses listdir, then needs to go back to the O/S and refetch the thrown-away information - so it's slow. The solution: - have scandir /not/ throw away the O/S supplied info and the new problem: - not all O/Ses provide the same (or any) extra info about the directory entries Have I got that right? If so, I still like the attribute idea better (surprise!), we just need to revisit the 'ensure_lstat' (or whatever it's called) parameter: instead of a true/false value, it could have a scale: - 0 = whatever the O/S gives us - 1 = at least the is_dir/is_file (whatever the other normal one is), and if the O/S doesn't give it to us for free than call lstat - 2 = we want it all -- call lstat if necessary on this platform After all, the programmer should know up front how much of the extra info will be needed for the work that is trying to be done. > We have a choice before us, a fork in the road. :-) We can choose one > of these options for the scandir API: > > 1) The current PEP 471 approach. This solves the issue with d_type > being missing or DT_UNKNOWN, it doesn't require onerror, and it's a > really tidy API that doesn't explode with AttributeErrors if you write > code on Windows (without thinking too hard) and then move to Linux. I > think all of these points are important -- the cross-platform one not > the least, because we want to make it easy, even *trivial*, for people > to write cross-platform code. Yes, but we don't want a function that sucks equally on all platforms. ;) > 2) Nick Coghlan's model of only fetching the lstat value if > ensure_lstat=True, and including an onerror callback for error > handling when scandir calls lstat internally. However, as described, > we'd also need an ensure_type=True option, so that scandir() isn't way > slower than listdir() if you actually don't want the is_X values and > d_type is missing/unknown. With the multi-level version of 'ensure_lstat' we do not need an extra 'ensure_type'. For reference, here's what get_tree_size() looks like with this approach, not including error handling with onerror: def get_tree_size(path): total = 0 for entry in os.scandir(path, ensure_lstat=1): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat_result.st_size return total And if we added the onerror here it would be a line fragment, as opposed to the extra four lines (at least) for the try/except in the first example (which I cut). Finally: Thank you for writing scandir, and this PEP. Excellent work. Oh, and +1 for option 2, slightly modified. :) -- ~Ethan~ From raymond.hettinger at gmail.com Wed Jul 9 03:48:17 2014 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 8 Jul 2014 18:48:17 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB2F25.3020205@gmx.de> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> Message-ID: <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> On Jul 7, 2014, at 4:37 PM, Andreas Maier wrote: > I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python. > > The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it. Once every few years, someone discovers IEEE-754, learns that NaNs aren't supposed to be equal to themselves and becomes inspired to open an old debate about whether the wreck Python in a effort to make the world safe for NaNs. And somewhere along the way, people forget that practicality beats purity. Here are a few thoughts on the subject that may or may not add a little clarity ;-) * Python already has IEEE-754 compliant NaNs: assert float('NaN') != float('NaN') * Python already has the ability to filter-out NaNs: [x for x in container if not math.nan(x)] * In the numeric world, the most common use of NaNs is for missing data (much like we usually use None). The property of not being equality to itself is primarily useful in low level code optimized to run a calculation to completion without running frequent checks for invalid results (much like @n/a is used in MS Excel). * Python also lets containers establish their own invariants to establish correctness, improve performance, and make it possible to reason about our programs: for x in c: assert x in c * Containers like dicts and sets have always used the rule that identity-implies equality. That is central to their implementation. In particular, the check of interned string keys relies on identity to bypass a slow character-by-character comparison to verify equality. * Traditionally, a relation R is considered an equality relation if it is reflexive, symmetric, and transitive: R(x, x) -> True R(x, y) -> R(y, x) R(x, y) ^ R(y, z) -> R(x, z) * Knowingly or not, programs tend to assume that all of those hold. Test suites in particular assume that if you put something in a container that assertIn() will pass. * Here are some examples of cases where non-reflexive objects would jeopardize the pragmatism of being able to reason about the correctness of programs: s = SomeSet() s.add(x) assert x in s s.remove(x) # See collections.abc.Set.remove assert not s s.clear() # See collections.abc.Set.clear asset not s * What the above code does is up to the implementer of the container. If you use the Set ABC, you can choose to implement __contains__() and discard() to use straight equality or identity-implies equality. Nothing prevents you from making containers that are hard to reason about. * The builtin containers make the choice for identity-implies equality so that it is easier to build fast, correct code. For the most part, this has worked out great (dictionaries in particular have had identify checks built-in from almost twenty years). * Years ago, there was a debate about whether to add an __is__() method to allow overriding the is-operator. The push for the change was the "pure" notion that "all operators should be customizable". However, the idea was rejected based on the "practical" notions that it would wreck our ability to reason about code, it slow down all code that used identity checks, that library modules (ours and third-party) already made deep assumptions about what "is" means, and that people would shoot themselves in the foot with hard to find bugs. Personally, I see no need to make the same mistake by removing the identity-implies-equality rule from the built-in containers. There's no need to upset the apple cart for nearly zero benefit. IMO, the proposed quest for purity is misguided. There are many practical reasons to let the builtin containers continue work as the do now. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jul 9 06:21:11 2014 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 09 Jul 2014 13:21:11 +0900 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <20140708165745.GJ13014@ando> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <20140708165745.GJ13014@ando> Message-ID: <87y4w3rwlk.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I don't think so. Floating point == represents *numeric* equality, There is no such thing as floating point == in Python. You can apply == to two floating point numbers, but == (at the language level) handles any two numbers, as well as pairs of things that aren't numbers in the Python language. So it's a design decision to include NaNs at all, and another design decision to follow IEEE in giving them behavior that violates the definition of equivalence relation for ==. > In an early post, you suggested that NANs don't have a value, or that > they have a value which is not a value. I don't think that's a good way > to look at it. I think the obvious way to think of it is that NAN's > value is Not A Number, exactly like it says on the box. Now, if > something is not a number, obviously you cannot compare it numerically: And if Python can't do something you ask it to do, it raises an exception. Why should this be different? Obviously, it's question of expedience. > I'm not sure what you're referring to here. Is it that containers such > as lists and dicts are permitted to optimize equality tests with > identity tests for speed? No, when I say I'm fuzzy I'm referring to the fact that although I understand the logical rationale for IEEE 754 NaN behavior, I don't really understand the ins and outs well enough to judge for myself whether it's a good idea for Python to follow that model and turn == into something that is not an equivalence relation. I'm not going to argue for a change, I just want to know where I stand. > Basically, and I realise that many people disagree with their decision > (notably Bertrand Meyer of Eiffel fame, and our own Mark > Dickenson), Indeed. So "it's the standard" does not mean there is a consensus of experts. I'm willing to delegate to a consensus of expert opinion, but not when some prominent local expert(s) disagree -- then I'd like to understand well enough to come to my own conclusions. From p.f.moore at gmail.com Wed Jul 9 09:13:10 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Jul 2014 08:13:10 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> Message-ID: On 9 July 2014 02:08, Ben Hoyt wrote: > Comments and votes, please! +1 on option 1 (current PEP approach) at the moment, but I would like to see how the error handling would look (suppose the function logs files that can't be statted, and assumes a size of 0 for them). The idea of a multi-level ensure_lstat isn't unreasonable, either, and that helps option 2. The biggest issue *I* see with option 2 is that people won't remember to add the ensure_XXX argument, and that will result in more code that seems to work but fails cross-platform. Unless scandir deliberately fails if you use an attribute that you haven't "ensured", but that would be really unfriendly... Paul From benhoyt at gmail.com Wed Jul 9 14:48:04 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 08:48:04 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BC9B8B.40509@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> Message-ID: > In this case because the names are exactly the same as the os versions which > /do/ make a system call. Fair enough. > So if I'm finally understanding the root problem here: > > - listdir returns a list of strings, one for each filename and one for > each directory, and keeps no other O/S supplied info. > > - os.walk, which uses listdir, then needs to go back to the O/S and > refetch the thrown-away information > > - so it's slow. > ... > and the new problem: > > - not all O/Ses provide the same (or any) extra info about the > directory entries > > Have I got that right? Yes, that's exactly right. > If so, I still like the attribute idea better (surprise!), we just need to > revisit the 'ensure_lstat' (or whatever it's called) parameter: instead of > a true/false value, it could have a scale: > > - 0 = whatever the O/S gives us > > - 1 = at least the is_dir/is_file (whatever the other normal one is), > and if the O/S doesn't give it to us for free than call lstat > > - 2 = we want it all -- call lstat if necessary on this platform > > After all, the programmer should know up front how much of the extra info > will be needed for the work that is trying to be done. Yeah, I think this is a good idea to make option #2 a bit nicer. I don't like the magic constants, and using constants like os.SCANDIR_LSTAT is annoying, so how about using strings? I also suggest calling the parameter "info" (because it determines what info is returned), so you'd do scandir(path, info='type') if you need just the is_X type information. I also think it's nice to have a way for power users to "just return what the OS gives us". However, I think making this the default is a bad idea, as it's just asking for cross-platform bugs (and it's easy to prevent). Paul Moore basically agrees with this in his reply yesterday, though I disagree with him it would be unfriendly to fail hard unless you asked for the info -- quite the opposite, Linux users would think it very unfriendly when your code broke because you didn't ask for the info. :-) So how about tweaking option #2 a tiny bit more to this: def scandir(path='.', info=None, onerror=None): ... * if info is None (the default), only the .name and .full_name attributes are present * if info is 'type', scandir ensures the is_dir/is_file/is_symlink attributes are present and either True or False * if info is 'lstat', scandir additionally ensures a .lstat is present and is a full stat_result object * if info is 'os', scandir returns the attributes the OS provides (everything on Windows, only is_X -- most of the time -- on POSIX) * if onerror is not None and errors occur during any internal lstat() call, onerror(exc) is called with the OSError exception object Further point -- because the is_dir/is_file/is_symlink attributes are booleans, it would be very bad for them to be present but None if you didn't ask for (or the OS didn't return) the type information. Because then "if entry.is_dir:" would be None and your code would think it wasn't a directory, when actually you don't know. For this reason, all attributes should fail with AttributeError if not fetched. > Thank you for writing scandir, and this PEP. Excellent work. Thanks! > Oh, and +1 for option 2, slightly modified. :) With the above tweaks, I'm getting closer to being 50/50. It's probably 60% #1 and 40% #2 for me now. :-) Okay folks -- please respond: option #1 as per the current PEP 471, or option #2 with Ethan's multi-level thing tweaks as per the above? -Ben From victor.stinner at gmail.com Wed Jul 9 15:05:05 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 9 Jul 2014 15:05:05 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: 2014-07-08 22:09 GMT+02:00 Ben Hoyt : >>> I think you're misunderstanding is_dir() and is_file(), as these don't >>> actually call os.stat(). All DirEntry methods either call nothing or >>> os.lstat() to get the stat info on the entry itself (not the >>> destination of the symlink). >> >> >> Oh. Extract of your PEP: "is_dir(): like os.path.isdir(), but much cheaper". >> >> genericpath.isdir() and genericpath.isfile() use os.stat(), whereas >> posixpath.islink() uses os.lstat(). >> >> Is it a mistake in the PEP? > > Ah, you're dead right -- this is basically a bug in the PEP, as > DirEntry.is_dir() is not like os.path.isdir() in that it is based on > the entry itself (like lstat), not following the link. > > I'll improve the wording here and update the PEP. Ok, so it means that your example grouping files per type, files and directories, is also wrong. Or at least, it behaves differently than os.walk(). You should put symbolic links to directories in the "dirs" list too. if entry.is_dir(): # is_dir() checks os.lstat() dirs.append(entry) elif entry.is_symlink() and os.path.isdir(entry): # isdir() checks os.stat() dirs.append(entry) else: non_dirs.append(entry) Victor From benhoyt at gmail.com Wed Jul 9 15:12:24 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 09:12:24 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: > Ok, so it means that your example grouping files per type, files and > directories, is also wrong. Or at least, it behaves differently than > os.walk(). You should put symbolic links to directories in the "dirs" > list too. > > if entry.is_dir(): # is_dir() checks os.lstat() > dirs.append(entry) > elif entry.is_symlink() and os.path.isdir(entry): # isdir() checks os.stat() > dirs.append(entry) > else: > non_dirs.append(entry) Yes, good call. I believe I'm doing this wrong in the scandir.py os.walk() implementation too -- hence this open issue: https://github.com/benhoyt/scandir/issues/4 -Ben From p.f.moore at gmail.com Wed Jul 9 15:12:34 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Jul 2014 14:12:34 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> Message-ID: On 9 July 2014 13:48, Ben Hoyt wrote: > Okay folks -- please respond: option #1 as per the current PEP 471, or > option #2 with Ethan's multi-level thing tweaks as per the above? I'm probably about 50/50 at the moment. What will swing it for me is likely error handling, so let's try both approaches with some error handling: Rules are that we calculate the total size of all files in a tree (as returned from lstat), with files that fail to stat being logged and their size assumed to be 0. Option 1: def get_tree_size(path): total = 0 for entry in os.scandir(path): try: isdir = entry.is_dir() except OSError: logger.warn("Cannot stat {}".format(entry.full_name)) continue if entry.is_dir(): total += get_tree_size(entry.full_name) else: try: total += entry.lstat().st_size except OSError: logger.warn("Cannot stat {}".format(entry.full_name)) return total Option 2: def log_err(exc): logger.warn("Cannot stat {}".format(exc.filename)) def get_tree_size(path): total = 0 for entry in os.scandir(path, info='lstat', onerror=log_err): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat.st_size return total On this basis, #2 wins. However, I'm slightly uncomfortable using the filename attribute of the exception in the logging, as there is nothing in the docs saying that this will give a full pathname. I'd hate to see "Unable to stat __init__.py"!!! So maybe the onerror function should also receive the DirEntry object - which will only have the name and full_name attributes, but that's all that is needed. OK, looks like option #2 is now my preferred option. My gut instinct still rebels over an API that deliberately throws information away in the default case, even though there is now an option to ask it to keep that information, but I see the logic and can learn to live with it. Paul From antoine at python.org Wed Jul 9 15:21:26 2014 From: antoine at python.org (Antoine Pitrou) Date: Wed, 09 Jul 2014 09:21:26 -0400 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <87y4w3rwlk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <53BB5082.500@btinternet.com> <20140708031202.GF13014@ando> <53BB6D5F.1010800@btinternet.com> <87a98ktjv7.fsf@uwakimon.sk.tsukuba.ac.jp> <8761j8thf5.fsf@uwakimon.sk.tsukuba.ac.jp> <20140708165745.GJ13014@ando> <87y4w3rwlk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Le 09/07/2014 00:21, Stephen J. Turnbull a ?crit : > Steven D'Aprano writes: > > > I don't think so. Floating point == represents *numeric* equality, > > There is no such thing as floating point == in Python. You can apply > == to two floating point numbers, but == (at the language level) > handles any two numbers, as well as pairs of things that aren't > numbers in the Python language. This is becoming pointless hair-splitting. >>> float.__eq__(1.0, 2.0) False >>> float.__eq__(1.0, 2) False >>> float.__eq__(1.0, 1.0+0J) NotImplemented >>> float.__eq__(1, 2) Traceback (most recent call last): File "", line 1, in TypeError: descriptor '__eq__' requires a 'float' object but received a 'int' Please direct any further discussion of this to python-ideas. From benhoyt at gmail.com Wed Jul 9 15:22:41 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 09:22:41 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> Message-ID: > Option 2: > def log_err(exc): > logger.warn("Cannot stat {}".format(exc.filename)) > > def get_tree_size(path): > total = 0 > for entry in os.scandir(path, info='lstat', onerror=log_err): > if entry.is_dir: > total += get_tree_size(entry.full_name) > else: > total += entry.lstat.st_size > return total > > On this basis, #2 wins. That's a pretty nice comparison, and you're right, onerror handling is nicer here. > However, I'm slightly uncomfortable using the > filename attribute of the exception in the logging, as there is > nothing in the docs saying that this will give a full pathname. I'd > hate to see "Unable to stat __init__.py"!!! Huh, you're right. I think this should be documented in os.walk() too. I think it should be the full filename (is it currently?). > So maybe the onerror function should also receive the DirEntry object > - which will only have the name and full_name attributes, but that's > all that is needed. That's an interesting idea -- though enough of a deviation from os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just document that the onerror exception .filename is the full path name. One issue with option #2 that I just realized -- does scandir yield the entry at all if there's a stat error? It can't really, because the caller will except the .lstat attribute to be set (assuming he asked for type='lstat') but it won't be. Is effectively removing these entries just because the stat failed a problem? I kind of think it is. If so, is there a way to solve it with option #2? > OK, looks like option #2 is now my preferred option. My gut instinct > still rebels over an API that deliberately throws information away in > the default case, even though there is now an option to ask it to keep > that information, but I see the logic and can learn to live with it. In terms of throwing away info "in the default case" -- it's simply a case of getting what you ask for. :-) Worst case, you'll write your code and test it, it'll fail hard on any system, you'll fix it immediately, and then it'll work on any system. -Ben From p.f.moore at gmail.com Wed Jul 9 15:30:32 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Jul 2014 14:30:32 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> Message-ID: On 9 July 2014 14:22, Ben Hoyt wrote: >> So maybe the onerror function should also receive the DirEntry object >> - which will only have the name and full_name attributes, but that's >> all that is needed. > > That's an interesting idea -- though enough of a deviation from > os.walk()'s onerror that I'm uncomfortable with it -- I'd rather just > document that the onerror exception .filename is the full path name. But the onerror exception will come from the lstat call, so it'll be a raw OSError (unless scandir modifies it, which may be what you're thinking of). And if so, aren't we at the mercy of what the OS module gives us? That's why I said we can't guarantee it. I looked at the documentation of OSError (in "Built In Exceptions"), and all it says is "the filename" (unqualified). I'd expect that to be "whatever got passed to the underlying OS API" - which may well be an absolute pathname if we're lucky, but who knows? (I'd actually prefer it if OSError guaranteed a full pathname, but that's a bigger issue...) Paul From ethan at stoneleaf.us Wed Jul 9 15:17:40 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 06:17:40 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> Message-ID: <53BD40F4.8020009@stoneleaf.us> On 07/09/2014 05:48 AM, Ben Hoyt wrote: > > So how about tweaking option #2 a tiny bit more to this: > > def scandir(path='.', info=None, onerror=None): ... > > * if info is None (the default), only the .name and .full_name > attributes are present > * if info is 'type', scandir ensures the is_dir/is_file/is_symlink > attributes are present and either True or False > * if info is 'lstat', scandir additionally ensures a .lstat is present > and is a full stat_result object > * if info is 'os', scandir returns the attributes the OS provides > (everything on Windows, only is_X -- most of the time -- on POSIX) I would rather have the default for info be 'os': cross-platform is good, but there is no reason to force it on some poor script that is meant to run on a local machine and will never leave it. > * if onerror is not None and errors occur during any internal lstat() > call, onerror(exc) is called with the OSError exception object As Paul mentioned, 'onerror(exc, DirEntry)' would be better. > Further point -- because the is_dir/is_file/is_symlink attributes are > booleans, it would be very bad for them to be present but None if you > didn't ask for (or the OS didn't return) the type information. Because > then "if entry.is_dir:" would be None and your code would think it > wasn't a directory, when actually you don't know. For this reason, all > attributes should fail with AttributeError if not fetched. Fair point, and agreed. -- ~Ethan~ From ethan at stoneleaf.us Wed Jul 9 15:41:04 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 06:41:04 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> Message-ID: <53BD4670.9080100@stoneleaf.us> On 07/09/2014 06:22 AM, Ben Hoyt wrote: > > One issue with option #2 that I just realized -- does scandir yield the entry at all if there's a stat error? It > can't really, because the caller will expect the .lstat attribute to be set (assuming he asked for type='lstat') but > it won't be. Is effectively removing these entries just because the stat failed a problem? I kind of think it is. If > so, is there a way to solve it with option #2? Leave it up to the onerror handler. If it returns None, skip yielding the entry, otherwise yield whatever it returned -- which also means the error handler should be able to set fields on the DirEntry: def log_err(exc, entry): logger.warn("Cannot stat {}".format(exc.filename)) entry.lstat.st_size = 0 return True def get_tree_size(path): total = 0 for entry in os.scandir(path, info='lstat', onerror=log_err): if entry.is_dir: total += get_tree_size(entry.full_name) else: total += entry.lstat.st_size return total This particular example doesn't benefit much from the addition, but this way we don't have to guess what the programmer wants or needs to do in the case of failure. -- ~Ethan~ From ethan at stoneleaf.us Wed Jul 9 16:41:11 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 07:41:11 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BD4670.9080100@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> Message-ID: <53BD5487.3060608@stoneleaf.us> On 07/09/2014 06:41 AM, Ethan Furman wrote: > > Leave it up to the onerror handler. If it returns None, skip yielding the entry, otherwise yield whatever it returned > -- which also means the error handler should be able to set fields on the DirEntry: > > def log_err(exc, entry): > logger.warn("Cannot stat {}".format(exc.filename)) > entry.lstat.st_size = 0 > return True Blah. Okay, either return the DirEntry (possibly modified), or have the log_err return entry instead of True. (Now where is that caffeine??) -- ~Ethan~ From walter at livinglogic.de Wed Jul 9 16:41:44 2014 From: walter at livinglogic.de (Walter =?utf-8?q?D=C3=B6rwald?=) Date: Wed, 09 Jul 2014 16:41:44 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: On 8 Jul 2014, at 15:52, Ben Hoyt wrote: > Hi folks, > > After some very good python-dev feedback on my first version of PEP > 471, I've updated the PEP to clarify a few things and added various > "Rejected ideas" subsections. Here's a link to the new version (I've > also copied the full text below): > > http://legacy.python.org/dev/peps/pep-0471/ -- new PEP as HTML > http://hg.python.org/peps/rev/0da4736c27e8 -- changes > > [...] > Rejected ideas > ============== > > [...] > Return values being pathlib.Path objects > ---------------------------------------- > > With Antoine Pitrou's new standard library ``pathlib`` module, it > at first seems like a great idea for ``scandir()`` to return instances > of ``pathlib.Path``. However, ``pathlib.Path``'s ``is_X()`` and > ``lstat()`` functions are explicitly not cached, whereas ``scandir`` > has to cache them by design, because it's (often) returning values > from the original directory iteration system call. > > And if the ``pathlib.Path`` instances returned by ``scandir`` cached > lstat values, but the ordinary ``pathlib.Path`` objects explicitly > don't, that would be more than a little confusing. > > Guido van Rossum explicitly rejected ``pathlib.Path`` caching lstat in > the context of scandir `here > `_, > making ``pathlib.Path`` objects a bad choice for scandir return > values. Can we at least make sure that attributes of DirEntry that have the same meaning as attributes of pathlib.Path have the same name? > [...] Servus, Walter From victor.stinner at gmail.com Wed Jul 9 17:05:33 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 9 Jul 2014 17:05:33 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: 2014-07-09 15:12 GMT+02:00 Ben Hoyt : >> Ok, so it means that your example grouping files per type, files and >> directories, is also wrong. Or at least, it behaves differently than >> os.walk(). You should put symbolic links to directories in the "dirs" >> list too. >> >> if entry.is_dir(): # is_dir() checks os.lstat() >> dirs.append(entry) >> elif entry.is_symlink() and os.path.isdir(entry): # isdir() checks os.stat() >> dirs.append(entry) >> else: >> non_dirs.append(entry) > > Yes, good call. I believe I'm doing this wrong in the scandir.py > os.walk() implementation too -- hence this open issue: > https://github.com/benhoyt/scandir/issues/4 The PEP says that DirEntry should mimic pathlib.Path, so I think that DirEntry.is_dir() should work as os.path.isir(): if the entry is a symbolic link, you should follow the symlink to get the status of the linked file with os.stat(). "entry.is_dir() or (entry.is_symlink() and os.path.isdir(entry))" looks wrong: why would you have to check is_dir() and isdir()? Duplicating this check is error prone and not convinient. Pseudo-code: --- class DirEntry: def __init__(self, lstat=None, d_type=None): self._stat = None self._lstat = lstat self._d_type = d_type def stat(self): if self._stat is None: self._stat = os.stat(self.full_name) return self._stat def lstat(self): if self._lstat is None: self._lstat = os.lstat(self.full_name) return self._lstat def is_dir(self): if self._d_type is not None: if self._d_type == DT_DIR: return True if self._d_type != DT_LNK: return False else: lstat = self.lstat() if stat.S_ISDIR(lstat.st_mode): return True if not stat.S_ISLNK(lstat.st_mode): return False stat = self.stat() return stat.S_ISDIR(stat.st_mode) --- DirEntry would be created with lstat (Windows) or d_type (Linux) prefilled. is_dir() would only need to call os.stat() once for symbolic links. With this code, it becomes even more obvious why is_dir() is a method and not a property ;-) Victor From p.f.moore at gmail.com Wed Jul 9 17:26:48 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Jul 2014 16:26:48 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: On 9 July 2014 16:05, Victor Stinner wrote: > The PEP says that DirEntry should mimic pathlib.Path, so I think that > DirEntry.is_dir() should work as os.path.isir(): if the entry is a > symbolic link, you should follow the symlink to get the status of the > linked file with os.stat(). Would this not "break" the tree size script being discussed in the other thread, as it would follow links and include linked directories in the "size" of the tree? As a Windows user with only a superficial understanding of how symlinks should behave, I don't have any intuition as to what the "right" answer should be. But I would say that the tree size code we've been debating over there (which recurses if is_dir is true and adds in st_size otherwise) should do whatever people would expect of a function with that name, as it's a perfect example of something a Windows user might write and expect it to work cross-platform. If it doesn't much of the worrying over making sure scandir's API is "cross-platform by default" is probably being wasted :-) (Obviously the walk_tree function could be modified if needed, but that's missing the point I'm trying to make :-)) Paul From benhoyt at gmail.com Wed Jul 9 17:29:21 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 11:29:21 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: >> The PEP says that DirEntry should mimic pathlib.Path, so I think that >> DirEntry.is_dir() should work as os.path.isir(): if the entry is a >> symbolic link, you should follow the symlink to get the status of the >> linked file with os.stat(). > > Would this not "break" the tree size script being discussed in the > other thread, as it would follow links and include linked directories > in the "size" of the tree? Yeah, I agree. Victor -- I don't think the DirEntry is_X() methods (or attributes) should mimic the link-following os.path.isdir() at all. You want the type of the entry, not the type of the source. Otherwise, as Paul says, you are essentially forced to follow links, and os.walk(followlinks=False), which is the default, can't do the right thing. -Ben From benhoyt at gmail.com Wed Jul 9 17:35:26 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 11:35:26 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BD4670.9080100@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> Message-ID: >> One issue with option #2 that I just realized -- does scandir yield the >> entry at all if there's a stat error? It >> can't really, because the caller will expect the .lstat attribute to be >> set (assuming he asked for type='lstat') but >> >> it won't be. Is effectively removing these entries just because the stat >> failed a problem? I kind of think it is. If >> so, is there a way to solve it with option #2? > > > Leave it up to the onerror handler. If it returns None, skip yielding the > entry, otherwise yield whatever it returned > -- which also means the error handler should be able to set fields on the > DirEntry: > > def log_err(exc, entry): > logger.warn("Cannot stat {}".format(exc.filename)) > entry.lstat.st_size = 0 > return True This is an interesting idea, but it's just getting more and more complex, and I'm guessing that being able to change the attributes of DirEntry will make the C implementation more complex. Also, I'm not sure it's very workable. For log_err above, you'd actually have to do something like this, right? def log_err(exc, entry): logger.warn("Cannot stat {}".format(exc.filename)) entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0)) return entry Unless there's another simple way around this issue, I'm back to loving the simplicity of option #1, which avoids this whole question. -Ben From p.f.moore at gmail.com Wed Jul 9 19:10:29 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Jul 2014 18:10:29 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> Message-ID: On 9 July 2014 14:22, Ben Hoyt wrote: > One issue with option #2 that I just realized -- does scandir yield > the entry at all if there's a stat error? It can't really, because the > caller will except the .lstat attribute to be set (assuming he asked > for type='lstat') but it won't be. Is effectively removing these > entries just because the stat failed a problem? I kind of think it is. > If so, is there a way to solve it with option #2? So the issue is that you need to do a stat but it failed. You have "whatever the OS gave you", but can't get anything more. This is only an issue on POSIX, where the original OS call doesn't give you everything, so it's fine, those POSIX people can just learn to cope with their broken OS, right? :-) More seriously, why not just return a DirEntry that says it's a file with a stat entry that's all zeroes? That seems pretty harmless. And the onerror function will be called, so if it is inappropriate the application can do something. Maybe it's worth letting onerror return a boolean that says whether to skip the entry, but that's as far as I'd bother going. It's a close call, but I think option #2 still wins (just) for me. Paul From ethan at stoneleaf.us Wed Jul 9 18:35:04 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 09:35:04 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> Message-ID: <53BD6F38.7090000@stoneleaf.us> On 07/09/2014 08:35 AM, Ben Hoyt wrote: >>> One issue with option #2 that I just realized -- does scandir yield the >>> entry at all if there's a stat error? It >>> can't really, because the caller will expect the .lstat attribute to be >>> set (assuming he asked for type='lstat') but >>> >>> it won't be. Is effectively removing these entries just because the stat >>> failed a problem? I kind of think it is. If >>> so, is there a way to solve it with option #2? >> >> >> Leave it up to the onerror handler. If it returns None, skip yielding the >> entry, otherwise yield whatever it returned >> -- which also means the error handler should be able to set fields on the >> DirEntry: >> >> def log_err(exc, entry): >> logger.warn("Cannot stat {}".format(exc.filename)) >> entry.lstat.st_size = 0 >> return True > > This is an interesting idea, but it's just getting more and more > complex, and I'm guessing that being able to change the attributes of > DirEntry will make the C implementation more complex. > > Also, I'm not sure it's very workable. For log_err above, you'd > actually have to do something like this, right? > > def log_err(exc, entry): > logger.warn("Cannot stat {}".format(exc.filename)) > entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0)) > return entry I would imagine we would provide a helper function: def stat_result(st_size=0, st_atime=0, st_mtime=0, ...): return os.stat_result((st_size, st_atime, st_mtime, ...)) and then in onerror entry.lstat = stat_result() > Unless there's another simple way around this issue, I'm back to > loving the simplicity of option #1, which avoids this whole question. Too simple is just as bad as too complex, and properly handling errors is rarely a simple task. Either we provide a clean way to deal with errors in the API, or we force every user everywhere to come up with their own system. Also, just because we provide it doesn't force people to use it, but if we don't provide it then people cannot use it. To summarize the choice I think we are looking at: 1) We provide a very basic tool that many will have to write wrappers around to get the desired behavior (choice 1) 2) We provide a more advanced tool that, in many cases, can be used as-is, and is also fairly easy to extend to handle odd situations (choice 2) More specifically, if we go with choice 1 (no built-in error handling, no mutable DirEntry), how would I implement choice 2? Would I have to write my own CustomDirEntry object? -- ~Ethan~ From p.f.moore at gmail.com Wed Jul 9 20:04:09 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Jul 2014 19:04:09 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BD6F38.7090000@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> Message-ID: On 9 July 2014 17:35, Ethan Furman wrote: > More specifically, if we go with choice 1 (no built-in error handling, no > mutable DirEntry), how would I implement choice 2? Would I have to write my > own CustomDirEntry object? Having built-in error handling is, I think, a key point. That's where #1 really falls down. But a mutable DirEntry and/or letting onerror manipulate the result is a lot more than just having a hook for being notified of errors. That seems to me to be a step too far, in the current context. Specifically, the tree size example doesn't need it. Do you have a compelling use case that needs a mutable DirEntry? It feels like YAGNI to me. Paul From ethan at stoneleaf.us Wed Jul 9 19:38:38 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 10:38:38 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> Message-ID: <53BD7E1E.6020700@stoneleaf.us> On 07/09/2014 10:10 AM, Paul Moore wrote: > On 9 July 2014 14:22, Ben Hoyt wrote: >> One issue with option #2 that I just realized -- does scandir yield >> the entry at all if there's a stat error? It can't really, because the >> caller will except the .lstat attribute to be set (assuming he asked >> for type='lstat') but it won't be. Is effectively removing these >> entries just because the stat failed a problem? I kind of think it is. >> If so, is there a way to solve it with option #2? > > So the issue is that you need to do a stat but it failed. You have > "whatever the OS gave you", but can't get anything more. This is only > an issue on POSIX, where the original OS call doesn't give you > everything, so it's fine, those POSIX people can just learn to cope > with their broken OS, right? :-) LOL > More seriously, why not just return a DirEntry that says it's a file > with a stat entry that's all zeroes? That seems pretty harmless. And > the onerror function will be called, so if it is inappropriate the > application can do something. Maybe it's worth letting onerror return > a boolean that says whether to skip the entry, but that's as far as > I'd bother going. I could live with this -- we could enhance it the future fairly easily if we needed to. -- ~Ethan~ From ethan at stoneleaf.us Wed Jul 9 20:29:50 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 11:29:50 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> Message-ID: <53BD8A1E.6090804@stoneleaf.us> On 07/09/2014 11:04 AM, Paul Moore wrote: > On 9 July 2014 17:35, Ethan Furman wrote: >> More specifically, if we go with choice 1 (no built-in error handling, no >> mutable DirEntry), how would I implement choice 2? Would I have to write my >> own CustomDirEntry object? > > Having built-in error handling is, I think, a key point. That's where > #1 really falls down. > > But a mutable DirEntry and/or letting onerror manipulate the result is > a lot more than just having a hook for being notified of errors. That > seems to me to be a step too far, in the current context. > Specifically, the tree size example doesn't need it. > > Do you have a compelling use case that needs a mutable DirEntry? It > feels like YAGNI to me. Not at this point. As I indicated in my reply to your response, as long as we have the onerror machinery now we can tweak it later if real-world use shows it would be beneficial. -- ~Ethan~ From benhoyt at gmail.com Wed Jul 9 21:03:20 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 15:03:20 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BD6F38.7090000@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> Message-ID: This is just getting way too complex ... further thoughts below. >> This is an interesting idea, but it's just getting more and more >> complex, and I'm guessing that being able to change the attributes of >> DirEntry will make the C implementation more complex. >> >> Also, I'm not sure it's very workable. For log_err above, you'd >> actually have to do something like this, right? >> >> def log_err(exc, entry): >> logger.warn("Cannot stat {}".format(exc.filename)) >> entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0)) >> return entry > > > I would imagine we would provide a helper function: > > def stat_result(st_size=0, st_atime=0, st_mtime=0, ...): > return os.stat_result((st_size, st_atime, st_mtime, ...)) > > and then in onerror > > entry.lstat = stat_result() > >> Unless there's another simple way around this issue, I'm back to >> loving the simplicity of option #1, which avoids this whole question. > > > Too simple is just as bad as too complex, and properly handling errors is > rarely a simple task. Either we provide a clean way to deal with errors in > the API, or we force every user everywhere to come up with their own system. > > Also, just because we provide it doesn't force people to use it, but if we > don't provide it then people cannot use it. So here's the ways in which option #2 is now more complicated than option #1: 1) it has an additional "info" argument, the values of which have to be documented ('os', 'type', 'lstat', and what each one means) 2) it has an additional "onerror" argument, the signature of which and fairly complicated return value is non-obvious and has to be documented 3) it requires user modification of the DirEntry object, which needs documentation, and is potentially hard to implement 4) because the DirEntry object now allows modification, you need a stat_result() helper function to help you build your own stat values I'm afraid points 3 and 4 here add way too much complexity. Remind me why all this is better than the PEP 471 approach again? It handles all of these problems, is very direct, and uses built-in Python constructs (method calls and try/except error handling). And it's also simple to document -- much simpler than the above 4 things, which could be a couple of pages in the docs. Here's the doc required for the PEP 471 approach: "Note about caching and error handling: The is_X() and lstat() functions may perform an lstat() on first call if the OS didn't already fetch this data when reading the directory. So if you need fine-grained error handling, catch OSError exceptions around these method calls. After the first call, the is_X() and lstat() functions cache the value on the DirEntry." -Ben From ethan at stoneleaf.us Wed Jul 9 21:17:43 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 12:17:43 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> Message-ID: <53BD9557.80709@stoneleaf.us> On 07/09/2014 12:03 PM, Ben Hoyt wrote: > > So here's the ways in which option #2 is now more complicated than option #1: > > 1) it has an additional "info" argument, the values of which have to > be documented ('os', 'type', 'lstat', and what each one means) > 2) it has an additional "onerror" argument, the signature of which and > fairly complicated return value is non-obvious and has to be > documented > 3) it requires user modification of the DirEntry object, which needs > documentation, and is potentially hard to implement > 4) because the DirEntry object now allows modification, you need a > stat_result() helper function to help you build your own stat values > > I'm afraid points 3 and 4 here add way too much complexity. I'm okay with dropping 3 and 4, and making the return from onerror being simply True to yield the entry, and False/None to skip it. That should make implementation much easier, and documentation not too strenuous either. -- ~Ethan~ From benhoyt at gmail.com Wed Jul 9 21:59:39 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 15:59:39 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BD9557.80709@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> Message-ID: >> 1) it has an additional "info" argument, the values of which have to >> be documented ('os', 'type', 'lstat', and what each one means) >> 2) it has an additional "onerror" argument, the signature of which and >> fairly complicated return value is non-obvious and has to be >> documented >> 3) it requires user modification of the DirEntry object, which needs >> documentation, and is potentially hard to implement >> 4) because the DirEntry object now allows modification, you need a >> stat_result() helper function to help you build your own stat values >> >> I'm afraid points 3 and 4 here add way too much complexity. > > > I'm okay with dropping 3 and 4, and making the return from onerror being > simply True to yield the entry, and False/None to skip it. That should make > implementation much easier, and documentation not too strenuous either. That's definitely better in terms of complexity. Other python-devers, please chime in with your thoughts or votes. -Ben From victor.stinner at gmail.com Wed Jul 9 22:24:19 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 9 Jul 2014 22:24:19 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> Message-ID: 2014-07-09 21:59 GMT+02:00 Ben Hoyt : > Other python-devers, please chime in with your thoughts or votes. Sorry, I didn't follow the whole discussion. IMO DirEntry must use methods and you should not expose nor document which infos are already provided by the OS or not. DirEntry should be a best-effort black-box object providing an API similar to pathlib.Path. is_dir() may be fast? fine, but don't say it in the documentation because Python must remain portable and you should not write code specific to one specific platform. is_dir(), is_file(), is_symlink() and lstat() can fail as any other Python function, no need to specialize them with custom error handler. If you care, just use a very standard try/except. I'm also against ensure_lstat=True or ideas like that fetching all datas transparently in the generator. The behaviour would be too different depending on the OS, and usually you don't need all informations. And it raises errors in the generator, which is something unusual, and difficult to handle (I don't like the onerror callback). Example where you may sometimes need is_dir(), but not always --- for entry in os.scandir(path): if ignore_entry(entry.name): # this entry is not interesting, lstat_result is useless here continue if entry.is_dir(): # fetch required data if needed continue ... --- I don't understand why you are all focused on handling os.stat() and os.lstat() errors. See for example the os.walk() function which is an old function (python 2.6!): it doesn't catch erros on isdir(), even if it has an onerror parameter... It only handles errors on listdir(). IMO errors on os.stat() and os.lstat() are very rare under very specific conditions. The most common case is that you can get the status if you can list files. Victor From p.f.moore at gmail.com Wed Jul 9 22:57:57 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Jul 2014 21:57:57 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> Message-ID: On 9 July 2014 21:24, Victor Stinner wrote: > Example where you may sometimes need is_dir(), but not always > --- > for entry in os.scandir(path): > if ignore_entry(entry.name): > # this entry is not interesting, lstat_result is useless here > continue > if entry.is_dir(): # fetch required data if needed > continue > ... > --- That is an extremely good point, and articulates why I've always been a bit uncomfortable with the whole ensure_stat idea. > I don't understand why you are all focused on handling os.stat() and > os.lstat() errors. See for example the os.walk() function which is an > old function (python 2.6!): it doesn't catch erros on isdir(), even if > it has an onerror parameter... It only handles errors on listdir(). > IMO errors on os.stat() and os.lstat() are very rare under very > specific conditions. The most common case is that you can get the > status if you can list files. Personally, I'm only focused on it as a response to others feeling it's important. I'm on Windows, where there are no extra stat calls, so all *I* care about is having an API that deals with the use cases others are concerned about without making it too hard for me to use it on Windows where I don't have to worry about all this. If POSIX users come to a consensus that error handling doesn't need special treatment, I'm more than happy to go back to the PEP version. (Much as previously happened with the race condition debate). Paul From ethan at stoneleaf.us Wed Jul 9 23:28:07 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 14:28:07 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> Message-ID: <53BDB3E7.5030004@stoneleaf.us> On 07/09/2014 01:57 PM, Paul Moore wrote: > On 9 July 2014 21:24, Victor Stinner wrote: >> >> Example where you may sometimes need is_dir(), but not always >> --- >> for entry in os.scandir(path): >> if ignore_entry(entry.name): >> # this entry is not interesting, lstat_result is useless here >> continue >> if entry.is_dir(): # fetch required data if needed >> continue >> ... > > That is an extremely good point, and articulates why I've always been > a bit uncomfortable with the whole ensure_stat idea. On a system which did not supply is_dir automatically I would write that as: for entry in os.scandir(path): # info defaults to 'os', which is basically None in this case if ignore_entry(entry.name): continue if os.path.isdir(entry.full_name): # do something interesting Not hard to read or understand, no time wasted in unnecessary lstat calls. -- ~Ethan~ From ethan at stoneleaf.us Wed Jul 9 22:44:12 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 13:44:12 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> Message-ID: <53BDA99C.3020101@stoneleaf.us> On 07/09/2014 01:24 PM, Victor Stinner wrote: > > Sorry, I didn't follow the whole discussion. IMO DirEntry must use > methods and you should not expose nor document which infos are already > provided by the OS or not. DirEntry should be a best-effort black-box > object providing an API similar to pathlib.Path. is_dir() may be fast? > fine, but don't say it in the documentation because Python must remain > portable and you should not write code specific to one specific > platform. Okay, so using that logic we should head over to the os module and remove: ctermid, getenv, getegid, geteuid, getgid, getgrouplist, getgroups, getpgid, getpgrp, getpriority, PRIO_PROCESS, PRIO_PGRP, PRIO_USER, getresuid, getresgid, getuid, initgroups, putenv, setegid, seteuid, setgid, setgroups, setpriority, setregid, setrusgid, setresuid, setreuid, getsid, setsid, setuid, unsetenv, fchmod, fchown, fdatasync, fpathconf, fstatvfs, ftruncate, lockf, F_LOCK, F_TLOCK, F_ULOCK, F_TEST, O_DSYNC, O_RSYNC, O_SYNC, O_NDELAY, O_NONBLOCK, O_NOCTTY, O_SHLOCK, O_EXLOCK, O_CLOEXEC, O_BINARY, O_NOINHERIT, O_SHORT_LIVED, O_TEMPORARY, O_RANDOM, O_SEQUENTIAL, O_TEXT, ... Okay, I'm tired of typing, but that list is not even half-way through the os page, and those are all methods or attributes that are not available on either Windows or Unix or some flavors of Unix. Oh, and all those upper-case attributes? Yup, documented. And when we don't document it ourselves we often refer readers to their system documentation because Python does not, in fact, return exactly the same results on all platforms -- particularly when calling into the OS. -- ~Ethan~ From benhoyt at gmail.com Wed Jul 9 23:33:12 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 17:33:12 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BDB3E7.5030004@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDB3E7.5030004@stoneleaf.us> Message-ID: > On a system which did not supply is_dir automatically I would write that as: > > for entry in os.scandir(path): # info defaults to 'os', which is > basically None in this case > if ignore_entry(entry.name): > continue > if os.path.isdir(entry.full_name): > # do something interesting > > Not hard to read or understand, no time wasted in unnecessary lstat calls. No, but how do you know whether you're on "a system which did not supply is_dir automatically"? The above is not cross-platform, or at least, not efficient cross-platform, which defeats the whole point of scandir -- the above is no better than listdir(). -Ben From benhoyt at gmail.com Wed Jul 9 23:42:07 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Wed, 9 Jul 2014 17:42:07 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BDA99C.3020101@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> Message-ID: I really don't understand why you *want* a worse, much less cross-platform API? > Okay, so using that logic we should head over to the os module and remove: > > ctermid, getenv, getegid... > > Okay, I'm tired of typing, but that list is not even half-way through the os > page, and those are all methods or attributes that are not available on > either Windows or Unix or some flavors of Unix. True, is this really the precedent we want to *aim for*. listdir() is cross-platform, and it's relatively easy to make scandir() cross-platform, so why not? > Oh, and all those upper-case attributes? Yup, documented. And when we > don't document it ourselves we often refer readers to their system > documentation because Python does not, in fact, return exactly the same > results on all platforms -- particularly when calling into the OS. But again, why a worse, less cross-platform API when a simple, cross-platform one is a method call away? -Ben From victor.stinner at gmail.com Wed Jul 9 23:38:26 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 9 Jul 2014 23:38:26 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BDA99C.3020101@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> Message-ID: 2014-07-09 22:44 GMT+02:00 Ethan Furman : > On 07/09/2014 01:24 PM, Victor Stinner wrote: >> Sorry, I didn't follow the whole discussion. IMO DirEntry must use >> methods and you should not expose nor document which infos are already >> provided by the OS or not. DirEntry should be a best-effort black-box >> object providing an API similar to pathlib.Path. is_dir() may be fast? >> fine, but don't say it in the documentation because Python must remain >> portable and you should not write code specific to one specific >> platform. > > > Okay, so using that logic we should head over to the os module and remove: (...) My comment was specific to the PEP 471, design of the DirEntry class. Victor From ethan at stoneleaf.us Thu Jul 10 00:12:18 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 15:12:18 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> Message-ID: <53BDBE42.7050609@stoneleaf.us> On 07/09/2014 02:42 PM, Ben Hoyt wrote: >> >> Okay, so using that [no platform specific] logic we should head over to the os module and remove: >> >> ctermid, getenv, getegid... >> >> Okay, I'm tired of typing, but that list is not even half-way through the os >> page, and those are all methods or attributes that are not available on >> either Windows or Unix or some flavors of Unix. > > True, is this really the precedent we want to *aim for*. listdir() is > cross-platform, and listdir has serious performance issues, which is why you developed scandir. >> Oh, and all those [snipped] upper-case attributes? Yup, documented. And when we >> don't document it ourselves we often refer readers to their system >> documentation because Python does not, in fact, return exactly the same >> results on all platforms -- particularly when calling into the OS. > > But again, why a worse, less cross-platform API when a simple, > cross-platform one is a method call away? For the same reason we don't use code that makes threaded behavior better, but kills the single thread application. If the programmer would rather have consistency on all platforms rather than performance on the one being used, `info='lstat'` is the option to use. I like the 'onerror' API better primarily because it gives a single point to deal with the errors. This has at least a couple advantages: - less duplication of code: in the tree_size example, the error handling is duplicated twice - readablity: with the error handling in a separate routine, one does not have to jump around the try/except blocks looking for what happens if there are no errors -- ~Ethan~ From ethan at stoneleaf.us Thu Jul 10 00:15:49 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 15:15:49 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> Message-ID: <53BDBF15.7020505@stoneleaf.us> On 07/09/2014 02:38 PM, Victor Stinner wrote: > 2014-07-09 22:44 GMT+02:00 Ethan Furman: >> On 07/09/2014 01:24 PM, Victor Stinner wrote: >>> >>> [...] Python must remain >>> portable and you should not write code specific to one specific >>> platform. >> >> >> Okay, so using that logic we should head over to the os module and remove: (...) > > My comment was specific to the PEP 471, design of the DirEntry class. And my comment was to the point of there being methods/attributes/return values that /do/ vary by platform, and /are/ documented as such. Even stat itself is not the same on Windows as posix. -- ~Ethan~ From ethan at stoneleaf.us Thu Jul 10 00:50:28 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 15:50:28 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDB3E7.5030004@stoneleaf.us> Message-ID: <53BDC734.9050901@stoneleaf.us> On 07/09/2014 02:33 PM, Ben Hoyt wrote: >> >> On a system which did not supply is_dir automatically I would write that as: >> >> for entry in os.scandir(path): >> if ignore_entry(entry.name): >> continue >> if os.path.isdir(entry.full_name): >> # do something interesting >> >> Not hard to read or understand, no time wasted in unnecessary lstat calls. > > No, but how do you know whether you're on "a system which did not > supply is_dir automatically"? The above is not cross-platform, or at > least, not efficient cross-platform, which defeats the whole point of > scandir -- the above is no better than listdir(). Hit a directory with 100,000 entries and you'll change your mind. ;) Okay, so the issue is you /want/ to write an efficient, cross-platform routine... hrmmm..... thinking........ Okay, marry the two ideas together: scandir(path, info=None, onerror=None) """ Return a generator that returns one directory entry at a time in a DirEntry object info: None --> DirEntries will have whatever attributes the O/S provides 'type' --> DirEntries will already have at least the file/dir distinction 'stat' --> DirEntries will also already have stat information """ DirEntry.is_dir() Return True if this is a directory-type entry; may call os.lstat if the cache is empty. DirEntry.is_file() Return True if this is a file-type entry; may call os.lstat if the cache is empty. DirEntry.is_symlink() Return True if this is a symbolic link; may call os.lstat if the cache is empty. DirEntry.stat Return the stat info for this link; may call os.lstat if the cache is empty. This way both paradigms are supported. -- ~Ethan~ From python at mrabarnett.plus.com Thu Jul 10 01:22:21 2014 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 10 Jul 2014 00:22:21 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BDC734.9050901@stoneleaf.us> References: <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDB3E7.5030004@stoneleaf.us> <53BDC734.9050901@stoneleaf.us> Message-ID: <53BDCEAD.8070809@mrabarnett.plus.com> On 2014-07-09 23:50, Ethan Furman wrote: > On 07/09/2014 02:33 PM, Ben Hoyt wrote: >>> >>> On a system which did not supply is_dir automatically I would write that as: >>> >>> for entry in os.scandir(path): >>> if ignore_entry(entry.name): >>> continue >>> if os.path.isdir(entry.full_name): >>> # do something interesting >>> >>> Not hard to read or understand, no time wasted in unnecessary lstat calls. >> >> No, but how do you know whether you're on "a system which did not >> supply is_dir automatically"? The above is not cross-platform, or at >> least, not efficient cross-platform, which defeats the whole point of >> scandir -- the above is no better than listdir(). > > Hit a directory with 100,000 entries and you'll change your mind. ;) > > Okay, so the issue is you /want/ to write an efficient, cross-platform routine... > > hrmmm..... > > thinking........ > > Okay, marry the two ideas together: > > scandir(path, info=None, onerror=None) > """ > Return a generator that returns one directory entry at a time in a DirEntry object Should that be "that yields one directory entry at a time"? > info: None --> DirEntries will have whatever attributes the O/S provides > 'type' --> DirEntries will already have at least the file/dir distinction > 'stat' --> DirEntries will also already have stat information > """ > > DirEntry.is_dir() > Return True if this is a directory-type entry; may call os.lstat if the cache is empty. > > DirEntry.is_file() > Return True if this is a file-type entry; may call os.lstat if the cache is empty. > > DirEntry.is_symlink() > Return True if this is a symbolic link; may call os.lstat if the cache is empty. > > DirEntry.stat > Return the stat info for this link; may call os.lstat if the cache is empty. > Why is "is_dir", et al, functions, but "stat" not a function? > > This way both paradigms are supported. > From ethan at stoneleaf.us Thu Jul 10 01:26:01 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 16:26:01 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BDCEAD.8070809@mrabarnett.plus.com> References: <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDB3E7.5030004@stoneleaf.us> <53BDC734.9050901@stoneleaf.us> <53BDCEAD.8070809@mrabarnett.pl us.com> Message-ID: <53BDCF89.5070007@stoneleaf.us> On 07/09/2014 04:22 PM, MRAB wrote: > On 2014-07-09 23:50, Ethan Furman wrote: >> >> Okay, marry the two ideas together: >> >> scandir(path, info=None, onerror=None) >> """ >> Return a generator that returns one directory entry at a time in a DirEntry object > > Should that be "that yields one directory entry at a time"? Yes, thanks. >> info: None --> DirEntries will have whatever attributes the O/S provides >> 'type' --> DirEntries will already have at least the file/dir distinction >> 'stat' --> DirEntries will also already have stat information >> """ >> >> DirEntry.is_dir() >> Return True if this is a directory-type entry; may call os.lstat if the cache is empty. >> >> DirEntry.is_file() >> Return True if this is a file-type entry; may call os.lstat if the cache is empty. >> >> DirEntry.is_symlink() >> Return True if this is a symbolic link; may call os.lstat if the cache is empty. >> >> DirEntry.stat >> Return the stat info for this link; may call os.lstat if the cache is empty. > > Why is "is_dir", et al, functions, but "stat" not a function? Good point. Make stat a function as well. -- ~Ethan~ From victor.stinner at gmail.com Thu Jul 10 02:15:58 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 10 Jul 2014 02:15:58 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: 2014-07-09 17:29 GMT+02:00 Ben Hoyt : >> Would this not "break" the tree size script being discussed in the >> other thread, as it would follow links and include linked directories >> in the "size" of the tree? The get_tree_size() function in the PEP would use: "if not entry.is_symlink() and entry.is_dir():". Note: First I wrote "if entry.is_dir() and not entry.is_symlink():", but this syntax is slower on Linux because is_dir() has to call lstat(). Adding an optional keyword to DirEntry.is_dir() would allow to write "if entry.is_dir(follow_symlink=False)", but it looks like a micro optimization and as I said, I prefer to stick to pathlib.Path API (which was already heavily discussed in its PEP). Anyway, this case is rare (I explain that below), we should not worry too much about it. > Yeah, I agree. Victor -- I don't think the DirEntry is_X() methods (or > attributes) should mimic the link-following os.path.isdir() at all. > You want the type of the entry, not the type of the source. On UNIX, a symlink to a directory is expected to behave like a directory. For example, in a file browser, you should enter in the linked directory when you click on a symlink to a directory. There are only a few cases where you want to handle symlinks differently: archive (ex: tar), compute the size of a directory (ex: du does not follow symlinks by default, du -L follows them), remove a directory. You should do a short poll in the Python stdlib and on the Internet to check what is the most common check. Examples of the Python stdlib: - zipfile: listdir + os.path.isdir - pkgutil: listdir + os.path.isdir - unittest.loader: listdir + os.path.isdir and os.path.isfile - http.server: listdir + os.path.isdir, it also uses os.path.islink: " Append / for directories or @ for symbolic links " - idlelib.GrepDialog: listdir + os.path.isdir - compileall: listdir + os.path.isdir and "os.path.isdir(fullname) and not os.path.islink(fullname)" <= don't follow symlinks to directories - shutil (copytree): listdir + os.path.isdir + os.path.islink - shutil (rmtree): listdir + os.lstat() + stat.S_ISDIR(mode) <= don't follow symlinks to directories - mailbox: listdir + os.path.isdir - tabnanny: listdir + os.path.isdir - os.walk: listdir + os.path.isdir + os.path.islink <= don't follow symlinks to directories by default, but the behaviour is configurable ... but symlinks to directories are added to the "dirs" list (not all symlinks, only symlinks to directories) - setup.py: listdir + os.path.isfile In this list of 12 examples, only compileall, shutil.rmtree and os.walk check if entries are symlinks. compileall starts by checking "if not os.path.isdir(fullname):" which follows symlinks. os.walk() starts by checking "if os.path.isdir(name):" which follows symlinks. I consider that only one case on 12 (8.3%) doesn't follow symlinks. If entry.is_dir() doesn't follow symlinks, the other 91.7% will need to be modified to use "if entry.is_dir() or (entry.is_link() and os.path.is_dir(entry.full_name)):" to keep the same behaviour :-( > Otherwise, as Paul says, you are essentially forced to follow links, > and os.walk(followlinks=False), which is the default, can't do the > right thing. os.walk() and get_tree_size() are good users of scandir(), but they are recursive functions. It means that you may handle symlinks differently, os.walk() gives the choice to follow or not symlinks for example. Recursive functions are rare. The most common case is to list files of a single directory and then filter files depending on various filters (is a file? is a directory? match the file name? ...). In such use case, you don't "care" of symlinks (you want to follow them). Victor From victor.stinner at gmail.com Thu Jul 10 02:23:17 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 10 Jul 2014 02:23:17 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: 2014-07-09 17:26 GMT+02:00 Paul Moore : > On 9 July 2014 16:05, Victor Stinner wrote: >> The PEP says that DirEntry should mimic pathlib.Path, so I think that >> DirEntry.is_dir() should work as os.path.isir(): if the entry is a >> symbolic link, you should follow the symlink to get the status of the >> linked file with os.stat(). > > (...) > As a Windows user with only a superficial understanding of how > symlinks should behave, (...) FYI Windows also supports symbolic links since Windows Vista. The feature is unknown because it is restricted to the administrator account. Try the "mklink" command in a terminal (cmd.exe) ;-) http://en.wikipedia.org/wiki/NTFS_symbolic_link ... To be honest, I never created a symlink on Windows. But since it is supported, you need to know it to write correctly your Windows code. (It's unrelated to "LNK" files.) Victor From Nikolaus at rath.org Thu Jul 10 02:25:54 2014 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 09 Jul 2014 17:25:54 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: (Ben Hoyt's message of "Wed, 9 Jul 2014 15:03:20 -0400") References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> Message-ID: <87pphe2h65.fsf@vostro.rath.org> Ben Hoyt writes: > So here's the ways in which option #2 is now more complicated than option #1: > > 1) it has an additional "info" argument, the values of which have to > be documented ('os', 'type', 'lstat', and what each one means) > 2) it has an additional "onerror" argument, the signature of which and > fairly complicated return value is non-obvious and has to be > documented > 3) it requires user modification of the DirEntry object, which needs > documentation, and is potentially hard to implement > 4) because the DirEntry object now allows modification, you need a > stat_result() helper function to help you build your own stat values > > I'm afraid points 3 and 4 here add way too much complexity. Points 3 and 4 are not required to go with option #2, option #2 merely allows to implement points 3 and 4 at some point in the future if it turns out to be desirable. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From ethan at stoneleaf.us Thu Jul 10 02:38:11 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 09 Jul 2014 17:38:11 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: <53BDE073.2030208@stoneleaf.us> On 07/09/2014 05:15 PM, Victor Stinner wrote: > 2014-07-09 17:29 GMT+02:00 Ben Hoyt : >>> Would this not "break" the tree size script being discussed in the >>> other thread, as it would follow links and include linked directories >>> in the "size" of the tree? > > The get_tree_size() function in the PEP would use: "if not > entry.is_symlink() and entry.is_dir():". > > Note: First I wrote "if entry.is_dir() and not entry.is_symlink():", > but this syntax is slower on Linux because is_dir() has to call > lstat(). Wouldn't it only have to call lstat if the entry was, in fact, a link? > There are only a few cases where you want to handle symlinks > differently: archive (ex: tar), compute the size of a directory (ex: > du does not follow symlinks by default, du -L follows them), remove a > directory. I agree with Victor here. If the entry is a link I would want to know if it was a link to a directory or a link to a file. If I care about not following sym links I can check is_symlink() (or whatever it's called). -- ~Ethan~ From victor.stinner at gmail.com Thu Jul 10 02:57:00 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 10 Jul 2014 02:57:00 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: Oh, since I'm proposing to add a new stat() method to DirEntry, we can optimize it. stat() can reuse lstat() result if the file is not a symlink. It simplifies is_dir(). New pseudo-code: --- class DirEntry: def __init__(self, path, name, lstat=None, d_type=None): self.name = name self.full_name = os.path.join(path, name) # lstat is known on Windows self._lstat = lstat if lstat is not None and not stat.S_ISLNK(lstat.st_mode): # On Windows, stat() only calls os.stat() for symlinks self._stat = lstat else: self._stat = None # d_type is known on UNIX if d_type != DT_UNKNOWN: self._d_type = d_type else: # DT_UNKNOWN is not a very useful information :-p self._d_type = None def stat(self): if self._stat is None: self._stat = os.stat(self.full_name) return self._stat def lstat(self): if self._lstat is None: self._lstat = os.lstat(self.full_name) if self._stat is None and not stat.S_ISLNK(self._lstat.st_mode): self._stat = lstat return self._lstat def is_dir(self): if self._d_type is not None: if self._d_type == DT_DIR: return True if self._d_type != DT_LNK: return False else: lstat = self.lstat() if stat.S_ISDIR(lstat.st_mode): return True stat = self.stat() # if lstat() was already called, stat() will only call os.stat() for symlink return stat.S_ISDIR(stat.st_mode) --- The extra caching rules are complex, that's why I suggest to not document them. In short: is_dir() only needs an extra syscall for symlinks, for other file types it does not need any syscall. Victor From timothy.c.delaney at gmail.com Thu Jul 10 02:58:57 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Thu, 10 Jul 2014 10:58:57 +1000 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: On 10 July 2014 10:23, Victor Stinner wrote: > 2014-07-09 17:26 GMT+02:00 Paul Moore : > > On 9 July 2014 16:05, Victor Stinner wrote: > >> The PEP says that DirEntry should mimic pathlib.Path, so I think that > >> DirEntry.is_dir() should work as os.path.isir(): if the entry is a > >> symbolic link, you should follow the symlink to get the status of the > >> linked file with os.stat(). > > > > (...) > > As a Windows user with only a superficial understanding of how > > symlinks should behave, (...) > > FYI Windows also supports symbolic links since Windows Vista. The > feature is unknown because it is restricted to the administrator > account. Try the "mklink" command in a terminal (cmd.exe) ;-) > http://en.wikipedia.org/wiki/NTFS_symbolic_link > > ... To be honest, I never created a symlink on Windows. But since it > is supported, you need to know it to write correctly your Windows > code. > Personally, I create them all the time on Windows - mainly via the Link Shell Extension < http://schinagl.priv.at/nt/hardlinkshellext/linkshellextension.html>. It's the easiest way to ensure that my directory structures are as I want them whilst not worrying about where the files really are e.g. code on SSD, GB+-sized data files on rusty metal, symlinks makes it look like it's the same directory structure. Same thing can be done with junctions if you're only dealing with directories, but symlinks work with files as well. I work cross-platform, and have a mild preference for option #2 with similar semantics on all platforms. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Thu Jul 10 04:28:09 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Thu, 10 Jul 2014 06:28:09 +0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal References: Message-ID: <87mwcic5hi.fsf@gmail.com> Ben Hoyt writes: ... > ``scandir()`` yields a ``DirEntry`` object for each file and directory > in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'`` > pseudo-directories are skipped, and the entries are yielded in > system-dependent order. Each ``DirEntry`` object has the following > attributes and methods: > > * ``name``: the entry's filename, relative to the ``path`` argument > (corresponds to the return values of ``os.listdir``) > > * ``full_name``: the entry's full path name -- the equivalent of > ``os.path.join(path, entry.name)`` I suggest renaming .full_name -> .path .full_name might be misleading e.g., it implies that .full_name == abspath(.full_name) that might be false. The .path name has no such associations. The semantics of the the .path attribute is defined by these assertions:: for entry in os.scandir(topdir): #NOTE: assume os.path.normpath(topdir) is not called to create .path assert entry.path == os.path.join(topdir, entry.name) assert entry.name == os.path.basename(entry.path) assert entry.name == os.path.relpath(entry.path, start=topdir) assert os.path.dirname(entry.path) == topdir assert (entry.path != os.path.abspath(entry.path) or os.path.isabs(topdir)) # it is absolute only if topdir is assert (entry.path != os.path.realpath(entry.path) or topdir == os.path.realpath(topdir)) # symlinks are not resolved assert (entry.path != os.path.normcase(entry.path) or topdir == os.path.normcase(topdir)) # no case-folding, # unlike PureWindowsPath ... > * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never > requires a system call on Windows, and usually doesn't on POSIX > systems I suggest documenting the implicit follow_symlinks parameter for .is_X methods. Note: lstat == partial(stat, follow_symlinks=False). In particular, .is_dir() should probably use follow_symlinks=True by default as suggested by Victor Stinner *if .is_dir() does it on Windows* MSDN says: GetFileAttributes() does not follow symlinks. os.path.isdir docs imply follow_symlinks=True: "both islink() and isdir() can be true for the same path." ... > Like the other functions in the ``os`` module, ``scandir()`` accepts > either a bytes or str object for the ``path`` parameter, and returns > the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the > same type as ``path``. However, it is *strongly recommended* to use > the str type, as this ensures cross-platform support for Unicode > filenames. Document when {e.name for e in os.scandir(path)} != set(os.listdir(path)) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ e.g., path can be an open file descriptor in os.listdir(path) since Python 3.3 but the PEP doesn't mention it explicitly. It has been discussed already e.g., https://mail.python.org/pipermail/python-dev/2014-July/135296.html PEP 471 should explicitly reject the support for specifying a file descriptor so that a code that uses os.scandir may assume that entry.path (.full_name) attribute is always present (no exceptions due to a failure to read /proc/self/fd/NNN or an error while calling fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see http://stackoverflow.com/q/1188757 ). Reject explicitly in PEP 471 the support for dir_fd parameter +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ aka the support for paths relative to directory descriptors. Note: it is a *different* (but related) issue. ... > Notes on exception handling > --------------------------- > > ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods > rather than attributes or properties, to make it clear that they may > not be cheap operations, and they may do a system call. As a result, > these methods may raise ``OSError``. > > For example, ``DirEntry.lstat()`` will always make a system call on > POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a > ``stat()`` system call on such systems if ``readdir()`` returns a > ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under > certain conditions or on certain file systems. > > For this reason, when a user requires fine-grained error handling, > it's good to catch ``OSError`` around these method calls and then > handle as appropriate. > I suggest documenting that next(os.scandir()) may raise OSError e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir Also, document whether os.scandir() itself may raise OSError (whether opendir or other OS functions may be called before the first yield). ... os.scandir() should allow the explicit cleanup ++++++++++++++++++++++++++++++++++++++++++++++ :: with closing(os.scandir()) as entries: for _ in entries: break entries.close() is called that frees the resources if necessary, to *avoid relying on garbage-collection for managing file descriptors* (check whether it is consistent with the .close() method from the generator protocol e.g., it might be already called on the exit from the loop whether an exception happens or not without requiring the with-statement (I don't know)). *It should be possible to limit the resource life-time on non-refcounting Python implementations.* os.scandir() object may support the context manager protocol explicitly:: with os.scandir() as entries: for _ in entries: break ``.__exit__`` method may just call ``.close`` method. ... > Rejected ideas > ============== > > > Naming > ------ > > The only other real contender for this function's name was > ``iterdir()``. However, ``iterX()`` functions in Python (mostly found > in Python 2) tend to be simple iterator equivalents of their > non-iterator counterparts. For example, ``dict.iterkeys()`` is just an > iterator version of ``dict.keys()``, but the objects returned are > identical. In ``scandir()``'s case, however, the return values are > quite different objects (``DirEntry`` objects vs filename strings), so > this should probably be reflected by a difference in name -- hence > ``scandir()``. > > See some `relevant discussion on python-dev > `_. > - os.scandir() name is inconsistent with the pathlib module. pathlib.Path has `.iterdir() method `_ that generates Path instances i.e., the argument that iterdir() should return strings is not valid - os.scandir() name conflicts with POSIX. POSIX already has `scandir() function `_ Most functions in the os module are thin-wrappers of their corresponding POSIX analogs In principle, POSIX scandir(path, &entries, sel, compar) is emulated using:: entries = sorted(filter(sel, os.scandir(path)), key=cmp_to_key(compar)) so that the above code snippet could be provided in the docs. We may say that os.scandir is a pythonic analog of the POSIX function and therefore there is no conflict even if os.scandir doesn't use POSIX scandir function in its implementation. If we can't say it then a *different name/module should be used to allow adding POSIX-compatible os.scandir() in the future*. -- Akira From ncoghlan at gmail.com Thu Jul 10 06:02:01 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Jul 2014 23:02:01 -0500 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BDBE42.7050609@stoneleaf.us> References: <53BC4060.5090805@stoneleaf.us> <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> <53BDBE42.7050609@stoneleaf.us> Message-ID: On 9 Jul 2014 17:14, "Ethan Furman" wrote: > > On 07/09/2014 02:42 PM, Ben Hoyt wrote: >>> >>> >>> Okay, so using that [no platform specific] logic we should head over to the os module and remove: >>> >>> >>> ctermid, getenv, getegid... >>> >>> Okay, I'm tired of typing, but that list is not even half-way through the os >>> page, and those are all methods or attributes that are not available on >>> either Windows or Unix or some flavors of Unix. >> >> >> True, is this really the precedent we want to *aim for*. listdir() is >> cross-platform, > > > and listdir has serious performance issues, which is why you developed scandir. > >>> Oh, and all those [snipped] upper-case attributes? Yup, documented. And when we >>> >>> don't document it ourselves we often refer readers to their system >>> documentation because Python does not, in fact, return exactly the same >>> results on all platforms -- particularly when calling into the OS. >> >> >> But again, why a worse, less cross-platform API when a simple, >> cross-platform one is a method call away? > > > For the same reason we don't use code that makes threaded behavior better, but kills the single thread application. > > If the programmer would rather have consistency on all platforms rather than performance on the one being used, `info='lstat'` is the option to use. > > I like the 'onerror' API better primarily because it gives a single point to deal with the errors. This has at least a couple advantages: > > - less duplication of code: in the tree_size example, the error > handling is duplicated twice > > - readablity: with the error handling in a separate routine, one > does not have to jump around the try/except blocks looking for > what happens if there are no errors The "onerror" approach can also deal with readdir failing, which the PEP currently glosses over. I'm somewhat inclined towards the current approach in the PEP, but I'd like to see an explanation of two aspects: 1. How a scandir variant with an 'onerror' option could be implemented given the version in the PEP 2. How the existing scandir module handles the 'onerror' parameter to its directory walking function Regards, Nick. > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jul 10 09:04:53 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Jul 2014 08:04:53 +0100 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: On 10 July 2014 01:23, Victor Stinner wrote: >> As a Windows user with only a superficial understanding of how >> symlinks should behave, (...) > > FYI Windows also supports symbolic links since Windows Vista. The > feature is unknown because it is restricted to the administrator > account. Try the "mklink" command in a terminal (cmd.exe) ;-) > http://en.wikipedia.org/wiki/NTFS_symbolic_link > > ... To be honest, I never created a symlink on Windows. But since it > is supported, you need to know it to write correctly your Windows > code. I know how symlinks *do* behave, and I know how Windows supports them. What I meant was that, because Windows typically makes little use of symlinks, I have little or no intuition of what feels natural to people using an OS where symlinks are common. As someone (Tim?) pointed out later in the thread, FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor do the dirent entries on Unix). So whether or not it's "natural", the "free" functionality provided by the OS is that of lstat, not that of stat. Presumably because it's possible to build symlink-following code on top of non-following code, but not the other way around. Paul From timothy.c.delaney at gmail.com Thu Jul 10 09:35:19 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Thu, 10 Jul 2014 17:35:19 +1000 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: On 10 July 2014 17:04, Paul Moore wrote: > On 10 July 2014 01:23, Victor Stinner wrote: > >> As a Windows user with only a superficial understanding of how > >> symlinks should behave, (...) > > > > FYI Windows also supports symbolic links since Windows Vista. The > > feature is unknown because it is restricted to the administrator > > account. Try the "mklink" command in a terminal (cmd.exe) ;-) > > http://en.wikipedia.org/wiki/NTFS_symbolic_link > > > > ... To be honest, I never created a symlink on Windows. But since it > > is supported, you need to know it to write correctly your Windows > > code. > > I know how symlinks *do* behave, and I know how Windows supports them. > What I meant was that, because Windows typically makes little use of > symlinks, I have little or no intuition of what feels natural to > people using an OS where symlinks are common. > > As someone (Tim?) pointed out later in the thread, > FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor > do the dirent entries on Unix). It wasn't me (I didn't even see it - lost in the noise). > So whether or not it's "natural", the > "free" functionality provided by the OS is that of lstat, not that of > stat. Presumably because it's possible to build symlink-following code > on top of non-following code, but not the other way around. > For most uses the most natural thing is to follow symlinks (e.g. opening a symlink in a text editor should open the target). However, I think not following symlinks by default is better approach for exactly the reason Paul has noted above. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Thu Jul 10 09:41:10 2014 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 10 Jul 2014 09:41:10 +0200 Subject: [Python-Dev] buildbot.python.org down again? In-Reply-To: References: <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io> Message-ID: <53BE4396.8010409@v.loewis.de> Am 08.07.14 16:48, schrieb Guido van Rossum: > May the true owner of buildbot.python.org > stand up! Well, I think that's me (atleast by my definition of "true owner"). I requested that the machine be set up, and I deployed the software that is running the service (it was also me who originally introduced buildbot to the Python project). On the other hand, I'm not at all "in charge" of that infrastructure piece. I haven't logged into the machine in many months, and it's Antoine who currently maintains its configuration. So I don't want to be pinged when the machine is down. > (But I do think there may well not be anyone who feels they own it. And > that's a problem for its long term viability.) I don't think that's actually the case for "ownership". But then, I also think that ownership is not a very important concept for pydotorg. Most owners will likely agree that they lose their right to have a say in it when they stop maintaining the piece that they own. > Generally speaking, as an organization we should set up a process for > managing ownership of *all* infrastructure in a uniform way. I don't > mean to say that we need to manage all infrastructure uniformly, just > that we need to have a process for identifying and contacting the > owner(s) for each piece of infrastructure, as well as collecting other > information that people besides the owners might need to know. You can > use a wiki page for that list for all I care, but have a process for > what belongs there, how/when to update it, and even an owner for the > wiki page! Unfortunately, that plan keeps failing. Everybody agrees that such a list would be useful, so everybody makes their own list. I was maintaining such a list in the Python wiki for some time, until a board member decided that a publically-visible inventory is not appropriate, and it must be a password-protected wiki - where I now keep forgetting where the wiki is, in the first place, let alone remembering how to log in. Regards, Martin From victor.stinner at gmail.com Thu Jul 10 10:37:19 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 10 Jul 2014 10:37:19 +0200 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: 2014-07-10 9:04 GMT+02:00 Paul Moore : > As someone (Tim?) pointed out later in the thread, > FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor > do the dirent entries on Unix). So whether or not it's "natural", the > "free" functionality provided by the OS is that of lstat, not that of > stat. Presumably because it's possible to build symlink-following code > on top of non-following code, but not the other way around. DirEntry methods will remain free (no syscall) for directories and regular files. One extra syscall will be needed only for symlinks, which are more rare than other file types (for example, you wrote " Windows typically makes little use of symlinks"). See my pseudo-code: https://mail.python.org/pipermail/python-dev/2014-July/135439.html On Windows, _lstat and _stat attributes will be filled directly in the constructor on Windows for regular files and directories. Victor From ncoghlan at gmail.com Thu Jul 10 15:58:57 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Jul 2014 08:58:57 -0500 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: On 10 Jul 2014 03:39, "Victor Stinner" wrote: > > 2014-07-10 9:04 GMT+02:00 Paul Moore : > > As someone (Tim?) pointed out later in the thread, > > FindFirstFile/FindNextFile doesn't follow symlinks by default (and nor > > do the dirent entries on Unix). So whether or not it's "natural", the > > "free" functionality provided by the OS is that of lstat, not that of > > stat. Presumably because it's possible to build symlink-following code > > on top of non-following code, but not the other way around. > > DirEntry methods will remain free (no syscall) for directories and > regular files. One extra syscall will be needed only for symlinks, > which are more rare than other file types (for example, you wrote " > Windows typically makes little use of symlinks"). The info we want for scandir is that of the *link itself*. That makes it easy to implement things like the "followlinks" flag of os.walk. The *far end* of the link isn't relevant at this level. The docs just need to be clear that DirEntry objects always match lstat(), never stat(). Cheers, Nick. > > See my pseudo-code: > https://mail.python.org/pipermail/python-dev/2014-July/135439.html > > On Windows, _lstat and _stat attributes will be filled directly in the > constructor on Windows for regular files and directories. > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Thu Jul 10 16:19:28 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 10 Jul 2014 10:19:28 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: >> DirEntry methods will remain free (no syscall) for directories and >> regular files. One extra syscall will be needed only for symlinks, >> which are more rare than other file types (for example, you wrote " >> Windows typically makes little use of symlinks"). > > The info we want for scandir is that of the *link itself*. That makes it > easy to implement things like the "followlinks" flag of os.walk. The *far > end* of the link isn't relevant at this level. > > The docs just need to be clear that DirEntry objects always match lstat(), > never stat(). Yeah, I agree with this. It makes the function (and documentation and implementation) quite a lot simpler to understand. scandir() is a lowish-level function which deals with the directory entries themselves, and mirrors both Windows FindNextFile and POSIX readdir() in that. If the user wants follow-links behaviour, they can easily call os.stat() themselves. If this is clearly documented that seems much simpler to me (and it also seems implicit to me in the fact that you're calling is_dir() on the *entry*). Otherwise we might as well go down the route of -- the objects returned are just like pathlib.Path(), but with stat() and lstat() cached on first use. -Ben From ethan at stoneleaf.us Thu Jul 10 19:53:45 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 10 Jul 2014 10:53:45 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: Message-ID: <53BED329.5020005@stoneleaf.us> On 07/10/2014 06:58 AM, Nick Coghlan wrote: > > The info we want for scandir is that of the *link itself*. That makes it > easy to implement things like the "followlinks" flag of os.walk. The > *far end* of the link isn't relevant at this level. This also mirrors listdir, correct? scandir is simply* returning something smarter than a string. > The docs just need to be clear that DirEntry objects always match lstat(), never stat(). Agreed. -- ~Ethan~ * As well as being a less resource-intensive generator. :) From breamoreboy at yahoo.co.uk Thu Jul 10 20:59:11 2014 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 10 Jul 2014 19:59:11 +0100 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues Message-ID: I'm just curious as to why there are 54 open issues after both of these PEPs have been accepted and 384 is listed as finished. Did we hit some unforeseen technical problem which stalled development? For these and any other open issues if you need some Windows testing doing please feel free to put me on the nosy list and ask for a test run. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com From brett at python.org Thu Jul 10 21:59:37 2014 From: brett at python.org (Brett Cannon) Date: Thu, 10 Jul 2014 19:59:37 +0000 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues References: Message-ID: [for those that don't know, 3121 is extension module inti/finalization and 384 is the stable ABI] On Thu Jul 10 2014 at 3:47:03 PM, Mark Lawrence wrote: > I'm just curious as to why there are 54 open issues after both of these > PEPs have been accepted and 384 is listed as finished. Did we hit some > unforeseen technical problem which stalled development? > No, the PEPs were fine and were accepted properly. A huge portion of the open issues are from Robin Schreiber who as part of GSoC 2012 -- https://www.google-melange.com/gsoc/project/details/google/gsoc2012/robin_hood/5668600916475904 -- went through and updated the stdlib to follow the new practices introduced in the two PEPs. Not sure if there was some policy decision made that updating the code wasn't worth it or people simply didn't get around to applying the patches. -Brett > > For these and any other open issues if you need some Windows testing > doing please feel free to put me on the nosy list and ask for a test run. > > -- > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. > > Mark Lawrence > > --- > This email is free from viruses and malware because avast! Antivirus > protection is active. > http://www.avast.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jul 10 22:08:55 2014 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Jul 2014 13:08:55 -0700 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: Message-ID: I don't know the details, but I suspect that was the result of my general guideline "don't start projects cleaning up lots of stdlib code just to satisfy some new style rule or just to use a new API" -- which came from hard-won experience where such a cleanup project introduced some new bugs that weren't found by review nor by tests. Though that was admittedly a long time. Still, such a project can really sap reviewer resources for relatively little benefit. On Thu, Jul 10, 2014 at 12:59 PM, Brett Cannon wrote: > [for those that don't know, 3121 is extension module inti/finalization and > 384 is the stable ABI] > > > On Thu Jul 10 2014 at 3:47:03 PM, Mark Lawrence > wrote: > >> I'm just curious as to why there are 54 open issues after both of these >> PEPs have been accepted and 384 is listed as finished. Did we hit some >> unforeseen technical problem which stalled development? >> > > No, the PEPs were fine and were accepted properly. A huge portion of the > open issues are from Robin Schreiber who as part of GSoC 2012 -- > https://www.google-melange.com/gsoc/project/details/google/gsoc2012/robin_hood/5668600916475904 > -- went through and updated the stdlib to follow the new practices > introduced in the two PEPs. Not sure if there was some policy decision made > that updating the code wasn't worth it or people simply didn't get around > to applying the patches. > > -Brett > > >> >> For these and any other open issues if you need some Windows testing >> doing please feel free to put me on the nosy list and ask for a test run. >> >> -- >> My fellow Pythonistas, ask not what our language can do for you, ask >> what you can do for our language. >> >> Mark Lawrence >> >> --- >> This email is free from viruses and malware because avast! Antivirus >> protection is active. >> http://www.avast.com >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ >> brett%40python.org >> > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Jul 11 01:57:39 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 10 Jul 2014 19:57:39 -0400 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: Message-ID: On Thu, Jul 10, 2014 at 2:59 PM, Mark Lawrence wrote: > I'm just curious as to why there are 54 open issues after both of these > PEPs have been accepted and 384 is listed as finished. Did we hit some > unforeseen technical problem which stalled development? I tried to bring some sanity to that effort by opening a "meta issue": http://bugs.python.org/issue15787 My enthusiasm, however, vanished after I reviewed the refactoring for the datetime module: http://bugs.python.org/issue15390 My main objections are to following PEP 384 (Stable ABI) within stdlib modules. I see little benefit for the stdlib (which is shipped fresh with every new version of Python) from following those guidelines. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jul 11 02:31:09 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 10 Jul 2014 17:31:09 -0700 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: Message-ID: <53BF304D.3030901@stoneleaf.us> On 07/10/2014 04:57 PM, Alexander Belopolsky wrote: > On Thu, Jul 10, 2014 at 2:59 PM, Mark Lawrence wrote: >> >> I'm just curious as to why there are 54 open issues after both of >> these PEPs have been accepted and 384 is listed as finished. Did >> we hit some unforeseen technical problem which stalled development? > > I tried to bring some sanity to that effort by opening a "meta issue": > > http://bugs.python.org/issue15787 > > My enthusiasm, however, vanished after I reviewed the refactoring for the datetime module: > > http://bugs.python.org/issue15390 > > My main objections are to following PEP 384 (Stable ABI) within stdlib > modules. I see little benefit for the stdlib (which is shipped fresh with every new version of Python) from following > those guidelines. If we aren't going to implement the changes (and I agree there's little value for the stdlib to do so), let's mark the issues as "won't fix" and close them. And thanks, Mark, for bringing it up. -- ~Ethan~ From ethan at stoneleaf.us Fri Jul 11 05:26:05 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 10 Jul 2014 20:26:05 -0700 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> <53BDBE42.7050609@stoneleaf.us> Message-ID: <53BF594D.9060007@stoneleaf.us> On 07/09/2014 09:02 PM, Nick Coghlan wrote: > On 9 Jul 2014 17:14, "Ethan Furman" wrote: >> >> I like the 'onerror' API better primarily because it gives a single >> point to deal with the errors. [...] > > The "onerror" approach can also deal with readdir failing, which the > PEP currently glosses over. Do we want this, though? I can see an error handler for individual entries, but if one of the *dir commands fails that would seem to be fairly catastrophic. > I'm somewhat inclined towards the current approach in the PEP, but I'd like to see an explanation of two aspects: > > 1. How a scandir variant with an 'onerror' option could be implemented given the version in the PEP Here's a stab at it: def scandir_error(path, info=None, onerror=None): for entry in scandir(path): if info == 'type': try: entry.is_dir() except OSError as exc: if onerror is None: raise if not onerror(exc, entry): continue elif info == 'lstat': try: entry.lstat() except OSError as exc: if onerror is None: raise if not onerror(exc, entry): continue yield entry Here it is again with an attempt to deal with opendir/readdir/closedir exceptions: def scandir_error(path, info=None, onerror=None): entries = scandir(path) try: entry = next(entries) except StopIteration: # pass it through raise except Exception as exc: if onerror is None: raise if not onerror(exc, 'what else here?'): # what do we do on False? # what do we do on True? else: for entry in scandir(path): if info == 'type': try: entry.is_dir() except OSError as exc: if onerror is None: raise if not onerror(exc, entry): continue elif info == 'lstat': try: entry.lstat() except OSError as exc: if onerror is None: raise if not onerror(exc, entry): continue yield entry > 2. How the existing scandir module handles the 'onerror' parameter to its directory walking function Here's the first third of it from the repo: def walk(top, topdown=True, onerror=None, followlinks=False): """Like os.walk(), but faster, as it uses scandir() internally.""" # Determine which are files and which are directories dirs = [] nondirs = [] try: for entry in scandir(top): if entry.is_dir(): dirs.append(entry) else: nondirs.append(entry) except OSError as error: if onerror is not None: onerror(error) return ... -- ~Ethan~ From benhoyt at gmail.com Fri Jul 11 13:12:59 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Fri, 11 Jul 2014 07:12:59 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: <53BF594D.9060007@stoneleaf.us> References: <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> <53BDBE42.7050609@stoneleaf.us> <53BF594D.9060007@stoneleaf.us> Message-ID: [replying to python-dev this time] >> The "onerror" approach can also deal with readdir failing, which the >> PEP currently glosses over. > > > Do we want this, though? I can see an error handler for individual entries, > but if one of the *dir commands fails that would seem to be fairly > catastrophic. Very much agreed that this isn't necessary for just readdir/FindNext errors. We've never had this level of detail before -- if listdir() fails half way through (very unlikely) it just bombs with OSError and you get no entries at all. If you really really want this (again very unlikely), you can always use call next() directly and catch OSError around that call. -Ben From stefan at bytereef.org Fri Jul 11 13:46:27 2014 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 11 Jul 2014 13:46:27 +0200 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: Message-ID: <20140711114627.GA27927@sleipnir.bytereef.org> Brett Cannon wrote: > No, the PEPs were fine and were accepted properly. A huge portion of the open > issues are from Robin?Schreiber who as part of GSoC 2012 -- https:// > www.google-melange.com/gsoc/project/details/google/gsoc2012/robin_hood/ > 5668600916475904 -- went through and updated the stdlib to follow the new > practices introduced in the two PEPs. Not sure if there was some policy > decision made that updating the code wasn't worth it or people simply didn't > get around to applying the patches. Due to the frequent state lookups there is a performance problem though, which is quite significant for _decimal. Otherwise I think I would have implemented the changes already. http://bugs.python.org/issue15722 I think for speed sensitive applications it may be an idea to create a new C function (METH_STATE flag) which gets the state passed in by ceval. Other than that, looking up the state inside the module but cache it (like it's done for the _decimal context) also has reasonable performance. Also I hit the same issues that Eli mentioned here a while ago: https://mail.python.org/pipermail/python-dev/2013-August/127862.html Stefan Krah From status at bugs.python.org Fri Jul 11 18:07:43 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 11 Jul 2014 18:07:43 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140711160743.59D7856A3B@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-07-04 - 2014-07-11) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4588 (-15) closed 29141 (+55) total 33729 (+40) Open issues with patches: 2151 Issues opened (24) ================== #21918: Convert test_tools to directory http://bugs.python.org/issue21918 opened by serhiy.storchaka #21919: Changing cls.__bases__ must ensure proper metaclass inheritanc http://bugs.python.org/issue21919 opened by abusalimov #21922: PyLong: use GMP http://bugs.python.org/issue21922 opened by h.venev #21925: ResouceWarning sometimes doesn't display http://bugs.python.org/issue21925 opened by msmhrt #21927: BOM appears in stdin when using Powershell http://bugs.python.org/issue21927 opened by jason.coombs #21928: Incorrect reference to partial() in functools.wraps documentat http://bugs.python.org/issue21928 opened by Dustin.Oprea #21929: Rounding properly http://bugs.python.org/issue21929 opened by jeroen1225 #21931: Nonsense errors reported by msilib.FCICreate for bad argument http://bugs.python.org/issue21931 opened by Jeffrey.Armstrong #21933: Allow the user to change font sizes with the text pane of turt http://bugs.python.org/issue21933 opened by Lita.Cho #21934: OpenBSD has no /dev/full device http://bugs.python.org/issue21934 opened by Daniel.Dickman #21935: Implement AUTH command in smtpd. http://bugs.python.org/issue21935 opened by zvyn #21937: IDLE interactive window doesn't display unsaved-indicator http://bugs.python.org/issue21937 opened by rhettinger #21939: IDLE - Test Percolator http://bugs.python.org/issue21939 opened by sahutd #21941: Clean up turtle TPen class http://bugs.python.org/issue21941 opened by ingrid #21944: Allow copying of CodecInfo objects http://bugs.python.org/issue21944 opened by lehmannro #21946: 'python -u' yields trailing carriage return '\r' (Python2 for http://bugs.python.org/issue21946 opened by msp #21947: `Dis` module doesn't know how to disassemble generators http://bugs.python.org/issue21947 opened by hakril #21949: Document the Py_SIZE() macro. http://bugs.python.org/issue21949 opened by gregory.p.smith #21951: tcl test change crashes AIX http://bugs.python.org/issue21951 opened by David.Edelsohn #21952: fnmatch.py can appear in tracemalloc diffs http://bugs.python.org/issue21952 opened by pitrou #21953: pythonrun.c does not check std streams the same as fileio.c http://bugs.python.org/issue21953 opened by steve.dower #21955: ceval.c: implement fast path for integers with a single digit http://bugs.python.org/issue21955 opened by haypo #21956: Doc files deleted from repo are not deleted from docs.python.o http://bugs.python.org/issue21956 opened by brandon-rhodes #21957: ASCII Formfeed (FF) & ASCII Vertical Tab (VT) Have Hexadecimal http://bugs.python.org/issue21957 opened by Zero Most recent 15 issues with no replies (15) ========================================== #21957: ASCII Formfeed (FF) & ASCII Vertical Tab (VT) Have Hexadecimal http://bugs.python.org/issue21957 #21955: ceval.c: implement fast path for integers with a single digit http://bugs.python.org/issue21955 #21951: tcl test change crashes AIX http://bugs.python.org/issue21951 #21949: Document the Py_SIZE() macro. http://bugs.python.org/issue21949 #21944: Allow copying of CodecInfo objects http://bugs.python.org/issue21944 #21941: Clean up turtle TPen class http://bugs.python.org/issue21941 #21937: IDLE interactive window doesn't display unsaved-indicator http://bugs.python.org/issue21937 #21935: Implement AUTH command in smtpd. http://bugs.python.org/issue21935 #21933: Allow the user to change font sizes with the text pane of turt http://bugs.python.org/issue21933 #21931: Nonsense errors reported by msilib.FCICreate for bad argument http://bugs.python.org/issue21931 #21928: Incorrect reference to partial() in functools.wraps documentat http://bugs.python.org/issue21928 #21919: Changing cls.__bases__ must ensure proper metaclass inheritanc http://bugs.python.org/issue21919 #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 #21909: PyLong_FromString drops const http://bugs.python.org/issue21909 #21899: Futures are not marked as completed http://bugs.python.org/issue21899 Most recent 15 issues waiting for review (15) ============================================= #21953: pythonrun.c does not check std streams the same as fileio.c http://bugs.python.org/issue21953 #21947: `Dis` module doesn't know how to disassemble generators http://bugs.python.org/issue21947 #21944: Allow copying of CodecInfo objects http://bugs.python.org/issue21944 #21941: Clean up turtle TPen class http://bugs.python.org/issue21941 #21939: IDLE - Test Percolator http://bugs.python.org/issue21939 #21935: Implement AUTH command in smtpd. http://bugs.python.org/issue21935 #21934: OpenBSD has no /dev/full device http://bugs.python.org/issue21934 #21925: ResouceWarning sometimes doesn't display http://bugs.python.org/issue21925 #21922: PyLong: use GMP http://bugs.python.org/issue21922 #21918: Convert test_tools to directory http://bugs.python.org/issue21918 #21916: Create unit tests for turtle textonly http://bugs.python.org/issue21916 #21914: Create unit tests for Turtle guionly http://bugs.python.org/issue21914 #21907: Update Windows build batch scripts http://bugs.python.org/issue21907 #21906: Tools\Scripts\md5sum.py doesn't work in Python 3.x http://bugs.python.org/issue21906 #21905: RuntimeError in pickle.whichmodule when sys.modules if mutate http://bugs.python.org/issue21905 Top 10 most discussed issues (10) ================================= #21597: Allow turtledemo code pane to get wider. http://bugs.python.org/issue21597 26 msgs #21922: PyLong: use GMP http://bugs.python.org/issue21922 15 msgs #21907: Update Windows build batch scripts http://bugs.python.org/issue21907 11 msgs #10289: Document magic methods called by built-in functions http://bugs.python.org/issue10289 6 msgs #21323: CGI HTTP server not running scripts from subdirectories http://bugs.python.org/issue21323 6 msgs #21765: Idle: make 3.x HyperParser work with non-ascii identifiers. http://bugs.python.org/issue21765 5 msgs #21880: IDLE: Ability to run 3rd party code checkers http://bugs.python.org/issue21880 5 msgs #21925: ResouceWarning sometimes doesn't display http://bugs.python.org/issue21925 5 msgs #21927: BOM appears in stdin when using Powershell http://bugs.python.org/issue21927 5 msgs #8231: Unable to run IDLE without write-access to home directory http://bugs.python.org/issue8231 4 msgs Issues closed (49) ================== #5712: tkinter - askopenfilenames returns string instead of tuple in http://bugs.python.org/issue5712 closed by serhiy.storchaka #9554: test_argparse.py: use new unittest features http://bugs.python.org/issue9554 closed by berker.peksag #9745: MSVC .pdb files not created by python 2.7 distutils http://bugs.python.org/issue9745 closed by berker.peksag #9822: windows batch files are dependent on cmd current directory http://bugs.python.org/issue9822 closed by zach.ware #9973: Sometimes buildbot fails to cleanup working copy http://bugs.python.org/issue9973 closed by zach.ware #10722: IDLE's subprocess didnit make connection ..... Python 2.7 http://bugs.python.org/issue10722 closed by terry.reedy #11259: asynchat does not check if terminator is negative integer http://bugs.python.org/issue11259 closed by haypo #12523: 'str' object has no attribute 'more' [/usr/lib/python3.2/async http://bugs.python.org/issue12523 closed by haypo #14121: add a convenience C-API function for unpacking iterables http://bugs.python.org/issue14121 closed by scoder #15105: curses: wrong indentation http://bugs.python.org/issue15105 closed by ned.deily #17755: test_builtin assumes LANG=C http://bugs.python.org/issue17755 closed by ned.deily #18887: test_multiprocessing.test_connection failure on Python 2.7 http://bugs.python.org/issue18887 closed by neologix #19279: UTF-7 decoder can produce inconsistent Unicode string http://bugs.python.org/issue19279 closed by serhiy.storchaka #19283: Need support to avoid Windows CRT compatibility issue. http://bugs.python.org/issue19283 closed by loewis #19593: Use specific asserts in importlib tests http://bugs.python.org/issue19593 closed by serhiy.storchaka #19650: test_multiprocessing_spawn.test_mymanager_context() crashed wi http://bugs.python.org/issue19650 closed by haypo #20639: pathlib.PurePath.with_suffix() does not allow removing the suf http://bugs.python.org/issue20639 closed by pitrou #21365: asyncio.Task reference misses the most important fact about it http://bugs.python.org/issue21365 closed by haypo #21437: document that asyncio.ProactorEventLoop doesn't support SSL http://bugs.python.org/issue21437 closed by haypo #21646: Add tests for turtle.ScrolledCanvas http://bugs.python.org/issue21646 closed by ingrid #21680: asyncio: document event loops http://bugs.python.org/issue21680 closed by haypo #21707: modulefinder uses wrong CodeType signature in .replace_paths_i http://bugs.python.org/issue21707 closed by berker.peksag #21714: Path.with_name can construct invalid paths http://bugs.python.org/issue21714 closed by pitrou #21732: SubprocessTestsMixin.test_subprocess_terminate() hangs on "AMD http://bugs.python.org/issue21732 closed by haypo #21743: Create tests for RawTurtleScreen http://bugs.python.org/issue21743 closed by Lita.Cho #21754: Add tests for turtle.TurtleScreenBase http://bugs.python.org/issue21754 closed by ingrid #21803: Remove macro indirections in complexobject http://bugs.python.org/issue21803 closed by pitrou #21806: Add tests for turtle.TPen class http://bugs.python.org/issue21806 closed by ingrid #21844: Fix HTMLParser in unicodeless build http://bugs.python.org/issue21844 closed by ezio.melotti #21881: python cannot parse tcl value http://bugs.python.org/issue21881 closed by serhiy.storchaka #21886: asyncio: Future.set_result() called on cancelled Future raises http://bugs.python.org/issue21886 closed by python-dev #21897: frame.f_locals causes segfault on Python >=3.4.1 http://bugs.python.org/issue21897 closed by pitrou #21911: "IndexError: tuple index out of range" should include the requ http://bugs.python.org/issue21911 closed by ezio.melotti #21920: Fixed missing colon in the docs http://bugs.python.org/issue21920 closed by berker.peksag #21921: Example in asyncio event throws resource usage warning http://bugs.python.org/issue21921 closed by python-dev #21923: distutils.sysconfig.customize_compiler will try to read variab http://bugs.python.org/issue21923 closed by ned.deily #21924: Cannot import anything that imports tokenize from script calle http://bugs.python.org/issue21924 closed by ned.deily #21926: Bundle C++ compiler with Python on Windows http://bugs.python.org/issue21926 closed by loewis #21930: new assert raises syntax proposal http://bugs.python.org/issue21930 closed by ezio.melotti #21932: os.read() must use Py_ssize_t for the size parameter http://bugs.python.org/issue21932 closed by haypo #21936: test_future_exception_never_retrieved() of test_asyncio fails http://bugs.python.org/issue21936 closed by haypo #21938: Py_XDECREF statement in gen_iternext() http://bugs.python.org/issue21938 closed by pitrou #21940: IDLE - Test WidgetRedirector http://bugs.python.org/issue21940 closed by terry.reedy #21942: pydoc source not displayed in browser on Windows http://bugs.python.org/issue21942 closed by zach.ware #21943: To duplicate a list has biyective properties, not inyective on http://bugs.python.org/issue21943 closed by mark.dickinson #21945: Wrong grammar in documentation http://bugs.python.org/issue21945 closed by ezio.melotti #21948: Documentation Typo http://bugs.python.org/issue21948 closed by berker.peksag #21950: import sqlite3 not running http://bugs.python.org/issue21950 closed by alexganwd #21954: str(b'text') returns "b'text'" in interpreter http://bugs.python.org/issue21954 closed by ned.deily From andreas.r.maier at gmx.de Fri Jul 11 16:04:35 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Fri, 11 Jul 2014 16:04:35 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> Message-ID: <53BFEEF3.2060101@gmx.de> Am 09.07.2014 03:48, schrieb Raymond Hettinger: > > On Jul 7, 2014, at 4:37 PM, Andreas Maier wrote: > >> I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python. >> >> The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it. > > Once every few years, someone discovers IEEE-754, learns that NaNs > aren't supposed to be equal to themselves and becomes inspired > to open an old debate about whether the wreck Python in a effort > to make the world safe for NaNs. And somewhere along the way, > people forget that practicality beats purity. > > Here are a few thoughts on the subject that may or may not add > a little clarity ;-) > > * Python already has IEEE-754 compliant NaNs: > > assert float('NaN') != float('NaN') > > * Python already has the ability to filter-out NaNs: > > [x for x in container if not math.nan(x)] > > * In the numeric world, the most common use of NaNs is for > missing data (much like we usually use None). The property > of not being equality to itself is primarily useful in > low level code optimized to run a calculation to completion > without running frequent checks for invalid results > (much like @n/a is used in MS Excel). > > * Python also lets containers establish their own invariants > to establish correctness, improve performance, and make it > possible to reason about our programs: > > for x in c: > assert x in c > > * Containers like dicts and sets have always used the rule > that identity-implies equality. That is central to their > implementation. In particular, the check of interned > string keys relies on identity to bypass a slow > character-by-character comparison to verify equality. > > * Traditionally, a relation R is considered an equality > relation if it is reflexive, symmetric, and transitive: > > R(x, x) -> True > R(x, y) -> R(y, x) > R(x, y) ^ R(y, z) -> R(x, z) > > * Knowingly or not, programs tend to assume that all of those > hold. Test suites in particular assume that if you put > something in a container that assertIn() will pass. > > * Here are some examples of cases where non-reflexive objects > would jeopardize the pragmatism of being able to reason > about the correctness of programs: > > s = SomeSet() > s.add(x) > assert x in s > > s.remove(x) # See collections.abc.Set.remove > assert not s > > s.clear() # See collections.abc.Set.clear > asset not s > > * What the above code does is up to the implementer of the > container. If you use the Set ABC, you can choose to > implement __contains__() and discard() to use straight > equality or identity-implies equality. Nothing prevents > you from making containers that are hard to reason about. > > * The builtin containers make the choice for identity-implies > equality so that it is easier to build fast, correct code. > For the most part, this has worked out great (dictionaries > in particular have had identify checks built-in from almost > twenty years). > > * Years ago, there was a debate about whether to add an __is__() > method to allow overriding the is-operator. The push for the > change was the "pure" notion that "all operators should be > customizable". However, the idea was rejected based on the > "practical" notions that it would wreck our ability to reason > about code, it slow down all code that used identity checks, > that library modules (ours and third-party) already made > deep assumptions about what "is" means, and that people would > shoot themselves in the foot with hard to find bugs. > > Personally, I see no need to make the same mistake by removing > the identity-implies-equality rule from the built-in containers. > There's no need to upset the apple cart for nearly zero benefit. Containers delegate the equal comparison on the container to their elements; they do not apply identity-based comparison to their elements. At least that is the externally visible behavior. Only the default comparison behavior implemented on type object follows the identity-implies-equality rule. As part of my doc patch, I will upload an extension to the test_compare.py test suite, which tests all built-in containers with values whose order differs the identity order, and it shows that the value order and equality wins over identity, if implemented. > > IMO, the proposed quest for purity is misguided. > There are many practical reasons to let the builtin > containers continue work as the do now. As I said, I can accept compatibility reasons. Plus, the argument brought up by Benjamin about the desire for the the identity-implies-equality rule as a default, with no corresponding rule for order comparison (and I added both to the doc patch). Andy From andreas.r.maier at gmx.de Fri Jul 11 16:10:47 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Fri, 11 Jul 2014 16:10:47 +0200 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BB69CB.6040407@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <53BB3261.6080705@stoneleaf.us> <87bnt0ttfa.fsf@uwakimon.sk.tsukuba.ac.jp> <53BB69CB.6040407@stoneleaf.us> Message-ID: <53BFF067.7060602@gmx.de> Am 08.07.2014 05:47, schrieb Ethan Furman: > On 07/07/2014 08:34 PM, Stephen J. Turnbull wrote: >> Ethan Furman writes: >> >>> And what would be this 'sensible definition' [of value equality]? >> >> I think that's the wrong question. I suppose Andreas's point is that >> when the programmer doesn't provide a definition, there is no such >> thing as a "sensible definition" to default to. I disagree, but given >> that as the point of discussion, asking what the definition is, is moot. > > He eventually made that point, but until he did I thought he meant that > there was such a sensible default definition, he just wasn't sharing > what he thought it might be with us. My main point is that a sensible definition is up to the class designer, so (all freedom at hand) would prefer an exception as default. But that cannot be changed at this point, and maybe never will. And I don't intend to stir up that discussion again. I dropped my other point about a better default comparison (i.e. one with a result, not an exceptioN). It is not easy to define one unless one comes to types such as sequences or integral types, and they in fact have defined their own customizations for comparison. Bottom line: I'm fine with just a doc patch, and a testcase improvement :-) Andy From andreas.r.maier at gmx.de Fri Jul 11 16:23:59 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Fri, 11 Jul 2014 16:23:59 +0200 Subject: [Python-Dev] == on object tests identity in 3.x - uploaded doc patch In-Reply-To: <53BFA64A.4080807@stoneleaf.us> References: <53BA82F3.1070403@gmx.de> <53BAC2DC.9030600@stoneleaf.us> <53BAD12A.20209@gmx.de> <53BADC46.40400@stoneleaf.us> <53BB2EF9.80002@gmx.de> <1404776980.8315.139100953.6B8B5879@webmail.messagingengine.com> <53BB32B1.2090300@stoneleaf.us> <20140708015833.GD13014@ando> <53BB56B6.8030306@stoneleaf.us> <53BFA590.7000509@gmx.de> <53BFA64A.4080807@stoneleaf.us> Message-ID: <53BFF37F.8000507@gmx.de> Am 11.07.2014 10:54, schrieb Ethan Furman: > On 07/11/2014 01:51 AM, Andreas Maier wrote: >> I like the motivation provided by Benjamin and will work it into the >> doc patch for issue #12067. The NaN special case >> will also stay in. > > Cool -- you should nosy myself, D'Aprano, and Benjamin (at least) on > that issue. Done. Plus, I have uploaded a patch (v8) to issue #12067, that reflects hopefully everything that was said (to the extent it was related to comparisons). Andy From ethan at stoneleaf.us Fri Jul 11 22:54:40 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 11 Jul 2014 13:54:40 -0700 Subject: [Python-Dev] == on object tests identity in 3.x In-Reply-To: <53BFEEF3.2060101@gmx.de> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> Message-ID: <53C04F10.8070509@stoneleaf.us> On 07/11/2014 07:04 AM, Andreas Maier wrote: > Am 09.07.2014 03:48, schrieb Raymond Hettinger: >> >> Personally, I see no need to make the same mistake by removing >> the identity-implies-equality rule from the built-in containers. >> There's no need to upset the apple cart for nearly zero benefit. > > Containers delegate the equal comparison on the container to their elements; they do not apply identity-based comparison > to their elements. At least that is the externally visible behavior. If that were true, then [NaN] == [NaN] would be False, and it is not. Here is the externally visible behavior: Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20) [GCC 4.7.3] on linux Type "help", "copyright", "credits" or "license" for more information. --> NaN = float('nan') --> NaN == NaN False --> [NaN] == [NaN] True -- ~Ethan~ From nad at acm.org Sat Jul 12 03:04:14 2014 From: nad at acm.org (Ned Deily) Date: Fri, 11 Jul 2014 18:04:14 -0700 Subject: [Python-Dev] buildbot.python.org down again? References: <62321D60-1197-47A5-B455-6E5200DD52F7@stufft.io> Message-ID: In article <62321D60-1197-47A5-B455-6E5200DD52F7 at stufft.io>, Donald Stufft wrote: > On Jul 8, 2014, at 12:58 AM, Nick Coghlan wrote: > > On 7 Jul 2014 10:47, "Guido van Rossum" wrote: > > > It would still be nice to know who "the appropriate persons" are. Too > > > much of our infrastructure seems to be maintained by house elves or the > > > ITA. > > I volunteered to be the board's liaison to the infrastructure team, and > > getting more visibility around what the infrastructure *is* and how it's > > monitored and supported is going to be part of that. That will serve a > > couple of key purposes: > > - making the points of escalation clearer if anything breaks or needs > > improvement (although "infrastructure at python.org" is a good default choice) > > - making the current "todo" list of the infrastructure team more visible > > (both to calibrate resolution time expectations and to provide potential > > contributors an idea of what's involved) > > Noah has already set up http://status.python.org/ to track service status, > > I can see about getting buildbot.python.org added to the list. > We (the infrastructure team) were actually looking earlier about > buildbot.python.org and we're not entirely sure who "owns" > buildbot.python.org. > Unfortunately a lot of the *.python.org services are in a similar state where > there is no clear owner. Generally we've not wanted to just step in and take > over for fear of stepping on someones toes but it appears that perhaps > buildbot.p.o has no owner? In parallel to this discussion, I ran into Noah at a meeting the other day and we talked a bit about buildbot.python.org. As Donald noted, it sounds like he and the infrastructure team are willing to add it to the list of machines they monitor and reboot, as long as they wouldn't be expected to administer the buildbot master itself. I checked with Antoine and Martin and they are agreeable with that. So I think there is general agreement that the infrastructure team can take on uptime monitoring and rebooting of buildbot.python.org and that Antoine/Martin would be the primary/secondary contacts/owners for other administrative issues. Martin would also be happy if the infrastructure team could handle installing routine security fixes as well. I'll leave it to the interested parties to discuss it further among themselves. -- Ned Deily, nad at acm.org From eliben at gmail.com Sat Jul 12 15:15:31 2014 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 12 Jul 2014 06:15:31 -0700 Subject: [Python-Dev] Semi-official read-only Github mirror of the CPython Mercurial repository In-Reply-To: References: Message-ID: Just a quick update on this. I've finally found time to set up a VPS at DigitalOcean of myself, and I'm moving the cronjob for updating the Github mirrors to it. This lets me ramp up the update frequency. For now I'll set it to every 4 hours, but in the future I may make it even more frequent. Hopefully this will not overrun my bandwidth allocation :) The CPython mirror (https://github.com/python/cpython) has been pretty popular so far, with over 70 forks. Eli On Mon, Sep 30, 2013 at 6:09 AM, Eli Bendersky wrote: > Hi all, > > https://github.com/python/cpython is now live as a semi-official, *read > only* Github mirror of the CPython Mercurial repository. Let me know if you > have any problems/concerns. > > I still haven't decided how often to update it (considering either just N > times a day, or maybe use a Hg hook for batching). Suggestions are welcome. > > The methodology I used to create it is via hg-fast-export. I also tried to > pack and gc the git repo as much as possible before the initial Github push > - it went down from almost ~2GB to ~200MB (so this is the size of a fresh > clone right now). > > Eli > > P.S. thanks Jesse for the keys to https://github.com/python > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jul 12 17:07:03 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Jul 2014 10:07:03 -0500 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> <53BDBE42.7050609@stoneleaf.us> <53BF594D.9060007@stoneleaf.us> Message-ID: On 11 Jul 2014 12:46, "Ben Hoyt" wrote: > > [replying to python-dev this time] > > >> The "onerror" approach can also deal with readdir failing, which the > >> PEP currently glosses over. > > > > > > Do we want this, though? I can see an error handler for individual entries, > > but if one of the *dir commands fails that would seem to be fairly > > catastrophic. > > Very much agreed that this isn't necessary for just readdir/FindNext > errors. We've never had this level of detail before -- if listdir() > fails half way through (very unlikely) it just bombs with OSError and > you get no entries at all. > > If you really really want this (again very unlikely), you can always > use call next() directly and catch OSError around that call. Agreed - I think the PEP should point this out explicitly, and show that the approach it takes offers a lot of flexibility in error handling from "just let it fail", to a single try/catch around the whole loop, to try/catch just around the operations that might call lstat(), to try/catch around the individual iteration steps. os.walk remains the higher level API that most code should be using, and that has to retain the current listdir based behaviour (any error = ignore all entries in that directory) for backwards compatibility reasons. Cheers, Nick. > > -Ben > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From geertj at gmail.com Sat Jul 12 11:12:37 2014 From: geertj at gmail.com (Geert Jansen) Date: Sat, 12 Jul 2014 11:12:37 +0200 Subject: [Python-Dev] Memory BIO for _ssl In-Reply-To: References: Message-ID: On Mon, Jul 7, 2014 at 1:49 AM, Antoine Pitrou wrote: > Le 05/07/2014 14:04, Geert Jansen a ?crit : > >> Since I need this for my Gruvi async framework, I want to volunteer to >> write a patch. It should be useful as well to Py3K's asyncio and other >> async frameworks. It would be good to get some feedback before I start >> on this. > > Thanks for volunteering! This would be a very welcome addition. I have a first patch and submitted it as issue #21965 http://bugs.python.org/issue21965 I've incorporated your feedback. Regards, Geert From ncoghlan at gmail.com Sat Jul 12 17:19:56 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Jul 2014 10:19:56 -0500 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: Message-ID: On 10 Jul 2014 19:59, "Alexander Belopolsky" wrote: > > > On Thu, Jul 10, 2014 at 2:59 PM, Mark Lawrence wrote: >> >> I'm just curious as to why there are 54 open issues after both of these PEPs have been accepted and 384 is listed as finished. Did we hit some unforeseen technical problem which stalled development? > > > I tried to bring some sanity to that effort by opening a "meta issue": > > http://bugs.python.org/issue15787 > > My enthusiasm, however, vanished after I reviewed the refactoring for the datetime module: > > http://bugs.python.org/issue15390 > > My main objections are to following PEP 384 (Stable ABI) within stdlib modules. I see little benefit for the stdlib (which is shipped fresh with every new version of Python) from following those guidelines. The main downside of "do as we say, not as we do" in this case is that we miss out on the feedback loop of what the stable ABI is like to *use*. For example, the docs problem, where it's hard to tell whether an API is part of the stable ABI or not, or the performance problem Stefan mentions. Using the stable ABI for standard library extensions also serves to decouple them further from the internal details of the CPython runtime, making it more likely they will be able to run correctly on alternative interpreters (since emulating or otherwise supporting the limited API is easier than supporting the whole thing). Cheers, Nick. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Jul 12 19:00:18 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 12 Jul 2014 13:00:18 -0400 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: Message-ID: On Sat, Jul 12, 2014 at 11:19 AM, Nick Coghlan wrote: > The main downside of "do as we say, not as we do" in this case is that we > miss out on the feedback loop of what the stable ABI is like to *use*. I good start for improving the situation would be to convert the extension module templates that we ship with the Python source: http://bugs.python.org/issue15848 (xxsubtype module) http://bugs.python.org/issue15849 (xx module) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaraco at jaraco.com Sun Jul 13 16:04:17 2014 From: jaraco at jaraco.com (Jason R. Coombs) Date: Sun, 13 Jul 2014 14:04:17 +0000 Subject: [Python-Dev] Another case for frozendict Message-ID: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> I repeatedly run into situations where a frozendict would be useful, and every time I do, I go searching and find the (unfortunately rejected) PEP-416. I'd just like to share another case where having a frozendict in the stdlib would be useful to me. I was interacting with a database and had a list of results from 206 queries: >>> res = [db.cases.remove({'_id': doc['_id']}) for doc in fives] >>> len(res) 206 I can see that the results are the same for the first two queries. >>> res[0] {'n': 1, 'err': None, 'ok': 1.0} >>> res[1] {'n': 1, 'err': None, 'ok': 1.0} So I'd like to test to see if that's the case, so I try to construct a 'set' on the results, which in theory would give me a list of unique results: >>> set(res) Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'dict' I can't do that because dict is unhashable. That's reasonable, and if I had a frozen dict, I could easily work around this limitation and accomplish what I need. >>> set(map(frozendict, res)) Traceback (most recent call last): File "", line 1, in NameError: name 'frozendict' is not defined PEP-416 mentions a MappingProxyType, but that's no help. >>> res_ex = list(map(types.MappingProxyType, res)) >>> set(res_ex) Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'mappingproxy' I can achieve what I need by constructing a set on the 'items' of the dict. >>> set(tuple(doc.items()) for doc in res) {(('n', 1), ('err', None), ('ok', 1.0))} But that syntax would be nicer if the result had the same representation as the input (mapping instead of tuple of pairs). A frozendict would have readily enabled the desirable behavior. Although hashability is mentioned in the PEP under constraints, there are many use-cases that fall out of the ability to hash a dict, such as the one described above, which are not mentioned at all in use-cases for the PEP. If there's ever any interest in reviving that PEP, I'm in favor of its implementation. -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Sun Jul 13 16:13:14 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 13 Jul 2014 16:13:14 +0200 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> Message-ID: The PEP has been rejected, but the MappingProxyType is now public: $ ./python Python 3.5.0a0 (default:5af54ed3af02, Jul 12 2014, 03:13:04) >>> d={1:2} >>> import types >>> d = types.MappingProxyType(d) >>> d mappingproxy({1: 2}) >>> d[1] 2 >>> d[1] = 3 Traceback (most recent call last): File "", line 1, in TypeError: 'mappingproxy' object does not support item assignment Victor From rosuav at gmail.com Sun Jul 13 16:22:57 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 14 Jul 2014 00:22:57 +1000 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> Message-ID: On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs wrote: > I can achieve what I need by constructing a set on the ?items? of the dict. > >>>> set(tuple(doc.items()) for doc in res) > > {(('n', 1), ('err', None), ('ok', 1.0))} This is flawed; the tuple-of-tuples depends on iteration order, which may vary. It should be a frozenset of those tuples, not a tuple. Which strengthens your case; it's that easy to get it wrong in the absence of an actual frozendict. ChrisA From andreas.r.maier at gmx.de Sun Jul 13 17:13:20 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Sun, 13 Jul 2014 17:13:20 +0200 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: <53C04F10.8070509@stoneleaf.us> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> Message-ID: <53C2A210.80902@gmx.de> Am 11.07.2014 22:54, schrieb Ethan Furman: > On 07/11/2014 07:04 AM, Andreas Maier wrote: >> Am 09.07.2014 03:48, schrieb Raymond Hettinger: >>> >>> Personally, I see no need to make the same mistake by removing >>> the identity-implies-equality rule from the built-in containers. >>> There's no need to upset the apple cart for nearly zero benefit. >> >> Containers delegate the equal comparison on the container to their >> elements; they do not apply identity-based comparison >> to their elements. At least that is the externally visible behavior. > > If that were true, then [NaN] == [NaN] would be False, and it is not. > > Here is the externally visible behavior: > > Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20) > [GCC 4.7.3] on linux > Type "help", "copyright", "credits" or "license" for more information. > --> NaN = float('nan') > --> NaN == NaN > False > --> [NaN] == [NaN] > True Ouch, that hurts ;-) First, the delegation of sequence equality to element equality is not something I have come up with during my doc patch. It has always been in 5.9 Comparisons of the Language Reference (copied from Python 3.4): "Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length." Second, if not by delegation to equality of its elements, how would the equality of sequences defined otherwise? But your test is definitely worth having a closer look at. I have broadened the test somewhat and that brings up further questions. Here is the test output, and a discussion of the results (test program try_eq.py and its output test_eq.out are attached to issue #12067): Test #1: Different equal int objects: obj1: type=, str=257, id=39305936 obj2: type=, str=257, id=39306160 a) obj1 is obj2: False b) obj1 == obj2: True c) [obj1] == [obj2]: True d) {obj1:'v'} == {obj2:'v'}: True e) {'k':obj1} == {'k':obj2}: True f) obj1 == obj2: True Discussion: Case 1.c) can be interpreted that the list delegates its == to the == on its elements. It cannot be interpreted to delegate to identity comparison. That is consistent with how everyone (I hope ;-) would expect int objects to behave, or lists or dicts of them. The motivation for case f) is explained further down, it has to do with caching. Test #2: Same int object: obj1: type=, str=257, id=39305936 obj2: type=, str=257, id=39305936 a) obj1 is obj2: True b) obj1 == obj2: True c) [obj1] == [obj2]: True d) {obj1:'v'} == {obj2:'v'}: True e) {'k':obj1} == {'k':obj2}: True f) obj1 == obj2: True -> No surprises (I hope). Test #3: Different equal float objects: obj1: type=, str=257.0, id=5734664 obj2: type=, str=257.0, id=5734640 a) obj1 is obj2: False b) obj1 == obj2: True c) [obj1] == [obj2]: True d) {obj1:'v'} == {obj2:'v'}: True e) {'k':obj1} == {'k':obj2}: True f) obj1 == obj2: True Discussion: I added this test only to show that float NaN is a special case, and that this test for float objects - that are not NaN - behaves like test #1 for int objects. Test #4: Same float object: obj1: type=, str=257.0, id=5734664 obj2: type=, str=257.0, id=5734664 a) obj1 is obj2: True b) obj1 == obj2: True c) [obj1] == [obj2]: True d) {obj1:'v'} == {obj2:'v'}: True e) {'k':obj1} == {'k':obj2}: True f) obj1 == obj2: True -> Same as test #2, hopefully no surprises. Test #5: Different float NaN objects: obj1: type=, str=nan, id=5734784 obj2: type=, str=nan, id=5734976 a) obj1 is obj2: False b) obj1 == obj2: False c) [obj1] == [obj2]: False d) {obj1:'v'} == {obj2:'v'}: False e) {'k':obj1} == {'k':obj2}: False f) obj1 == obj2: False Discussion: Here, the list behaves as I would expect under the rule that it delegates equality to its elements. Case c) allows that interpretation. However, an interpretation based on identity would also be possible. Test #6: Same float NaN object: obj1: type=, str=nan, id=5734784 obj2: type=, str=nan, id=5734784 a) obj1 is obj2: True b) obj1 == obj2: False c) [obj1] == [obj2]: True d) {obj1:'v'} == {obj2:'v'}: True e) {'k':obj1} == {'k':obj2}: True f) obj1 == obj2: False Discussion (this is Ethan's example): Case 6.b) shows the special behavior of float NaN that is documented: a float NaN object is the same as itself but unequal to itself. Case 6.c) is the surprising case. It could be interpreted in two ways (at least that's what I found): 1) The comparison is based on identity of the float objects. But that is inconsistent with test #4. And why would the list special-case NaN comparison in such a way that it ends up being inconsistent with the special definition of NaN (outside of the list)? 2) The list does not always delegate to element equality, but attempts to optimize if the objects are the same (same identity). We will see later that that happens. Further, when comparing float NaNs of the same identity, the list implementation forgot to special-case NaNs. Which would be a bug, IMHO. I did not analyze the C implementation, so this is all speculation based upon external visible behavior. Test #7: Different objects (with equal x) of class C (C.__eq__() implemented with equality of x, C.__ne__() returning NotImplemented): obj1: type=, str=C(256), id=39406504 obj2: type=, str=C(256), id=39406616 a) obj1 is obj2: False C.__eq__(): self=39406504, other=39406616, returning True b) obj1 == obj2: True C.__eq__(): self=39406504, other=39406616, returning True c) [obj1] == [obj2]: True C.__eq__(): self=39406616, other=39406504, returning True d) {obj1:'v'} == {obj2:'v'}: True C.__eq__(): self=39406504, other=39406616, returning True e) {'k':obj1} == {'k':obj2}: True C.__eq__(): self=39406504, other=39406616, returning True f) obj1 == obj2: True The __eq__() and __ne__() implementations each print a debug message. The __ne__() is only defined to verify that it is not invoked, and that the inherited default __ne__() does not chime in. Discussion: Here we see that the list equality comparison does invoke the element equality. However, the picture becomes more complex further down. Test #8: Same object of class C (C.__eq__() implemented with equality of x, C.__ne__() returning NotImplemented): obj1: type=, str=C(256), id=39406504 obj2: type=, str=C(256), id=39406504 a) obj1 is obj2: True C.__eq__(): self=39406504, other=39406504, returning True b) obj1 == obj2: True c) [obj1] == [obj2]: True d) {obj1:'v'} == {obj2:'v'}: True e) {'k':obj1} == {'k':obj2}: True C.__eq__(): self=39406504, other=39406504, returning True f) obj1 == obj2: True Discussion: The == on the class C objects in case 8.b) invokes __eq__(), even though the objects are the same object. This can be explained by the desire in Python that classes should be able not to be reflexive, if needed. Like float NaN, for example. Now, the list equality in case 8.c) is interesting. The list equality does not invoke element equality. Even though object equality in case 8.b) did not assume reflexivity and invoked the __eq__() method, the list seems to assume reflexivity and seems to go by object identity. The only other potential explanation (that I found) would be that some aspects of the comparison behavior are cached. That's why I added the cases f), which show that caching for comparison results does not happen (the __eq__() method is invoked again). So we are back to discussing why element equality does not assume reflexivity, but list equality does. IMHO, that is another bug, or maybe the same one. Test #9: Different objects (with equal x) of class D (D.__eq__() implemented with inequality of x, D.__ne__() returning NotImplemented): obj1: type=, str=C(256), id=39407064 obj2: type=, str=C(256), id=39406952 a) obj1 is obj2: False D.__eq__(): self=39407064, other=39406952, returning False b) obj1 == obj2: False D.__eq__(): self=39407064, other=39406952, returning False c) [obj1] == [obj2]: False D.__eq__(): self=39406952, other=39407064, returning False d) {obj1:'v'} == {obj2:'v'}: False D.__eq__(): self=39407064, other=39406952, returning False e) {'k':obj1} == {'k':obj2}: False D.__eq__(): self=39407064, other=39406952, returning False f) obj1 == obj2: False Discussion: Class D implements __eq__() by != on the data attribute. This test does not really show any surprises, and is consistent with the theory that list comparison delegates to element comparison. This is really just a preparation for the next test, that uses the same object of this class. Test #10: Same object of class D (D.__eq__() implemented with inequality of x, D.__ne__() returning NotImplemented): obj1: type=, str=C(256), id=39407064 obj2: type=, str=C(256), id=39407064 a) obj1 is obj2: True D.__eq__(): self=39407064, other=39407064, returning False b) obj1 == obj2: False c) [obj1] == [obj2]: True d) {obj1:'v'} == {obj2:'v'}: True e) {'k':obj1} == {'k':obj2}: True D.__eq__(): self=39407064, other=39407064, returning False f) obj1 == obj2: False Discussion: The inequality-based implementation of __eq__() explains case 10.b). It is surprising (to me) that the list comparison in case 10.c) returns True. If one compares that to case 9.c), one could believe that the identities of the objects are used for both cases. But why would the list not respect the result of __eq__() if it is implemented? This behavior seems at least to be consistent with surprise of case 6.c) In order to not just rely on the external behavior, I started digging into the C implementation. For list equality comparison, I started at list_richcompare() which uses PyObject_RichCompareBool(), which shortcuts its result based on identity comparison, and thus enforces reflexitivity. The comment on line 714 in object.c in PyObject_RichCompareBool() also confirms that: /* Quick result when objects are the same. Guarantees that identity implies equality. */ IMHO, we need to discuss whether we are serious with the direction that was claimed earlier in this thread, that reflexivity (i.e. identity implies equality) should be decided upon by the classes and not by the Python language. As I see it, we have some pieces of code that enforce reflexivity, and some that don't. Andy From steve at pearwood.info Sun Jul 13 18:23:03 2014 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 14 Jul 2014 02:23:03 +1000 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: <53C2A210.80902@gmx.de> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> Message-ID: <20140713162249.GP5705@ando> On Sun, Jul 13, 2014 at 05:13:20PM +0200, Andreas Maier wrote: > Second, if not by delegation to equality of its elements, how would the > equality of sequences defined otherwise? Wow. I'm impressed by the amount of detailed effort you've put into investigating this. (Too much detail to absorb, I'm afraid.) But perhaps you might have just asked on the python-list at python.org mailing list, or here, where we would have told you the answer: list __eq__ first checks element identity before going on to check element equality. If you can read C, you might like to check the list source code: http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c but if I'm reading it correctly, list.__eq__ conceptually looks something like this: def __eq__(self, other): if not isinstance(other, list): return NotImplemented if len(other) != len(self): return False for a, b in zip(self, other): if not (a is b or a == b): return False return True (The actual code is a bit more complex than that, since there is a single function, list_richcompare, which handles all the rich comparisons.) The critical test is PyObject_RichCompareBool here: http://hg.python.org/cpython/file/22e5a85ba840/Objects/object.c which explicitly says: /* Quick result when objects are the same. Guarantees that identity implies equality. */ [...] > I added this test only to show that float NaN is a special case, NANs are not a special case. List __eq__ treats all object types identically (pun intended): py> class X: ... def __eq__(self, other): return False ... py> x = X() py> x == x False py> [x] == [X()] False py> [x] == [x] True [...] > Case 6.c) is the surprising case. It could be interpreted in two ways > (at least that's what I found): > > 1) The comparison is based on identity of the float objects. But that is > inconsistent with test #4. And why would the list special-case NaN > comparison in such a way that it ends up being inconsistent with the > special definition of NaN (outside of the list)? It doesn't. NANs are not special cased in any way. This was discussed to death some time ago, both on python-dev and python-ideas. If you're interested, you can start here: https://mail.python.org/pipermail/python-list/2012-October/633992.html which is in the middle of one of the threads, but at least it gets you to the right time period. > 2) The list does not always delegate to element equality, but attempts > to optimize if the objects are the same (same identity). Right! It's not just lists -- I believe that tuples, dicts and sets behave the same way. > We will see > later that that happens. Further, when comparing float NaNs of the same > identity, the list implementation forgot to special-case NaNs. Which > would be a bug, IMHO. "Forgot"? I don't think the behaviour of list comparisons is an accident. NAN equality is non-reflexive. Very few other things are the same. It would be seriously weird if alist == alist could return False. You'll note that the IEEE-754 standard has nothing to say about the behaviour of Python lists containing NANs, so we're free to pick whatever behaviour makes the most sense for Python, and that is to minimise the "Gotcha!" factor. NANs are a gotcha to anyone who doesn't know IEEE-754, and possibly even some who do. I will go to the barricades to fight to keep the non-reflexivity of NANs *in isolation*, but I believe that Python has made the right decision to treat lists containing NANs the same as everything else. NAN == NAN # obeys IEEE-754 semantics and returns False [NAN] == [NAN] # obeys standard expectation that equality is reflexive This behaviour is not a bug, it is a feature. As far as I am concerned, this only needs documenting. If anyone needs list equality to honour the special behaviour of NANs, write a subclass or an equal() function. -- Steven From rosuav at gmail.com Sun Jul 13 18:34:20 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 14 Jul 2014 02:34:20 +1000 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: <20140713162249.GP5705@ando> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> Message-ID: On Mon, Jul 14, 2014 at 2:23 AM, Steven D'Aprano wrote: >> We will see >> later that that happens. Further, when comparing float NaNs of the same >> identity, the list implementation forgot to special-case NaNs. Which >> would be a bug, IMHO. > > "Forgot"? I don't think the behaviour of list comparisons is an > accident. Well, "forgot" is on the basis that the identity check is intended to be a mere optimization. If that were the case ("don't actually call __eq__ when you reckon it'll return True"), then yes, failing to special-case NaN would be a bug. But since it's intended behaviour, as explained further down, it's not a bug and not the result of forgetfulness. ChrisA From wizzat at gmail.com Sun Jul 13 18:50:53 2014 From: wizzat at gmail.com (Mark Roberts) Date: Sun, 13 Jul 2014 09:50:53 -0700 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> Message-ID: I find it handy to use named tuple as my database mapping type. It allows you to perform this behavior seamlessly. -Mark > On Jul 13, 2014, at 7:04, "Jason R. Coombs" wrote: > > I repeatedly run into situations where a frozendict would be useful, and every time I do, I go searching and find the (unfortunately rejected) PEP-416. I?d just like to share another case where having a frozendict in the stdlib would be useful to me. > > I was interacting with a database and had a list of results from 206 queries: > > >>> res = [db.cases.remove({'_id': doc['_id']}) for doc in fives] > >>> len(res) > 206 > > I can see that the results are the same for the first two queries. > > >>> res[0] > {'n': 1, 'err': None, 'ok': 1.0} > >>> res[1] > {'n': 1, 'err': None, 'ok': 1.0} > > So I?d like to test to see if that?s the case, so I try to construct a ?set? on the results, which in theory would give me a list of unique results: > > >>> set(res) > Traceback (most recent call last): > File "", line 1, in > TypeError: unhashable type: 'dict' > > I can?t do that because dict is unhashable. That?s reasonable, and if I had a frozen dict, I could easily work around this limitation and accomplish what I need. > > >>> set(map(frozendict, res)) > Traceback (most recent call last): > File "", line 1, in > NameError: name 'frozendict' is not defined > > PEP-416 mentions a MappingProxyType, but that?s no help. > > >>> res_ex = list(map(types.MappingProxyType, res)) > >>> set(res_ex) > Traceback (most recent call last): > File "", line 1, in > TypeError: unhashable type: 'mappingproxy' > > I can achieve what I need by constructing a set on the ?items? of the dict. > > >>> set(tuple(doc.items()) for doc in res) > {(('n', 1), ('err', None), ('ok', 1.0))} > > But that syntax would be nicer if the result had the same representation as the input (mapping instead of tuple of pairs). A frozendict would have readily enabled the desirable behavior. > > Although hashability is mentioned in the PEP under constraints, there are many use-cases that fall out of the ability to hash a dict, such as the one described above, which are not mentioned at all in use-cases for the PEP. > > If there?s ever any interest in reviving that PEP, I?m in favor of its implementation. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/wizzat%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jul 13 20:11:58 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Jul 2014 13:11:58 -0500 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> Message-ID: On 13 July 2014 11:34, Chris Angelico wrote: > On Mon, Jul 14, 2014 at 2:23 AM, Steven D'Aprano wrote: >>> We will see >>> later that that happens. Further, when comparing float NaNs of the same >>> identity, the list implementation forgot to special-case NaNs. Which >>> would be a bug, IMHO. >> >> "Forgot"? I don't think the behaviour of list comparisons is an >> accident. > > Well, "forgot" is on the basis that the identity check is intended to > be a mere optimization. If that were the case ("don't actually call > __eq__ when you reckon it'll return True"), then yes, failing to > special-case NaN would be a bug. But since it's intended behaviour, as > explained further down, it's not a bug and not the result of > forgetfulness. Right, it's not a mere optimisation - it's the only way to get containers to behave sensibly. Otherwise we'd end up with nonsense like: >>> x = float("nan") >>> x in [x] False That currently returns True because of the identity check - it would return False if we delegated the check to float.__eq__ because the defined IEEE754 behaviour for NaN's breaks the mathematical definition of an equivalence class as a transitive, reflexive and commutative operation. (It breaks it for *good reasons*, but we still need to figure out a way of dealing with the impedance mismatch between the definition of floats and the definition of container invariants like "assert x in [x]") The current approach means that the lack of reflexivity of NaN's stays confined to floats and similar types - it doesn't leak out and infect the behaviour of the container types. What we've never figured out is a good place to *document* it. I thought there was an open bug for that, but I can't find it right now. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Sun Jul 13 20:16:11 2014 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 14 Jul 2014 04:16:11 +1000 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> Message-ID: On Mon, Jul 14, 2014 at 4:11 AM, Nick Coghlan wrote: > What we've never figured out is a good place to *document* it. I > thought there was an open bug for that, but I can't find it right now. Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found a parallel explanation of sequence equality. ChrisA From ncoghlan at gmail.com Sun Jul 13 20:23:42 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Jul 2014 13:23:42 -0500 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> Message-ID: On 13 July 2014 13:16, Chris Angelico wrote: > On Mon, Jul 14, 2014 at 4:11 AM, Nick Coghlan wrote: >> What we've never figured out is a good place to *document* it. I >> thought there was an open bug for that, but I can't find it right now. > > Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found > a parallel explanation of sequence equality. We might need to expand the tables of sequence operations to cover equality and inequality checks - those are currently missing. Cheers, Nick. > > ChrisA > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dw+python-dev at hmmz.org Sun Jul 13 20:43:28 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Sun, 13 Jul 2014 18:43:28 +0000 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> Message-ID: <20140713184328.GA6345@k2> On Sun, Jul 13, 2014 at 02:04:17PM +0000, Jason R. Coombs wrote: > PEP-416 mentions a MappingProxyType, but that?s no help. Well, it kindof is. By combining MappingProxyType and UserDict the desired effect can be achieved concisely: import collections import types class frozendict(collections.UserDict): def __init__(self, d, **kw): if d: d = d.copy() d.update(kw) else: d = kw self.data = types.MappingProxyType(d) _h = None def __hash__(self): if self._h is None: self._h = sum(map(hash, self.data.items())) return self._h def __repr__(self): return repr(dict(self)) > Although hashability is mentioned in the PEP under constraints, there are many > use-cases that fall out of the ability to hash a dict, such as the one > described above, which are not mentioned at all in use-cases for the PEP. > If there?s ever any interest in reviving that PEP, I?m in favor of its > implementation. In its previous form, the PEP seemed more focused on some false optimization capabilities of a read-only type, rather than as here, the far more interesting hashability properties. It might warrant a fresh PEP to more thoroughly investigate this angle. David From dw+python-dev at hmmz.org Sun Jul 13 20:50:18 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Sun, 13 Jul 2014 18:50:18 +0000 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <20140713184328.GA6345@k2> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <20140713184328.GA6345@k2> Message-ID: <20140713185018.GB6345@k2> On Sun, Jul 13, 2014 at 06:43:28PM +0000, dw+python-dev at hmmz.org wrote: > if d: > d = d.copy() To cope with iterables, "d = d.copy()" should have read "d = dict(d)". David From ncoghlan at gmail.com Sun Jul 13 21:09:25 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Jul 2014 14:09:25 -0500 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <20140713184328.GA6345@k2> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <20140713184328.GA6345@k2> Message-ID: On 13 July 2014 13:43, wrote: > In its previous form, the PEP seemed more focused on some false > optimization capabilities of a read-only type, rather than as here, the > far more interesting hashability properties. It might warrant a fresh > PEP to more thoroughly investigate this angle. RIght, the use case would be "frozendict as a simple alternative to a full class definition", but even less structured than namedtuple in that the keys may vary as well. That difference means that frozendict applies more cleanly to semi-structured data manipulated as dictionaries (think stuff deserialised from JSON) than namedtuple does. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From marko at pacujo.net Sun Jul 13 21:54:02 2014 From: marko at pacujo.net (Marko Rauhamaa) Date: Sun, 13 Jul 2014 22:54:02 +0300 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: (Nick Coghlan's message of "Sun, 13 Jul 2014 13:11:58 -0500") References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> Message-ID: <8738e56nmt.fsf@elektro.pacujo.net> Nick Coghlan : > Right, it's not a mere optimisation - it's the only way to get > containers to behave sensibly. Otherwise we'd end up with nonsense > like: > >>>> x = float("nan") >>>> x in [x] > False Why is that nonsense? I mean, why is it any more nonsense than >>> x == x False Anyway, personally, I'm perfectly "happy" to live with the choices of past generations, regardless of whether they were good or not. What you absolutely don't want to do is "correct" the choices of past generations. Marko From 4kir4.1i at gmail.com Sun Jul 13 22:05:27 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Mon, 14 Jul 2014 00:05:27 +0400 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> Message-ID: <87ion1owhk.fsf@gmail.com> Nick Coghlan writes: ... > definition of floats and the definition of container invariants like > "assert x in [x]") > > The current approach means that the lack of reflexivity of NaN's stays > confined to floats and similar types - it doesn't leak out and infect > the behaviour of the container types. > > What we've never figured out is a good place to *document* it. I > thought there was an open bug for that, but I can't find it right now. There was related issue "Tuple comparisons with NaNs are broken" http://bugs.python.org/issue21873 but it was closed as "not a bug" despite the corresponding behavior is *not documented* anywhere. -- Akira From benhoyt at gmail.com Mon Jul 14 02:12:16 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Sun, 13 Jul 2014 20:12:16 -0400 Subject: [Python-Dev] Updates to PEP 471, the os.scandir() proposal In-Reply-To: References: <53BC5309.6000605@stoneleaf.us> <53BC9B8B.40509@stoneleaf.us> <53BD4670.9080100@stoneleaf.us> <53BD6F38.7090000@stoneleaf.us> <53BD9557.80709@stoneleaf.us> <53BDA99C.3020101@stoneleaf.us> <53BDBE42.7050609@stoneleaf.us> <53BF594D.9060007@stoneleaf.us> Message-ID: >> Very much agreed that this isn't necessary for just readdir/FindNext >> errors. We've never had this level of detail before -- if listdir() >> fails half way through (very unlikely) it just bombs with OSError and >> you get no entries at all. >> >> If you really really want this (again very unlikely), you can always >> use call next() directly and catch OSError around that call. > > Agreed - I think the PEP should point this out explicitly, and show that the > approach it takes offers a lot of flexibility in error handling from "just > let it fail", to a single try/catch around the whole loop, to try/catch just > around the operations that might call lstat(), to try/catch around the > individual iteration steps. Good point. It'd be good to mention this explicitly in the PEP and have another example or two of the different levels of errors handling. > os.walk remains the higher level API that most code should be using, and > that has to retain the current listdir based behaviour (any error = ignore > all entries in that directory) for backwards compatibility reasons. Yes, definitely. -Ben From benhoyt at gmail.com Mon Jul 14 02:33:16 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Sun, 13 Jul 2014 20:33:16 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() Message-ID: Hi folks, Thanks Victor, Nick, Ethan, and others for continued discussion on the scandir PEP 471 (most recent thread starts at https://mail.python.org/pipermail/python-dev/2014-July/135377.html). Just an aside ... I was reminded again recently why scandir() matters: a scandir user emailed me the other day, saying "I used scandir to dump the contents of a network dir in under 15 seconds. 13 root dirs, 60,000 files in the structure. This will replace some old VBA code embedded in a spreadsheet that was taking 15-20 minutes to do the exact same thing." I asked if he could run scandir's benchmark.py on his directory tree, and here's what it printed out: C:\Python34\scandir-master>benchmark.py "\\my\network\directory" Using fast C version of scandir Priming the system's cache... Benchmarking walks on \\my\network\directory, repeat 1/3... Benchmarking walks on \\my\network\directory, repeat 2/3... Benchmarking walks on \\my\network\directory, repeat 3/3... os.walk took 8739.851s, scandir.walk took 129.500s -- 67.5x as fast That's right -- os.walk() with scandir was almost 70x as fast as the current version! Admittedly this is a network file system, but that's still a real and important use case. It really pays not to throw away information the OS gives you for free. :-) On the recent python-dev thread, Victor especially made some well thought out suggestions. It seems to me there's general agreement that the basic API in PEP 471 is good (with Ethan not a fan at first, but it seems he's on board after further discussion :-). That said, I think there's basically one thing remaining to decide: whether or not to have DirEntry.is_dir() and .is_file() follow symlinks by default. I think Victor made a pretty good case that: (a) following links is usually what you want (b) that's the precedent set by the similar functions os.path.isdir() and pathlib.Path.is_dir(), so to do otherwise would be confusing (c) with the non-link-following version, if you wanted to follow links you'd have to say something like "if (entry.is_symlink() and os.path.isdir(entry.full_name)) or entry.is_dir()" instead of just "if entry.is_dir()" (d) it's error prone to have to do (c), as I found out recently when I had a bug in my implementation of os.walk() with scandir -- I had a bug due to getting this exact test wrong If we go with Victor's link-following .is_dir() and .is_file(), then we probably need to add his suggestion of a follow_symlinks=False parameter (defaults to True). Either that or you have to say "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit less nice. As a KISS enthusiast, I admit I'm still somewhat partial to the DirEntry methods just returning (non-link following) info about the *directory entry* itself. However, I can definitely see the error-proneness of that, and the advantages given the points above. So I guess I'm on the fence. Given the above arguments for symlink-following is_dir()/is_file() methods (have I missed any, Victor?), what do others think? I'd be very keen to come to a consensus on this, so that I can make some final updates to the PEP and see about getting it accepted and/or implemented. :-) -Ben From timothy.c.delaney at gmail.com Mon Jul 14 02:52:42 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 14 Jul 2014 10:52:42 +1000 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: On 14 July 2014 10:33, Ben Hoyt wrote: > If we go with Victor's link-following .is_dir() and .is_file(), then > we probably need to add his suggestion of a follow_symlinks=False > parameter (defaults to True). Either that or you have to say > "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit > less nice. > Absolutely agreed that follow_symlinks is the way to go, disagree on the default value. > Given the above arguments for symlink-following is_dir()/is_file() > methods (have I missed any, Victor?), what do others think? > I would say whichever way you go, someone will assume the opposite. IMO not following symlinks by default is safer. If you follow symlinks by default then everyone has the following issues: 1. Crossing filesystems (including onto network filesystems); 2. Recursive directory structures (symlink to a parent directory); 3. Symlinks to non-existent files/directories; 4. Symlink to an absolutely huge directory somewhere else (very annoying if you just wanted to do a directory sizer ...). If follow_symlinks=False by default, only those who opt-in have to deal with the above. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jul 14 04:17:33 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Jul 2014 21:17:33 -0500 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: On 13 Jul 2014 20:54, "Tim Delaney" wrote: > > On 14 July 2014 10:33, Ben Hoyt wrote: >> >> >> >> If we go with Victor's link-following .is_dir() and .is_file(), then >> we probably need to add his suggestion of a follow_symlinks=False >> parameter (defaults to True). Either that or you have to say >> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit >> less nice. > > > Absolutely agreed that follow_symlinks is the way to go, disagree on the default value. > >> >> Given the above arguments for symlink-following is_dir()/is_file() >> methods (have I missed any, Victor?), what do others think? > > > I would say whichever way you go, someone will assume the opposite. IMO not following symlinks by default is safer. If you follow symlinks by default then everyone has the following issues: > > 1. Crossing filesystems (including onto network filesystems); > > 2. Recursive directory structures (symlink to a parent directory); > > 3. Symlinks to non-existent files/directories; > > 4. Symlink to an absolutely huge directory somewhere else (very annoying if you just wanted to do a directory sizer ...). > > If follow_symlinks=False by default, only those who opt-in have to deal with the above. Or the ever popular symlink to "." (or a directory higher in the tree). I think os.walk() is a good source of inspiration here: call the flag "followlink" and default it to False. Cheers, Nick. > > Tim Delaney > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Mon Jul 14 04:29:12 2014 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 14 Jul 2014 12:29:12 +1000 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: On 14 July 2014 12:17, Nick Coghlan wrote: > > I think os.walk() is a good source of inspiration here: call the flag > "followlink" and default it to False. > Actually, that's "followlinks", and I'd forgotten that os.walk() defaulted to not follow - definitely behaviour to match IMO :) Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jul 14 04:55:37 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 13 Jul 2014 19:55:37 -0700 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: <53C2A210.80902@gmx.de> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> Message-ID: <53C346A9.3050200@stoneleaf.us> On 07/13/2014 08:13 AM, Andreas Maier wrote: > Am 11.07.2014 22:54, schrieb Ethan Furman: >> >> Here is the externally visible behavior: >> >> Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20) >> [GCC 4.7.3] on linux >> Type "help", "copyright", "credits" or "license" for more information. >> --> NaN = float('nan') >> --> NaN == NaN >> False >> --> [NaN] == [NaN] >> True > > Ouch, that hurts ;-) Yeah, I've been bitten enough times that now I try to always test code before I post. ;) > Test #8: Same object of class C > (C.__eq__() implemented with equality of x, > C.__ne__() returning NotImplemented): > > obj1: type=, str=C(256), id=39406504 > obj2: type=, str=C(256), id=39406504 > > a) obj1 is obj2: True > C.__eq__(): self=39406504, other=39406504, returning True This is interesting/weird/odd -- why is __eq__ being called for an 'is' test? --- test_eq.py ---------------------------- class TestEqTrue: def __eq__(self, other): print('Test.__eq__ returning True') return True class TestEqFalse: def __eq__(self, other): print('Test.__eq__ returning False') return False tet = TestEqTrue() print(tet is tet) print(tet in [tet]) tef = TestEqFalse() print(tef is tef) print(tef in [tef]) ------------------------------------------- When I run this all I get is four Trues, never any messages about being in __eq__. How did you get that result? -- ~Ethan~ From ethan at stoneleaf.us Mon Jul 14 06:52:37 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 13 Jul 2014 21:52:37 -0700 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: <53C36215.2080206@stoneleaf.us> On 07/13/2014 05:33 PM, Ben Hoyt wrote: > > On the recent python-dev thread, Victor especially made some well > thought out suggestions. It seems to me there's general agreement that > the basic API in PEP 471 is good (with Ethan not a fan at first, but > it seems he's on board after further discussion :-). I would still like to have 'info' and 'onerror' added to the basic API, but I agree that having methods and caching on first lookup is good. > That said, I think there's basically one thing remaining to decide: > whether or not to have DirEntry.is_dir() and .is_file() follow > symlinks by default. We should have a flag for that, and default it to False: scandir(path, *, followlinks=False, info=None, onerror=None) -- ~Ethan~ From ethan at stoneleaf.us Mon Jul 14 07:51:04 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 13 Jul 2014 22:51:04 -0700 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: <53C36BBA.8010406@gmx.de> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <53C346A9.3050200@stoneleaf.us> <53C36BBA.8010406@gmx.de> Message-ID: <53C36FC8.3000707@stoneleaf.us> On 07/13/2014 10:33 PM, Andreas Maier wrote: > Am 14.07.2014 04:55, schrieb Ethan Furman: >> On 07/13/2014 08:13 AM, Andreas Maier wrote: >>> Test #8: Same object of class C >>> (C.__eq__() implemented with equality of x, >>> C.__ne__() returning NotImplemented): >>> >>> obj1: type=, str=C(256), id=39406504 >>> obj2: type=, str=C(256), id=39406504 >>> >>> a) obj1 is obj2: True >>> C.__eq__(): self=39406504, other=39406504, returning True >> >> This is interesting/weird/odd -- why is __eq__ being called for an 'is' >> test? > > The debug messages are printed before the result is printed. So this is the debug message for the next case, 8.b). Ah, whew! That's a relief. > Sorry for not explaining it. Had I been reading more closely I would (hopefully) have noticed that, but I was headed out the door at the time. -- ~Ethan~ From victor.stinner at gmail.com Mon Jul 14 10:18:31 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 14 Jul 2014 10:18:31 +0200 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: 2014-07-14 2:33 GMT+02:00 Ben Hoyt : > If we go with Victor's link-following .is_dir() and .is_file(), then > we probably need to add his suggestion of a follow_symlinks=False > parameter (defaults to True). Either that or you have to say > "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit > less nice. You forgot one of my argument: we must have exactly the same API than os.path.is_dir() and pathlib.Path.is_dir(), because it would be very confusing (source of bugs) to have a different behaviour. Since these functions don't have any parameter (there is no such follow_symlink(s) parameter), I'm opposed to the idea of adding such parameter. If you really want to add a follow_symlink optional parameter, IMO you should modify all os.path.is*() functions and all pathlib.Path.is*() methods to add it there too. Maybe if nobody asked for this feature before, it's because it's not useful in practice. You can simply test explicitly is_symlink() before checking is_dir(). Well, let's imagine DirEntry.is_dir() does not follow symlinks. How do you test is_dir() and follow symlinks? "stat.S_ISDIR(entry.stat().st_mode)" ? You have to import the stat module, and use the ugly C macro S_ISDIR(). Victor From victor.stinner at gmail.com Mon Jul 14 10:25:48 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 14 Jul 2014 10:25:48 +0200 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: 2014-07-14 4:17 GMT+02:00 Nick Coghlan : > Or the ever popular symlink to "." (or a directory higher in the tree). "." and ".." are explicitly ignored by os.listdir() an os.scandir(). > I think os.walk() is a good source of inspiration here: call the flag > "followlink" and default it to False. IMO the specific function os.walk() is not a good example. It includes symlinks to directories in the dirs list and then it does not follow symlink, it is a recursive function and has a followlinks optional parameter (default: False). Moreover, in 92% of cases, functions using os.listdir() and os.path.isdir() *follow* symlinks: https://mail.python.org/pipermail/python-dev/2014-July/135435.html Victor From victor.stinner at gmail.com Mon Jul 14 10:31:00 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 14 Jul 2014 10:31:00 +0200 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: <53C36215.2080206@stoneleaf.us> References: <53C36215.2080206@stoneleaf.us> Message-ID: 2014-07-14 6:52 GMT+02:00 Ethan Furman : > We shoIf you put the option on scandir(), you uld have a flag for that, and default it to False: > > scandir(path, *, followlinks=False, info=None, onerror=None) What happens to name and full_name with followlinks=True? Do they contain the name in the directory (name of the symlink) or name of the linked file? So it means that is_dir() may or may not follow symlinks depending how the object was built? Victor From benhoyt at gmail.com Mon Jul 14 14:27:39 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 14 Jul 2014 08:27:39 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: First, just to clarify a couple of points. > You forgot one of my argument: we must have exactly the same API than > os.path.is_dir() and pathlib.Path.is_dir(), because it would be very > confusing (source of bugs) to have a different behaviour. Actually, I specifically included that argument. It's item (b) in the list in my original message yesterday. :-) > Since these functions don't have any parameter (there is no such > follow_symlink(s) parameter), I'm opposed to the idea of adding such > parameter. > > If you really want to add a follow_symlink optional parameter, IMO you > should modify all os.path.is*() functions and all pathlib.Path.is*() > methods to add it there too. Maybe if nobody asked for this feature > before, it's because it's not useful in practice. You can simply test > explicitly is_symlink() before checking is_dir(). Yeah, this is fair enough. > Well, let's imagine DirEntry.is_dir() does not follow symlinks. How do > you test is_dir() and follow symlinks? > "stat.S_ISDIR(entry.stat().st_mode)" ? You have to import the stat > module, and use the ugly C macro S_ISDIR(). No, you don't actually need stat/S_ISDIR in that case -- if DirEntry.is_dir() does not follow symlinks, you just say: entry.is_symlink() and os.path.isdir(entry.full_name) Or for the full test: (entry.is_symlink() and os.path.isdir(entry.full_name)) or entry.is_dir() On the other hand, if DirEntry.is_dir() does follow symlinks per your proposal, then to do is_dir without following symlinks you need to use DirEntry. lstat() like so: stat.S_ISDIR(entry.lstat().st_mode) So from this perspective it's somewhat nicer to have DirEntry.is_X() not follow links and use DirEntry.is_symlink() and os.path.isX() to supplement that if you want to follow links. I think Victor has a good point re 92% of the stdlib calls that use listdir and isX do follow links. However, I think Tim Delaney makes some good points above about the (not so) safety of scandir following symlinks by default -- symlinks to network file systems, nonexist files, or huge directory trees. In that light, this kind of thing should be opt-*in*. I guess I'm still slightly on the DirEntry-does-not-follow-links side of the fence, due to the fact that it's a method on the *directory entry* object, due to simplicity of implementation, and due to Tim Delaney's "it should be safe by default" point above. However, we're *almost* bikeshedding at this point, and I think we just need to pick one way or the other. It's straight forward to implement one in terms of the other in each case. -Ben From andreas.r.maier at gmx.de Mon Jul 14 07:33:46 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Mon, 14 Jul 2014 07:33:46 +0200 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: <53C346A9.3050200@stoneleaf.us> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <53C346A9.3050200@stoneleaf.us> Message-ID: <53C36BBA.8010406@gmx.de> Am 14.07.2014 04:55, schrieb Ethan Furman: > On 07/13/2014 08:13 AM, Andreas Maier wrote: >> Test #8: Same object of class C >> (C.__eq__() implemented with equality of x, >> C.__ne__() returning NotImplemented): >> >> obj1: type=, str=C(256), id=39406504 >> obj2: type=, str=C(256), id=39406504 >> >> a) obj1 is obj2: True >> C.__eq__(): self=39406504, other=39406504, returning True > > This is interesting/weird/odd -- why is __eq__ being called for an 'is' > test? The debug messages are printed before the result is printed. So this is the debug message for the next case, 8.b). Sorry for not explaining it. Andy From 4kir4.1i at gmail.com Mon Jul 14 07:51:24 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Mon, 14 Jul 2014 09:51:24 +0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() References: Message-ID: <87a98cpjxf.fsf@gmail.com> Nick Coghlan writes: > On 13 Jul 2014 20:54, "Tim Delaney" wrote: >> >> On 14 July 2014 10:33, Ben Hoyt wrote: >>> >>> >>> >>> If we go with Victor's link-following .is_dir() and .is_file(), then >>> we probably need to add his suggestion of a follow_symlinks=False >>> parameter (defaults to True). Either that or you have to say >>> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit >>> less nice. >> >> >> Absolutely agreed that follow_symlinks is the way to go, disagree on the > default value. >> >>> >>> Given the above arguments for symlink-following is_dir()/is_file() >>> methods (have I missed any, Victor?), what do others think? >> >> >> I would say whichever way you go, someone will assume the opposite. IMO > not following symlinks by default is safer. If you follow symlinks by > default then everyone has the following issues: >> >> 1. Crossing filesystems (including onto network filesystems); >> >> 2. Recursive directory structures (symlink to a parent directory); >> >> 3. Symlinks to non-existent files/directories; >> >> 4. Symlink to an absolutely huge directory somewhere else (very annoying > if you just wanted to do a directory sizer ...). >> >> If follow_symlinks=False by default, only those who opt-in have to deal > with the above. > > Or the ever popular symlink to "." (or a directory higher in the tree). > > I think os.walk() is a good source of inspiration here: call the flag > "followlink" and default it to False. > Let's not multiply entities beyond necessity. There is well-defined *follow_symlinks* parameter https://docs.python.org/3/library/os.html#follow-symlinks e.g., os.access, os.chown, os.link, os.stat, os.utime and many other functions in os module support follow_symlinks parameter, see os.supports_follow_symlinks. os.walk is an exception that uses *followlinks*. It might be because it is an old function e.g., newer os.fwalk uses follow_symlinks. ------------------------------------------------------------ As it has been said: os.path.isdir, pathlib.Path.is_dir in Python File.directory? in Ruby, System.Directory.doesDirectoryExist in Haskell, `test -d` in shell do follow symlinks i.e., follow_symlinks=True as default is more familiar for .is_dir method. `cd path` in shell, os.chdir(path), `ls path`, os.listdir(path), and os.scandir(path) itself follow symlinks (even on Windows: http://bugs.python.org/issue13772 ). GUI file managers such as `nautilus` also treat symlinks to directories as directories -- you may click on them to open corresponding directories. Only *recursive* functions such as os.walk, os.fwalk do not follow symlinks by default, to avoid symlink loops. Note: the behavior is consistent with coreutils commands such as `cp` that follows symlinks for non-recursive actions but e.g., `du` utility that is inherently recursive doesn't follow symlinks by default. follow_symlinks=True as default for DirEntry.is_dir method allows to avoid easy-to-introduce bugs while replacing old os.listdir/os.path.isdir code or writing a new code using the same mental model. -- Akira From tisdall at gmail.com Mon Jul 14 15:57:06 2014 From: tisdall at gmail.com (Tim Tisdall) Date: Mon, 14 Jul 2014 09:57:06 -0400 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module Message-ID: I was interested in providing patches for the socket module to add Bluetooth 4.0 support. I couldn't find any details on how to provide contributions to the Python project, though... Is there some online documentation with guidelines on how to contribute? Should I just provide a patch to this mailing list? Also, is there a method to test changes against all the different *nix variations? Is Bluez the standard across the different *nix variations? -Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Mon Jul 14 17:21:25 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 14 Jul 2014 17:21:25 +0200 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: Message-ID: <53C3F575.9010602@v.loewis.de> Am 12.07.14 17:19, schrieb Nick Coghlan: > Using the stable ABI for standard library extensions also serves to > decouple them further from the internal details of the CPython runtime, > making it more likely they will be able to run correctly on alternative > interpreters (since emulating or otherwise supporting the limited API is > easier than supporting the whole thing). There are two features to be gained for the standard library from that A. with proper module shutdown support, it will be possible to release objects that are currently held in C global/static variables, as the C global variables will go away. This, in turn, is a step forward in the desire to allow for proper leak-free interpreter shutdown, and in the desire to base interpreter shutdown on GC. B. with proper use of heap types (instead of the static type objects), support for the multiple-interpreter feature will be improved, since type objects will be per-interpreter, instead of being global. This, in turn, is desirable since otherwise state changes can leak from one interpreter to the other. So I still maintain that the change is desirable even for the standard library. Regards, Martin From g.rodola at gmail.com Mon Jul 14 17:32:42 2014 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Mon, 14 Jul 2014 17:32:42 +0200 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: On Mon, Jul 14, 2014 at 3:57 PM, Tim Tisdall wrote: > I was interested in providing patches for the socket module to add > Bluetooth 4.0 support. I couldn't find any details on how to provide > contributions to the Python project, though... Is there some online > documentation with guidelines on how to contribute? Should I just provide > a patch to this mailing list? > > Also, is there a method to test changes against all the different *nix > variations? Is Bluez the standard across the different *nix variations? > > -Tim > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/g.rodola%40gmail.com > > Hello there, you can take a look at: https://docs.python.org/devguide/#contributing Patches must be submitted on the Python bug tracker: http://bugs.python.org/ -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip at pobox.com Mon Jul 14 17:30:04 2014 From: skip at pobox.com (Skip Montanaro) Date: Mon, 14 Jul 2014 10:30:04 -0500 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: On Mon, Jul 14, 2014 at 8:57 AM, Tim Tisdall wrote: > Is there some online documentation with guidelines on how to contribute? http://lmgtfy.com/?q=contribute+to+python Skip From brett at python.org Mon Jul 14 17:41:57 2014 From: brett at python.org (Brett Cannon) Date: Mon, 14 Jul 2014 15:41:57 +0000 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues References: <53C3F575.9010602@v.loewis.de> Message-ID: On Mon Jul 14 2014 at 11:27:34 AM, "Martin v. L?wis" wrote: > Am 12.07.14 17:19, schrieb Nick Coghlan: > > Using the stable ABI for standard library extensions also serves to > > decouple them further from the internal details of the CPython runtime, > > making it more likely they will be able to run correctly on alternative > > interpreters (since emulating or otherwise supporting the limited API is > > easier than supporting the whole thing). > > There are two features to be gained for the standard library from that > > A. with proper module shutdown support, it will be possible to release > objects that are currently held in C global/static variables, as the > C global variables will go away. This, in turn, is a step forward in > the desire to allow for proper leak-free interpreter shutdown, and > in the desire to base interpreter shutdown on GC. > > B. with proper use of heap types (instead of the static type objects), > support for the multiple-interpreter feature will be improved, since > type objects will be per-interpreter, instead of being global. This, > in turn, is desirable since otherwise state changes can leak from > one interpreter to the other. > > So I still maintain that the change is desirable even for the standard > library. > I agree for PEP 3121 which is the initialization/finalization work. The stable ABi is not necessary. So maybe we should re-examine the patches and accept the bits that clean up init/finalization and leave out any ABi-related changes. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Mon Jul 14 17:53:47 2014 From: brian at python.org (Brian Curtin) Date: Mon, 14 Jul 2014 10:53:47 -0500 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: On Mon, Jul 14, 2014 at 10:30 AM, Skip Montanaro wrote: > On Mon, Jul 14, 2014 at 8:57 AM, Tim Tisdall wrote: > > Is there some online documentation with guidelines on how to contribute? > > http://lmgtfy.com/?q=contribute+to+python This response is unacceptable. Tim: check out https://docs.python.org/devguide/ and perhaps look at the core-mentorship[0] mailing list while coming up with your first contributions. It's a good first step to getting some guidance on the process and getting some eyes on your early patches. [0] https://mail.python.org/mailman/listinfo/core-mentorship/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip at pobox.com Mon Jul 14 18:09:55 2014 From: skip at pobox.com (Skip Montanaro) Date: Mon, 14 Jul 2014 11:09:55 -0500 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: On Mon, Jul 14, 2014 at 10:53 AM, Brian Curtin wrote: >> > Is there some online documentation with guidelines on how to contribute? >> >> http://lmgtfy.com/?q=contribute+to+python > > > This response is unacceptable. Tim and I already discussed this offline. I admitted to being in a bit of a snarky mood today, and he seems to have accepted my post in good natured fashion. I should have at least added a smiley to my post. I will refrain from attempts at unadorned levity in the future. As penance, Tim or Brian, if you are are in or near Chicago, look me up. I'd be happy to buy y'all a beer. Skip From ethan at stoneleaf.us Mon Jul 14 18:16:22 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 14 Jul 2014 09:16:22 -0700 Subject: [Python-Dev] Python Job Board Message-ID: <53C40256.3020101@stoneleaf.us> has now been dead for five months. -- ~Ethan~ From hasan.diwan at gmail.com Mon Jul 14 18:20:36 2014 From: hasan.diwan at gmail.com (Hasan Diwan) Date: Mon, 14 Jul 2014 09:20:36 -0700 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: Would http://lmbtfy.com/?q=contribute+to+python# be more or less acceptable? -- H On 14 July 2014 09:09, Skip Montanaro wrote: > On Mon, Jul 14, 2014 at 10:53 AM, Brian Curtin wrote: > >> > Is there some online documentation with guidelines on how to > contribute? > >> > >> http://lmgtfy.com/?q=contribute+to+python > > > > > > This response is unacceptable. > > Tim and I already discussed this offline. I admitted to being in a bit > of a snarky mood today, and he seems to have accepted my post in good > natured fashion. I should have at least added a smiley to my post. I > will refrain from attempts at unadorned levity in the future. > > As penance, Tim or Brian, if you are are in or near Chicago, look me > up. I'd be happy to buy y'all a beer. > > Skip > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/hasan.diwan%40gmail.com > -- Sent from my mobile device Envoy? de mon portable -------------- next part -------------- An HTML attachment was scrubbed... URL: From tisdall at gmail.com Mon Jul 14 17:57:06 2014 From: tisdall at gmail.com (Tim Tisdall) Date: Mon, 14 Jul 2014 11:57:06 -0400 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: Naw, I'd accept that response. I think I searched on Friday, but forgot about finding that. :) There's enough traffic on a mailing list without useless noise. Thanks for all the responses. On Mon, Jul 14, 2014 at 11:53 AM, Brian Curtin wrote: > On Mon, Jul 14, 2014 at 10:30 AM, Skip Montanaro wrote: > >> On Mon, Jul 14, 2014 at 8:57 AM, Tim Tisdall wrote: >> > Is there some online documentation with guidelines on how to contribute? >> >> http://lmgtfy.com/?q=contribute+to+python > > > This response is unacceptable. > > Tim: check out https://docs.python.org/devguide/ and perhaps look at the > core-mentorship[0] mailing list while coming up with your first > contributions. It's a good first step to getting some guidance on the > process and getting some eyes on your early patches. > > [0] https://mail.python.org/mailman/listinfo/core-mentorship/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Jul 14 18:59:29 2014 From: brett at python.org (Brett Cannon) Date: Mon, 14 Jul 2014 16:59:29 +0000 Subject: [Python-Dev] Python Job Board References: <53C40256.3020101@stoneleaf.us> Message-ID: On Mon Jul 14 2014 at 12:17:03 PM, Ethan Furman wrote: > has now been dead for five months. > This is the wrong place to ask about this. It falls under the purview of the web site who you can email at webmaster@ or submit an issue at https://github.com/python/pythondotorg . But I know from PSF status reports that it's being actively rewritten and fixed to make it manageable for more than one person to run easily. -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip at pobox.com Mon Jul 14 19:43:24 2014 From: skip at pobox.com (Skip Montanaro) Date: Mon, 14 Jul 2014 12:43:24 -0500 Subject: [Python-Dev] Python Job Board In-Reply-To: References: <53C40256.3020101@stoneleaf.us> Message-ID: On Mon, Jul 14, 2014 at 11:59 AM, Brett Cannon wrote: > This is the wrong place to ask about this. It falls under the purview of the > web site who you can email at webmaster@ or submit an issue at > https://github.com/python/pythondotorg . But I know from PSF status reports > that it's being actively rewritten and fixed to make it manageable for more > than one person to run easily. Agree with that. I originally skipped this post because I'm pretty sure MAL who is heavily involved with the rewrite effort) still hangs out here. I will modify Brett's admonition a bit though. A better place to comment about the job board (and perhaps volunteer to help with the current effort) is jobs at python.org. Skip From alexander.belopolsky at gmail.com Mon Jul 14 20:10:16 2014 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Jul 2014 14:10:16 -0400 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: <53C3F575.9010602@v.loewis.de> Message-ID: On Mon, Jul 14, 2014 at 11:41 AM, Brett Cannon wrote: > So maybe we should re-examine the patches and accept the bits that clean > up init/finalization and leave out any ABI-related changes. This is precisely what I suggested two years ago. http://bugs.python.org/issue15390#msg170249 I am not against ABI-related changes in principle, but I think these changes should be carefully considered on a case by case basis and not applied wholesale. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Jul 14 20:24:55 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 14 Jul 2014 11:24:55 -0700 Subject: [Python-Dev] Python Job Board In-Reply-To: References: <53C40256.3020101@stoneleaf.us> Message-ID: <53C42077.9070408@stoneleaf.us> On 07/14/2014 10:43 AM, Skip Montanaro wrote: > On Mon, Jul 14, 2014 at 11:59 AM, Brett Cannon wrote: >> >> This is the wrong place to ask about this. It falls under the purview of the >> web site who you can email at webmaster@ or submit an issue at >> https://github.com/python/pythondotorg . But I know from PSF status reports >> that it's being actively rewritten and fixed to make it manageable for more >> than one person to run easily. > > Agree with that. I originally skipped this post because I'm pretty > sure MAL who is heavily involved with the rewrite effort) still hangs > out here. I will modify Brett's admonition a bit though. A better > place to comment about the job board (and perhaps volunteer to help > with the current effort) is jobs at python.org. Mostly just hoping to raise awareness in case anybody here is able/willing to pitch in. -- ~Ethan~ From tjreedy at udel.edu Mon Jul 14 22:42:25 2014 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 14 Jul 2014 16:42:25 -0400 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: On 7/14/2014 9:57 AM, Tim Tisdall wrote: 2 questions not answered yet. > Also, is there a method to test changes against all the different *nix > variations? We have a set of buildbots. https://www.python.org/dev/buildbot/ > Is Bluez the standard across the different *nix variations? No idea. -- Terry Jan Reedy From hasan.diwan at gmail.com Mon Jul 14 22:46:06 2014 From: hasan.diwan at gmail.com (Hasan Diwan) Date: Mon, 14 Jul 2014 13:46:06 -0700 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: Tim, Are you aware of https://code.google.com/p/pybluez/ ? -- H On 14 July 2014 13:42, Terry Reedy wrote: > On 7/14/2014 9:57 AM, Tim Tisdall wrote: > > 2 questions not answered yet. > > > Also, is there a method to test changes against all the different *nix >> variations? >> > > We have a set of buildbots. > https://www.python.org/dev/buildbot/ > > > Is Bluez the standard across the different *nix variations? >> > > No idea. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > hasan.diwan%40gmail.com > -- Sent from my mobile device Envoy? de mon portable -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Mon Jul 14 23:30:56 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 14 Jul 2014 17:30:56 -0400 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: <20140714213056.7A3C1250DFD@webabinitio.net> On Mon, 14 Jul 2014 16:42:25 -0400, Terry Reedy wrote: > On 7/14/2014 9:57 AM, Tim Tisdall wrote: > > 2 questions not answered yet. > > > Also, is there a method to test changes against all the different *nix > > variations? > > We have a set of buildbots. > https://www.python.org/dev/buildbot/ > > > Is Bluez the standard across the different *nix variations? > > No idea. It would be really nice to answer that and the related testing questions. The socket module has bluetooth support, but there are no tests. An effort to write some was started at the Bloomberg sprint last month, but nothing has been posted to the issue yet: http://bugs.python.org/issue7687 Is Bluetooth 4.0 something different from what the socket module already has? --David From tisdall at gmail.com Tue Jul 15 01:08:43 2014 From: tisdall at gmail.com (Tim Tisdall) Date: Mon, 14 Jul 2014 19:08:43 -0400 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: Quite aware. I'm pretty sure it has no 4.x LE capabilities. Last I checked it seemed like a dead project, but there seems to be some activity there now. On Jul 14, 2014 4:47 PM, "Hasan Diwan" wrote: > Tim, > Are you aware of https://code.google.com/p/pybluez/ ? -- H > > > On 14 July 2014 13:42, Terry Reedy wrote: > >> On 7/14/2014 9:57 AM, Tim Tisdall wrote: >> >> 2 questions not answered yet. >> >> >> Also, is there a method to test changes against all the different *nix >>> variations? >>> >> >> We have a set of buildbots. >> https://www.python.org/dev/buildbot/ >> >> >> Is Bluez the standard across the different *nix variations? >>> >> >> No idea. >> >> -- >> Terry Jan Reedy >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ >> hasan.diwan%40gmail.com >> > > > > -- > Sent from my mobile device > Envoy? de mon portable > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/tisdall%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tisdall at gmail.com Tue Jul 15 01:13:32 2014 From: tisdall at gmail.com (Tim Tisdall) Date: Mon, 14 Jul 2014 19:13:32 -0400 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: <20140714213056.7A3C1250DFD@webabinitio.net> References: <20140714213056.7A3C1250DFD@webabinitio.net> Message-ID: The major change is to the Bluetooth address struct. It now has an added value for the distinction between "public" and "random" 4.x addresses. Also some added constants to open LE connections. On Jul 14, 2014 5:32 PM, "R. David Murray" wrote: > On Mon, 14 Jul 2014 16:42:25 -0400, Terry Reedy wrote: > > On 7/14/2014 9:57 AM, Tim Tisdall wrote: > > > > 2 questions not answered yet. > > > > > Also, is there a method to test changes against all the different *nix > > > variations? > > > > We have a set of buildbots. > > https://www.python.org/dev/buildbot/ > > > > > Is Bluez the standard across the different *nix variations? > > > > No idea. > > It would be really nice to answer that and the related testing questions. > The socket module has bluetooth support, but there are no tests. > An effort to write some was started at the Bloomberg sprint last month, > but nothing has been posted to the issue yet: > > http://bugs.python.org/issue7687 > > Is Bluetooth 4.0 something different from what the socket module already > has? > > --David > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/tisdall%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Tue Jul 15 03:01:19 2014 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 14 Jul 2014 20:01:19 -0500 Subject: [Python-Dev] Python Job Board In-Reply-To: <53C42077.9070408@stoneleaf.us> References: <53C40256.3020101@stoneleaf.us> <53C42077.9070408@stoneleaf.us> Message-ID: >From http://www.reddit.com/r/Python/comments/17c69p/i_was_told_by_a_friend_that_learning_python_for/c84bswd : >* http://www.python.org/community/jobs/ >* https://jobs.github.com/positions?description=python >* http://careers.joelonsoftware.com/jobs?searchTerm=python >* http://www.linkedin.com/jsearch?keywords=python >* http://www.indeed.com/q-Python-jobs.html >* http://www.simplyhired.com/a/jobs/list/q-python >* http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&FREE_TEXT=python >* http://careers.stackoverflow.com/jobs/tag/python >* http://www.pythonjobs.com/ >* http://www.djangojobs.org/ -- Wes Turner On Mon, Jul 14, 2014 at 1:24 PM, Ethan Furman wrote: > On 07/14/2014 10:43 AM, Skip Montanaro wrote: > >> On Mon, Jul 14, 2014 at 11:59 AM, Brett Cannon wrote: >>> >>> >>> This is the wrong place to ask about this. It falls under the purview of >>> the >>> web site who you can email at webmaster@ or submit an issue at >>> https://github.com/python/pythondotorg . But I know from PSF status >>> reports >>> that it's being actively rewritten and fixed to make it manageable for >>> more >>> than one person to run easily. >> >> >> Agree with that. I originally skipped this post because I'm pretty >> sure MAL who is heavily involved with the rewrite effort) still hangs >> out here. I will modify Brett's admonition a bit though. A better >> place to comment about the job board (and perhaps volunteer to help >> with the current effort) is jobs at python.org. > > > Mostly just hoping to raise awareness in case anybody here is able/willing > to pitch in. > > -- > ~Ethan~ > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/wes.turner%40gmail.com From benhoyt at gmail.com Tue Jul 15 04:48:41 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 14 Jul 2014 22:48:41 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: <87a98cpjxf.fsf@gmail.com> References: <87a98cpjxf.fsf@gmail.com> Message-ID: > Let's not multiply entities beyond necessity. > > There is well-defined *follow_symlinks* parameter > https://docs.python.org/3/library/os.html#follow-symlinks > e.g., os.access, os.chown, os.link, os.stat, os.utime and many other > functions in os module support follow_symlinks parameter, see > os.supports_follow_symlinks. Huh, interesting. I didn't know os.stat() had a follow_symlinks parameter -- when False, it's equivalent to lstat. If DirEntry has a .stat(follow_symlinks=True) method, we don't actually need lstat(). > os.walk is an exception that uses *followlinks*. It might be because it > is an old function e.g., newer os.fwalk uses follow_symlinks. Yes, I'm sure that's correct. Today it'd be called follow_symlinks, but obviously one can't change os.walk() anymore. > Only *recursive* functions such as os.walk, os.fwalk do not follow > symlinks by default, to avoid symlink loops. [...] > > follow_symlinks=True as default for DirEntry.is_dir method allows to > avoid easy-to-introduce bugs while replacing old > os.listdir/os.path.isdir code or writing a new code using the same > mental model. I think these are good points, especially that of porting existing listdir()/os.path.isdir() code and avoiding bugs. As I mentioned, I was really on the fence about the link-following thing, but if it's a tiny bit harder to implement but it avoids bugs (and I already had a bug with this when implementing os.walk), that's a worthwhile trade-off. In light of that, I propose I update the PEP to basically follow Victor's model of is_X() and stat() following symlinks by default, and allowing you to specify follow_symlinks=False if you want something other than that. Victor had one other question: > What happens to name and full_name with followlinks=True? > Do they contain the name in the directory (name of the symlink) > or name of the linked file? I would say they should contain the name and full path of the entry -- the symlink, NOT the linked file. They kind of have to, right, otherwise they'd have to be method calls that potentially call the system. In any case, here's the modified proposal: scandir(path='.') -> generator of DirEntry objects, which have: * name: name as per listdir() * full_name: full path name (not necessarily absolute), equivalent of os.path.join(path, entry.name) * is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name), but free in most cases; cached per entry * is_file(follow_symlinks=True): like os.path.isfile(entry.full_name), but free in most cases; cached per entry * is_symlink(): like os.path.islink(), but free in most cases; cached per entry * stat(follow_symlinks=True): like os.stat(entry.full_name, follow_symlinks=follow_symlinks); cached per entry The above may not be quite perfect, but it's good, and I think there's been enough bike-shedding on the API. :-) So please speak now or forever hold your peace. :-) I intend to update the PEP to reflect this and make a few other clarifications in the next few days. -Ben From ethan at stoneleaf.us Tue Jul 15 04:57:30 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 14 Jul 2014 19:57:30 -0700 Subject: [Python-Dev] Python Job Board In-Reply-To: References: <53C40256.3020101@stoneleaf.us> <53C42077.9070408@stoneleaf.us> Message-ID: <53C4989A.7040203@stoneleaf.us> On 07/14/2014 06:01 PM, Wes Turner wrote: > From http://www.reddit.com/r/Python/comments/17c69p/i_was_told_by_a_friend_that_learning_python_for/c84bswd > : > >> * http://www.python.org/community/jobs/ >> * https://jobs.github.com/positions?description=python >> * http://careers.joelonsoftware.com/jobs?searchTerm=python >> * http://www.linkedin.com/jsearch?keywords=python >> * http://www.indeed.com/q-Python-jobs.html >> * http://www.simplyhired.com/a/jobs/list/q-python >> * http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&FREE_TEXT=python >> * http://careers.stackoverflow.com/jobs/tag/python >> * http://www.pythonjobs.com/ >> * http://www.djangojobs.org/ Nice, thanks! -- ~Ethan~ From ethan at stoneleaf.us Tue Jul 15 05:00:51 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 14 Jul 2014 20:00:51 -0700 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: <87a98cpjxf.fsf@gmail.com> Message-ID: <53C49963.30509@stoneleaf.us> On 07/14/2014 07:48 PM, Ben Hoyt wrote: > > In any case, here's the modified proposal: > > scandir(path='.') -> generator of DirEntry objects, which have: > > * name: name as per listdir() > * full_name: full path name (not necessarily absolute), equivalent of > os.path.join(path, entry.name) > * is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name), > but free in most cases; cached per entry > * is_file(follow_symlinks=True): like os.path.isfile(entry.full_name), > but free in most cases; cached per entry > * is_symlink(): like os.path.islink(), but free in most cases; cached per entry > * stat(follow_symlinks=True): like os.stat(entry.full_name, > follow_symlinks=follow_symlinks); cached per entry > > The above may not be quite perfect, but it's good, and I think there's > been enough bike-shedding on the API. :-) Looks doable. Just make sure the cached entries reflect the 'follow_symlinks' setting -- so a symlink could end up with both an lstat cached entry and a stat cached entry. -- ~Ethan~ From victor.stinner at gmail.com Tue Jul 15 08:25:52 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 15 Jul 2014 08:25:52 +0200 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: <87a98cpjxf.fsf@gmail.com> Message-ID: Le mardi 15 juillet 2014, Ben Hoyt a ?crit : > > > Victor had one other question: > > > What happens to name and full_name with followlinks=True? > > Do they contain the name in the directory (name of the symlink) > > or name of the linked file? > > I would say they should contain the name and full path of the entry -- > the symlink, NOT the linked file. They kind of have to, right, > otherwise they'd have to be method calls that potentially call the > system. > Sorry, I don't remember who but someone proposed to add the follow_symlinks parameter in scandir() directly. If the parameter is added to methods, there is no such issue. I like the compromise of adding an optional follow_symlinks to is_xxx() and stat() method. No need for .lstat(). Again: remove any garantee about the cache in the definitions of methods, instead copy the doc from os.path and os. Add a global remark saying that most methods don't need any syscall in general, except for symlinks (with follow_symlinks=True). Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jul 15 13:09:12 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Jul 2014 06:09:12 -0500 Subject: [Python-Dev] PEP 3121, 384 Refactoring Issues In-Reply-To: References: <53C3F575.9010602@v.loewis.de> Message-ID: On 14 Jul 2014 11:41, "Brett Cannon" wrote: > > > I agree for PEP 3121 which is the initialization/finalization work. The stable ABi is not necessary. So maybe we should re-examine the patches and accept the bits that clean up init/finalization and leave out any ABi-related changes. Martin's right about improving the subinterpreter support - every type declaration we move from a static struct to the dynamic type creation API is one that isn't shared between subinterpreters any more. That argument is potentially valid even for *builtin* modules and types, not just those in extension modules. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jul 15 13:24:14 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Jul 2014 06:24:14 -0500 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: <87a98cpjxf.fsf@gmail.com> Message-ID: On 14 Jul 2014 22:50, "Ben Hoyt" wrote: > > In light of that, I propose I update the PEP to basically follow > Victor's model of is_X() and stat() following symlinks by default, and > allowing you to specify follow_symlinks=False if you want something > other than that. > > Victor had one other question: > > > What happens to name and full_name with followlinks=True? > > Do they contain the name in the directory (name of the symlink) > > or name of the linked file? > > I would say they should contain the name and full path of the entry -- > the symlink, NOT the linked file. They kind of have to, right, > otherwise they'd have to be method calls that potentially call the > system. It would be worth explicitly pointing out "os.readlink(entry.full_name)" in the docs as the way to get the target of a symlink entry. Alternatively, it may be worth including a readlink() method directly on the entry objects. (That can easily be added later though, so no need for it in the initial proposal). > > In any case, here's the modified proposal: > > scandir(path='.') -> generator of DirEntry objects, which have: > > * name: name as per listdir() > * full_name: full path name (not necessarily absolute), equivalent of > os.path.join(path, entry.name) > * is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name), > but free in most cases; cached per entry > * is_file(follow_symlinks=True): like os.path.isfile(entry.full_name), > but free in most cases; cached per entry > * is_symlink(): like os.path.islink(), but free in most cases; cached per entry > * stat(follow_symlinks=True): like os.stat(entry.full_name, > follow_symlinks=follow_symlinks); cached per entry > > The above may not be quite perfect, but it's good, and I think there's > been enough bike-shedding on the API. :-) +1, sounds good to me (and I like having the caching guarantees listed - helps make it clear how DirEntry differs from pathlib.Path) Cheers, Nick. > > So please speak now or forever hold your peace. :-) I intend to update > the PEP to reflect this and make a few other clarifications in the > next few days. > > -Ben > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From benhoyt at gmail.com Tue Jul 15 14:01:16 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 15 Jul 2014 08:01:16 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: <53C49963.30509@stoneleaf.us> References: <87a98cpjxf.fsf@gmail.com> <53C49963.30509@stoneleaf.us> Message-ID: > Looks doable. Just make sure the cached entries reflect the > 'follow_symlinks' setting -- so a symlink could end up with both an lstat > cached entry and a stat cached entry. Yes, good point -- basically the functions will use the _stat cache if follow_symlinks=True, otherwise the _lstat cache. If the entry is not a symlink (the usual case), they'll be the same value. -Ben From benhoyt at gmail.com Tue Jul 15 14:05:55 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 15 Jul 2014 08:05:55 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: <87a98cpjxf.fsf@gmail.com> Message-ID: > Sorry, I don't remember who but someone proposed to add the follow_symlinks > parameter in scandir() directly. If the parameter is added to methods, > there is no such issue. Yeah, I think having the DirEntry methods do different things depending on how scandir() was called is a really bad idea. It seems you're agreeing with this? > Again: remove any garantee about the cache in the definitions of methods, > instead copy the doc from os.path and os. Add a global remark saying that > most methods don't need any syscall in general, except for symlinks (with > follow_symlinks=True). I'm not sure I follow this -- surely it *has* to be documented that the values of DirEntry.is_X() and DirEntry.stat() are cached per entry, in contrast to os.path.isX()/os.stat()? I don't mind a global remark about not needing syscalls, but I do think it makes sense to make it explicit -- that is_X() almost never need syscalls, whereas stat() does only on POSIX but is free on Windows (except for symlinks). -Ben From benhoyt at gmail.com Tue Jul 15 14:19:35 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 15 Jul 2014 08:19:35 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: <87zjgbni64.fsf@gmail.com> References: <87a98cpjxf.fsf@gmail.com> <87zjgbni64.fsf@gmail.com> Message-ID: > I'd *keep DirEntry.lstat() method* regardless of existence of > .stat(*, follow_symlinks=True) method (despite the slight violation of > DRY principle) for readability. `dir_entry.lstat().st_mode` is more > consice than `dir_entry.stat(follow_symlinks=False).st_mode` and the > meaning of lstat is well-established -- get (symbolic link) status [2]. The meaning of lstat() is well-established, so I don't mind this. But I don't think it's necessary, either. My thought would be that in new code/functions we should kind of prescribe best-practices rather than leave the options open. Yes, it's a few more characters, but "follow_symlinks=True" is allow much clear than "l" to describe this behaviour, especially for non-Linux hackers. > I suggest *renaming .full_name -> .path* due to reasons outlined in [1]. > > [1]: https://mail.python.org/pipermail/python-dev/2014-July/135441.html Hmmm, perhaps. You suggest .full_name implies it's the absolute path, which isn't true. I don't mind .path, but it kind of sounds like "the Path object associated with this entry". I think "full_name" is fine -- it's not "abs_name". > follow_symlinks (if added) should be *keyword-only parameter* because > `dir_entry.is_dir(False)` is unreadable (it is not clear at a glance > what `False` means in this case). Agreed follow_symlinks should be a keyword-only parameter (as it is in os.stat() in Python 3). > Exceptions are part of the public API. pathlib is inconsitent with > os.path here e.g., os.path.isdir() ignores all OS errors raised by > the stat() call but the corresponding pathlib call ignores only broken > symlinks (non-existent entries). > > The cherry-picking of which stat errors to silence (implicitly) seems > worse than either silencing the errors (like os.path.isdir does) or > allowing them to propagate. Hmmm, you're right there's a subtle difference here. I think the os.path.isdir() behaviour could mask real errors, and the pathlib behaviour is more correct. pathlib's behaviour is not implicit though -- it's clearly documented in the docs: https://docs.python.org/3/library/pathlib.html#pathlib.Path.is_dir > Returning False instead of raising OSError in is_dir() method simplifies > the usage greatly without (much) negative consequences. It is a *rare* > case when silencing errors could be more practical. I think is_X() *should* fail if there are permissions errors or other fatal errors. Whether or not they should fail if the file doesn't exist (unlikely to happen anyway) or on a broken symlink is a different question, but there's a good prececent with the existing os/pathlib functions there. -Ben From p.f.moore at gmail.com Tue Jul 15 14:31:16 2014 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 15 Jul 2014 13:31:16 +0100 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: <87a98cpjxf.fsf@gmail.com> <87zjgbni64.fsf@gmail.com> Message-ID: On 15 July 2014 13:19, Ben Hoyt wrote: > Hmmm, perhaps. You suggest .full_name implies it's the absolute path, > which isn't true. I don't mind .path, but it kind of sounds like "the > Path object associated with this entry". I think "full_name" is fine > -- it's not "abs_name". Interesting. I hadn't really thought about it, but I might have assumed full_name was absolute. However, now I see that it's "only as absolute as the directory argument to scandir is". Having said that, I don't think that full_name *implies* that, just that it's a possible mistake people could make. I agree that "path" could be seen as implying a Path object. My preference would be to retain the name full_name, but just make it explicit in the documentation that it is based on the directory name argument. Paul From ethan at stoneleaf.us Tue Jul 15 18:41:40 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 15 Jul 2014 09:41:40 -0700 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: <87a98cpjxf.fsf@gmail.com> Message-ID: <53C559C4.20708@stoneleaf.us> On 07/14/2014 11:25 PM, Victor Stinner wrote: > > Again: remove any garantee about the cache in the definitions of methods, > instead copy the doc from os.path and os. Add a global remark saying that > most methods don't need any syscall in general, except for symlinks (with > follow_symlinks=True). I don't understand what you're saying here. The fact that DirEnrry.is_xxx will use cached values *must* be documented, or our users will waste huge amounts of time trying to figure out why an unknowingly cached value is no longer matching the current status. ~Ethan~ From rowen at uw.edu Wed Jul 16 01:48:48 2014 From: rowen at uw.edu (Russell E. Owen) Date: Tue, 15 Jul 2014 16:48:48 -0700 Subject: [Python-Dev] Another case for frozendict References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> Message-ID: In article , Chris Angelico wrote: > On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs wrote: > > I can achieve what I need by constructing a set on the ???items??? of the dict. > > > >>>> set(tuple(doc.items()) for doc in res) > > > > {(('n', 1), ('err', None), ('ok', 1.0))} > > This is flawed; the tuple-of-tuples depends on iteration order, which > may vary. It should be a frozenset of those tuples, not a tuple. Which > strengthens your case; it's that easy to get it wrong in the absence > of an actual frozendict. I would love to see frozendict in python. I find myself using dicts for translation tables, usually tables that should not be modified. Documentation usually suffices to get that idea across, but it's not ideal. frozendict would also be handy as a default values for function arguments. In that case documentation isn't enough and one has to resort to using a default value of None and then changing it in the function body. I like frozendict because I feel it is expressive and adds some safety. -- Russell From python at mrabarnett.plus.com Wed Jul 16 04:27:23 2014 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 16 Jul 2014 03:27:23 +0100 Subject: [Python-Dev] Another case for frozendict In-Reply-To: References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> Message-ID: <53C5E30B.6060509@mrabarnett.plus.com> On 2014-07-16 00:48, Russell E. Owen wrote: > In article > , > Chris Angelico wrote: > >> On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs wrote: >> > I can achieve what I need by constructing a set on the ???items??? of the dict. >> > >> >>>> set(tuple(doc.items()) for doc in res) >> > >> > {(('n', 1), ('err', None), ('ok', 1.0))} >> >> This is flawed; the tuple-of-tuples depends on iteration order, which >> may vary. It should be a frozenset of those tuples, not a tuple. Which >> strengthens your case; it's that easy to get it wrong in the absence >> of an actual frozendict. > > I would love to see frozendict in python. > > I find myself using dicts for translation tables, usually tables that > should not be modified. Documentation usually suffices to get that idea > across, but it's not ideal. > > frozendict would also be handy as a default values for function > arguments. In that case documentation isn't enough and one has to resort > to using a default value of None and then changing it in the function > body. > > I like frozendict because I feel it is expressive and adds some safety. > Here's another use-case. Using the 're' module: >>> import re >>> # Make a regex. ... p = re.compile(r'(?P\w+)\s+(?P\w+)') >>> >>> # What are the named groups? ... p.groupindex {'first': 1, 'second': 2} >>> >>> # Perform a match. ... m = p.match('FIRST SECOND') >>> m.groupdict() {'first': 'FIRST', 'second': 'SECOND'} >>> >>> # Try modifying the pattern object. ... p.groupindex['JUNK'] = 'foobar' >>> >>> # What are the named groups now? ... p.groupindex {'first': 1, 'second': 2, 'JUNK': 'foobar'} >>> >>> # And the match object? ... m.groupdict() Traceback (most recent call last): File "", line 2, in IndexError: no such group It can't find a named group called 'JUNK'. And with a bit more tinkering it's possible to crash Python. (I'll leave that as an exercise for the reader! :-)) The 'regex' module, on the other hand, rebuilds the dict each time: >>> import regex >>> # Make a regex. ... p = regex.compile(r'(?P\w+)\s+(?P\w+)') >>> >>> # What are the named groups? ... p.groupindex {'second': 2, 'first': 1} >>> >>> # Perform a match. ... m = p.match('FIRST SECOND') >>> m.groupdict() {'second': 'SECOND', 'first': 'FIRST'} >>> >>> # Try modifying the regex. ... p.groupindex['JUNK'] = 'foobar' >>> >>> # What are the named groups now? ... p.groupindex {'second': 2, 'first': 1} >>> >>> # And the match object? ... m.groupdict() {'second': 'SECOND', 'first': 'FIRST'} Using a frozendict instead would be a nicer solution. From cs at zip.com.au Wed Jul 16 05:40:00 2014 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 16 Jul 2014 13:40:00 +1000 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: <20140716034000.GA41444@cskk.homeip.net> I was going to stay out of this one... On 14Jul2014 10:25, Victor Stinner wrote: >2014-07-14 4:17 GMT+02:00 Nick Coghlan : >> Or the ever popular symlink to "." (or a directory higher in the tree). > >"." and ".." are explicitly ignored by os.listdir() an os.scandir(). > >> I think os.walk() is a good source of inspiration here: call the flag >> "followlink" and default it to False. I also think followslinks should be spelt like os.walk, and also default to False. >IMO the specific function os.walk() is not a good example. It includes >symlinks to directories in the dirs list and then it does not follow >symlink, I agree that is a bad mix. >it is a recursive function and has a followlinks optional >parameter (default: False). Which I think is desirable. >Moreover, in 92% of cases, functions using os.listdir() and >os.path.isdir() *follow* symlinks: >https://mail.python.org/pipermail/python-dev/2014-July/135435.html Sigh. This is a historic artifact, a convenience, and a side effect of bring symlinks into UNIX in the first place. The objective was that symlinks should largely be transparent to users for naive operation. So the UNIX calls open/cd/listdir all follow symlinks so that things work transparently and a million C programs do not break. However, so do chmod/chgrp/chown, for the same reasons and with generally less desirable effects. Conversely, the find command, for example, does not follow symlinks and this is generally a good thing. "ls" is the same. Like os.walk, they are for inspecting stuff, and shouldn't indirect unless asked. I think following symlinks, especially for something like os.walk and os.scandir, should default to False. I DO NOT want to quietly wander to remote parts of the file space because someone has stuck a symlink somewhere unfortunate, lurking like a little bomb (or perhaps trapdoor, waiting to suck me down into an unexpected dark place). It is also slower to follow symlinks by default. I am also against flag parameters that default to True, on the whole; they are a failure of ergonomic design. Leaving off a flag should usually be like setting it to False. A missing flag is an "off" flag. For these reasons (and others I have not yet thought through:-) I am voting for a: followlinks=False optional parameter. If you want to follow links, it is hardly difficult. Cheers, Cameron Simpson Our job is to make the questions so painful that the only way to make the pain go away is by thinking. - Fred Friendly From rdmurray at bitdance.com Wed Jul 16 15:37:55 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 16 Jul 2014 09:37:55 -0400 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <53C5E30B.6060509@mrabarnett.plus.com> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <53C5E30B.6060509@mrabarnett.plus.com> Message-ID: <20140716133755.C0A61250DEF@webabinitio.net> On Wed, 16 Jul 2014 03:27:23 +0100, MRAB wrote: > Here's another use-case. > > Using the 're' module: > > >>> import re > >>> # Make a regex. > ... p = re.compile(r'(?P\w+)\s+(?P\w+)') > >>> > >>> # What are the named groups? > ... p.groupindex > {'first': 1, 'second': 2} > >>> > >>> # Perform a match. > ... m = p.match('FIRST SECOND') > >>> m.groupdict() > {'first': 'FIRST', 'second': 'SECOND'} > >>> > >>> # Try modifying the pattern object. > ... p.groupindex['JUNK'] = 'foobar' > >>> > >>> # What are the named groups now? > ... p.groupindex > {'first': 1, 'second': 2, 'JUNK': 'foobar'} > >>> > >>> # And the match object? > ... m.groupdict() > Traceback (most recent call last): > File "", line 2, in > IndexError: no such group > > It can't find a named group called 'JUNK'. IMO, preventing someone from shooting themselves in the foot by modifying something they shouldn't modify according to the API is not a Python use case ("consenting adults"). > And with a bit more tinkering it's possible to crash Python. (I'll > leave that as an exercise for the reader! :-)) Preventing a Python program from being able to crash the interpreter, that's a use case :) --David From rdmurray at bitdance.com Wed Jul 16 15:47:59 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 16 Jul 2014 09:47:59 -0400 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <53C5E30B.6060509@mrabarnett.plus.com> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <53C5E30B.6060509@mrabarnett.plus.com> Message-ID: <20140716134802.9ED8DB14086@webabinitio.net> On Wed, 16 Jul 2014 03:27:23 +0100, MRAB wrote: > >>> # Try modifying the pattern object. > ... p.groupindex['JUNK'] = 'foobar' > >>> > >>> # What are the named groups now? > ... p.groupindex > {'first': 1, 'second': 2, 'JUNK': 'foobar'} > >>> > >>> # And the match object? > ... m.groupdict() > Traceback (most recent call last): > File "", line 2, in > IndexError: no such group > > It can't find a named group called 'JUNK'. After I hit send on my previous message, I thought more about your example. One of the issues here is that modifying the dict breaks an invariant of the API. I have a similar situation in the email module, and I used the same solution you did in regex: always return a new dict. It would be nice to be able to return a frozendict instead of having the overhead of building a new dict on each call. That by itself might not be enough reason. But, if the user wants to use the data in modified form elsewhere, they would then have to construct a new regular dict out of it, making the decision to vary the data from what matches the state of the object it came from an explicit one. That seems to fit the Python zen ("explicit is better than implicit"). So I'm changing my mind, and do consider this a valid use case, even absent the crash. --David From rdmurray at bitdance.com Wed Jul 16 16:24:45 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 16 Jul 2014 10:24:45 -0400 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <20140716140429.GA14503@k2> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <53C5E30B.6060509@mrabarnett.plus.com> <20140716134802.9ED8DB14086@webabinitio.net> <20140716140429.GA14503@k2> Message-ID: <20140716142445.7F4BB250D0C@webabinitio.net> On Wed, 16 Jul 2014 14:04:29 -0000, dw+python-dev at hmmz.org wrote: > On Wed, Jul 16, 2014 at 09:47:59AM -0400, R. David Murray wrote: > > > It would be nice to be able to return a frozendict instead of having the > > overhead of building a new dict on each call. > > There already is an in-between available both to Python and C: > PyDictProxy_New() / types.MappingProxyType. It's a one line change in > each case to return a temporary intermediary, using something like (C): > Py_INCREF(self->dict) > return self->dict; > > To > return PyDictProxy_New(self->dict); > > Or Python: > return self.dct > > To > return types.MappingProxyType(self.dct) > > Which is cheaper than a copy, and avoids having to audit every use of > self->dict to ensure the semantics required for a "frozendict" are > respected, i.e. no mutation occurs after the dict becomes visible to the > user, and potentially has __hash__ called. > > > > That by itself might not be enough reason. But, if the user wants to > > use the data in modified form elsewhere, they would then have to > > construct a new regular dict out of it, making the decision to vary > > the data from what matches the state of the object it came from an > > explicit one. That seems to fit the Python zen ("explicit is better > > than implicit"). > > > > So I'm changing my mind, and do consider this a valid use case, even > > absent the crash. > > Avoiding crashes seems a better use for a read-only proxy, rather than a > hashable immutable type. Good point. MappingProxyType wasn't yet exposed when I wrote that email code. --David From ericsnowcurrently at gmail.com Wed Jul 16 16:27:51 2014 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 16 Jul 2014 08:27:51 -0600 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <20140716134802.9ED8DB14086@webabinitio.net> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <53C5E30B.6060509@mrabarnett.plus.com> <20140716134802.9ED8DB14086@webabinitio.net> Message-ID: On Wed, Jul 16, 2014 at 7:47 AM, R. David Murray wrote: > After I hit send on my previous message, I thought more about your > example. One of the issues here is that modifying the dict breaks an > invariant of the API. I have a similar situation in the email module, > and I used the same solution you did in regex: always return a new dict. > It would be nice to be able to return a frozendict instead of having the > overhead of building a new dict on each call. That by itself might not > be enough reason. But, if the user wants to use the data in modified form > elsewhere, they would then have to construct a new regular dict out of it, > making the decision to vary the data from what matches the state of the > object it came from an explicit one. That seems to fit the Python zen > ("explicit is better than implicit"). > > So I'm changing my mind, and do consider this a valid use case, even > absent the crash. +1 A simple implementation is pretty straight-forward: class FrozenDict(Mapping): def __init__(self, *args, **kwargs): self._map = dict(*args, **kwargs) self._hash = ... def __hash__(self): return self._hash def __len__(self): return len(self._map) def __iter__(self): yield from self._map def __getitem__(self, key): return self._map[key] This is actually something I've used before on a number of occasions. Having it in the stdlib would be nice (though that alone is not sufficient for inclusion :)). If there is a valid use case for a frozen dict type in other stdlib modules, I'd consider that a pretty good justification for adding it. Incidentally, collections.abc.Mapping is the only one of the 6 container ABCs that does not have a concrete implementation (not counting types.MappingProxyType which is only a proxy). -eric From andreas.r.maier at gmx.de Wed Jul 16 13:39:55 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Wed, 16 Jul 2014 13:39:55 +0200 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: <20140713162249.GP5705@ando> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> Message-ID: <53C6648B.5000404@gmx.de> Am 13.07.2014 18:23, schrieb Steven D'Aprano: > On Sun, Jul 13, 2014 at 05:13:20PM +0200, Andreas Maier wrote: > >> Second, if not by delegation to equality of its elements, how would the >> equality of sequences defined otherwise? > > Wow. I'm impressed by the amount of detailed effort you've put into > investigating this. (Too much detail to absorb, I'm afraid.) But perhaps > you might have just asked on the python-list at python.org mailing list, or > here, where we would have told you the answer: > > list __eq__ first checks element identity before going on > to check element equality. I apologize for not asking. It seems I was looking at the trees (behaviors of specific cases) without seeing the wood (identity goes first). > If you can read C, you might like to check the list source code: > > http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c I can read (and write) C fluently, but (1) I don't have a build environment on my Windows system so I cannot debug it, and (2) I find it hard to judge from just looking at the C code which C function is invoked when the Python code enters the C code. (Quoting Raymond H. from his blog: "Unless you know where to look, searching the source for an answer can be a time consuming intellectual investment.") So thanks for clarifying this. I guess I am arriving (slowly and still partly reluctantly, and I'm not alone with that feeling, it seems ...) at the bottom line of all this, which is that reflexivity is an important goal in Python, that self-written non-reflexive classes are not intended nor well supported, and that the non-reflexive NaN is considered an exception that cannot be expected to be treated consistently non-reflexive. > This was discussed to death some time ago, both on python-dev and > python-ideas. If you're interested, you can start here: > > https://mail.python.org/pipermail/python-list/2012-October/633992.html > > which is in the middle of one of the threads, but at least it gets you > to the right time period. I read a number of posts in that thread by now. Sorry for not reading it earlier, but the mailing list archive just does not lend itself to searching the past. Of course, one can google it ;-) Andy From andreas.r.maier at gmx.de Wed Jul 16 13:40:03 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Wed, 16 Jul 2014 13:40:03 +0200 Subject: [Python-Dev] == on object tests identity in 3.x - list delegation to members? In-Reply-To: <87ion1owhk.fsf@gmail.com> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> <87ion1owhk.fsf@gmail.com> Message-ID: <53C66493.5040904@gmx.de> Am 13.07.2014 22:05, schrieb Akira Li: > Nick Coghlan writes: > ... >> definition of floats and the definition of container invariants like >> "assert x in [x]") >> >> The current approach means that the lack of reflexivity of NaN's stays >> confined to floats and similar types - it doesn't leak out and infect >> the behaviour of the container types. >> >> What we've never figured out is a good place to *document* it. I >> thought there was an open bug for that, but I can't find it right now. > > There was related issue "Tuple comparisons with NaNs are broken" > http://bugs.python.org/issue21873 > but it was closed as "not a bug" despite the corresponding behavior is > *not documented* anywhere. I currently know about these two issues related to fixing the docs: http://bugs.python.org/11945 - about NaN values in containers http://bugs.python.org/12067 - comparisons I am working on the latter, currently. The patch only targets the comparisons chapter in the Language Reference, there is another comparisons chapter in the Library Reference, and one in the Tutorial. I will need to update the patch to issue 12067 as a result of this discussion. Andy From dw+python-dev at hmmz.org Wed Jul 16 16:04:29 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Wed, 16 Jul 2014 14:04:29 +0000 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <20140716134802.9ED8DB14086@webabinitio.net> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <53C5E30B.6060509@mrabarnett.plus.com> <20140716134802.9ED8DB14086@webabinitio.net> Message-ID: <20140716140429.GA14503@k2> On Wed, Jul 16, 2014 at 09:47:59AM -0400, R. David Murray wrote: > It would be nice to be able to return a frozendict instead of having the > overhead of building a new dict on each call. There already is an in-between available both to Python and C: PyDictProxy_New() / types.MappingProxyType. It's a one line change in each case to return a temporary intermediary, using something like (C): Py_INCREF(self->dict) return self->dict; To return PyDictProxy_New(self->dict); Or Python: return self.dct To return types.MappingProxyType(self.dct) Which is cheaper than a copy, and avoids having to audit every use of self->dict to ensure the semantics required for a "frozendict" are respected, i.e. no mutation occurs after the dict becomes visible to the user, and potentially has __hash__ called. > That by itself might not be enough reason. But, if the user wants to > use the data in modified form elsewhere, they would then have to > construct a new regular dict out of it, making the decision to vary > the data from what matches the state of the object it came from an > explicit one. That seems to fit the Python zen ("explicit is better > than implicit"). > > So I'm changing my mind, and do consider this a valid use case, even > absent the crash. Avoiding crashes seems a better use for a read-only proxy, rather than a hashable immutable type. David From andreas.r.maier at gmx.de Wed Jul 16 17:24:16 2014 From: andreas.r.maier at gmx.de (Andreas Maier) Date: Wed, 16 Jul 2014 17:24:16 +0200 Subject: [Python-Dev] == on object tests identity in 3.x - uploaded patch v9 In-Reply-To: <53C66493.5040904@gmx.de> References: <53BB2AC7.2060009@gmx.de> <53BB2F25.3020205@gmx.de> <96E0871E-5495-47CC-9221-48C56D16A01D@gmail.com> <53BFEEF3.2060101@gmx.de> <53C04F10.8070509@stoneleaf.us> <53C2A210.80902@gmx.de> <20140713162249.GP5705@ando> <87ion1owhk.fsf@gmail.com> <53C66493.5040904@gmx.de> Message-ID: <53C69920.3050808@gmx.de> Am 16.07.2014 13:40, schrieb Andreas Maier: > Am 13.07.2014 22:05, schrieb Akira Li: >> Nick Coghlan writes: >> ... >> >> There was related issue "Tuple comparisons with NaNs are broken" >> http://bugs.python.org/issue21873 >> but it was closed as "not a bug" despite the corresponding behavior is >> *not documented* anywhere. > > I currently know about these two issues related to fixing the docs: > > http://bugs.python.org/11945 - about NaN values in containers > http://bugs.python.org/12067 - comparisons > > I am working on the latter, currently. The patch only targets the > comparisons chapter in the Language Reference, there is another > comparisons chapter in the Library Reference, and one in the Tutorial. > > I will need to update the patch to issue 12067 as a result of this > discussion. I have uploaded v9 of the patch to issue 12067; it should address the recent discussion (plus Mark's review comment on the issue itself). Please review. Andy From jeanpierreda at gmail.com Wed Jul 16 19:10:07 2014 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 16 Jul 2014 10:10:07 -0700 Subject: [Python-Dev] Another case for frozendict In-Reply-To: <20140716133755.C0A61250DEF@webabinitio.net> References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <53C5E30B.6060509@mrabarnett.plus.com> <20140716133755.C0A61250DEF@webabinitio.net> Message-ID: On Wed, Jul 16, 2014 at 6:37 AM, R. David Murray wrote: > IMO, preventing someone from shooting themselves in the foot by modifying > something they shouldn't modify according to the API is not a Python > use case ("consenting adults"). Then why have immutable objects at all? Why do you have to put tuples and frozensets inside sets, instead of lists and sets? Compare with Java, which really is "consenting adults" here -- you can add a mutable object to a set, just don't mutate it, or you might not be able to find it in the set again. Several people seem to act as if the Pythonic way is to not allow for any sort of immutable types at all. ISTM people are trying to retroactively claim some standard of Pythonicity that never existed. Python can and does protect you from shooting yourself in the foot by making objects immutable. Or do you have another explanation for the proliferation of immutable types, and the inability to add mutable types to sets and dicts? Using a frozendict to protect and enforce an invariant in the re module is entirely reasonable. So is creating a new dict each time. The intermediate -- reusing a mutable dict and failing in incomprehensible ways if you mutate it, and potentially even crashing due to memory safety issues -- is not Pythonic at all. -- Devin From rdmurray at bitdance.com Wed Jul 16 19:17:11 2014 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 16 Jul 2014 13:17:11 -0400 Subject: [Python-Dev] Another case for frozendict In-Reply-To: References: <6ede74ce745545f48593398a592330c8@BLUPR06MB434.namprd06.prod.outlook.com> <53C5E30B.6060509@mrabarnett.plus.com> <20140716133755.C0A61250DEF@webabinitio.net> Message-ID: <20140716171712.1A9B4250DF6@webabinitio.net> On Wed, 16 Jul 2014 10:10:07 -0700, Devin Jeanpierre wrote: > On Wed, Jul 16, 2014 at 6:37 AM, R. David Murray wrote: > > IMO, preventing someone from shooting themselves in the foot by modifying > > something they shouldn't modify according to the API is not a Python > > use case ("consenting adults"). > > Then why have immutable objects at all? Why do you have to put tuples > and frozensets inside sets, instead of lists and sets? Compare with > Java, which really is "consenting adults" here -- you can add a > mutable object to a set, just don't mutate it, or you might not be > able to find it in the set again. > > Several people seem to act as if the Pythonic way is to not allow for > any sort of immutable types at all. ISTM people are trying to > retroactively claim some standard of Pythonicity that never existed. > Python can and does protect you from shooting yourself in the foot by > making objects immutable. Or do you have another explanation for the > proliferation of immutable types, and the inability to add mutable > types to sets and dicts? > > Using a frozendict to protect and enforce an invariant in the re > module is entirely reasonable. So is creating a new dict each time. > The intermediate -- reusing a mutable dict and failing in > incomprehensible ways if you mutate it, and potentially even crashing > due to memory safety issues -- is not Pythonic at all. You'll note I ended up agreeing with you there: when mutation breaks an invariant of the object it came from, that's an issue. Which would be the case if you could use mutable objects as keys. --David From kmike84 at gmail.com Wed Jul 16 23:44:23 2014 From: kmike84 at gmail.com (Mikhail Korobov) Date: Thu, 17 Jul 2014 03:44:23 +0600 Subject: [Python-Dev] cStringIO vs io.BytesIO Message-ID: Hi, cStringIO was removed from Python 3. It seems the suggested replacement is io.BytesIO. But there is a problem: cStringIO.StringIO(b'data') didn't copy the data while io.BytesIO(b'data') makes a copy (even if the data is not modified later). This means io.BytesIO is not suited well to cases when you want to get a readonly file-like interface for existing byte strings. Isn't it one of the main io.BytesIO use cases? Wrapping bytes in cStringIO.StringIO used to be almost free, but this is not true for io.BytesIO. So making code 3.x compatible by ditching cStringIO can cause a serious performance/memory regressions. One can change the code to build the data using BytesIO (without creating bytes objects in the first place), but that is not always possible or convenient. I believe this problem affects tornado ( https://github.com/tornadoweb/tornado/issues/1110), Scrapy (this is how I became aware of this issue), NLTK (anecdotical evidence - I tried to port some hairy NLTK module to io.BytesIO, it became many times slower) and maybe pretty much every IO-related project ported to Python 3.x (django - check , werkzeug and frameworks based on it - check , requests - check - they all wrap user data to BytesIO, and this may cause slowdowns and up to 2x memory usage in Python 3.x). Do you know if there a workaround? Maybe there is some stdlib part that I'm missing, or a module on PyPI? It is not that hard to write an own wrapper that won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an existing solution or plans to fix it in Python itself - this BytesIO use case looks quite important. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dw+python-dev at hmmz.org Thu Jul 17 01:07:54 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Wed, 16 Jul 2014 23:07:54 +0000 Subject: [Python-Dev] cStringIO vs io.BytesIO In-Reply-To: References: Message-ID: <20140716230754.GA22619@k2> On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote: > So making code 3.x compatible by ditching cStringIO can cause a serious > performance/memory? regressions. One can change the code to build the data > using BytesIO (without creating bytes objects in the first place), but that is > not always possible or convenient. > > I believe this problem affects tornado (https://github.com/tornadoweb/tornado/ > Do you know if there a workaround? Maybe there is some stdlib part that I'm > missing, or a module on PyPI? It is not that hard to write an own wrapper that > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an > existing solution or plans to fix it in Python itself - this BytesIO use case > looks quite important. Regarding a fix, the problem seems mostly that the StringI/StringO specializations were removed, and the new implementation is basically just a StringO. At a small cost to memory, it is easy to add a Py_buffer source and flags variable to the bytesio struct, with the buffers initially setup for reading, and if a mutation method is called, check for a copy-on-write flag, duplicate the source object into private memory, then continue operating as it does now. Attached is a (rough) patch implementing this, feel free to try it with hg tip. [23:03:44 k2!124 cpython] cat i.py import io buf = b'x' * (1048576 * 16) def x(): io.BytesIO(buf) [23:03:51 k2!125 cpython] ./python -m timeit -s 'import i' 'i.x()' 100 loops, best of 3: 2.9 msec per loop [23:03:57 k2!126 cpython] ./python-cow -m timeit -s 'import i' 'i.x()' 1000000 loops, best of 3: 0.364 usec per loop David diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c --- a/Modules/_io/bytesio.c +++ b/Modules/_io/bytesio.c @@ -2,6 +2,12 @@ #include "structmember.h" /* for offsetof() */ #include "_iomodule.h" +enum io_flags { + /* initvalue describes a borrowed buffer we cannot modify and must later + * release */ + IO_SHARED = 1 +}; + typedef struct { PyObject_HEAD char *buf; @@ -11,6 +17,10 @@ PyObject *dict; PyObject *weakreflist; Py_ssize_t exports; + Py_buffer initvalue; + /* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that + * we don't own buf. */ + enum io_flags flags; } bytesio; typedef struct { @@ -33,6 +43,47 @@ return NULL; \ } +/* Unshare our buffer in preparation for writing, in the case that an + * initvalue object was provided, and we're currently borrowing its buffer. + * size indicates the total reserved buffer size allocated as part of + * unsharing, to avoid a potentially redundant allocation in the subsequent + * mutation. + */ +static int +unshare(bytesio *self, size_t size) +{ + Py_ssize_t new_size = size; + Py_ssize_t copy_size = size; + char *new_buf; + + /* Do nothing if buffer wasn't shared */ + if (! (self->flags & IO_SHARED)) { + return 0; + } + + /* If hint provided, adjust our new buffer size and truncate the amount of + * source buffer we copy as necessary. */ + if (size > copy_size) { + copy_size = size; + } + + /* Allocate or fail. */ + new_buf = (char *)PyMem_Malloc(new_size); + if (new_buf == NULL) { + PyErr_NoMemory(); + return -1; + } + + /* Copy the (possibly now truncated) source string to the new buffer, and + * forget any reference used to keep the source buffer alive. */ + memcpy(new_buf, self->buf, copy_size); + PyBuffer_Release(&self->initvalue); + self->flags &= ~IO_SHARED; + self->buf = new_buf; + self->buf_size = new_size; + self->string_size = (Py_ssize_t) copy_size; + return 0; +} /* Internal routine to get a line from the buffer of a BytesIO object. Returns the length between the current position to the @@ -125,11 +176,18 @@ static Py_ssize_t write_bytes(bytesio *self, const char *bytes, Py_ssize_t len) { + size_t desired; + assert(self->buf != NULL); assert(self->pos >= 0); assert(len >= 0); - if ((size_t)self->pos + len > self->buf_size) { + desired = (size_t)self->pos + len; + if (unshare(self, desired)) { + return -1; + } + + if (desired > self->buf_size) { if (resize_buffer(self, (size_t)self->pos + len) < 0) return -1; } @@ -502,6 +560,10 @@ return NULL; } + if (unshare(self, size)) { + return NULL; + } + if (size < self->string_size) { self->string_size = size; if (resize_buffer(self, size) < 0) @@ -655,10 +717,13 @@ static PyObject * bytesio_close(bytesio *self) { - if (self->buf != NULL) { + if (self->flags & IO_SHARED) { + PyBuffer_Release(&self->initvalue); + self->flags &= ~IO_SHARED; + } else if (self->buf != NULL) { PyMem_Free(self->buf); - self->buf = NULL; } + self->buf = NULL; Py_RETURN_NONE; } @@ -788,10 +853,17 @@ "deallocated BytesIO object has exported buffers"); PyErr_Print(); } - if (self->buf != NULL) { + + if (self->flags & IO_SHARED) { + /* We borrowed buf from another object */ + PyBuffer_Release(&self->initvalue); + self->flags &= ~IO_SHARED; + } else if (self->buf != NULL) { + /* We owned buf */ PyMem_Free(self->buf); - self->buf = NULL; } + self->buf = NULL; + Py_CLEAR(self->dict); if (self->weakreflist != NULL) PyObject_ClearWeakRefs((PyObject *) self); @@ -811,12 +883,6 @@ /* tp_alloc initializes all the fields to zero. So we don't have to initialize them here. */ - self->buf = (char *)PyMem_Malloc(0); - if (self->buf == NULL) { - Py_DECREF(self); - return PyErr_NoMemory(); - } - return (PyObject *)self; } @@ -834,13 +900,32 @@ self->string_size = 0; self->pos = 0; + /* Release any previous initvalue. */ + if (self->flags & IO_SHARED) { + PyBuffer_Release(&self->initvalue); + self->buf = NULL; + self->flags &= ~IO_SHARED; + } + if (initvalue && initvalue != Py_None) { - PyObject *res; - res = bytesio_write(self, initvalue); - if (res == NULL) + Py_buffer *buf = &self->initvalue; + if (PyObject_GetBuffer(initvalue, buf, PyBUF_CONTIG_RO) < 0) { return -1; - Py_DECREF(res); - self->pos = 0; + } + self->buf = self->initvalue.buf; + self->buf_size = (size_t)self->initvalue.len; + self->string_size = self->initvalue.len; + self->flags |= IO_SHARED; + } + + /* If no initvalue provided, prepare a private buffer now. */ + if (self->buf == NULL) { + self->buf = (char *)PyMem_Malloc(0); + if (self->buf == NULL) { + Py_DECREF(self); + PyErr_NoMemory(); + return -1; + } } return 0; From dw+python-dev at hmmz.org Thu Jul 17 02:18:21 2014 From: dw+python-dev at hmmz.org (dw+python-dev at hmmz.org) Date: Thu, 17 Jul 2014 00:18:21 +0000 Subject: [Python-Dev] cStringIO vs io.BytesIO In-Reply-To: <20140716230754.GA22619@k2> References: <20140716230754.GA22619@k2> Message-ID: <20140717001821.GA25779@k2> It's worth note that a natural extension of this is to do something very similar on the write side: instead of generating a temporary private heap allocation, generate (and freely resize) a private PyBytes object until it is exposed to the user, at which point, _getvalue() returns it, and converts its into an IO_SHARED buffer. That way another copy is avoided in the common case of building a string, calling getvalue() once, then discarding the IO object. David On Wed, Jul 16, 2014 at 11:07:54PM +0000, dw+python-dev at hmmz.org wrote: > On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote: > > > So making code 3.x compatible by ditching cStringIO can cause a serious > > performance/memory? regressions. One can change the code to build the data > > using BytesIO (without creating bytes objects in the first place), but that is > > not always possible or convenient. > > > > I believe this problem affects tornado (https://github.com/tornadoweb/tornado/ > > Do you know if there a workaround? Maybe there is some stdlib part that I'm > > missing, or a module on PyPI? It is not that hard to write an own wrapper that > > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an > > existing solution or plans to fix it in Python itself - this BytesIO use case > > looks quite important. > > Regarding a fix, the problem seems mostly that the StringI/StringO > specializations were removed, and the new implementation is basically > just a StringO. > > At a small cost to memory, it is easy to add a Py_buffer source and > flags variable to the bytesio struct, with the buffers initially setup > for reading, and if a mutation method is called, check for a > copy-on-write flag, duplicate the source object into private memory, > then continue operating as it does now. > > Attached is a (rough) patch implementing this, feel free to try it with > hg tip. > > [23:03:44 k2!124 cpython] cat i.py > import io > buf = b'x' * (1048576 * 16) > def x(): > io.BytesIO(buf) > > [23:03:51 k2!125 cpython] ./python -m timeit -s 'import i' 'i.x()' > 100 loops, best of 3: 2.9 msec per loop > > [23:03:57 k2!126 cpython] ./python-cow -m timeit -s 'import i' 'i.x()' > 1000000 loops, best of 3: 0.364 usec per loop > > > David > > > > diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c > --- a/Modules/_io/bytesio.c > +++ b/Modules/_io/bytesio.c > @@ -2,6 +2,12 @@ > #include "structmember.h" /* for offsetof() */ > #include "_iomodule.h" > > +enum io_flags { > + /* initvalue describes a borrowed buffer we cannot modify and must later > + * release */ > + IO_SHARED = 1 > +}; > + > typedef struct { > PyObject_HEAD > char *buf; > @@ -11,6 +17,10 @@ > PyObject *dict; > PyObject *weakreflist; > Py_ssize_t exports; > + Py_buffer initvalue; > + /* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that > + * we don't own buf. */ > + enum io_flags flags; > } bytesio; > > typedef struct { > @@ -33,6 +43,47 @@ > return NULL; \ > } > > +/* Unshare our buffer in preparation for writing, in the case that an > + * initvalue object was provided, and we're currently borrowing its buffer. > + * size indicates the total reserved buffer size allocated as part of > + * unsharing, to avoid a potentially redundant allocation in the subsequent > + * mutation. > + */ > +static int > +unshare(bytesio *self, size_t size) > +{ > + Py_ssize_t new_size = size; > + Py_ssize_t copy_size = size; > + char *new_buf; > + > + /* Do nothing if buffer wasn't shared */ > + if (! (self->flags & IO_SHARED)) { > + return 0; > + } > + > + /* If hint provided, adjust our new buffer size and truncate the amount of > + * source buffer we copy as necessary. */ > + if (size > copy_size) { > + copy_size = size; > + } > + > + /* Allocate or fail. */ > + new_buf = (char *)PyMem_Malloc(new_size); > + if (new_buf == NULL) { > + PyErr_NoMemory(); > + return -1; > + } > + > + /* Copy the (possibly now truncated) source string to the new buffer, and > + * forget any reference used to keep the source buffer alive. */ > + memcpy(new_buf, self->buf, copy_size); > + PyBuffer_Release(&self->initvalue); > + self->flags &= ~IO_SHARED; > + self->buf = new_buf; > + self->buf_size = new_size; > + self->string_size = (Py_ssize_t) copy_size; > + return 0; > +} > > /* Internal routine to get a line from the buffer of a BytesIO > object. Returns the length between the current position to the > @@ -125,11 +176,18 @@ > static Py_ssize_t > write_bytes(bytesio *self, const char *bytes, Py_ssize_t len) > { > + size_t desired; > + > assert(self->buf != NULL); > assert(self->pos >= 0); > assert(len >= 0); > > - if ((size_t)self->pos + len > self->buf_size) { > + desired = (size_t)self->pos + len; > + if (unshare(self, desired)) { > + return -1; > + } > + > + if (desired > self->buf_size) { > if (resize_buffer(self, (size_t)self->pos + len) < 0) > return -1; > } > @@ -502,6 +560,10 @@ > return NULL; > } > > + if (unshare(self, size)) { > + return NULL; > + } > + > if (size < self->string_size) { > self->string_size = size; > if (resize_buffer(self, size) < 0) > @@ -655,10 +717,13 @@ > static PyObject * > bytesio_close(bytesio *self) > { > - if (self->buf != NULL) { > + if (self->flags & IO_SHARED) { > + PyBuffer_Release(&self->initvalue); > + self->flags &= ~IO_SHARED; > + } else if (self->buf != NULL) { > PyMem_Free(self->buf); > - self->buf = NULL; > } > + self->buf = NULL; > Py_RETURN_NONE; > } > > @@ -788,10 +853,17 @@ > "deallocated BytesIO object has exported buffers"); > PyErr_Print(); > } > - if (self->buf != NULL) { > + > + if (self->flags & IO_SHARED) { > + /* We borrowed buf from another object */ > + PyBuffer_Release(&self->initvalue); > + self->flags &= ~IO_SHARED; > + } else if (self->buf != NULL) { > + /* We owned buf */ > PyMem_Free(self->buf); > - self->buf = NULL; > } > + self->buf = NULL; > + > Py_CLEAR(self->dict); > if (self->weakreflist != NULL) > PyObject_ClearWeakRefs((PyObject *) self); > @@ -811,12 +883,6 @@ > /* tp_alloc initializes all the fields to zero. So we don't have to > initialize them here. */ > > - self->buf = (char *)PyMem_Malloc(0); > - if (self->buf == NULL) { > - Py_DECREF(self); > - return PyErr_NoMemory(); > - } > - > return (PyObject *)self; > } > > @@ -834,13 +900,32 @@ > self->string_size = 0; > self->pos = 0; > > + /* Release any previous initvalue. */ > + if (self->flags & IO_SHARED) { > + PyBuffer_Release(&self->initvalue); > + self->buf = NULL; > + self->flags &= ~IO_SHARED; > + } > + > if (initvalue && initvalue != Py_None) { > - PyObject *res; > - res = bytesio_write(self, initvalue); > - if (res == NULL) > + Py_buffer *buf = &self->initvalue; > + if (PyObject_GetBuffer(initvalue, buf, PyBUF_CONTIG_RO) < 0) { > return -1; > - Py_DECREF(res); > - self->pos = 0; > + } > + self->buf = self->initvalue.buf; > + self->buf_size = (size_t)self->initvalue.len; > + self->string_size = self->initvalue.len; > + self->flags |= IO_SHARED; > + } > + > + /* If no initvalue provided, prepare a private buffer now. */ > + if (self->buf == NULL) { > + self->buf = (char *)PyMem_Malloc(0); > + if (self->buf == NULL) { > + Py_DECREF(self); > + PyErr_NoMemory(); > + return -1; > + } > } > > return 0; > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/dw%2Bpython-dev%40hmmz.org From ncoghlan at gmail.com Thu Jul 17 03:28:16 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Jul 2014 21:28:16 -0400 Subject: [Python-Dev] cStringIO vs io.BytesIO In-Reply-To: <20140716230754.GA22619@k2> References: <20140716230754.GA22619@k2> Message-ID: On 16 Jul 2014 20:00, wrote: > On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote: > > I believe this problem affects tornado ( https://github.com/tornadoweb/tornado/ > > Do you know if there a workaround? Maybe there is some stdlib part that I'm > > missing, or a module on PyPI? It is not that hard to write an own wrapper that > > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an > > existing solution or plans to fix it in Python itself - this BytesIO use case > > looks quite important. > > Regarding a fix, the problem seems mostly that the StringI/StringO > specializations were removed, and the new implementation is basically > just a StringO. Right, I don't think there's a major philosophy change here, just a missing optimisation that could be restored in 3.5. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Thu Jul 17 03:51:27 2014 From: antoine at python.org (Antoine Pitrou) Date: Wed, 16 Jul 2014 21:51:27 -0400 Subject: [Python-Dev] cStringIO vs io.BytesIO In-Reply-To: <20140716230754.GA22619@k2> References: <20140716230754.GA22619@k2> Message-ID: Hi, Le 16/07/2014 19:07, dw+python-dev at hmmz.org a ?crit : > > Attached is a (rough) patch implementing this, feel free to try it with > hg tip. Thanks for your work. Please post any patch to http://bugs.python.org Regards Antoine. From kmike84 at gmail.com Thu Jul 17 20:24:17 2014 From: kmike84 at gmail.com (Mikhail Korobov) Date: Fri, 18 Jul 2014 00:24:17 +0600 Subject: [Python-Dev] cStringIO vs io.BytesIO In-Reply-To: References: <20140716230754.GA22619@k2> Message-ID: That was an impressively fast draft patch! 2014-07-17 7:28 GMT+06:00 Nick Coghlan : > > On 16 Jul 2014 20:00, wrote: > > On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote: > > > I believe this problem affects tornado ( > https://github.com/tornadoweb/tornado/ > > > Do you know if there a workaround? Maybe there is some stdlib part > that I'm > > > missing, or a module on PyPI? It is not that hard to write an own > wrapper that > > > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there > is an > > > existing solution or plans to fix it in Python itself - this BytesIO > use case > > > looks quite important. > > > > Regarding a fix, the problem seems mostly that the StringI/StringO > > specializations were removed, and the new implementation is basically > > just a StringO. > > Right, I don't think there's a major philosophy change here, just a > missing optimisation that could be restored in 3.5. > > Cheers, > Nick. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jul 18 18:07:59 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 18 Jul 2014 18:07:59 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140718160759.5064A56A70@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-07-11 - 2014-07-18) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4589 ( +1) closed 29188 (+47) total 33777 (+48) Open issues with patches: 2154 Issues opened (36) ================== #21044: tarfile does not handle file .name being an int http://bugs.python.org/issue21044 reopened by zach.ware #21946: 'python -u' yields trailing carriage return '\r' (Python2 for http://bugs.python.org/issue21946 reopened by haypo #21950: import sqlite3 not running after configure --prefix=/alt/path; http://bugs.python.org/issue21950 reopened by r.david.murray #21958: Allow python 2.7 to compile with Visual Studio 2013 http://bugs.python.org/issue21958 opened by Zachary.Turner #21960: Better path handling in Idle find in files http://bugs.python.org/issue21960 opened by terry.reedy #21961: Add What's New for Idle. http://bugs.python.org/issue21961 opened by terry.reedy #21962: No timeout for asyncio.Event.wait() or asyncio.Condition.wait( http://bugs.python.org/issue21962 opened by ajaborsk #21963: 2.7.8 backport of Issue1856 (avoid daemon thread problems at s http://bugs.python.org/issue21963 opened by ned.deily #21964: inconsistency in list-generator comprehension with yield(-from http://bugs.python.org/issue21964 opened by hakril #21965: Add support for Memory BIO to _ssl http://bugs.python.org/issue21965 opened by geertj #21967: Interpreter crash upon accessing frame.f_restricted of a frame http://bugs.python.org/issue21967 opened by anselm.kruis #21969: WindowsPath constructor does not check for invalid characters http://bugs.python.org/issue21969 opened by Antony.Lee #21970: Broken code for handling file://host in urllib.request.FileHan http://bugs.python.org/issue21970 opened by vadmium #21971: Index and update turtledemo doc. http://bugs.python.org/issue21971 opened by terry.reedy #21972: Bugs in the lexer and parser documentation http://bugs.python.org/issue21972 opened by Fran??ois-Ren??.Rideau #21973: Idle should not quit on corrupted user config files http://bugs.python.org/issue21973 opened by Tomk #21975: Using pickled/unpickled sqlite3.Row results in segfault rather http://bugs.python.org/issue21975 opened by Elizacat #21976: Fix test_ssl.py to handle LibreSSL versioning appropriately http://bugs.python.org/issue21976 opened by worr #21980: Implement `logging.LogRecord.__repr__` http://bugs.python.org/issue21980 opened by cool-RR #21983: segfault in ctypes.cast http://bugs.python.org/issue21983 opened by Anthony.LaTorre #21986: Pickleability of code objects is inconsistent http://bugs.python.org/issue21986 opened by ppperry #21987: TarFile.getmember on directory requires trailing slash iff ove http://bugs.python.org/issue21987 opened by moloney #21989: Missing (optional) argument `start` and `end` in documentation http://bugs.python.org/issue21989 opened by SylvainDe #21990: saxutils defines an inner class where a normal one would do http://bugs.python.org/issue21990 opened by alex #21991: The new email API should use MappingProxyType instead of retur http://bugs.python.org/issue21991 opened by r.david.murray #21992: New AST node Else() should be introduced http://bugs.python.org/issue21992 opened by Igor.Bronshteyn #21995: Idle: pseudofiles have no buffer attribute. http://bugs.python.org/issue21995 opened by terry.reedy #21996: gettarinfo method does not handle files without text string na http://bugs.python.org/issue21996 opened by vadmium #21997: Pdb.set_trace debugging does not end correctly in IDLE http://bugs.python.org/issue21997 opened by ppperry #21998: asyncio: a new self-pipe should be created in the child proces http://bugs.python.org/issue21998 opened by haypo #21999: shlex: bug in posix more handling of empty strings http://bugs.python.org/issue21999 opened by isoschiz #22000: cross type comparisons clarification http://bugs.python.org/issue22000 opened by Jim.Jewett #22001: containers "same" does not always mean "__eq__". http://bugs.python.org/issue22001 opened by Jim.Jewett #22002: Make full use of test discovery in test subpackages http://bugs.python.org/issue22002 opened by zach.ware #22003: BytesIO copy-on-write http://bugs.python.org/issue22003 opened by dw #22005: datetime.__setstate__ fails decoding python2 pickle http://bugs.python.org/issue22005 opened by eddygeek Most recent 15 issues with no replies (15) ========================================== #22005: datetime.__setstate__ fails decoding python2 pickle http://bugs.python.org/issue22005 #22000: cross type comparisons clarification http://bugs.python.org/issue22000 #21999: shlex: bug in posix more handling of empty strings http://bugs.python.org/issue21999 #21998: asyncio: a new self-pipe should be created in the child proces http://bugs.python.org/issue21998 #21997: Pdb.set_trace debugging does not end correctly in IDLE http://bugs.python.org/issue21997 #21996: gettarinfo method does not handle files without text string na http://bugs.python.org/issue21996 #21995: Idle: pseudofiles have no buffer attribute. http://bugs.python.org/issue21995 #21992: New AST node Else() should be introduced http://bugs.python.org/issue21992 #21991: The new email API should use MappingProxyType instead of retur http://bugs.python.org/issue21991 #21990: saxutils defines an inner class where a normal one would do http://bugs.python.org/issue21990 #21989: Missing (optional) argument `start` and `end` in documentation http://bugs.python.org/issue21989 #21971: Index and update turtledemo doc. http://bugs.python.org/issue21971 #21967: Interpreter crash upon accessing frame.f_restricted of a frame http://bugs.python.org/issue21967 #21965: Add support for Memory BIO to _ssl http://bugs.python.org/issue21965 #21960: Better path handling in Idle find in files http://bugs.python.org/issue21960 Most recent 15 issues waiting for review (15) ============================================= #22003: BytesIO copy-on-write http://bugs.python.org/issue22003 #22002: Make full use of test discovery in test subpackages http://bugs.python.org/issue22002 #21999: shlex: bug in posix more handling of empty strings http://bugs.python.org/issue21999 #21990: saxutils defines an inner class where a normal one would do http://bugs.python.org/issue21990 #21989: Missing (optional) argument `start` and `end` in documentation http://bugs.python.org/issue21989 #21986: Pickleability of code objects is inconsistent http://bugs.python.org/issue21986 #21976: Fix test_ssl.py to handle LibreSSL versioning appropriately http://bugs.python.org/issue21976 #21975: Using pickled/unpickled sqlite3.Row results in segfault rather http://bugs.python.org/issue21975 #21965: Add support for Memory BIO to _ssl http://bugs.python.org/issue21965 #21958: Allow python 2.7 to compile with Visual Studio 2013 http://bugs.python.org/issue21958 #21955: ceval.c: implement fast path for integers with a single digit http://bugs.python.org/issue21955 #21951: tcl test change crashes AIX http://bugs.python.org/issue21951 #21947: `Dis` module doesn't know how to disassemble generators http://bugs.python.org/issue21947 #21944: Allow copying of CodecInfo objects http://bugs.python.org/issue21944 #21941: Clean up turtle TPen class http://bugs.python.org/issue21941 Top 10 most discussed issues (10) ================================= #21645: asyncio: Race condition in signal handling on FreeBSD http://bugs.python.org/issue21645 16 msgs #15443: datetime module has no support for nanoseconds http://bugs.python.org/issue15443 14 msgs #21815: imaplib truncates some untagged responses http://bugs.python.org/issue21815 14 msgs #21935: Implement AUTH command in smtpd. http://bugs.python.org/issue21935 11 msgs #21955: ceval.c: implement fast path for integers with a single digit http://bugs.python.org/issue21955 10 msgs #21975: Using pickled/unpickled sqlite3.Row results in segfault rather http://bugs.python.org/issue21975 9 msgs #21986: Pickleability of code objects is inconsistent http://bugs.python.org/issue21986 9 msgs #21927: BOM appears in stdin when using Powershell http://bugs.python.org/issue21927 8 msgs #1598: unexpected response in imaplib http://bugs.python.org/issue1598 7 msgs #18320: python installation is broken if prefix is overridden on an in http://bugs.python.org/issue18320 7 msgs Issues closed (43) ================== #8849: python.exe problem with cvxopt http://bugs.python.org/issue8849 closed by r.david.murray #9390: Error in sys.excepthook on windows when redirecting output of http://bugs.python.org/issue9390 closed by zach.ware #14714: PEP 414 tokenizing hook does not preserve tabs http://bugs.python.org/issue14714 closed by aronacher #15962: Windows STDIN/STDOUT Redirection is actually FIXED http://bugs.python.org/issue15962 closed by terry.reedy #16178: atexit._run_exitfuncs should be a public API http://bugs.python.org/issue16178 closed by rhettinger #16237: bdist_rpm SPEC files created with distutils may be distro spec http://bugs.python.org/issue16237 closed by ncoghlan #16382: Better warnings exception for bad category http://bugs.python.org/issue16382 closed by berker.peksag #16859: tarfile.TarInfo.fromtarfile does not check read() return value http://bugs.python.org/issue16859 closed by lars.gustaebel #16895: Batch file to mimic 'make' on Windows http://bugs.python.org/issue16895 closed by zach.ware #17308: Dialog.py crashes when putty Window resized http://bugs.python.org/issue17308 closed by berker.peksag #18144: FD leak in urllib2 http://bugs.python.org/issue18144 closed by serhiy.storchaka #18974: Use argparse in the diff script http://bugs.python.org/issue18974 closed by serhiy.storchaka #19076: Pdb.do_break calls error with obsolete file kwarg http://bugs.python.org/issue19076 closed by berker.peksag #19355: Initial modernization of OpenWatcom support http://bugs.python.org/issue19355 closed by Jeffrey.Armstrong #20451: os.exec* mangles argv on windows (splits on spaces, etc) http://bugs.python.org/issue20451 closed by rhettinger #21059: idle_test.test_warning failure http://bugs.python.org/issue21059 closed by zach.ware #21163: asyncio doesn't warn if a task is destroyed during its executi http://bugs.python.org/issue21163 closed by haypo #21247: test_asyncio: test_subprocess_send_signal hangs on Fedora buil http://bugs.python.org/issue21247 closed by haypo #21323: CGI HTTP server not running scripts from subdirectories http://bugs.python.org/issue21323 closed by ned.deily #21599: Argument transport in attach and detach method in Server class http://bugs.python.org/issue21599 closed by haypo #21655: Write Unit Test for Vec2 and TNavigator class in the Turtle Mo http://bugs.python.org/issue21655 closed by Lita.Cho #21765: Idle: make 3.x HyperParser work with non-ascii identifiers. http://bugs.python.org/issue21765 closed by terry.reedy #21899: Futures are not marked as completed http://bugs.python.org/issue21899 closed by Sebastian.Kreft.Deezer #21906: Tools\Scripts\md5sum.py doesn't work in Python 3.x http://bugs.python.org/issue21906 closed by berker.peksag #21913: threading.Condition.wait() is not interruptible in Python 2.7 http://bugs.python.org/issue21913 closed by neologix #21918: Convert test_tools to directory http://bugs.python.org/issue21918 closed by zach.ware #21953: pythonrun.c does not check std streams the same as fileio.c http://bugs.python.org/issue21953 closed by steve.dower #21957: ASCII Formfeed (FF) & ASCII Vertical Tab (VT) Have Hexadecimal http://bugs.python.org/issue21957 closed by ned.deily #21959: msi product code for 2.7.8150 not in Tools/msi/uuids.py http://bugs.python.org/issue21959 closed by r.david.murray #21966: InteractiveConsole does not support -q option http://bugs.python.org/issue21966 closed by belopolsky #21968: 'abort' object is not callable http://bugs.python.org/issue21968 closed by Apple Grew #21974: Typo in "Set" in PEP 289 http://bugs.python.org/issue21974 closed by rhettinger #21977: In the re's token example OP and SKIP regexes can be improved http://bugs.python.org/issue21977 closed by rhettinger #21978: Support index access on OrderedDict views (e.g. o.keys()[7]) http://bugs.python.org/issue21978 closed by rhettinger #21979: SyntaxError not raised on "0xaor 1" http://bugs.python.org/issue21979 closed by mark.dickinson #21981: Idle problem http://bugs.python.org/issue21981 closed by eric.smith #21982: Idle configDialog: fix regression and add minimal unittest http://bugs.python.org/issue21982 closed by terry.reedy #21984: list(itertools.repeat(1)) causes the system to hang http://bugs.python.org/issue21984 closed by rhettinger #21985: test_asyncio prints some junk http://bugs.python.org/issue21985 closed by haypo #21988: Decrease iterating overhead in timeit http://bugs.python.org/issue21988 closed by gvanrossum #21993: counterintuitive behavior of list.index with boolean values http://bugs.python.org/issue21993 closed by ezio.melotti #21994: Syntax error in the ssl module documentation http://bugs.python.org/issue21994 closed by berker.peksag #22004: io documentation refers to newline as newlines http://bugs.python.org/issue22004 closed by python-dev From techtonik at gmail.com Sun Jul 20 16:34:27 2014 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 20 Jul 2014 17:34:27 +0300 Subject: [Python-Dev] subprocess research - max limit for piped output Message-ID: I am trying to figure out what is maximum size for piped input in subprocess.check_output() I've got limitation of about 500Mb after which Python exits with MemoryError without any additional details. I have only 2.76Gb memory used out of 8Gb, so what limit do I hit? 1. subprocess output read buffer 2. Python limit on size of variable 3. some OS limit on output pipes Testcase attached. C:\discovery\interface\subprocess>py dead.py Testing size: 520Mb ..truncating to 545259520 .. Traceback (most recent call last): File "dead.py", line 66, in backticks(r'type largefile') File "dead.py", line 36, in backticks output = subprocess.check_output(command, shell=True) File "C:\Python27\lib\subprocess.py", line 567, in check_output output, unused_err = process.communicate() File "C:\Python27\lib\subprocess.py", line 791, in communicate stdout = _eintr_retry_call(self.stdout.read) File "C:\Python27\lib\subprocess.py", line 476, in _eintr_retry_call return func(*args) MemoryError The process tried to write to a nonexistent pipe. -- anatoly t. -------------- next part -------------- import subprocess # --- replacing shell backticks --- # https://docs.python.org/2/library/subprocess.html#replacing-bin-sh-shell-backquote # output=`mycmd myarg` # output = check_output(["mycmd", "myarg"]) # not true, because mycmd is not passed to shell try: pass #output = subprocess.check_output(["mycmd", "myarg"], shell=True) except OSError as ex: # command not found. # it is impossible to catch output here, but shell outputs # message to stderr, which backticks doesn't catch either output = '' except subprocess.CalledProcessError as ex: output = ex.output # ^ information about error condition is lost # ^ output in case of OSError is lost # ux notes: # - `mycmd myarg` > ["mycmd", "myarg"] # - `` is invisible # subprocess.check_output is hardly rememberable # - exception checking is excessive and not needed # (common pattern is to check return code) def backticks(command): ''' - no return code - no stderr capture ''' try: # this doesn't escape shell patterns, such as: # ^ (windows cmd.exe shell) output = subprocess.check_output(command, shell=True) except OSError as ex: # command not found. # it is impossible to catch output here, but shell outputs # message to stderr, which backticks doesn't catch either output = '' except subprocess.CalledProcessError as ex: output = ex.output return output import os for size in range(520, 600, 2): print("Testing size: %sMb" % size) #cursize = os.path.getsize("largefile") with open("largefile", "ab") as data: data.seek(0, 2) cursize = data.tell() #print(cursize) limit = size*1024**2 if cursize > limit: print('..truncating to %s' % limit) data.truncate(limit) else: print('..extending to %s' % limit) while cursize < limit: toadd = min(100, limit-cursize) data.write('1'*99+'\n') cursize += 100 print("..") backticks(r'type largefile') From antoine at python.org Sun Jul 20 18:50:06 2014 From: antoine at python.org (Antoine Pitrou) Date: Sun, 20 Jul 2014 12:50:06 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: Hi, > Thanks Victor, Nick, Ethan, and others for continued discussion on the > scandir PEP 471 (most recent thread starts at > https://mail.python.org/pipermail/python-dev/2014-July/135377.html). Have you tried modifying importlib's _bootstrap.py to use scandir() instead of listdir() + stat()? Regards Antoine. From benhoyt at gmail.com Sun Jul 20 23:34:19 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Sun, 20 Jul 2014 17:34:19 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: > Have you tried modifying importlib's _bootstrap.py to use scandir() instead > of listdir() + stat()? No, I haven't -- I'm not familiar with that code. What does _bootstrap.py do -- does it do a lot of listdir calls and stat-ing of many files? -Ben From brett at python.org Mon Jul 21 00:35:48 2014 From: brett at python.org (Brett Cannon) Date: Sun, 20 Jul 2014 22:35:48 +0000 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() References: Message-ID: Oh yes. :) The file Antoine is referring to is the implementation of import. On Sun, Jul 20, 2014, 17:34 Ben Hoyt wrote: > > Have you tried modifying importlib's _bootstrap.py to use scandir() > instead > > of listdir() + stat()? > > No, I haven't -- I'm not familiar with that code. What does > _bootstrap.py do -- does it do a lot of listdir calls and stat-ing of > many files? > > -Ben > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Mon Jul 21 01:45:28 2014 From: antoine at python.org (Antoine Pitrou) Date: Sun, 20 Jul 2014 19:45:28 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: Le 20/07/2014 17:34, Ben Hoyt a ?crit : >> Have you tried modifying importlib's _bootstrap.py to use scandir() instead >> of listdir() + stat()? > > No, I haven't -- I'm not familiar with that code. What does > _bootstrap.py do -- does it do a lot of listdir calls and stat-ing of > many files? Quite a bit, although that should be dampened in recent 3.x versions, thanks to the caching of directory contents. Even though there is tangible performance improvement from scandir(), it would be useful to find out if the API fits well. Regards Antoine. From benhoyt at gmail.com Mon Jul 21 17:32:05 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 21 Jul 2014 11:32:05 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: > Even though there is tangible performance improvement from scandir(), it > would be useful to find out if the API fits well. Got it -- I see where you're coming from now. I'll take a quick look (hopefully later this week). -Ben From victor.stinner at gmail.com Mon Jul 21 17:57:12 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 21 Jul 2014 17:57:12 +0200 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: Hi, 2014-07-20 18:50 GMT+02:00 Antoine Pitrou : > Have you tried modifying importlib's _bootstrap.py to use scandir() instead > of listdir() + stat()? IMO the current os.scandir() API does not fit importlib requirements. importlib usually wants fresh data, whereas DirEntry cache cannot be invalidated. It's probably possible to cache some os.stat() result in importlib, but it looks like it requires a non trivial refactoring of the code. I don't know importlib enough to suggest how to change it. There are many open isssues related to stat() in importlib, I found these ones: http://bugs.python.org/issue14604 http://bugs.python.org/issue14067 http://bugs.python.org/issue19216 Closed issues: http://bugs.python.org/issue17330 http://bugs.python.org/issue18810 By the way, DirEntry constructor is not documented in the PEP. Should we document it? It might be a way to "invalidate the cache": entry = DirEntry(os.path.dirname(entry.path), entry.name) Maybe it is an abuse of the API. A clear_cache() method would be less ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry for a long time? Another question: should we expose DirEntry type directly in the os namespace? (os.DirEntry) Victor From Steve.Dower at microsoft.com Mon Jul 21 18:11:45 2014 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 21 Jul 2014 16:11:45 +0000 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: <5a4f4fb5c98347258ad1ed1c754d922f@DM2PR0301MB0734.namprd03.prod.outlook.com> Victor Stinner wrote: > 2014-07-20 18:50 GMT+02:00 Antoine Pitrou : >> Have you tried modifying importlib's _bootstrap.py to use scandir() >> instead of listdir() + stat()? > > IMO the current os.scandir() API does not fit importlib requirements. > importlib usually wants fresh data, whereas DirEntry cache cannot be > invalidated. It's probably possible to cache some os.stat() result in > importlib, but it looks like it requires a non trivial refactoring of > the code. I don't know importlib enough to suggest how to change it. The data is completely fresh at the time it is obtained, which is identical to using stat(). There will always be a race-condition between looking and doing, which is why we still use exception handling on actions. > By the way, DirEntry constructor is not documented in the PEP. Should > we document it? It might be a way to "invalidate the cache": > > entry = DirEntry(os.path.dirname(entry.path), entry.name) > > Maybe it is an abuse of the API. A clear_cache() method would be less > ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry > for a long time? DirEntry is a convenient way to return a tuple without returning a tuple, that's all. If you want up to date info, call os.stat() and pass in the path. This should just be a better (and ideally transparent) substitute for os.listdir() in every single context. Personally I'd make it a string subclass and put one-shot properties on it (i.e. call/cache stat() on first access where we don't already know the answer), which I think is close enough to where it's landed that I'm happy. (As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) ) Cheers, Steve From benhoyt at gmail.com Mon Jul 21 18:48:50 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 21 Jul 2014 12:48:50 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: Thanks for an initial look into this, Victor. > IMO the current os.scandir() API does not fit importlib requirements. > importlib usually wants fresh data, whereas DirEntry cache cannot be > invalidated. It's probably possible to cache some os.stat() result in > importlib, but it looks like it requires a non trivial refactoring of > the code. I don't know importlib enough to suggest how to change it. Yes, with importlib already doing its own caching (somewhat complicated, as the open and closed issues show), I get the feeling it wouldn't be a good fit. Note that I'm not saying we wouldn't use it if we were implementing importlib from scratch. > By the way, DirEntry constructor is not documented in the PEP. Should > we document it? It might be a way to "invalidate the cache": I would prefer not to, just to keep things simple. Similar to creating os.stat_result() objects ... you can kind of do it (see scandir.py), but it's not recommended or even documented. The entire purpose of DirEntry objects is so scandir can produce them, not for general use. > entry = DirEntry(os.path.dirname(entry.path), entry.name) > > Maybe it is an abuse of the API. A clear_cache() method would be less > ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry > for a long time? > > Another question: should we expose DirEntry type directly in the os > namespace? (os.DirEntry) Again, I'd rather not expose this. It's quite system-specific (see the different system versions in scandir.py), and trying to combine this, make it consistent, and document it would be a bit of a pain, and also possibly prevent future modifications (because then the parts of the implementation would be set in stone). I'm not really opposed to a clear_cache() method -- basically it'd set _lstat and _stat and _d_type to None internally. However, I'd prefer to keep it as is, and as the PEP says: If developers want "refresh" behaviour (for example, for watching a file's size change), they can simply use pathlib.Path objects, or call the regular os.stat() or os.path.getsize() functions which get fresh data from the operating system every call. -Ben From matsjoyce at gmail.com Mon Jul 21 21:26:14 2014 From: matsjoyce at gmail.com (matsjoyce) Date: Mon, 21 Jul 2014 19:26:14 +0000 (UTC) Subject: [Python-Dev] Reviving restricted mode? References: <200902231657.52201.victor.stinner@haypocalc.com> Message-ID: Sorry about being a bit late on this front (just 5 years...), but I've extended tav's jail to module level, and added the niceties. It's goal is similar to that of rexec, stopping IO, but not crashes. It is currently at https://github.com/matsjoyce/sandypython, and it has instructions as to its use. I've bashed it with all the exploits I've found online, and its still holding, so I thought the public might like ago. From victor.stinner at gmail.com Mon Jul 21 21:36:09 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 21 Jul 2014 21:36:09 +0200 Subject: [Python-Dev] Reviving restricted mode? In-Reply-To: References: <200902231657.52201.victor.stinner@haypocalc.com> Message-ID: Hi, 2014-07-21 21:26 GMT+02:00 matsjoyce : > Sorry about being a bit late on this front (just 5 years...), but I've > extended tav's jail to module level, and added the niceties. It's goal is > similar to that of rexec, stopping IO, but not crashes. It is currently at > https://github.com/matsjoyce/sandypython, and it has instructions as to its > use. I've bashed it with all the exploits I've found online, and its still > holding, so I thought the public might like ago. I wrote this project, started from tav's jail: https://github.com/haypo/pysandbox/ I gave up because I know consider that pysandbox is broken by design. Please read the LWN article: https://lwn.net/Articles/574215/ Don't hesitate to ask more specific questions. Victor From ncoghlan at gmail.com Mon Jul 21 23:37:09 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jul 2014 07:37:09 +1000 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: <5a4f4fb5c98347258ad1ed1c754d922f@DM2PR0301MB0734.namprd03.prod.outlook.com> References: <5a4f4fb5c98347258ad1ed1c754d922f@DM2PR0301MB0734.namprd03.prod.outlook.com> Message-ID: On 22 Jul 2014 02:46, "Steve Dower" wrote: > > Personally I'd make it a string subclass and put one-shot properties on it (i.e. call/cache stat() on first access where we don't already know the answer), which I think is close enough to where it's landed that I'm happy. (As far as bikeshedding goes, I prefer "_DirEntry" and no docs :) ) +1 for "_DirEntry" as the name in the implementation, and documenting its behaviour under "scandir" rather than as a standalone object. Only -0 for full documentation as a standalone class, though. Cheers, Nick. > > Cheers, > Steve > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jul 22 00:26:02 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 22 Jul 2014 00:26:02 +0200 Subject: [Python-Dev] PEP 471 "scandir" accepted Message-ID: Hi, I asked privately Guido van Rossum if I can be the BDFL-delegate for the PEP 471 and he agreed. I accept the latest version of the PEP: http://legacy.python.org/dev/peps/pep-0471/ I consider that the PEP 471 "scandir" was discussed enough to collect all possible options (variations of the API) and that main flaws have been detected. Ben Hoyt modified his PEP to list all these options, and for each option gives advantages and drawbacks. Great job Ben :-) Thanks all developers who contributed to the threads on the python-dev mailing list! The new version of the PEP has an optional "follow_symlinks" parameter which is True by default. IMO this API fits better the common case, list the content of a single directory, and it's now simple to not follow symlinks to implement a recursive function like os.walk(). The PEP also explicitly mentions that os.walk() will be modified to benefit of the new os.scandir() function. I'm happy because the final API is very close to os.path functions and pathlib.Path methods. Python stays consistent, which is a great power of this language! The PEP is accepted. It's time to review the implementation ;-) The current code can be found at: https://github.com/benhoyt/scandir (I don't think that Ben already updated his implementation for the latest version of the PEP.) Victor From victor.stinner at gmail.com Tue Jul 22 00:39:26 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 22 Jul 2014 00:39:26 +0200 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: 2014-07-21 18:48 GMT+02:00 Ben Hoyt : >> By the way, DirEntry constructor is not documented in the PEP. Should >> we document it? It might be a way to "invalidate the cache": > > I would prefer not to, just to keep things simple. Similar to creating > os.stat_result() objects ... you can kind of do it (see scandir.py), > but it's not recommended or even documented. The entire purpose of > DirEntry objects is so scandir can produce them, not for general use. > >> entry = DirEntry(os.path.dirname(entry.path), entry.name) >> >> Maybe it is an abuse of the API. A clear_cache() method would be less >> ugly :-) But maybe Ben Hoyt does not want to promote keeping DirEntry >> for a long time? >> >> Another question: should we expose DirEntry type directly in the os >> namespace? (os.DirEntry) > > Again, I'd rather not expose this. It's quite system-specific (see the > different system versions in scandir.py), and trying to combine this, > make it consistent, and document it would be a bit of a pain, and also > possibly prevent future modifications (because then the parts of the > implementation would be set in stone). We should mimic os.stat() and os.stat_result: os.stat_result symbol exists in the os namespace, but the type constructor is not documented. No need for extra protection like not adding the type in the os module, or adding a "_" prefix to the name. By the way, it's possible to serialize a stat_result with pickle. See also my issue "Enhance doc of os.stat_result": http://bugs.python.org/issue21813 > I'm not really opposed to a clear_cache() method -- basically it'd set > _lstat and _stat and _d_type to None internally. However, I'd prefer > to keep it as is, and as the PEP says: (...) Ok, agreed. Victor From benhoyt at gmail.com Tue Jul 22 04:27:09 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 21 Jul 2014 22:27:09 -0400 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: References: Message-ID: > I asked privately Guido van Rossum if I can be the BDFL-delegate for > the PEP 471 and he agreed. I accept the latest version of the PEP: > > http://legacy.python.org/dev/peps/pep-0471/ Thank you! > The PEP also explicitly mentions that os.walk() will be modified to > benefit of the new os.scandir() function. Yes, this was a good suggestion to include that explicitly -- in actual fact, speeding up os.walk() was my main goal initially. > The PEP is accepted. Superb. Could you please update the PEP with the Resolution and BDFL-Delegate fields? > It's time to review the implementation ;-) The current code can be found at: > > https://github.com/benhoyt/scandir > > (I don't think that Ben already updated his implementation for the > latest version of the PEP.) I have actually updated my GitHub repo for the current PEP (did this last Saturday). However, there are still a few open issues, the main one is that my scandir.py module doesn't handle the bytes/str thing properly. I intend to work on the CPython implementation over the next few weeks. However, a couple of thoughts up-front: I think if I were doing this from scratch I'd reimplement listdir() in Python as "return [e.name for e in scandir(path)]". However, I'm not sure this is a good idea, as I don't really want listdir() to suddenly use more memory and perform slightly *worse* due to the extra DirEntry object allocations. So my basic plan is to have an internal helper function in posixmodule.c that either yields DirEntry objects or strings. And then listdir() would simply be defined something like "return list(_scandir(path, yield_strings=True))" in C or in Python. My reasoning is that then there'll be much less (if any) code duplication between scandir() and listdir(). Does this sound like a reasonable approach? -Ben From benhoyt at gmail.com Tue Jul 22 04:32:10 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Mon, 21 Jul 2014 22:32:10 -0400 Subject: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir() In-Reply-To: References: Message-ID: > We should mimic os.stat() and os.stat_result: os.stat_result symbol > exists in the os namespace, but the type constructor is not > documented. No need for extra protection like not adding the type in > the os module, or adding a "_" prefix to the name. Yeah, that works for me. > By the way, it's possible to serialize a stat_result with pickle. That makes sense, as stat_result is basically just a tuple and a bit extra. I wonder if it should be possible to pickle DirEntry objects? I'm thinking possibly not. If so, would it cache the stat or file type info? -Ben From victor.stinner at gmail.com Tue Jul 22 09:39:17 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 22 Jul 2014 09:39:17 +0200 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: References: Message-ID: Modify os.listdir() to use os.scandir() is not part of the PEP, you should not do that. If you worry about performances, try to implement my free list idea. You may modify the C code of listdir() to share as much code as possible. I mean you can implement your idea in C. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Tue Jul 22 09:33:41 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Tue, 22 Jul 2014 11:33:41 +0400 Subject: [Python-Dev] PEP 471 "scandir" accepted References: Message-ID: <87r41donje.fsf@gmail.com> Ben Hoyt writes: > I think if I were doing this from scratch I'd reimplement listdir() in > Python as "return [e.name for e in scandir(path)]". ... > So my basic plan is to have an internal helper function in > posixmodule.c that either yields DirEntry objects or strings. And then > listdir() would simply be defined something like "return > list(_scandir(path, yield_strings=True))" in C or in Python. > > My reasoning is that then there'll be much less (if any) code > duplication between scandir() and listdir(). > > Does this sound like a reasonable approach? Note: listdir() accepts an integer path (an open file descriptor that refers to a directory) that is passed to fdopendir() on POSIX [4] i.e., *you can't use scandir() to replace listdir() in this case* (as I've already mentioned in [1]). See the corresponding tests from [2]. [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html >From os.listdir() docs [3]: > This function can also support specifying a file descriptor; the file > descriptor must refer to a directory. [3] https://docs.python.org/3.4/library/os.html#os.listdir [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736 -- Akira From benhoyt at gmail.com Tue Jul 22 17:52:45 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 22 Jul 2014 11:52:45 -0400 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: <87r41donje.fsf@gmail.com> References: <87r41donje.fsf@gmail.com> Message-ID: > Note: listdir() accepts an integer path (an open file descriptor that > refers to a directory) that is passed to fdopendir() on POSIX [4] i.e., > *you can't use scandir() to replace listdir() in this case* (as I've > already mentioned in [1]). See the corresponding tests from [2]. > > [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html > [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html > > From os.listdir() docs [3]: > >> This function can also support specifying a file descriptor; the file >> descriptor must refer to a directory. > > [3] https://docs.python.org/3.4/library/os.html#os.listdir > [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736 Fair point. Yes, I hadn't realized listdir supported dir_fd (must have been looking at 2.x docs), though you've pointed it out at [1] above. and I guess I wasn't thinking about implementation at the time. It would be easy enough (I think) to have the helper function support both, but raise an error in the scandir() function if the type of path is an integer. However, given that we have to support this for listdir() anyway, I think it's worth reconsidering whether scandir()'s directory argument can be an integer FD. Given that listdir() already supports it, it will almost certainly be asked for later anyway for someone who's porting some listdir code that uses an FD. Thoughts, Victor? -Ben From victor.stinner at gmail.com Tue Jul 22 18:16:14 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 22 Jul 2014 18:16:14 +0200 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: References: <87r41donje.fsf@gmail.com> Message-ID: 2014-07-22 17:52 GMT+02:00 Ben Hoyt : > However, given that we have to support this for listdir() anyway, I > think it's worth reconsidering whether scandir()'s directory argument > can be an integer FD. Given that listdir() already supports it, it > will almost certainly be asked for later anyway for someone who's > porting some listdir code that uses an FD. Thoughts, Victor? Please focus on what was accepted in the PEP. We should first test os.scandir(). In a few months, with better feedbacks, we can consider extending os.scandir() to support a file descriptor. There are different issues which should be discussed and decided to implement it (ex: handle the lifetime of the directory file descriptor). Victor From ncoghlan at gmail.com Tue Jul 22 22:57:18 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Jul 2014 06:57:18 +1000 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: References: <87r41donje.fsf@gmail.com> Message-ID: On 23 Jul 2014 02:18, "Victor Stinner" wrote: > > 2014-07-22 17:52 GMT+02:00 Ben Hoyt : > > However, given that we have to support this for listdir() anyway, I > > think it's worth reconsidering whether scandir()'s directory argument > > can be an integer FD. Given that listdir() already supports it, it > > will almost certainly be asked for later anyway for someone who's > > porting some listdir code that uses an FD. Thoughts, Victor? > > Please focus on what was accepted in the PEP. We should first test > os.scandir(). In a few months, with better feedbacks, we can consider > extending os.scandir() to support a file descriptor. There are > different issues which should be discussed and decided to implement it > (ex: handle the lifetime of the directory file descriptor). As Victor suggests, getting the core version working and incorporated first is a good way to go. Future enhancements (like accepting a file descriptor) and refactorings (like eliminating the code duplication with listdir) don't need to (and hence shouldn't) go into the initial patch. Cheers, Nick. > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.gaynor at gmail.com Tue Jul 22 23:03:36 2014 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Tue, 22 Jul 2014 21:03:36 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPEP466=5D_SSLSockets=2C_and_sockets=2C_?= =?utf-8?q?=5Fsocketobjects_oh_my!?= Message-ID: Hi all, I've been happily working on the SSL module backports for Python2 (pursuant to PEP466), and I've hit something of a snag: In python3, the SSLSocket keeps a weak reference to the underlying socket, rather than a strong reference, as Python2 uses. Unfortunately, due to the way sockets work in Python2, this doesn't work: On Python2, _socketobject composes around _real_socket from the _socket module, whereas on Python3, it subclasses _socket.socket. Since you now have a Python- level class, you can weak reference it. The question is: a) Should we backport weak referencing _socket.sockets (changing the structure of the module seems overly invasive, albeit completely backwards compatible)? b) Does anyone know why weak references are used in the first place? The commit message just alludes to fixing a leak with no reference to an issue. Anyone who's interested in the state of the branch can see it at: github.com/alex/cpython on the backport-ssl branch. Note that many many tests are still failing, and you'll need to apply the patch from http://bugs.python.org/issue22023 to get it to work. Thanks, Alex PS: Any help in getting http://bugs.python.org/issue22023 landed which be very much appreciated. From benhoyt at gmail.com Tue Jul 22 23:07:37 2014 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 22 Jul 2014 17:07:37 -0400 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: References: <87r41donje.fsf@gmail.com> Message-ID: Makes sense, thanks. -Ben On Tue, Jul 22, 2014 at 4:57 PM, Nick Coghlan wrote: > > On 23 Jul 2014 02:18, "Victor Stinner" wrote: >> >> 2014-07-22 17:52 GMT+02:00 Ben Hoyt : >> > However, given that we have to support this for listdir() anyway, I >> > think it's worth reconsidering whether scandir()'s directory argument >> > can be an integer FD. Given that listdir() already supports it, it >> > will almost certainly be asked for later anyway for someone who's >> > porting some listdir code that uses an FD. Thoughts, Victor? >> >> Please focus on what was accepted in the PEP. We should first test >> os.scandir(). In a few months, with better feedbacks, we can consider >> extending os.scandir() to support a file descriptor. There are >> different issues which should be discussed and decided to implement it >> (ex: handle the lifetime of the directory file descriptor). > > As Victor suggests, getting the core version working and incorporated first > is a good way to go. Future enhancements (like accepting a file descriptor) > and refactorings (like eliminating the code duplication with listdir) don't > need to (and hence shouldn't) go into the initial patch. > > Cheers, > Nick. > >> >> Victor > > >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com From antoine at python.org Tue Jul 22 23:25:27 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 22 Jul 2014 17:25:27 -0400 Subject: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my! In-Reply-To: References: Message-ID: Le 22/07/2014 17:03, Alex Gaynor a ?crit : > > The question is: > > a) Should we backport weak referencing _socket.sockets (changing the structure > of the module seems overly invasive, albeit completely backwards > compatible)? > b) Does anyone know why weak references are used in the first place? The commit > message just alludes to fixing a leak with no reference to an issue. Because : - the SSLSocket has a strong reference to the ssl object (self._sslobj) - self._sslobj having a strong reference to the SSLSocket would mean both would only get destroyed on a GC collection I assume that's what "leak" means here :-) As for 2.x, I don't see why you couldn't just continue using a strong reference. Regards Antoine. From ncoghlan at gmail.com Tue Jul 22 23:44:54 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Jul 2014 07:44:54 +1000 Subject: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my! In-Reply-To: References: Message-ID: On 23 Jul 2014 07:28, "Antoine Pitrou" wrote: > > Le 22/07/2014 17:03, Alex Gaynor a ?crit : > >> >> The question is: >> >> a) Should we backport weak referencing _socket.sockets (changing the structure >> of the module seems overly invasive, albeit completely backwards >> compatible)? >> b) Does anyone know why weak references are used in the first place? The commit >> message just alludes to fixing a leak with no reference to an issue. > > > Because : > - the SSLSocket has a strong reference to the ssl object (self._sslobj) > - self._sslobj having a strong reference to the SSLSocket would mean both would only get destroyed on a GC collection > > I assume that's what "leak" means here :-) > > As for 2.x, I don't see why you couldn't just continue using a strong reference. As Antoine says, if the cycle already exists in Python 2 (and it sounds like it does), we can just skip backporting the weak reference change. I'll also give the Fedora Python list a heads up about your repo to see if anyone there can help you with the backport. Cheers, Nick. > > Regards > > Antoine. > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jul 22 23:57:53 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 22 Jul 2014 23:57:53 +0200 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: References: Message-ID: 2014-07-22 4:27 GMT+02:00 Ben Hoyt : >> The PEP is accepted. > > Superb. Could you please update the PEP with the Resolution and > BDFL-Delegate fields? Done. Victor From antoine at python.org Wed Jul 23 01:00:18 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 22 Jul 2014 19:00:18 -0400 Subject: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my! In-Reply-To: References: Message-ID: Le 22/07/2014 17:44, Nick Coghlan a ?crit : > > > > > As for 2.x, I don't see why you couldn't just continue using a strong > reference. > > As Antoine says, if the cycle already exists in Python 2 (and it sounds > like it does), we can just skip backporting the weak reference change. No, IIRC there shouldn't be a cycle. It's just complicated in a different way than 3.x :-) Regards Antoine. From 4kir4.1i at gmail.com Wed Jul 23 01:21:14 2014 From: 4kir4.1i at gmail.com (Akira Li) Date: Wed, 23 Jul 2014 03:21:14 +0400 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: (Ben Hoyt's message of "Tue, 22 Jul 2014 11:52:45 -0400") References: <87r41donje.fsf@gmail.com> Message-ID: <871ttdnfo5.fsf@gmail.com> Ben Hoyt writes: >> Note: listdir() accepts an integer path (an open file descriptor that >> refers to a directory) that is passed to fdopendir() on POSIX [4] i.e., >> *you can't use scandir() to replace listdir() in this case* (as I've >> already mentioned in [1]). See the corresponding tests from [2]. >> >> [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html >> [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html >> >> From os.listdir() docs [3]: >> >>> This function can also support specifying a file descriptor; the file >>> descriptor must refer to a directory. >> >> [3] https://docs.python.org/3.4/library/os.html#os.listdir >> [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736 > > Fair point. > > Yes, I hadn't realized listdir supported dir_fd (must have been > looking at 2.x docs), though you've pointed it out at [1] above. and I > guess I wasn't thinking about implementation at the time. FYI, dir_fd is related but *different*: compare "specifying a file descriptor" [1] vs. "paths relative to directory descriptors" [2]. "NOTE: os.supports_fd and os.supports_dir_fd are different sets." [3]: >>> import os >>> os.listdir in os.supports_fd True >>> os.listdir in os.supports_dir_fd False [1] https://docs.python.org/3/library/os.html#path-fd [2] https://docs.python.org/3/library/os.html#dir-fd [3] https://mail.python.org/pipermail/python-dev/2014-July/135296.html To be clear: *listdir() does not support dir_fd* though it can be emulated using os.open(dir_fd=..). You can safely ignore the rest of the e-mail until you want to implement path-fd [1] support for os.scandir() in several months. Here's code example that demonstrates both path-fd [1] and dir-fd [2]: import contextlib import os with contextlib.ExitStack() as stack: dir_fd = os.open('/etc', os.O_RDONLY) stack.callback(os.close, dir_fd) fd = os.open('init.d', os.O_RDONLY, dir_fd=dir_fd) # dir-fd [2] stack.callback(os.close, fd) print("\n".join(os.listdir(fd))) # path-fd [1] It is the same as os.listdir('/etc/init.d') unless '/etc' is symlinked to refer to another directory after the first os.open('/etc',..) call. See also, os.fwalk(dir_fd=..) [4] [4] https://docs.python.org/3/library/os.html#os.fwalk > However, given that we have to support this for listdir() anyway, I > think it's worth reconsidering whether scandir()'s directory argument > can be an integer FD. What is entry.path in this case? If input directory is a file descriptor (an integer) then os.path.join(directory, entry.name) won't work. "PEP 471 should explicitly reject the support for specifying a file descriptor so that a code that uses os.scandir may assume that entry.path attribute is always present (no exceptions due to a failure to read /proc/self/fd/NNN or an error while calling fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see http://stackoverflow.com/q/1188757 )." [5] [5] https://mail.python.org/pipermail/python-dev/2014-July/135441.html On the other hand os.fwalk() [4] that supports both path-fd [1] and dir-fd [2] could be implemented without entry.path property if os.scandir() supports just path-fd [1]. os.fwalk() provides a safe way to traverse a directory tree without symlink races e.g., [6]: def get_tree_size(directory): """Return total size of files in directory and subdirs.""" return sum(entry.lstat().st_size for root, dirs, files, rootfd in fwalk(directory) for entry in files) [6] http://legacy.python.org/dev/peps/pep-0471/#examples where fwalk() is the exact copy of os.fwalk() except that it uses _fwalk() which is defined in terms of scandir(): import os # adapt os._fwalk() to use scandir() instead of os.listdir() def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks): # Note: This uses O(depth of the directory tree) file descriptors: # if necessary, it can be adapted to only require O(1) FDs, see # http://bugs.python.org/issue13734 entries = scandir(topfd) dirs, nondirs = [], [] for entry in entries: #XXX call onerror on OSError on next() and return? # report symlinks to directories as directories (like os.walk) # but no recursion into symlinked subdirectories unless # follow_symlinks is true # add dangling symlinks as nondirs (DirEntry.is_dir() doesn't # raise on broken links) try: (dirs if entry.is_dir() else nondirs).append(entry) except FileNotFoundError: continue # ignore disappeared files if topdown: yield toppath, dirs, nondirs, topfd for entry in dirs: try: orig_st = entry.stat(follow_symlinks=follow_symlinks) #XXX O_DIRECTORY, O_CLOEXEC, [? O_NOCTTY, O_SEARCH ?] dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd) except OSError as err: if onerror is not None: onerror(err) return try: if follow_symlinks or os.path.samestat(orig_st, os.stat(dirfd)): dirpath = os.path.join(toppath, entry.name) # entry.path yield from _fwalk(dirfd, dirpath, topdown, onerror, follow_symlinks) finally: close(dirfd) # or use with entry.opendir() as dirfd: ... if not topdown: yield toppath, dirs, nondirs, topfd i.e., if os.scandir() supports specifying file descriptors [1] then it is relatively straightforward to define os.fwalk() in terms of it. Would scandir() provide the same performance benefits as for os.walk()? entry.stat() can be implemented without entry.path when entry._directory (or whatever other DirEntry's attribute that stores the first parameter to os.scandir(fd)) is an open file descriptor that refers to a directory: def stat(self, *, follow_symlinks=True): return os.stat(self.name, #NOTE: ignore caching follow_symlinks=follow_symlinks, dir_fd=self._directory) lstat = lambda self: self.stat(follow_symlinks=False) -- Akira From antoine at python.org Wed Jul 23 03:23:16 2014 From: antoine at python.org (Antoine Pitrou) Date: Tue, 22 Jul 2014 21:23:16 -0400 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: References: Message-ID: Le 21/07/2014 18:26, Victor Stinner a ?crit : > > I'm happy because the final API is very close to os.path functions and > pathlib.Path methods. Python stays consistent, which is a great power > of this language! By the way, http://bugs.python.org/issue19767 could benefit too. Regards Antoine. From alex.gaynor at gmail.com Wed Jul 23 21:36:07 2014 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Wed, 23 Jul 2014 19:36:07 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPEP466=5D_SSLSockets=2C_and_sockets=2C?= =?utf-8?q?=09=5Fsocketobjects_oh_my!?= References: Message-ID: Antoine Pitrou python.org> writes: > No, IIRC there shouldn't be a cycle. It's just complicated in a > different way than 3.x > > Regards > > Antoine. > Indeed, you're right, this is just differently convoluted so no leak (not that I would call "collected by a normal GC" a leak :-)). That said, I've hit another issue, with SNI callbacks. The first argument to an SNI callback is the socket. The callback is set up by some C code, which right now has access to only the _socket.socket object, not the ssl.SSLSocket object, which is what the public API needs there. Possible solutions are: * Pass the SSLObject *in addition* to the _socket.socket object to the C code. This generates some additional divergence from the Python3 code, but is probably basically straightforward. * Try to refactor the socket code in the same way as Python3 did, so we can pass *only* the SSLObject here. This is some nasty scope creep for PEP466, but would make the overall _ssl.c diff smaller. * Some super sweet and simple thing I haven't thought of yet. Thoughts? By way of a general status update, the only failing tests left are this, and a few things about SSLError's str(), so this will hopefully be ready to upload any day now for review. Cheers, Alex PS: Please review and merge http://bugs.python.org/issue22023 :-) From antoine at python.org Wed Jul 23 23:02:26 2014 From: antoine at python.org (Antoine Pitrou) Date: Wed, 23 Jul 2014 17:02:26 -0400 Subject: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my! In-Reply-To: References: Message-ID: Le 23/07/2014 15:36, Alex Gaynor a ?crit : > > That said, I've hit another issue, with SNI callbacks. The first argument to an > SNI callback is the socket. The callback is set up by some C code, which right > now has access to only the _socket.socket object, not the ssl.SSLSocket object, > which is what the public API needs there. > > Possible solutions are: > > * Pass the SSLObject *in addition* to the _socket.socket object to the C code. > This generates some additional divergence from the Python3 code, but is > probably basically straightforward. You mean for use with SSL_set_app_data? From alex.gaynor at gmail.com Wed Jul 23 23:10:39 2014 From: alex.gaynor at gmail.com (Alex Gaynor) Date: Wed, 23 Jul 2014 21:10:39 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?=5BPEP466=5D_SSLSockets=2C_and_sockets=2C?= =?utf-8?q?=09=5Fsocketobjects_oh_my!?= References: Message-ID: Antoine Pitrou python.org> writes: > > You mean for use with SSL_set_app_data? Yes, if you look in ``_servername_callback``, you can see where it uses ``SSL_get_app_data`` and then reads ``ssl->Socket``, which is supposed to be the same object that's returned by ``context.wrap_socket()``. Alex From ncoghlan at gmail.com Thu Jul 24 00:06:26 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Jul 2014 08:06:26 +1000 Subject: [Python-Dev] [PEP466] SSLSockets, and sockets, _socketobjects oh my! In-Reply-To: References: Message-ID: On 24 Jul 2014 05:37, "Alex Gaynor" wrote: > > Possible solutions are: > > * Pass the SSLObject *in addition* to the _socket.socket object to the C code. > This generates some additional divergence from the Python3 code, but is > probably basically straightforward. > * Try to refactor the socket code in the same way as Python3 did, so we can > pass *only* the SSLObject here. This is some nasty scope creep for PEP466, > but would make the overall _ssl.c diff smaller. > * Some super sweet and simple thing I haven't thought of yet. > > Thoughts? Wearing my "risk management" hat, option 1 sounds significantly more appealing than option 2 :) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jul 24 02:34:13 2014 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 23 Jul 2014 17:34:13 -0700 Subject: [Python-Dev] PEP 471 "scandir" accepted In-Reply-To: References: Message-ID: <53D05485.8050406@stoneleaf.us> On 07/21/2014 03:26 PM, Victor Stinner wrote: > > The PEP is accepted. Thanks, Victor! Congratulations, Ben! -- ~Ethan~ From phil at riverbankcomputing.com Thu Jul 24 18:55:15 2014 From: phil at riverbankcomputing.com (Phil Thompson) Date: Thu, 24 Jul 2014 17:55:15 +0100 Subject: [Python-Dev] =?utf-8?q?Does_Zip_Importer_have_to_be_Special=3F?= Message-ID: I have an importer for use in applications that embed an interpreter that does a similar job to the Zip importer (except that the storage is a C data structure rather than a .zip file). Just like the Zip importer I need to import my importer and add it to sys.path_hooks. However the earliest opportunity I have to do this is after the Py_Initialize() call returns - but this is too late because some parts of the standard library have already needed to be imported. My current workaround is to include a modified version of _bootstrap.py as a frozen module that has the necessary steps added to the end of its _install() function. The Zip importer doesn't have this problem because it gets special treatment - the call to its equivalent code is hard-coded and happens exactly when needed. What would help is a table of functions that were called where _PyImportZip_Init() is currently called. By default the only entry in the table would be _PyImportZip_Init. There would be a way of modifying the table, either like how PyImport_FrozenModules is handled or how Inittab is handled. ...or if there is a better solution that I have missed that doesn't require a modified _bootstrap.py. Thanks, Phil From brett at python.org Thu Jul 24 19:48:59 2014 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jul 2014 17:48:59 +0000 Subject: [Python-Dev] Does Zip Importer have to be Special? References: Message-ID: On Thu Jul 24 2014 at 1:07:12 PM, Phil Thompson wrote: > I have an importer for use in applications that embed an interpreter > that does a similar job to the Zip importer (except that the storage is > a C data structure rather than a .zip file). Just like the Zip importer > I need to import my importer and add it to sys.path_hooks. However the > earliest opportunity I have to do this is after the Py_Initialize() call > returns - but this is too late because some parts of the standard > library have already needed to be imported. > > My current workaround is to include a modified version of _bootstrap.py > as a frozen module that has the necessary steps added to the end of its > _install() function. > > The Zip importer doesn't have this problem because it gets special > treatment - the call to its equivalent code is hard-coded and happens > exactly when needed. > > What would help is a table of functions that were called where > _PyImportZip_Init() is currently called. By default the only entry in > the table would be _PyImportZip_Init. There would be a way of modifying > the table, either like how PyImport_FrozenModules is handled or how > Inittab is handled. > > ...or if there is a better solution that I have missed that doesn't > require a modified _bootstrap.py. > Basically you want a way to specify arguments into importlib._bootstrap._install() so that sys.path_hooks and sys.meta_path were configurable instead of hard-coded (it could also be done just past importlib being installed, but that's a minor detail). Either way there is technically no reason not to allow for it, just lack of motivation since this would only come up for people who embed the interpreter AND have a custom importer which affects loading the stdlib as well (any reason you can't freeze the stdblib as a solution?). We could go the route of some static array that people could modify. Another option would be to allow for the specification of a single function which is called just prior to importing the rest of the stdlib, The problem with all of this is you are essentially asking for a hook to let you have code have access to the interpreter state before it is fully initialized. Zipimport and the various bits of code that get loaded during startup are special since they are coded to avoid touching anything that isn't ready to be used. So if we expose something that allows access prior to full initialization it would have to be documented as having no guarantees of interpreter state, etc. so we are not held to some API that makes future improvements difficult. IOW allowing for easy patching of Python is probably the best option I can think of. Would tweaking importlib._bootstrap._install() to accept specified values for sys.meta_path and sys.path_hooks be enough so that you can change the call site for those functions? -------------- next part -------------- An HTML attachment was scrubbed... URL: From phil at riverbankcomputing.com Thu Jul 24 20:12:13 2014 From: phil at riverbankcomputing.com (Phil Thompson) Date: Thu, 24 Jul 2014 19:12:13 +0100 Subject: [Python-Dev] =?utf-8?q?Does_Zip_Importer_have_to_be_Special=3F?= In-Reply-To: References: Message-ID: On 24/07/2014 6:48 pm, Brett Cannon wrote: > On Thu Jul 24 2014 at 1:07:12 PM, Phil Thompson > > wrote: > >> I have an importer for use in applications that embed an interpreter >> that does a similar job to the Zip importer (except that the storage >> is >> a C data structure rather than a .zip file). Just like the Zip >> importer >> I need to import my importer and add it to sys.path_hooks. However the >> earliest opportunity I have to do this is after the Py_Initialize() >> call >> returns - but this is too late because some parts of the standard >> library have already needed to be imported. >> >> My current workaround is to include a modified version of >> _bootstrap.py >> as a frozen module that has the necessary steps added to the end of >> its >> _install() function. >> >> The Zip importer doesn't have this problem because it gets special >> treatment - the call to its equivalent code is hard-coded and happens >> exactly when needed. >> >> What would help is a table of functions that were called where >> _PyImportZip_Init() is currently called. By default the only entry in >> the table would be _PyImportZip_Init. There would be a way of >> modifying >> the table, either like how PyImport_FrozenModules is handled or how >> Inittab is handled. >> >> ...or if there is a better solution that I have missed that doesn't >> require a modified _bootstrap.py. >> > > Basically you want a way to specify arguments into > importlib._bootstrap._install() so that sys.path_hooks and > sys.meta_path > were configurable instead of hard-coded (it could also be done just > past > importlib being installed, but that's a minor detail). Either way there > is > technically no reason not to allow for it, just lack of motivation > since > this would only come up for people who embed the interpreter AND have a > custom importer which affects loading the stdlib as well (any reason > you > can't freeze the stdblib as a solution?). Not really. I'd lose the compression my importer implements. (Are there any problems with freezing packages rather than simple modules?) > We could go the route of some static array that people could modify. > Another option would be to allow for the specification of a single > function > which is called just prior to importing the rest of the stdlib, > > The problem with all of this is you are essentially asking for a hook > to > let you have code have access to the interpreter state before it is > fully > initialized. Zipimport and the various bits of code that get loaded > during > startup are special since they are coded to avoid touching anything > that > isn't ready to be used. So if we expose something that allows access > prior > to full initialization it would have to be documented as having no > guarantees of interpreter state, etc. so we are not held to some API > that > makes future improvements difficult. > > IOW allowing for easy patching of Python is probably the best option I > can > think of. Would tweaking importlib._bootstrap._install() to accept > specified values for sys.meta_path and sys.path_hooks be enough so that > you > can change the call site for those functions? My importer runs under PathFinder so it needs sys.path as well (and doesn't need sys.meta_path). Phil From brett at python.org Thu Jul 24 20:26:21 2014 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jul 2014 18:26:21 +0000 Subject: [Python-Dev] Does Zip Importer have to be Special? References: Message-ID: On Thu Jul 24 2014 at 2:12:20 PM, Phil Thompson wrote: > On 24/07/2014 6:48 pm, Brett Cannon wrote: > > On Thu Jul 24 2014 at 1:07:12 PM, Phil Thompson > > > > wrote: > > > >> I have an importer for use in applications that embed an interpreter > >> that does a similar job to the Zip importer (except that the storage > >> is > >> a C data structure rather than a .zip file). Just like the Zip > >> importer > >> I need to import my importer and add it to sys.path_hooks. However the > >> earliest opportunity I have to do this is after the Py_Initialize() > >> call > >> returns - but this is too late because some parts of the standard > >> library have already needed to be imported. > >> > >> My current workaround is to include a modified version of > >> _bootstrap.py > >> as a frozen module that has the necessary steps added to the end of > >> its > >> _install() function. > >> > >> The Zip importer doesn't have this problem because it gets special > >> treatment - the call to its equivalent code is hard-coded and happens > >> exactly when needed. > >> > >> What would help is a table of functions that were called where > >> _PyImportZip_Init() is currently called. By default the only entry in > >> the table would be _PyImportZip_Init. There would be a way of > >> modifying > >> the table, either like how PyImport_FrozenModules is handled or how > >> Inittab is handled. > >> > >> ...or if there is a better solution that I have missed that doesn't > >> require a modified _bootstrap.py. > >> > > > > Basically you want a way to specify arguments into > > importlib._bootstrap._install() so that sys.path_hooks and > > sys.meta_path > > were configurable instead of hard-coded (it could also be done just > > past > > importlib being installed, but that's a minor detail). Either way there > > is > > technically no reason not to allow for it, just lack of motivation > > since > > this would only come up for people who embed the interpreter AND have a > > custom importer which affects loading the stdlib as well (any reason > > you > > can't freeze the stdblib as a solution?). > > Not really. I'd lose the compression my importer implements. > > (Are there any problems with freezing packages rather than simple > modules?) > Nope, modules and packages are both supported. > > > We could go the route of some static array that people could modify. > > Another option would be to allow for the specification of a single > > function > > which is called just prior to importing the rest of the stdlib, > > > > The problem with all of this is you are essentially asking for a hook > > to > > let you have code have access to the interpreter state before it is > > fully > > initialized. Zipimport and the various bits of code that get loaded > > during > > startup are special since they are coded to avoid touching anything > > that > > isn't ready to be used. So if we expose something that allows access > > prior > > to full initialization it would have to be documented as having no > > guarantees of interpreter state, etc. so we are not held to some API > > that > > makes future improvements difficult. > > > > IOW allowing for easy patching of Python is probably the best option I > > can > > think of. Would tweaking importlib._bootstrap._install() to accept > > specified values for sys.meta_path and sys.path_hooks be enough so that > > you > > can change the call site for those functions? > > My importer runs under PathFinder so it needs sys.path as well (and > doesn't need sys.meta_path). > sys.path can be set via PYTHONPATH, etc. so that shouldn't be as much of an issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jul 24 22:42:39 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jul 2014 06:42:39 +1000 Subject: [Python-Dev] Does Zip Importer have to be Special? In-Reply-To: References: Message-ID: On 25 Jul 2014 03:51, "Brett Cannon" wrote: > The problem with all of this is you are essentially asking for a hook to let you have code have access to the interpreter state before it is fully initialized. Zipimport and the various bits of code that get loaded during startup are special since they are coded to avoid touching anything that isn't ready to be used. So if we expose something that allows access prior to full initialization it would have to be documented as having no guarantees of interpreter state, etc. so we are not held to some API that makes future improvements difficult. Note that this is *exactly* the problem PEP 432 is designed to handle: separating the configuration of the core interpreter from the configuration of the operating system interfaces, so the latter can run relatively normally (at least compared to today). As you say, though it's a niche problem compared to something like packaging, which is why it got bumped down my personal priority list. I haven't even got back to the first preparatory step I identified which is to separate out our main functions to a separate "Programs" directory so it's easier to distinguish "embeds Python" sections of the code from the more typical "is part of Python" and "extends Python" code. > IOW allowing for easy patching of Python is probably the best option I can think of. Yeah, that sounds reasonable - IIRC, Christian ended up going with a similar "make it patch friendly" approach for the hashing changes, rather than going overboard with configuration options. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From phil at riverbankcomputing.com Fri Jul 25 11:33:41 2014 From: phil at riverbankcomputing.com (Phil Thompson) Date: Fri, 25 Jul 2014 10:33:41 +0100 Subject: [Python-Dev] =?utf-8?q?Does_Zip_Importer_have_to_be_Special=3F?= In-Reply-To: References: Message-ID: On 24/07/2014 9:42 pm, Nick Coghlan wrote: > On 25 Jul 2014 03:51, "Brett Cannon" wrote: > >> The problem with all of this is you are essentially asking for a hook >> to > let you have code have access to the interpreter state before it is > fully > initialized. Zipimport and the various bits of code that get loaded > during > startup are special since they are coded to avoid touching anything > that > isn't ready to be used. So if we expose something that allows access > prior > to full initialization it would have to be documented as having no > guarantees of interpreter state, etc. so we are not held to some API > that > makes future improvements difficult. > > Note that this is *exactly* the problem PEP 432 is designed to handle: > separating the configuration of the core interpreter from the > configuration > of the operating system interfaces, so the latter can run relatively > normally (at least compared to today). The implementation of PEP 432 would be great. > As you say, though it's a niche problem compared to something like > packaging, which is why it got bumped down my personal priority list. I > haven't even got back to the first preparatory step I identified which > is > to separate out our main functions to a separate "Programs" directory > so > it's easier to distinguish "embeds Python" sections of the code from > the > more typical "is part of Python" and "extends Python" code. Is there any way for somebody you don't trust :) to be able to help move it forward? Phil From phil at riverbankcomputing.com Fri Jul 25 11:36:18 2014 From: phil at riverbankcomputing.com (Phil Thompson) Date: Fri, 25 Jul 2014 10:36:18 +0100 Subject: [Python-Dev] =?utf-8?q?Does_Zip_Importer_have_to_be_Special=3F?= In-Reply-To: References: Message-ID: <43d9658bcad1ed2e82f89314bfdd9fcd@www.riverbankcomputing.com> On 24/07/2014 7:26 pm, Brett Cannon wrote: > On Thu Jul 24 2014 at 2:12:20 PM, Phil Thompson > > wrote: > >> On 24/07/2014 6:48 pm, Brett Cannon wrote: >> > IOW allowing for easy patching of Python is probably the best option I >> > can >> > think of. Would tweaking importlib._bootstrap._install() to accept >> > specified values for sys.meta_path and sys.path_hooks be enough so that >> > you >> > can change the call site for those functions? >> >> My importer runs under PathFinder so it needs sys.path as well (and >> doesn't need sys.meta_path). > > sys.path can be set via PYTHONPATH, etc. so that shouldn't be as much > of an > issue. I prefer to have Py_IgnoreEnvironmentFlag set. Also I'm not clear at what point I would import my custom importer? Phil From ncoghlan at gmail.com Fri Jul 25 14:30:54 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jul 2014 22:30:54 +1000 Subject: [Python-Dev] Does Zip Importer have to be Special? In-Reply-To: References: Message-ID: On 25 July 2014 19:33, Phil Thompson wrote: > On 24/07/2014 9:42 pm, Nick Coghlan wrote: >> As you say, though it's a niche problem compared to something like >> packaging, which is why it got bumped down my personal priority list. I >> haven't even got back to the first preparatory step I identified which is >> to separate out our main functions to a separate "Programs" directory so >> it's easier to distinguish "embeds Python" sections of the code from the >> more typical "is part of Python" and "extends Python" code. > > > Is there any way for somebody you don't trust :) to be able to help move it > forward? This thread prompted me to finally commit one of the smaller pieces of preparatory refactoring, moving the 3 applications we have that embed the CPython runtime out to a separate directory: http://bugs.python.org/issue18093 (that seems like a trivial change, but I found it made a surprisingly big difference when trying to keep the various moving parts of the initialisation sequence straight in my head) The other preparatory refactoring would be to split the monster pythonrun.c file in 2, by creating a separate "lifecycle.c" file. In my original PEP 432 branch I split it into 3 (pythonrun.c, bootstrap.c, shutdown.c) but that's actually quite an intrusive change - you end up have to expose a lot of otherwise static variables to the linker so the startup and shutdown code can both see them. Splitting in two should achieve most of the same benefits (i.e. separating the lifecycle management of the interpreter itself from the normal runtime operation code) without having to expose so much additional information to the linker (and hence change the names to include the _Py prefix). The origin of those refactorings is the fact that attempting to merge the default branch into my PEP 432 development branch (https://bitbucket.org/ncoghlan/cpython_sandbox/branch/pep432_modular_bootstrap) was generally a pain due to the merge conflicts around the structural changes. Doing the structural refactorings *first* makes it more feasible to work on the patch and do regular merges in from default. Since these are areas that aren't likely to change in a maintenance release, the risk of merge conflicts when merging forward from 3.4 to default is low even with code moved around on default. By contrast, I regularly hit significant problems when trying to merge from default to the feature branch. The existing feature branch is dated enough now (more than 18 months since the last commit!) that I wouldn't try to use it directly. Instead, I'd recommend starting a new clone based on the GitHub or BitBucket mirror (according to version control system and hosting service preference), and then use the current PEP draft and my old feature branch as a point of reference for starting another implementation attempt. (You may also be able to find some interested collaborators on http://bugs.python.org/issue13533, as I suspect PEP 432 is a prerequisite to resolving their issues as well) Cheers, Nick. P.S. I'm also starting to think that PEP 432 may pave the way for a locale independent startup sequence, which would let us offer a "-X utf8" option to tell the interpreter to ignore the OS locale settings entirely when deciding which encodings to use for various things. That would be a possible future enhancement rather than something to pursue in the initial implementation, though. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From status at bugs.python.org Fri Jul 25 18:07:56 2014 From: status at bugs.python.org (Python tracker) Date: Fri, 25 Jul 2014 18:07:56 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20140725160756.508F2568DE@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2014-07-18 - 2014-07-25) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4591 ( +2) closed 29248 (+60) total 33839 (+62) Open issues with patches: 2160 Issues opened (42) ================== #19884: Importing readline produces erroneous output http://bugs.python.org/issue19884 reopened by haypo #22010: Idle: better management of Shell window output http://bugs.python.org/issue22010 opened by terry.reedy #22011: test_os extended attribute setxattr tests can fail with ENOSPC http://bugs.python.org/issue22011 opened by Hibou57 #22012: struct.unpack('?', '\x02') returns (False,) on Mac OSX http://bugs.python.org/issue22012 opened by wayedt #22013: Add at least minimal support for thread groups http://bugs.python.org/issue22013 opened by rhettinger #22014: Add summary table for OS exception <-> errno mapping http://bugs.python.org/issue22014 opened by ncoghlan #22016: Add a new 'surrogatereplace' output only error handler http://bugs.python.org/issue22016 opened by ncoghlan #22018: Add a new signal.set_wakeup_socket() function http://bugs.python.org/issue22018 opened by haypo #22021: shutil.make_archive() root_dir do not work http://bugs.python.org/issue22021 opened by DemoHT #22023: PyUnicode_FromFormat is broken on python 2 http://bugs.python.org/issue22023 opened by alex #22024: Add to shutil the ability to wait until files are definitely d http://bugs.python.org/issue22024 opened by zach.ware #22025: webbrowser.get(command_line) does not support Windows-style pa http://bugs.python.org/issue22025 opened by dan.oreilly #22027: RFC 6531 (SMTPUTF8) support in smtplib http://bugs.python.org/issue22027 opened by zvyn #22028: Python 3.4.1 Installer ended prematurely (Windows msi) http://bugs.python.org/issue22028 opened by DieInSente #22029: argparse - CSS white-space: like control for individual text b http://bugs.python.org/issue22029 opened by paul.j3 #22033: Subclass friendly reprs http://bugs.python.org/issue22033 opened by serhiy.storchaka #22034: posixpath.join() and bytearray http://bugs.python.org/issue22034 opened by serhiy.storchaka #22035: Fatal error in dbm.gdbm http://bugs.python.org/issue22035 opened by serhiy.storchaka #22038: Implement atomic operations on non-x86 platforms http://bugs.python.org/issue22038 opened by Vitor.de.Lima #22039: PyObject_SetAttr doesn't mention value = NULL http://bugs.python.org/issue22039 opened by pitrou #22041: http POST request with python 3.3 through web proxy http://bugs.python.org/issue22041 opened by AlexMJ #22042: signal.set_wakeup_fd(fd): set the fd to non-blocking mode http://bugs.python.org/issue22042 opened by haypo #22043: Use a monotonic clock to compute timeouts http://bugs.python.org/issue22043 opened by haypo #22044: Premature Py_DECREF while generating a TypeError in call_tzinf http://bugs.python.org/issue22044 opened by Knio #22045: Python make issue http://bugs.python.org/issue22045 opened by skerr #22046: ZipFile.read() should mention that it might throw NotImplement http://bugs.python.org/issue22046 opened by detly #22047: argparse improperly prints mutually exclusive options when the http://bugs.python.org/issue22047 opened by Sam.Kerr #22049: argparse: type= doesn't honor nargs > 1 http://bugs.python.org/issue22049 opened by Chris.Bruner #22051: Turtledemo: stop reloading demos http://bugs.python.org/issue22051 opened by terry.reedy #22052: Comparison operators called in reverse order for subclasses wi http://bugs.python.org/issue22052 opened by mark.dickinson #22054: Add os.get_blocking() and os.set_blocking() functions http://bugs.python.org/issue22054 opened by haypo #22057: The doc say all globals are copied on eval(), but only __built http://bugs.python.org/issue22057 opened by amishne #22058: datetime.datetime() should accept a datetime.date as init para http://bugs.python.org/issue22058 opened by facundobatista #22059: incorrect type conversion from str to bytes in asynchat module http://bugs.python.org/issue22059 opened by hoxily #22060: Clean up ctypes.test, use unittest test discovery http://bugs.python.org/issue22060 opened by zach.ware #22062: Fix pathlib.Path.(r)glob doc glitches. http://bugs.python.org/issue22062 opened by terry.reedy #22063: asyncio: sock_xxx() methods of event loops should make the soc http://bugs.python.org/issue22063 opened by haypo #22064: Misleading message from 2to3 when skipping optional fixers http://bugs.python.org/issue22064 opened by ncoghlan #22065: Update turtledemo menu creation http://bugs.python.org/issue22065 opened by terry.reedy #22066: subprocess.communicate() does not receive full output from the http://bugs.python.org/issue22066 opened by juj #22067: time_test fails after strptime() http://bugs.python.org/issue22067 opened by serhiy.storchaka #22068: test_gc fails after test_idle http://bugs.python.org/issue22068 opened by serhiy.storchaka Most recent 15 issues with no replies (15) ========================================== #22067: time_test fails after strptime() http://bugs.python.org/issue22067 #22066: subprocess.communicate() does not receive full output from the http://bugs.python.org/issue22066 #22064: Misleading message from 2to3 when skipping optional fixers http://bugs.python.org/issue22064 #22060: Clean up ctypes.test, use unittest test discovery http://bugs.python.org/issue22060 #22057: The doc say all globals are copied on eval(), but only __built http://bugs.python.org/issue22057 #22051: Turtledemo: stop reloading demos http://bugs.python.org/issue22051 #22046: ZipFile.read() should mention that it might throw NotImplement http://bugs.python.org/issue22046 #22045: Python make issue http://bugs.python.org/issue22045 #22039: PyObject_SetAttr doesn't mention value = NULL http://bugs.python.org/issue22039 #22035: Fatal error in dbm.gdbm http://bugs.python.org/issue22035 #22034: posixpath.join() and bytearray http://bugs.python.org/issue22034 #22033: Subclass friendly reprs http://bugs.python.org/issue22033 #22027: RFC 6531 (SMTPUTF8) support in smtplib http://bugs.python.org/issue22027 #22024: Add to shutil the ability to wait until files are definitely d http://bugs.python.org/issue22024 #22016: Add a new 'surrogatereplace' output only error handler http://bugs.python.org/issue22016 Most recent 15 issues waiting for review (15) ============================================= #22068: test_gc fails after test_idle http://bugs.python.org/issue22068 #22065: Update turtledemo menu creation http://bugs.python.org/issue22065 #22060: Clean up ctypes.test, use unittest test discovery http://bugs.python.org/issue22060 #22054: Add os.get_blocking() and os.set_blocking() functions http://bugs.python.org/issue22054 #22051: Turtledemo: stop reloading demos http://bugs.python.org/issue22051 #22044: Premature Py_DECREF while generating a TypeError in call_tzinf http://bugs.python.org/issue22044 #22043: Use a monotonic clock to compute timeouts http://bugs.python.org/issue22043 #22042: signal.set_wakeup_fd(fd): set the fd to non-blocking mode http://bugs.python.org/issue22042 #22041: http POST request with python 3.3 through web proxy http://bugs.python.org/issue22041 #22038: Implement atomic operations on non-x86 platforms http://bugs.python.org/issue22038 #22035: Fatal error in dbm.gdbm http://bugs.python.org/issue22035 #22034: posixpath.join() and bytearray http://bugs.python.org/issue22034 #22033: Subclass friendly reprs http://bugs.python.org/issue22033 #22029: argparse - CSS white-space: like control for individual text b http://bugs.python.org/issue22029 #22027: RFC 6531 (SMTPUTF8) support in smtplib http://bugs.python.org/issue22027 Top 10 most discussed issues (10) ================================= #22018: Add a new signal.set_wakeup_socket() function http://bugs.python.org/issue22018 35 msgs #22003: BytesIO copy-on-write http://bugs.python.org/issue22003 18 msgs #21933: Allow the user to change font sizes with the text pane of turt http://bugs.python.org/issue21933 16 msgs #22012: struct.unpack('?', '\x02') returns (False,) on Mac OSX http://bugs.python.org/issue22012 10 msgs #1602: windows console doesn't print or input Unicode http://bugs.python.org/issue1602 9 msgs #22041: http POST request with python 3.3 through web proxy http://bugs.python.org/issue22041 8 msgs #22058: datetime.datetime() should accept a datetime.date as init para http://bugs.python.org/issue22058 8 msgs #18643: add a fallback socketpair() implementation in test.support http://bugs.python.org/issue18643 7 msgs #19884: Importing readline produces erroneous output http://bugs.python.org/issue19884 7 msgs #22013: Add at least minimal support for thread groups http://bugs.python.org/issue22013 7 msgs Issues closed (60) ================== #1049450: Solaris: EINTR exception in select/socket calls in telnetlib http://bugs.python.org/issue1049450 closed by haypo #4350: Remove dead code from Tkinter.py http://bugs.python.org/issue4350 closed by serhiy.storchaka #5718: Problem compiling ffi part of build on AIX 5.3. http://bugs.python.org/issue5718 closed by skrah #6167: Tkinter.Scrollbar: the activate method needs to return a value http://bugs.python.org/issue6167 closed by serhiy.storchaka #11266: asyncore does not handle EINTR in recv, send, connect, accept, http://bugs.python.org/issue11266 closed by haypo #11945: Adopt and document consistent semantics for handling NaN value http://bugs.python.org/issue11945 closed by rhettinger #12184: socketserver.ForkingMixin collect_children routine needs to co http://bugs.python.org/issue12184 closed by neologix #12801: C realpath not used by os.path.realpath http://bugs.python.org/issue12801 closed by haypo #15275: isinstance is called a more times that needed in ntpath http://bugs.python.org/issue15275 closed by serhiy.storchaka #15759: "make suspicious" doesn't display instructions in case of fail http://bugs.python.org/issue15759 closed by serhiy.storchaka #15982: asyncore.dispatcher does not handle windows socket error code http://bugs.python.org/issue15982 closed by haypo #16133: asyncore.dispatcher.recv doesn't handle EAGAIN / EWOULDBLOCK http://bugs.python.org/issue16133 closed by haypo #16494: Add a method on importlib.SourceLoader for creating bytecode f http://bugs.python.org/issue16494 closed by brett.cannon #16547: IDLE raises an exception in tkinter after fresh file's text ha http://bugs.python.org/issue16547 closed by serhiy.storchaka #17210: documentation of PyUnicode_Format() states wrong argument type http://bugs.python.org/issue17210 closed by python-dev #17391: _cursesmodule Fails to Build on GCC 2.95 (static) http://bugs.python.org/issue17391 closed by neologix #17709: http://docs.python.org/2.7/objects.inv doesn't support :func:` http://bugs.python.org/issue17709 closed by asvetlov #18093: Move main functions to a separate Programs directory http://bugs.python.org/issue18093 closed by ncoghlan #18132: buttons in turtledemo disappear on small screens http://bugs.python.org/issue18132 closed by terry.reedy #18168: plistlib output self-sorted dictionary http://bugs.python.org/issue18168 closed by serhiy.storchaka #18392: Doc: PyObject_Malloc() is not documented http://bugs.python.org/issue18392 closed by zach.ware #18436: Add mapping of symbol to function to operator module http://bugs.python.org/issue18436 closed by zach.ware #19629: support.rmtree fails on symlinks under Windows http://bugs.python.org/issue19629 closed by berker.peksag #21035: Python's HTTP server implementations hangs after 16.343 reques http://bugs.python.org/issue21035 closed by neologix #21500: Make use of the "load_tests" protocol in test_importlib packag http://bugs.python.org/issue21500 closed by zach.ware #21566: make use of the new default socket.listen() backlog argument http://bugs.python.org/issue21566 closed by neologix #21597: Allow turtledemo code pane to get wider. http://bugs.python.org/issue21597 closed by terry.reedy #21645: asyncio: Race condition in signal handling on FreeBSD http://bugs.python.org/issue21645 closed by haypo #21665: 2.7.7 ttk widgets not themed http://bugs.python.org/issue21665 closed by python-dev #21772: platform.uname() not EINTR safe http://bugs.python.org/issue21772 closed by neologix #21813: Enhance doc of os.stat_result http://bugs.python.org/issue21813 closed by haypo #21868: Tbuffer in turtle allows negative size http://bugs.python.org/issue21868 closed by rhettinger #21882: turtledemo modules imported by test___all__ cause side effects http://bugs.python.org/issue21882 closed by terry.reedy #21888: plistlib.FMT_BINARY behavior doesn't send required dict parame http://bugs.python.org/issue21888 closed by serhiy.storchaka #21901: test_selectors.PollSelectorTestCase.test_above_fd_setsize repo http://bugs.python.org/issue21901 closed by neologix #21947: `Dis` module doesn't know how to disassemble generators http://bugs.python.org/issue21947 closed by ncoghlan #21976: Fix test_ssl.py to handle LibreSSL versioning appropriately http://bugs.python.org/issue21976 closed by pitrou #21989: Missing (optional) argument `start` and `end` in documentation http://bugs.python.org/issue21989 closed by r.david.murray #22002: Make full use of test discovery in test subpackages http://bugs.python.org/issue22002 closed by python-dev #22006: thread module documentation erroneously(?) states not all buil http://bugs.python.org/issue22006 closed by mark.dickinson #22007: sys.stdout.write on Python 2.7 is not EINTR safe http://bugs.python.org/issue22007 closed by neologix #22008: Symtable's syntax warning should contain the word "because" http://bugs.python.org/issue22008 closed by python-dev #22009: pdb.set_trace() crashes with UnicodeDecodeError when binary da http://bugs.python.org/issue22009 closed by ned.deily #22015: C signal handler doesn't save/restore errno http://bugs.python.org/issue22015 closed by haypo #22017: Bad reference counting in the _warnings module http://bugs.python.org/issue22017 closed by python-dev #22019: ntpath.join() error with Chinese character Path http://bugs.python.org/issue22019 closed by ezio.melotti #22020: tutorial 9.10. Generators statement error http://bugs.python.org/issue22020 closed by ezio.melotti #22022: test_pathlib: shutil.rmtree() sporadic failures on Windows http://bugs.python.org/issue22022 closed by zach.ware #22026: 2.7.8 ttk Button text display problem http://bugs.python.org/issue22026 closed by zach.ware #22030: Use calloc in set resizing http://bugs.python.org/issue22030 closed by rhettinger #22031: Hexadecimal id in reprs http://bugs.python.org/issue22031 closed by serhiy.storchaka #22032: Use __qualname__ together with __module__ http://bugs.python.org/issue22032 closed by serhiy.storchaka #22036: Obsolete reference to stringobject in comment http://bugs.python.org/issue22036 closed by python-dev #22037: Poor grammar in asyncio TCP echo client example http://bugs.python.org/issue22037 closed by asvetlov #22040: Add a "force" parameter to shutil.rmtree http://bugs.python.org/issue22040 closed by r.david.murray #22048: Add weighted random choice to random package http://bugs.python.org/issue22048 closed by mark.dickinson #22050: argparse: read nargs > 1 options from file doesn't work http://bugs.python.org/issue22050 closed by r.david.murray #22053: turtledemo: clean up start and stop, fix warning http://bugs.python.org/issue22053 closed by terry.reedy #22055: Incomplete sentence in asyncio BaseEventLoop doc http://bugs.python.org/issue22055 closed by asvetlov #22061: Restore deleted tkinter functions with deprecaton dummies. http://bugs.python.org/issue22061 closed by serhiy.storchaka From khannaagrim at gmail.com Tue Jul 29 17:11:22 2014 From: khannaagrim at gmail.com (agrim khanna) Date: Tue, 29 Jul 2014 20:41:22 +0530 Subject: [Python-Dev] Contribute to Python.org Message-ID: Respected Sir, I am Agrim Khanna, undergraduate student in IIIT Allahabad, India. I wanted to contribute to python.org but didnt know how to start. I have elementary knowledge of python language. Could you please help me on the same. Yours Sincerely, Agrim Khanna IIIT-Allahabad -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Jul 29 17:40:01 2014 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 29 Jul 2014 17:40:01 +0200 Subject: [Python-Dev] Contribute to Python.org In-Reply-To: References: Message-ID: Hi, You should read the Python Developer Guide: https://docs.python.org/devguide/ You can also join the core mentorship mailing list: http://pythonmentors.com/ Welcome! Victor 2014-07-29 17:11 GMT+02:00 agrim khanna : > Respected Sir, > > I am Agrim Khanna, undergraduate student in IIIT Allahabad, India. I wanted > to contribute to python.org but didnt know how to start. I have elementary > knowledge of python language. > > Could you please help me on the same. > > Yours Sincerely, > Agrim Khanna > IIIT-Allahabad > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com > From khannaagrim at gmail.com Tue Jul 29 22:44:53 2014 From: khannaagrim at gmail.com (agrim khanna) Date: Wed, 30 Jul 2014 02:14:53 +0530 Subject: [Python-Dev] Contribute to Python.org Message-ID: Respected Sir/Madam, I have installed the setup on my machine and have compiled and run it as well. I was unable to figure out how to make a patch and how to find a suitable bug for me to fix. I request you to guide me in the same. Yours Sincerely, Agrim Khanna IIIT-Allahabad, India -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Tue Jul 29 22:55:54 2014 From: brett at python.org (Brett Cannon) Date: Tue, 29 Jul 2014 20:55:54 +0000 Subject: [Python-Dev] Contribute to Python.org References: Message-ID: On Tue Jul 29 2014 at 4:52:14 PM agrim khanna wrote: > Respected Sir/Madam, > > I have installed the setup on my machine and have compiled and run it as > well. I was unable to figure out how to make a patch and how to find a > suitable bug for me to fix. I request you to guide me in the same. > How to make a patch is in the devguide which was sent to you in your last email: https://docs.python.org/devguide/patch.html Finding issues is also covered in the devguide as well as you are able to ask for help on the core-mentoship mailing list (also in the last email sent to you: http://pythonmentors.com/). > > Yours Sincerely, > Agrim Khanna > IIIT-Allahabad, India > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > brett%40python.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Jul 30 05:59:15 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 30 Jul 2014 06:59:15 +0300 Subject: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now In-Reply-To: <3hNDzH5WHWz7Ljk@mail.python.org> References: <3hNDzH5WHWz7Ljk@mail.python.org> Message-ID: 30.07.14 02:45, antoine.pitrou ???????(??): > http://hg.python.org/cpython/rev/79a5fbe2c78f > changeset: 91935:79a5fbe2c78f > parent: 91933:fbd104359ef8 > user: Antoine Pitrou > date: Tue Jul 29 19:41:11 2014 -0400 > summary: > Issue #22003: When initialized from a bytes object, io.BytesIO() now > defers making a copy until it is mutated, improving performance and > memory use on some use cases. > > Patch by David Wilson. Did you compare this with issue #15381 [1]? [1] http://bugs.python.org/issue15381 From storchaka at gmail.com Wed Jul 30 08:11:24 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 30 Jul 2014 09:11:24 +0300 Subject: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now In-Reply-To: References: <3hNDzH5WHWz7Ljk@mail.python.org> Message-ID: 30.07.14 06:59, Serhiy Storchaka ???????(??): > 30.07.14 02:45, antoine.pitrou ???????(??): >> http://hg.python.org/cpython/rev/79a5fbe2c78f >> changeset: 91935:79a5fbe2c78f >> parent: 91933:fbd104359ef8 >> user: Antoine Pitrou >> date: Tue Jul 29 19:41:11 2014 -0400 >> summary: >> Issue #22003: When initialized from a bytes object, io.BytesIO() now >> defers making a copy until it is mutated, improving performance and >> memory use on some use cases. >> >> Patch by David Wilson. > > Did you compare this with issue #15381 [1]? > > [1] http://bugs.python.org/issue15381 Using microbenchmark from issue22003: $ cat i.py import io word = b'word' line = (word * int(79/len(word))) + b'\n' ar = line * int((4 * 1048576) / len(line)) def readlines(): return len(list(io.BytesIO(ar))) print('lines: %s' % (readlines(),)) $ ./python -m timeit -s 'import i' 'i.readlines()' Before patch: 10 loops, best of 3: 46.9 msec per loop After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop From ncoghlan at gmail.com Wed Jul 30 13:46:15 2014 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Jul 2014 21:46:15 +1000 Subject: [Python-Dev] Contribute to Python.org In-Reply-To: References: Message-ID: On 30 July 2014 01:40, Victor Stinner wrote: > Hi, > > You should read the Python Developer Guide: > > https://docs.python.org/devguide/ > > You can also join the core mentorship mailing list: > > http://pythonmentors.com/ For python.org *itself* (as in, the Django application now powering the site), the contribution process is not yet as clear, but the code and issue tracker are at https://github.com/python/pythondotorg and https://mail.python.org/mailman/listinfo/pydotorg-www is the relevant mailing list. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From antoine at python.org Wed Jul 30 15:59:48 2014 From: antoine at python.org (Antoine Pitrou) Date: Wed, 30 Jul 2014 09:59:48 -0400 Subject: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now In-Reply-To: References: <3hNDzH5WHWz7Ljk@mail.python.org> Message-ID: Le 30/07/2014 02:11, Serhiy Storchaka a ?crit : > 30.07.14 06:59, Serhiy Storchaka ???????(??): >> 30.07.14 02:45, antoine.pitrou ???????(??): >>> http://hg.python.org/cpython/rev/79a5fbe2c78f >>> changeset: 91935:79a5fbe2c78f >>> parent: 91933:fbd104359ef8 >>> user: Antoine Pitrou >>> date: Tue Jul 29 19:41:11 2014 -0400 >>> summary: >>> Issue #22003: When initialized from a bytes object, io.BytesIO() now >>> defers making a copy until it is mutated, improving performance and >>> memory use on some use cases. >>> >>> Patch by David Wilson. >> >> Did you compare this with issue #15381 [1]? Not really, but David's patch is simple enough and does a good job of accelerating the read-only BytesIO case. > $ ./python -m timeit -s 'import i' 'i.readlines()' > > Before patch: 10 loops, best of 3: 46.9 msec per loop > After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop > After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop I'm surprised your patch does better here. Any idea why? Regards Antoine. From dw+python-dev at python.org Wed Jul 30 11:46:30 2014 From: dw+python-dev at python.org (dw+python-dev at python.org) Date: Wed, 30 Jul 2014 09:46:30 +0000 Subject: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now In-Reply-To: References: <3hNDzH5WHWz7Ljk@mail.python.org> Message-ID: <20140730094630.GA786@k2> Hi Serhiy, At least conceptually, 15381 seems the better approach, but getting a correct implementation may take more iterations than the (IMHO) simpler change in 22003. For my tastes, the current 15381 implementation seems a little too magical in relying on Py_REFCNT() as the sole indication that a PyBytes can be mutated. For the sake of haste, 22003 only addresses the specific regression introduced in Python 3.x BytesIO, compared to 2.x StringI, where 3.x lacked an equivalent no-copies specialization. David From martin at v.loewis.de Wed Jul 30 20:03:35 2014 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 30 Jul 2014 20:03:35 +0200 Subject: [Python-Dev] Bluetooth 4.0 support in "socket" module In-Reply-To: References: Message-ID: <53D93377.90301@v.loewis.de> Am 14.07.14 15:57, schrieb Tim Tisdall: > Also, is there a method to test changes against all the different *nix > variations? Is Bluez the standard across the different *nix variations? Perhaps not the answer you expected, but: Python uses autoconf for feature testing. You can be certain that the API *will* vary across system vendors. For example, FreeBSD apparently uses ng_hci(4): http://www.unix.com/man-page/freebsd/4/ng_hci/ If you add features, all you need to make sure that Python continues to compile when the platform feature is not present. People using the other systems are then free to contribute support for their platforms. Regards, Martin From storchaka at gmail.com Wed Jul 30 21:48:52 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 30 Jul 2014 22:48:52 +0300 Subject: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now In-Reply-To: References: <3hNDzH5WHWz7Ljk@mail.python.org> Message-ID: 30.07.14 16:59, Antoine Pitrou ???????(??): > > Le 30/07/2014 02:11, Serhiy Storchaka a ?crit : >> 30.07.14 06:59, Serhiy Storchaka ???????(??): >>> 30.07.14 02:45, antoine.pitrou ???????(??): >>>> http://hg.python.org/cpython/rev/79a5fbe2c78f >>>> changeset: 91935:79a5fbe2c78f >>>> parent: 91933:fbd104359ef8 >>>> user: Antoine Pitrou >>>> date: Tue Jul 29 19:41:11 2014 -0400 >>>> summary: >>>> Issue #22003: When initialized from a bytes object, io.BytesIO() now >>>> defers making a copy until it is mutated, improving performance and >>>> memory use on some use cases. >>>> >>>> Patch by David Wilson. >>> >>> Did you compare this with issue #15381 [1]? > > Not really, but David's patch is simple enough and does a good job of > accelerating the read-only BytesIO case. Ignoring tests and comments my patch adds/removes/modifies about 200 lines, and David's patch -- about 150 lines of code. But it's __sizeof__ looks not correct, correcting it requires changing about 50 lines. In sum the complexity of both patches is about equal. >> $ ./python -m timeit -s 'import i' 'i.readlines()' >> >> Before patch: 10 loops, best of 3: 46.9 msec per loop >> After issue22003 patch: 10 loops, best of 3: 36.4 msec per loop >> After issue15381 patch: 10 loops, best of 3: 27.6 msec per loop > > I'm surprised your patch does better here. Any idea why? I didn't look at David's patch too close yet. But my patch includes optimization for end-of-line scanning. From zachary.ware+pydev at gmail.com Wed Jul 30 22:11:51 2014 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Wed, 30 Jul 2014 15:11:51 -0500 Subject: [Python-Dev] [Python-checkins] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now In-Reply-To: <3hNDzH5WHWz7Ljk@mail.python.org> References: <3hNDzH5WHWz7Ljk@mail.python.org> Message-ID: I'd like to point out a couple of compiler warnings on Windows: On Tue, Jul 29, 2014 at 6:45 PM, antoine.pitrou wrote: > diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c > --- a/Modules/_io/bytesio.c > +++ b/Modules/_io/bytesio.c > @@ -33,6 +37,45 @@ > return NULL; \ > } > > +/* Ensure we have a buffer suitable for writing, in the case that an initvalue > + * object was provided, and we're currently borrowing its buffer. `size' > + * indicates the new buffer size allocated as part of unsharing, to avoid a > + * redundant reallocation caused by any subsequent mutation. `truncate' > + * indicates whether truncation should occur if `size` < self->string_size. > + * > + * Do nothing if the buffer wasn't shared. Returns 0 on success, or sets an > + * exception and returns -1 on failure. Existing state is preserved on failure. > + */ > +static int > +unshare(bytesio *self, size_t preferred_size, int truncate) > +{ > + if (self->initvalue) { > + Py_ssize_t copy_size; > + char *new_buf; > + > + if((! truncate) && preferred_size < self->string_size) { ..\Modules\_io\bytesio.c(56): warning C4018: '<' : signed/unsigned mismatch > + preferred_size = self->string_size; > + } > + > + new_buf = (char *)PyMem_Malloc(preferred_size); > + if (new_buf == NULL) { > + PyErr_NoMemory(); > + return -1; > + } > + > + copy_size = self->string_size; > + if (copy_size > preferred_size) { ..\Modules\_io\bytesio.c(67): warning C4018: '>' : signed/unsigned mismatch > + copy_size = preferred_size; > + } > + > + memcpy(new_buf, self->buf, copy_size); > + Py_CLEAR(self->initvalue); > + self->buf = new_buf; > + self->buf_size = preferred_size; > + self->string_size = (Py_ssize_t) copy_size; > + } > + return 0; > +} > > /* Internal routine to get a line from the buffer of a BytesIO > object. Returns the length between the current position to the -- Zach From antoine at python.org Wed Jul 30 23:23:25 2014 From: antoine at python.org (Antoine Pitrou) Date: Wed, 30 Jul 2014 17:23:25 -0400 Subject: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now In-Reply-To: References: <3hNDzH5WHWz7Ljk@mail.python.org> Message-ID: Le 30/07/2014 15:48, Serhiy Storchaka a ?crit : > > Ignoring tests and comments my patch adds/removes/modifies about 200 > lines, and David's patch -- about 150 lines of code. But it's __sizeof__ > looks not correct, correcting it requires changing about 50 lines. In > sum the complexity of both patches is about equal. I meant that David's approach is conceptually simpler, which makes it easier to review. Regardless, there is no exclusive-OR here: if you can improve over the current version, there's no reason not to consider it/ > I didn't look at David's patch too close yet. But my patch includes > optimization for end-of-line scanning. Ahah, unrelated stuff :-) From storchaka at gmail.com Thu Jul 31 16:09:41 2014 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 31 Jul 2014 17:09:41 +0300 Subject: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now In-Reply-To: References: <3hNDzH5WHWz7Ljk@mail.python.org> Message-ID: 31.07.14 00:23, Antoine Pitrou ???????(??): > Le 30/07/2014 15:48, Serhiy Storchaka a ?crit : > I meant that David's approach is conceptually simpler, which makes it > easier to review. > Regardless, there is no exclusive-OR here: if you can improve over the > current version, there's no reason not to consider it/ Unfortunately there is no anything common in implementations. Conceptually David came in his last patch to same idea as in issue15381 but with different and less general implementation. To apply my patch you need first rollback issue22003 changes (except tests).