[issue7674] select.select() corner cases: duplicate fds, out-of-range fds

Mon Jan 11 07:52:31 CET 2010

New submission from Chris Leary <cdl28 at cornell.edu>:

I was just reading through this ACM article that enumerates some of the issues with the select function in .NET: http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext

select.select() currently suffers from the same documentation problem where the behavior with duplicate and/or out-of-range file descriptors in one of the sequences (i.e. rlist) is not described.

Given the current implementation of seq2set in trunk it appears that:

1. A ValueError is raised when a given file descriptor is out of range. (Typically a result of the programmer passing a non-fd value, since FD_SETSIZE is "normally at least equal to the maximum number of descriptors supported by the system.")

2. Duplicate file descriptor numbers are collapsed into the fd_set, and are therefore idempotent at a system API level.

However, the language-level support code generally assumes no duplication, as there is a fixed size array of (FD_SETSIZE + 1) pylist entries (one additional for a sentinel value). Although there is a TODO to dynamically size that to the largest targeted file descriptor number, that would still assume one PyObject per file descriptor in the input sequences.

The set2list function used to produce a return value will, however, return duplicates: for each value in the input list, if the corresponding fd is set, that pyobject is added to the return list.

Proposed Changes
----------------

At a glance it would seem that the Right Thing to do is to collapse duplicates in the input, as if we created a set(AsFileDescriptor(o) for o in input_list), so that no duplicates will be returned in the result; however, you *can* have a heterogeneous input list with a fileno like 5 and a file-like object whose fileno() resolved to 5, in which case you don't want to arbitrarily choose only one of those PyObjects to return. Therefore, I'm thinking it's probably best to leave it as-is and document it.

In any case, if we want to explicitly allow duplicates in the input list we should probably make the pylist arrays into dynamically sized structures in the sizes of the corresponding input lists for correctness.

If this all makes sense I'll be happy to come up with a module/documentation/unit test patch.

----------
assignee: georg.brandl
components: Documentation, Extension Modules
messages: 97578
nosy: cdleary, georg.brandl
severity: normal
status: open
title: select.select() corner cases: duplicate fds, out-of-range fds
type: behavior

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7674>
_______________________________________