[Python-Dev] PEP 204 - Range Literals

Tue, 18 Jul 2000 12:18:31 +0200

--2oS5YaxWCcQjTEyO
Content-Type: text/plain; charset=us-ascii

I've attached the first draft of PEP 204, Range Literals, which I just
uploaded to CVS as well. If anyone finds time to review it, or the proposed
implementation (patch #100902 on SourceForge) please do. I'm not sure what
to do with the patch... Guido said as much as 'apply it'(*), but I doubt
anyone really looked at it yet ;)

I wrote this using 'joe', which is still my favorite editor, because despite
many, many minutes of effort, xemacs simply refused to do what I wanted it
to do. How do you 're-flow' a paragraph, for instance, or enable word
wrapping, or better yet: word wrapping with autoindentation ? If anyone has
a quick-and-dirty howto on [x]emacs, feel free to share ;)

(* In http://www.python.org/pipermail/python-dev/2000-July/013234.html)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

--2oS5YaxWCcQjTEyO
Content-Type: text/plain
Content-Disposition: attachment; filename="pep-0204.txt"

PEP: 204
Title: Range Literals
Version: $Revision: 1.2 $
Owner: thomas@xs4all.net (Thomas Wouters)
Python-Version: 2.0
Status: Draft

Introduction

    This PEP describes the `range literal' proposal for Python 2.0. 
    This PEP tracks the status and ownership of this feature, slated
    for introduction in Python 2.0.  It contains a description of the
    feature and outlines changes necessary to support the feature. 
    This PEP summarizes discussions held in mailing list forums, and
    provides URLs for further information, where appropriate.  The CVS
    revision history of this file contains the definitive historical
    record.

List ranges

    Ranges are sequences of numbers of a fixed stepping, often used in
    for-loops.  The Python for-loop is designed to iterate over a
    sequence directly:

    >>> l = ['a', 'b', 'c', 'd']
    >>> for item in l:
    ...     print item
    a
    b
    c
    d

    However, this solution is not always prudent.  Firstly, problems
    arise when altering the sequence in the body of the for-loop,
    resulting in the for-loop skipping items.  Secondly, it is not
    possible to iterate over, say, every second element of the
    sequence.  And thirdly, it is sometimes necessary to process an
    element based on its index, which is not readily available in the
    above construct.

    For these instances, and others where a range of numbers is
    desired, Python provides the `range' builtin function, which
    creates a list of numbers.  The `range' function takes three
    arguments, `start', `end' and `step'.  `start' and `step' are
    optional, and default to 0 and 1, respectively.

    The `range' function creates a list of numbers, starting at
    `start', with a step of `step', up to, but not including `end', so
    that `range(10)' produces a list that has exactly 10 items, the
    numbers 0 through 9.

    Using the `range' function, the above example would look like
    this:

    >>> for i in range(len(l)):
    ...     print l[i]
    a
    b
    c
    d

    Or, to start at the second element of `l' and processing only
    every second element from then on:

    >>> for i in range(1, len(l), 2):
    ...     print l[i]
    b
    d

    There are several disadvantages with this approach:

    - Clarity of purpose: Adding another functioncall, possibly with
      extra arithmatic to determine the desired length and step of the
      list, does not improve readability of the code.  Also, it is
      possible to `shadow' the builtin `range' function by supplying a
      local or global variable with the same name, effectively
      replacing it.  This may or may not be a desired effect.

    - Efficiency: because the `range' function can be overridden, the
      Python compiler cannot make assumptions about the for-loop, and
      has to maintain a seperate loop counter.

    - Consistency: There already is a syntax that is used to denote
      ranges, as shown below.  This syntax uses the exact same
      arguments, though all optional, in the exact same way.  It seems
      logical to extend this syntax to ranges, to form `range
      literals'.

Slice Indices

    In Python, a sequence can be indexed in one of two ways:
    retrieving a single item, or retrieving a range of items. 
    Retrieving a range of items results in a new object of the same
    type as the original sequence, containing zero or more items from
    the original sequence.  This is done using a `range notation':

    >>> l[2:4]
    ['c', 'd']

    This range notation consists of zero, one or two indices seperated
    by a colon.  The first index is the `start' index, the second the
    `end'.  When either is left out, they default to respectively the
    start and the end of the sequence.

    There is also an extended range notation, which incorporates
    `step' as well.  Though this notation is not currently supported
    by most builtin types, if it were, it would work as follows:

    >>> l[1:4:2]
    ['b', 'd']

    The third `argument' to the slice syntax is exactly the same as
    the `step' argument to range().  The underlying mechanisms of
    standard and these extended slices are sufficiently different and
    inconsistent that many classes and extensions outside of
    mathematical packages do not implement support for the extended
    variant, and this should definately be resolved, but this is
    beyond the scope of this PEP.

    Extended slices do show, however, that there is already a
    perfectly valid and applicable syntax to denote ranges in a way
    that solve all of the earlier stated disadvantages of the use of
    the range() function:

    - It is clearer, more concise syntax, which has already proven to
      be both intuitive and easy to learn.

    - It is consistent with the other use of ranges in Python
      (slices.)

    - Because it is built-in syntax, instead of a builtin function, it
      cannot be overridden.  This means both that a viewer can be
      certain about what the code does, and that an optimizer will not
      have to worry about range() being `shadowed'.

The Proposed Solution

    The proposed implementation of range-literals combines the syntax
    for list literals with the syntax for (extended) slices, to form
    range literals:

    >>> [1:10]
    [1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> [:5]
    [0, 1, 2, 3, 4]
    >>> [5:1:-1]
    [5, 4, 3, 2]

    There is one minor difference between range literals and the slice
    syntax: though it is possible to omit all of `start', `end' and
    `step' in slices, it does not make sense to omit `end' in range
    literals.  In slices, `end' would default to the end of the list,
    but this has no meaning in range literals.

Reference Implementation

    The proposed implementation can be found on SourceForge[1].  It
    adds a new bytecode, BUILD_RANGE, that takes three arguments from
    the stack and builds a list on the bases of those.  The list is
    pushed back on the stack.

    The use of a new bytecode is necessary to be able to build ranges
    based on other calculations, whose outcome is not known at compile
    time.

    The code introduces two new functions to listobject.c, which are
    currently hovering between private functions and full-fledged API
    calls.

    PyObject * PyList_FromRange(long start, long end, long step)
    builds a list from start, end and step, returning NULL if an error
    occurs.

    long PyList_GetLenOfRange(long start, long end, long step) is a
    helper function to determine the length of a range.  It was
    previously a static function in bltinmodule.c, but is now
    necessary in both listobject.c and bltinmodule.c (for xrange).  It
    is made non-static solely to avoid code duplication.

Open issues

    One possible solution to the discrepancy of requiring the `end'
    argument in range literals is to allow the range syntax to create
    a `generator', rather than a list, such as the `xrange' builtin
    function does.  However, a generator would not be a list, and it
    would be impossible, for instance, to assign to items in the
    generator, or append to it.

    The range syntax could conceivably be extended to include tuples,
    immutable lists, which could then be safely implemented as
    generators.  Especially for large number arrays, this may be a
    desirable solution: generators require very little in the way of
    storage and initialization, and there is only a small performance
    impact in calculating and creating the appropriate number on
    request.  (TBD: is there any at all ? Cursory testing suggests
    equal performance even in the case of ranges of length 1.)

    However, even if idea was adopted, would it be wise to `special
    case' the second argument, making it optional in one instance of
    the syntax, and non-optional in other cases ?

    Should it be possible to mix range syntax with normal list
    literals, creating a single list, like so:

    >>> [5, 6, 1:6, 7, 9]
    to create
    [5, 6, 1, 2, 3, 4, 5, 7, 9]

    How should range literals interact with another proposed new
    feature, `list comprehensions', PEP-202 ? In specific, should it
    be possible to create lists in list comprehensions, like so:

    >>> [x:y for x in (1,2) y in (3, 4)]

    Should this example return a single list with multiple ranges:
    [1, 2, 1, 2, 3, 2, 2, 3]

    Or a list of lists, like so:
    [[1, 2], [1, 2, 3], [2], [2, 3]]

    However, as the syntax and semantics of list comprehensions are
    still subject of hot debate, these issues are probably best
    addressed by the `list comprehensions' PEP.

    Range literals accept objects other than integers: it performs
    PyInt_AsLong() on the objects passed in, so as long as the objects
    can be coerced into integers, they will be accepted.  The
    resulting list, however, is always composed of standard integers.

    Should range literals create a list of the passed-in type ? It
    might be desirable in the cases of other builtin types, such as
    longs and strings:

    >>> [ 1L : 2L<<64 : 2<<32L ]    
    >>> ["a":"z":"b"]
    >>> ["a":"z":2]

    However, this might be too much `magic' to be obvious.  It might
    also present problems with user-defined classes: even if the base
    class can be found and a new instance created, the instance may
    require additional arguments to __init__, causing the creation to
    fail.

    The PyList_FromRange() and PyList_GetLenOfRange() functions need
    to be classified: are they part of the API, or should they be made
    private functions ?

References:

    [1]
http://sourceforge.net/patch/?func=detailpatch&patch_id=100902&group_id=5470

Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:

--2oS5YaxWCcQjTEyO--