seek() tell()

Jeff Epler jepler at unpythonic.net
Thu Feb 6 09:25:23 EST 2003


On Wed, Feb 05, 2003 at 07:24:38PM -0800, Dennis Lee Bieber wrote:
> Jeff Epler fed this fish to the penguins on Wednesday 05 February 2003 
> 06:13 am:
> 
> > 
> > On Windows, the number returned by ftell() will be the number you
> > would have gotten if you had treated the file as binary and written \n
> > as \r\n
> > (so 1 is added for each \n, essentially).  However, it was the intent
> > of the folks who wrote the standard to accomodate systems where "text
> > files" are files with fixed-length records, one record per line.  On
> > those systems, the number returned might turn out to be (for example)
> > a 25:7 bit split (line number:line position, giving files up to 32M
> > lines long with up to 127 chars per line)
> >
>         I've never encountered a C compiler that didn't return a byte count. 
> (Both Linux gcc and Windows M$ VC6 return long ints)

Yes, but the C standard intended to support the configuration I
described.  I have the impression that some mainframes & maybe vax have the
kind of record-oriented text files I described.

See http://www.lysator.liu.se/c/rat/d9.html#4-9-2
[excerpt, with relevant sections starred:]
    4.9.2  Streams

    C inherited its notion of text streams from the UNIX environment in
    which it was born.  Having each line delimited by a single new-line
    character, regardless of the characteristics of the actual terminal,
    supported a simple model of text as a sort of arbitrary length
    scroll or ``galley.''  Having a channel that is ``transparent''
    (no file structure or reserved data encodings)  eliminated the need
    for a distinction between text and binary streams.

    Many other environments have different properties, however.  If a
    program written in C is to produce a text file digestible by other
    programs, by text editors in particular, it must conform to the text
    formatting conventions of that environment.

    The I/O facilities defined by the Standard are both more complex
    and more restrictive than the ancestral I/O facilities of UNIX.
    This is justified on pragmatic grounds: most of the differences,
    restrictions and omissions exist to permit C I/O implementations in
    environments which differ from the UNIX I/O model.

    Troublesome aspects of the stream concept include:

    The definition of lines.
        In the UNIX model, division of a file into lines is effected by
        new-line characters.  Different techniques are used by other
        systems --- lines may be separated by CR-LF (carriage return,
        line feed) or by unrecorded areas on the recording medium, or each
        line may be prefixed by its length.  The Standard addresses this
        diversity by specifying that new-line be used as a line separator
        at the program level, but then permitting an implementation to
        transform the data read or written to conform to the conventions
        of the environment.

*       Some environments represent text lines as blank-filled
*       fixed-length records.  Thus the Standard specifies that it
        is implementation-defined whether trailing blanks are removed
        from a line on input.  (This specification also addresses the
        problems of environments which represent text as variable-length
        records, but do not allow a record length of 0: an empty line
        may be written as a one-character record containing a blank,
        and the blank is stripped on input.)
    [...]
    Random access.
        The UNIX I/O model features random access to data in a file,
        indexed by character number.  On systems where a new-line
        character processed by the program represents an unknown number of
        physically recorded characters, this simple mechanism cannot be
        consistently supported for text streams.  The Standard abstracts
        the significant properties of random access for text streams:
        the ability to determine the current file position and then later
*       reposition the file to the same location.  ftell returns a file
*       position indicator, which has no necessary interpretation except
*       that an fseek operation with that indicator value will position
*       the file to the same place.  Thus an implementation may encode
*       whatever file positioning information is most appropriate for
*       a text file, subject only to the constraint that the encoding
*       be representable as a long.  Use of fgetpos and fsetpos removes
        even this constraint.
[...]
*   It was agreed that some minimum maximum line length must be mandated; 254 was chosen. 

Jeff





More information about the Python-list mailing list