Question about file objects...

nn pruebauno at latinmail.com
Thu Dec 3 15:13:12 EST 2009


On Dec 2, 6:56 pm, Terry Reedy <tjre... at udel.edu> wrote:
> J wrote:
> > On Wed, Dec 2, 2009 at 09:27, nn <prueba... at latinmail.com> wrote:
> >>> Is there a way to read the file, one item at a time, delimited by
> >>> commas WITHOUT having to read all 16,000 items from that one line,
> >>> then split them out into a list or dictionary??
>
> >> File iteration is a convenience since it is the most common case. If
> >> everything is on one line, you will have to handle record separators
> >> manually by using the .read(<number_of_bytes>) method on the file
> >> object and searching for the comma. If everything fits in memory the
> >> straightforward way would be to read the whole file with .read() and
> >> use .split(",") on the returned string. That should give you a nice
> >> list of everything.
>
> > Agreed. The confusion came because the guy teaching said that
> > iterating the file is delimited by a carriage return character...
>
> If he said exactly that, he is not exactly correct. File iteration looks
> for line ending character(s), which depends on the system or universal
> newline setting.
>
> > which to me sounds like it's an arbitrary thing that can be changed...
>
> > I was already thinking that I'd have to read it in small chunks and
> > search for the delimiter i want...  and reading the whole file into a
> > string and then splitting that would would be nice, until the file is
> > so large that it starts taking up significant amounts of memory.
>
> > Anyway, thanks both of you for the explanations... I appreciate the help!
>
> I would not be surprised if a generic file chunk generator were posted
> somewhere. It would be a good entry for the Python Cookbook, if not
> there already.
>
> tjr

There should be but writing one isn't too difficult:

def chunker(file_obj):
    parts=['']
    while True:
        fdata=file_obj.read(8192)
        if not fdata: break
        parts=(parts[-1]+fdata).split(',')
        for col in parts[:-1]:
            yield col
    yield parts[-1]




More information about the Python-list mailing list