emulating read and readline methods

MRAB google at mrabarnett.plus.com
Wed Sep 10 19:11:06 EDT 2008


On Sep 10, 10:52 pm, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
> Sean Davis schrieb:
>
>
>
> > I have a large file that I would like to transform and then feed to a
> > function (psycopg2 copy_from) that expects a file-like object (needs
> > read and readline methods).
>
> > I have a class like so:
>
> > class GeneInfo():
> >     def __init__(self):
> >         #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
> > gene_info.gz',"/tmp/gene_info.gz")
> >         self.fh = gzip.open("/tmp/gene_info.gz")
> >         self.fh.readline() #deal with header line
>
> >     def _read(self,n=1):
> >         for line in self.fh:
> >             if line=='':
> >                 break
> >             line=line.strip()
> >             line=re.sub("\t-","\t",line)
> >             rowvals = line.split("\t")
> >             yield "\t".join([rowvals[i] for i in
> > [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
>
> >     def readline(self,n=1):
> >         return self._read().next()
>
> >     def read(self,n=1):
> >         return self._read().next()
>
> >     def close(self):
> >         self.fh.close()
>
> > and I use it like so:
>
> > a=GeneInfo()
> > cur.copy_from(a,"gene_info")
> > a.close()
>
> > It works well except that the end of file is not caught by copy_from.
> > I get errors like:
>
> > psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
> > during .read() call
> > CONTEXT:  COPY gene_info, line 1000: ""
>
> > for a 1000 line test file.  Any ideas what is going on?
>
> I'm a bit lost why the above actually works - as _read() appears to be
> re-created instead of re-used for each invocation, and thus can't work IMHO.
>
Each generator that's created reads a single line from the file
(self.fh), yields the result, and is then discarded; none of the
individual generator read more than one line from the file.

> Anyway, I think the real problem is that you don't follow the
> readline-protocol. it returns "" if there is no more line to read,
> instead you raise a StopIteration
>
> Diez




More information about the Python-list mailing list