emulating read and readline methods

Wed Sep 10 17:52:00 EDT 2008

Sean Davis schrieb:
> I have a large file that I would like to transform and then feed to a
> function (psycopg2 copy_from) that expects a file-like object (needs
> read and readline methods).
> 
> I have a class like so:
> 
> class GeneInfo():
>     def __init__(self):
>         #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
> gene_info.gz',"/tmp/gene_info.gz")
>         self.fh = gzip.open("/tmp/gene_info.gz")
>         self.fh.readline() #deal with header line
> 
>     def _read(self,n=1):
>         for line in self.fh:
>             if line=='':
>                 break
>             line=line.strip()
>             line=re.sub("\t-","\t",line)
>             rowvals = line.split("\t")
>             yield "\t".join([rowvals[i] for i in
> [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
> 
>     def readline(self,n=1):
>         return self._read().next()
> 
>     def read(self,n=1):
>         return self._read().next()
> 
>     def close(self):
>         self.fh.close()
> 
> and I use it like so:
> 
> a=GeneInfo()
> cur.copy_from(a,"gene_info")
> a.close()
> 
> It works well except that the end of file is not caught by copy_from.
> I get errors like:
> 
> psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
> during .read() call
> CONTEXT:  COPY gene_info, line 1000: ""
> 
> for a 1000 line test file.  Any ideas what is going on?

I'm a bit lost why the above actually works - as _read() appears to be 
re-created instead of re-used for each invocation, and thus can't work IMHO.

Anyway, I think the real problem is that you don't follow the 
readline-protocol. it returns "" if there is no more line to read, 
instead you raise a StopIteration

Diez