buffering choking sys.stdin.readlines() ?

Diez B. Roggisch deets at nospam.web.de
Mon May 12 11:29:51 EDT 2008


cshirky schrieb:
> Newbie question:
> 
> I'm trying to turn a large XML file (~7G compressed) into a YAML file,
> and my program seems to be buffering the input.
> 
> IOtest.py is just
> 
>   import sys
>   for line in sys.stdin.readlines():
>     print line
> 
> but when I run
> 
> $ gzcat bigXMLfile.gz | IOtest.py
> 
> but it hangs then dies.
> 
> The goal of the program is to build a YAML file with print statements,
> rather than building a gigantic nested dictionary, but I am obviously
> doing something wrong in passing input through without buffering. Any
> advice gratefully fielded.

readlines() reads all of the file into the memory. Try using xreadlines, 
the generator-version, instead. And I'm not 100% sure, but I *think* doing

for line in sys.stdin:
    ...

does exactly that.

Diez



More information about the Python-list mailing list