Looping through a file a block of text at a time not by line

Wed Jun 14 05:12:44 EDT 2006

Rosario Morgan wrote:
> Hello
> 
> Help is great appreciated in advance.
> 
> I need to loop through a file 6000 bytes at a time.  I was going to 
> use the following but do not know how to advance through the file 6000 
> bytes at a time.
> 
> file = open('hotels.xml')

while True:
  block = file.read(6000)
  if not block:
    break
  do_something_with_block(block)

or:

block = file.read(6000)
while block:
  do_something_with_block(block)
  block = file.read(6000)

> newblock = re.sub(re.compile(r'<Rate.*?></Rate>'),'',block)

Either you compile the regexp once and use the compiled regexp object:

  exp = re.compile(r'<Rate.*?></Rate>')
  (...)
  newblock = exp.sub('', block)

or you use a non-compiled regexp:

  newblock = re.sub(r'<Rate.*?></Rate>','',block)

Here, the first solution may be better. Using a SAX parser may be an
option too... (maybe overkill, or maybe the RightThingToDo(tm),
depending on the context...)

> 
> I cannot use readlines because the file is 138MB all on one line.

So much for the "XML is human readable and editable"....

-- 
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'onurb at xiludom.gro'.split('@')])"