file read : feature or bug ?

Andy Jewell andy at wild-flower.co.uk
Wed Apr 16 16:48:26 EDT 2003


On Tuesday 15 Apr 2003 5:10 pm, Asbuilt Easynet wrote:
> Hello,
>
> I have a problem to use the read function that seems read more than
> requested
> as you can see in this sample demo.
>
> Thanks
>
> Jef
>
> # This is a program to read a file in a reverse order by 3 characters block
> # the demo.log file is 2 characters per line in windows so ended with CRLF
> # 01
> # 02
> # 03
> # 04
> #
> # we can see it like 01__02__03__04 where __ is CRLF
> #
> # PROBLEM :
> # the program normaly should return
> #   _04
> #   03_
> #   2__
> #   __0
> #   11
> #
> # BUT it returns
> #   _04
> #   03_
> #   2_0     <-- From where is coming the 0 ?
> #   _02     <-- From where is coming the 2 ?
> #   11
> #  (this is an extract from rgrep.py)
>
> import string
>
> l_bufsize = 3
> l_file    = open("demo.log")
> l_file.seek(0, 2)
> l_pos      = l_file.tell()
>
> while l_pos > 0:
>   l_size = min(l_pos, l_bufsize)
>   l_pos  = l_pos - l_size
>   l_file.seek(l_pos)
>   l_buffer = l_file.read(l_size)
>   print "l_buffer = " + l_buffer + "[" + str(len(l_buffer)) +"]"
>
>   l_lines = string.split(l_buffer, "x")
>   print "Split = " + str(l_lines)
>   del l_buffer


Is this just an exercise or a contrived example to demonstrate your problem? 

Doing file I/O in python with this style of processing is *incredibly* 
inefficient.  If the file can be read into memory, that's the best place to 
process it. 

Here's a more pythonic way of doing what you're doing above:

infile=file("demo.log","r")
lines=infile.readlines()
lines.reverse()
for line in lines:
    print line

If you were merely experimenting with seek, I guess this wont help much! :-)  
But the same principle still applies - get the file into memory if you can 
and then manipulate it there.  Always use builtin functions where they do 
what you want - they're in C and are therefore much faster than python.  

Python's strengths lie elsewhere than sheer computational muscle; readability, 
manageability and a wealth of readily available libraries are just three of 
them.

Hope that helps

-andyj





More information about the Python-list mailing list