Scanning a file

Steven D'Aprano steve at REMOVETHIScyber.com.au
Fri Oct 28 19:41:14 EDT 2005


On Fri, 28 Oct 2005 06:22:11 -0700, pinkfloydhomer at gmail.com wrote:

> Which is quite fast. The only problems is that the file might be huge.

What *you* call huge and what *Python* calls huge may be very different
indeed. What are you calling huge?

> I really have no need for reading the entire file into a string as I am
> doing here. All I want is to count occurences this substring. Can I
> somehow count occurences in a file without reading it into a string
> first?

Magic?

You have to read the file into memory at some stage, otherwise how can you
see what value the bytes are? The only question is, can you read it all
into one big string (in which case, your solution is unlikely to be
beaten), or do you have to read the file in chunks and deal with the
boundary cases (which is harder)?

Here is another thought. What are you going to do with the count when you
are done? That sounds to me like a pretty pointless result: "Hi user, the
file XYZ has 27 occurrences of bitpattern \x00\x00\x01\x00. Would you like
to do another file?"

If you are planning to use this count to do something, perhaps there is a
more efficient way to combine the two steps into one -- especially
valuable if your files really are huge.


-- 
Steven.




More information about the Python-list mailing list