[Tutor] Load Entire File into memory

Dave Angel davea at davea.name
Mon Nov 4 22:35:01 CET 2013


On 4/11/2013 11:26, Amal Thomas wrote:

> @Dave: thanks.. By the way I am running my codes on a server with about
> 100GB ram but I cant afford my code to use 4-5 times the size of the text
> file. Now I am using  read() / readlines() , these seems to be more
> efficient in memory usage than io.StringIO(f.read()).
>

Sorry I misspoke about read() on a large file.  I was confusing it with
something else.

However, note that in any environment if you have a large buffer, and
you force the system to copy that large buffer, you'll be using
(temporarily at least) twice the space.  And usually the original can't
be freed, for various technical reasons.

The real question is how you're going to be addressing the data, and
wha.t constraints are on that data.

Since you think you need it all in memory, you clearly are planning to
access it randomly. Since the data is apparently ASCII characters, and
you're running at least 3.3, you won't be paying the penalty if it turns
out to be strings.  But there may be alternate ways of encoding each
line which save space and/or make it faster to use.  One big buffer
imaging the file is likely to be one of the worst.

Are the lines variable length?  Do you ever deal randomly with a portion
of a line, or only the whole thing?  If the line is multiple ASCII
characters, are their order significant?  how many different symbols can
appear in a single line?  how many different ones total?  (probably
excluding the newline).  What's the average line length?

Each of these questions may lead to exploring different optimzation
strategies.  But I've done enough speculating.


-- 
DaveA




More information about the Tutor mailing list