Fast forward-backward (write-read)

David Hutto dwightdhutto at gmail.com
Tue Oct 23 20:01:36 EDT 2012


On Tue, Oct 23, 2012 at 7:35 PM, emile <emile at fenx.com> wrote:
> On 10/23/2012 04:19 PM, David Hutto wrote:
>>
>> Whether this is fast enough, or not, I don't know:
>
>
> well, the OP's original post started with
>   "I am working with some rather large data files (>100GB)..."

Well, is this a dedicated system, and one that they have the budget to upgrade?

Data files have some sort of parsing, unless it's one huge dict, or
list, so there has to be an average size to the parse.

So big O notation should begin to refine without a full file.

>
>
>> filename = "data_file.txt"
>> f = open(filename, 'r')
>> forward =  [line.rstrip('\n') for line in f.readlines()]
>
>
> f.readlines() will be big(!) and have overhead... and forward results in
> something again as big.
>
Not if an average can be taken, and then refined as the actual gigs
are being iterated through.

>
>> backward =  [line.rstrip('\n') for line in reversed(forward)]
>
>
> and defining backward looks to me to require space to build backward and
> hold reversed(forward)
>
> So, let's see, at that point in time (building backward) you've got
> probably somewhere close to 400-500Gb in memory.
>
> My guess -- probably not so fast.  Thrashing is sure to be a factor on all
> but machines I'll never have a chance to work on.

But does the OP have access? They never stated their hardware, and
upgradable budget.

>
>
>> f.close()
>> print forward, "\n\n", "********************\n\n", backward, "\n"
>
>
>
> It's good to retain context.

Trying to practice good form ;).


-- 
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com



More information about the Python-list mailing list