Efficient way to break up a list into two pieces

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun Feb 21 18:21:27 EST 2010


On Sun, 21 Feb 2010 11:07:24 -0800, marwie wrote:

>> x = SomeReallyBigListOrString
>> for item in x[1:]:
>>     process(item)
>>
>> has to copy the entire list or string (less the first item). But
>> honestly, I've never found a situation where it actually mattered.
> 
> Good grief, it copies that, too? I assumed that the copying is at least
> delayed until the assignment and that x[1:] is represented by some
> wrapper that just shuffles the indices around (much like the .indices
> method that MRAB suggested).

Such complexity doesn't happen for free. It costs developer time, more 
complexity means more bugs, and more runtime overhead, especially for 
small lists. 99.9% of programs operate on small amounts of data, and 
"small" gets bigger every year.

(I was reading one of my old Uni text books the other day, and one of the 
projects was an application to index all the words in a text file. The 
author decided to use disk storage rather than in-memory data structures 
because he was hoping to read files up to 60,000 words, and considered it 
unlikely that any PCs would have enough memory to store 60,000 words!)

Unless you actually profile the code, you're as likely to be *slowing 
Python down* by such extra sophistication as you are to speed it up. 
Copying blocks of pointers is really fast on modern CPUs.

As a general rule, Python aims for simplicity of implementation rather 
than clever, but fragile, tricks.


> Maybe copying chunks of data really isn't that much of an issue as it
> used to be (and maybe I'm getting old). The application I have in mind
> has to do with symbolic computation, and expressions would be
> represented by python lists. It's too early to do any profiling (in
> fact, it's at the "deciding if python is the right language to implement
> it" stage), but it will have to handle expressions with many terms (i.e.
> long lists), and in the end taking these lists apart and putting them
> back together in different order is the only thing the code will do.


Are you likely to have expressions with a hundred or so MILLION terms?

If so, then you should start thinking about clever tricks to minimise 
copying. Otherwise, you're engaged in premature optimization, which is 
the root of all computer science evil :)



-- 
Steven



More information about the Python-list mailing list