The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)

Steven D'Aprano steve at pearwood.info
Tue Mar 22 08:03:45 EDT 2016


On Tue, 22 Mar 2016 10:05 pm, BartC wrote:

> On 22/03/2016 01:01, Steven D'Aprano wrote:
>> On Tue, 22 Mar 2016 06:43 am, BartC wrote:
>>
>>> This code was adapted from a program that used:
>>>
>>>      readstrfile(filename)
>>>
>>> which either returned the contents of the file as a string, or 0.
>>
>> What an interesting function. And I don't mean that in a good way.
>>
>> So if it returns 0, how do you know what the problem is? Mistyped file
>> name? Permission denied? File doesn't actually exist? Disk corruption and
>> you can't open the file? Some weird OS problem where you can't *close*
>> the file? (That can actually happen, although it's never happened to me.)
>> How do you debug any problems, given only "0" as a result?
>>
>> What happens if you read (let's say) a 20GB Blue-Ray disk image?
> 
> I think you're making far too much of a throwaway function to grab a
> file off disk and into memory.
> 
> But out of interest, how would /you/ write a function that takes a
> file-spec and turns it into an in-memory string? And what would its use
> look like?

I already told you. For a quick and dirty script where I didn't care much
about reliability, I would use:

the_text = open(filename).read()

and leave it at that.

There's a hierarchy of less- to more-reliable. Next would be:

with open(filename) as f:
    the_text = f.read()

which guarantees to close the file promptly. Better still would be to avoid
dealing with the entire file in one (potentially enormous) chunk, and
process it line by line:

with open(filename) as f:
    for line in f:
        process line


If for some reason I *had* to process it as one big chunk of text, where I
knew that there was a chance that it could be bigger than what I could
comfortably hold in memory in one go, I would research mmap. But I don't
really know anything about how that works. I've been lucky enough to never
need to care.

Dealing with out-of-memory errors on modern OSes is one of the hardest
things to get right. In some ways, we're lucky, because the OS will try
really hard to give the illusion that you have an infinite amount of
memory. But the illusion is never perfect, and the abstraction of "virtual
memory plus real memory = infinite memory" can break down. I once foolishly
tried to create an *enormous* list, something like [0]*10**100, and my OS
very kindly started swapping applications in and out of memory trying to
free up 40 000 000 billion billion billion billion billion billion billion
billion billion petabytes of memory (estimated).

Not only did Python lock up, but so did the OS. I decided to leave it
overnight to see if it would recover, but 16 hours later it was still
locked up and frantically trying to swap memory. I'm not sure why the
OOM-Killer didn't trigger. I ended up having to do a hard power-down to
recover. So virtual memory is a mixed blessing.


>> Pythonic code probably uses a lot of iterables:
>>
>> for value in something:
>>      ...
> 
>> in preference to Pascal code written in Python:
>>
>> for index in range(len(something)):
>>      value = something[index]
> 
> (Suppose you need both the value and its index in the loop? Then the
> one-line for above won't work. For example, 'something' is [10,20,30] 
> and you want to print:
> 
>   0: 10
>   1: 20
>   2: 30 )



for index, n in enumerate([10, 20, 30]):
    print(index, ":", n)


>> or worse:
>>
>> index = 0
>> while index < len(something):
>>      value = something[index]
>>      ...
>>      index += 1
> 
>> (I don't know where that while-loop idiom comes from. C? Assembly?
>> Penitent monks living in hair shirts in the desert and flogging
>> themselves with chains every single night to mortify the accursed flesh?
>> But I'm seeing it a lot in code written by beginners. I presume somebody,
>> or some book, is teaching it to them. "Learn Python The Hard Way"
>> perhaps?)
> 
> Are you suggesting 'while' is not needed? 

Of course not. Use while loops for when you need a while loop.

But *writing a for-loop using while* is an abuse of while.



-- 
Steven




More information about the Python-list mailing list