[Tutor] Iterating through a list of strings

Steven D'Aprano steve at pearwood.info
Mon May 3 13:15:07 CEST 2010


On Mon, 3 May 2010 08:18:41 pm Luke Paireepinart wrote:
> On Mon, May 3, 2010 at 3:50 AM, Stefan Behnel <stefan_ml at behnel.de> 
wrote:
> > Luke Paireepinart, 03.05.2010 10:27:
> >> What's this bizarre syntax?
> >
> > Look it up in the docs, it's called "with statement". Its purpose
> > here is to make sure the file is closed after the execution of the
> > statement's body, regardless of any errors that may occur while
> > running the loop.

This holds for file objects, but the with statement is far more powerful 
and general than that. It's a generalisation of something like:

setup()
try:
    block()
finally:
    teardown()

(although I admit I haven't really used the with statement enough to be 
completely sure about what it does). It's not just for files.



> >> I thought they changed for loop interations so that if you did
> >> for line in open('packages.txt'):
> >>     .... etc...
> >>
> >> it would automatically close the file handle after the loop
> >> terminated. Have I been wrong this whole time?
> >
> > Yes. The fact that the file is automatically closed after the loop
> > is an implementation detail of CPython that does not apply in other
> > Python implementations.
>
> So why is it an implementation detail?  Why is it not universally
> like that? You never have an explicit reference to the file handle. 
> When it gets garbage-collected after the loop it should get rid of
> the file handle.

Because not all implementations use reference counting. Jython (Python 
written in Java) uses the Java garbage collector, and so it doesn't 
close files until the application shuts down. IronPython uses .Net, and 
so it does whatever .Net does.

Even in CPython, if you have a reference loop, Python can't close the 
file until the garbage collector runs, and it might not run 
immediately. It might not run at all, if the object has a __del__ 
method, or if the caller has turned it off.


> I mean, where is the line between 'implementation details' and
> 'language features'?  What reason is there to make lists mutable but
> strings immutable?  Why aren't strings mutable, or lists immutable?

These are design decisions. The language creator made that decision, and 
having made that decision, the immutability of strings is a language 
feature.

For instance, some years ago Python's list.sort method was "unstable". 
Stable sorting is very desirable, because it allows you to do something 
like this:

sort by family name
sort by first name
sort by address

and have intuitively correct results. Unstable sorts don't, subsequent 
sorts may "mess up" the order from earlier sorts. But because stable 
sorting is harder to get right without a serious slow-down, Python (the 
language) didn't *promise* that sorting was stable. Consequently 
implementations were free to make their own choices. For many years, 
CPython's sort was stable for small lists but unstable for large lists.

Then Tim Peters invented a new sort algorithm which was not only faster 
than Python's already blindingly fast sort, but was also stable. So for 
at least one release (2.3, if I remember correctly) CPython's sort was 
stable but Python the language made no promises that it would remain 
stable forever. Perhaps it would turn out that Tim Peter's new sort was 
buggy, and had to be replaced? Or that Tim's testing was flawed, and it 
was actually slower than the old one?

Finally, I think in Python 2.4, it was decided that the new stable sort 
had proven itself enough that Python (the language) could afford to 
promise sorting would be stable. Any language wanting to call itself 
Python would have to provide a stable sort method. Failure to be stable 
would count as a bug, and not a quality of implementation issue.

Python the language promises that files will be automatically closed at 
some point after you have finished with them. That is a language 
feature. Exactly when this happens is a quality of implementation 
issue. The garbage collectors used by Java and .Net prioritise other 
things over closing files, e.g. performance with multiple CPUs.



-- 
Steven D'Aprano


More information about the Tutor mailing list