[Tutor] Iterating through a list of strings
Steven D'Aprano
steve at pearwood.info
Mon May 3 13:15:07 CEST 2010
On Mon, 3 May 2010 08:18:41 pm Luke Paireepinart wrote:
> On Mon, May 3, 2010 at 3:50 AM, Stefan Behnel <stefan_ml at behnel.de>
wrote:
> > Luke Paireepinart, 03.05.2010 10:27:
> >> What's this bizarre syntax?
> >
> > Look it up in the docs, it's called "with statement". Its purpose
> > here is to make sure the file is closed after the execution of the
> > statement's body, regardless of any errors that may occur while
> > running the loop.
This holds for file objects, but the with statement is far more powerful
and general than that. It's a generalisation of something like:
setup()
try:
block()
finally:
teardown()
(although I admit I haven't really used the with statement enough to be
completely sure about what it does). It's not just for files.
> >> I thought they changed for loop interations so that if you did
> >> for line in open('packages.txt'):
> >> .... etc...
> >>
> >> it would automatically close the file handle after the loop
> >> terminated. Have I been wrong this whole time?
> >
> > Yes. The fact that the file is automatically closed after the loop
> > is an implementation detail of CPython that does not apply in other
> > Python implementations.
>
> So why is it an implementation detail? Why is it not universally
> like that? You never have an explicit reference to the file handle.
> When it gets garbage-collected after the loop it should get rid of
> the file handle.
Because not all implementations use reference counting. Jython (Python
written in Java) uses the Java garbage collector, and so it doesn't
close files until the application shuts down. IronPython uses .Net, and
so it does whatever .Net does.
Even in CPython, if you have a reference loop, Python can't close the
file until the garbage collector runs, and it might not run
immediately. It might not run at all, if the object has a __del__
method, or if the caller has turned it off.
> I mean, where is the line between 'implementation details' and
> 'language features'? What reason is there to make lists mutable but
> strings immutable? Why aren't strings mutable, or lists immutable?
These are design decisions. The language creator made that decision, and
having made that decision, the immutability of strings is a language
feature.
For instance, some years ago Python's list.sort method was "unstable".
Stable sorting is very desirable, because it allows you to do something
like this:
sort by family name
sort by first name
sort by address
and have intuitively correct results. Unstable sorts don't, subsequent
sorts may "mess up" the order from earlier sorts. But because stable
sorting is harder to get right without a serious slow-down, Python (the
language) didn't *promise* that sorting was stable. Consequently
implementations were free to make their own choices. For many years,
CPython's sort was stable for small lists but unstable for large lists.
Then Tim Peters invented a new sort algorithm which was not only faster
than Python's already blindingly fast sort, but was also stable. So for
at least one release (2.3, if I remember correctly) CPython's sort was
stable but Python the language made no promises that it would remain
stable forever. Perhaps it would turn out that Tim Peter's new sort was
buggy, and had to be replaced? Or that Tim's testing was flawed, and it
was actually slower than the old one?
Finally, I think in Python 2.4, it was decided that the new stable sort
had proven itself enough that Python (the language) could afford to
promise sorting would be stable. Any language wanting to call itself
Python would have to provide a stable sort method. Failure to be stable
would count as a bug, and not a quality of implementation issue.
Python the language promises that files will be automatically closed at
some point after you have finished with them. That is a language
feature. Exactly when this happens is a quality of implementation
issue. The garbage collectors used by Java and .Net prioritise other
things over closing files, e.g. performance with multiple CPUs.
--
Steven D'Aprano
More information about the Tutor
mailing list