[Python-Dev] xreadlines : readlines :: xrange : range

Guido van Rossum guido@python.org
Wed, 10 Jan 2001 11:38:16 -0500


> [Guido]
> > I'm much more confident about the getc_unlocked() approach than about
> > fgets() -- with the latter we need much more faith in the C library
> > implementers.  (E.g. that fgets() never writes beyond the null bytes
> > it promises, and that it locks/unlocks only once.)  Also, you're
> > relying on blindingly fast memchr() and memset() implementations.

[Tim]
> Yet Andrew's timings say it's a wash on Linux and Solaris (perhaps even a
> bit quicker on Solaris, despite that it's paying an extra layer of function
> call per line, to keep it out of get_line proper).  That tells me the
> assumptions are indeed mild.  The business about not writing beyond the null
> byte is a concern only I would have raised:  the possibility is an
> aggressively paranoid reading of the std (I do *lots* of things with libc
> I'm paranoid about <0.9 wink>).  If even *Microsoft* didn't blow these
> things, it's hard to imagine any other vendor exploding ...
> 
> Still, I'd rather get rid of ms_getline_hack if I could, because the code is
> so much more complicated.

Which is another argument to prefer the getc_unlocked() code when it
works -- it's obviously correct. :-)

> >> Both methods lack a refinement I would like to see, but can't
> >> achieve in "the Windows way":  ensure that consistency is on no
> >> worse than a per-line basis.  [Example omitted]
> 
> > The only portable way to ensure this that I can see, is to have a
> > separate mutex in the Python file object.  Since this is hardly a
> > common thing to do, I think it's better to let the application manage
> > that lock if they need it.
> 
> Well, it would be easy to fiddle the HAVE_GETC_UNLOCKED method to keep the
> file locked until the line was complete, and I wouldn't be opposed to making
> life saner on platforms that allow it.

Hm...  That would be possible, except for one unfortunate detail:
_PyString_Resize() may call PyErr_BadInternalCall() which touches
thread state.

> But there's another problem here:
> part of the reason we release Python threads around the fgets is in case
> some other thread is trying to write the data we're trying to read, yes?

NO, NO NO!  Mixing reads and writes on the same stream wasn't what we
are locking against at all.  (As you've found out, it doesn't even
work.)  We're only trying to protect against concurrent *reads*.

> But since FLOCKFILE is in effect, other threads *trying* to write to the
> stream we're reading will get blocked anyway.  Seems to give us potential
> for deadlocks.

Only if tyeh are holding other locks at the same time.  I haven't done
a thorough survey of fileobject.c, but I've skimmed it, I believe it's
religious about releasing the Global Interpreter Lock around I/O
calls.  But, of course, 3rd party C code might not be.

> > (Then why are we bothering with flockfile(), you may ask?
> 
> I wouldn't ask that, no <wink>.
> 
> > Because otherwise, accidental multithreaded reading from the same
> > file could cause core dumps.)
> 
> Ugh ... turns out that on my box I can provoke core dumps anyway, with this
> program.  Blows up under released 2.0 and CVS Pythons (so it's not due to
> anything new):

Yeah.  But this is insane use -- see my comments on SF.  It's only
worth fixing because it could be used to intentionally crash Python --
but there are easier ways...

--Guido van Rossum (home page: http://www.python.org/~guido/)