[Tutor] Problems understanding semantics of readlines()

Rogério Brito linuxconsul@yahoo.com.br
Wed, 10 Apr 2002 06:09:36 -0300


	First of all, Paul, thank you very much for your answer.

On Apr 09 2002, Paul Sidorsky wrote:
> No, the parameter is called sizeHINT (my emphasis) because it's just a
> suggestion.  Results should be expected to vary by platform and
> implementation, as you already discovered.  From the library reference:

	Yes, I read the library reference before posting here. I tried
	to do my homework as well as I could (as I always try). :-)

> > If given an optional parameter sizehint, it reads that many bytes 
> > from the file and enough more to complete a line, and returns the 
> > lines from that. 
> 
> Thus you're guaranteed to get the entire first line no matter what
> size you ask for.

	No problems here.

> As for the second line, I suspect (though this is a bit of a guess)
> that there was enough room left over in the internal buffer to grab
> the entire second line because the first line was 128K+1 bytes and
> the buffer was probably a multiple of 512 or 1K bytes.  So you got
> the second line for free, whether you wanted it or not.

	Well, I'd prefer if the semantics of the functions were more
	clearly defined. In my understanding, the function is supposed
	to work in the way that Jython does it, not in the way that
	CPython does.

	Anyway, I tried the following program and I still see the b's
	appearing in my list:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#!/usr/bin/python

f = open("/tmp/file.txt", "w+")
f.write("a"*128*1024+"\n"+"b"*1024*1024) # notice the lack of \n after b's
f.seek(0)
print f.readlines(10)
f.close()
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

> I think readlines(sizehint) has been semi-deprecated by xreadlines(). 

	I don't know if it is that much relevant here, since
	readlines() is still available.

> (Perhaps the tutorial should be updated?)

	I didn't see any mentions of it being buggy or deprecated,
	though, besides, according to the documentation of
	xreadlines(), readlines() shouldn't be deprecated so soon.

> You can use f.xreadlines() to iterate over a file on a guaranteed
> per-line basis with the same efficient memory usage.

	My idea wasn't exactly to iterate over a file. :-) I am just a
	newbie (in Python, but not in Programming) trying the examples
	of the tutorial. :-)

> Actually, according to the docs, xreadlines() uses
> readlines(sizehint) to do its thing, but it's cleaner and more
> predictable.

	I can see why it would be "cleaner" (relatively), but I don't
	see why why it would be more predictable. Could you elaborate
	on the predictability? I'd like to keep myself aside from the
	unpredictable parts of the language, with the exception of
	(pseudo)random number generators. :-)
	
	Anyway, I tried the following code and the behavior still
	included the b's:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#!/usr/bin/python

import xreadlines;

f = open("/tmp/file.txt", "w+")
f.write("a"*128*1024+"\n"+"b"*1024*1024) # notice the lack of \n after b's
f.seek(0)
lines = []
for l in xreadlines.xreadlines(f): lines.append(l)
print lines
f.close()
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

	Any help here?

> One last thing to note is that the second line won't have a \n at
> the end because one was not written to the end of the file.
> readline() & readlines() won't add one if one doesn't exist.

	Oh, yes. It was intentional that the last line wasn't
	terminated because I read in the tutorial that "Only complete
	lines will be returned". :-(

	I was testing that. I was a bit disappointed by the divergence
	of the documentation and of the observed behaviour. :-(

	Am I using the functions in illegal ways? If not, should the
	documentation be changed or should it be reported as a bug?
	Any help here is more than welcome.

> Hope that helps.  BTW I got the same results you did using WinME &
> Python 2.1.

	Well, I'd expect that, since (I'm guessing) the implementation
	of both is based on the CPython sources, right? (I'm not sure
	what I'm talking about in this paragraph).


	Thank you very much for your kind answer, Roger...

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  Rogério Brito - rbrito@iname.com - http://www.ime.usp.br/~rbrito/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=