[Tutor] just what does read() return?

Fri Oct 1 03:54:33 CEST 2010

On 9/30/10, Steven D'Aprano <steve at pearwood.info> wrote:
> On Fri, 1 Oct 2010 08:32:40 am Alex Hall wrote:
>
>> I fully expected to see txt be an array of strings since I figured
>> self.original would have been split on one or more new lines. It
>> turns out, though, that I get this instead:
>> ['l\nvx vy z\nvx vy z']
>
> There's no need to call str() on something that already is a string.
> Admittedly it doesn't do much harm, but it is confusing for the person
> reading, who may be fooled into thinking that perhaps the argument
> wasn't a string in the first place.
Agreed. I was having some (unrelated) trouble and was desperate enough
to start forcing things to the data type I needed, just in case.
>
> The string split method doesn't interpret its argument as a regular
> expression. r'\n+' has no special meaning here. It's just three literal
> characters backslash, the letter n, and the plus sign. split() tries to
> split on that substring, and since your data doesn't include that
> combination anywhere, returns a list containing a single item:
>
>>>> "abcde".split("ZZZ")
> ['abcde']
Yes, that makes sense.
>
>> How is it that txt is not an array of the lines in the file, but
>> instead still holds \n characters? I thought the manual said read()
>> returns a string:
>
> It does return a string. It is a string including the newline
> characters.
>
>
> [...]
>> I know I can use f.readline(), and I was doing that before and it all
>> worked fine. However, I saw that I was reading the file twice and, in
>> the interest of good practice if I ever have this sort of project
>> with a huge file, I thought I would try to be more efficient and read
>> it once.
>
> You think that keeping a huge file in memory *all the time* is more
> efficient?
Ah, I see what you mean now. I work with the data later, so you are
saying that it would be better to just read the file as necessary,
then then, when I need the file's data later, just read it again.
> It's the other way around -- when dealing with *small* files
> you can afford to keep it in memory. When dealing with huge files, you
> need to re-write your program to deal with the file a piece at a time.
> (This is often a good strategy for small files as well, but it is
> essential for huge ones.)
>
> Of course, "small" and "huge" is relative to the technology of the day.
> I remember when 1MB was huge. These days, huge would mean gigabytes.
> Small would be anything under a few tens of megabytes.
>
>
> --
> Steven D'Aprano
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>


-- 
Have a great day,
Alex (msg sent from GMail website)
mehgcap at gmail.com; http://www.facebook.com/mehgcap