[ python-Bugs-1744752 ] Newline skipped in "for line in file"

SourceForge.net noreply at sourceforge.net
Wed Jul 4 07:16:00 CEST 2007


Bugs item #1744752, was opened at 2007-06-28 04:23
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1744752&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rune Devik (runedevik)
Assigned to: Nobody/Anonymous (nobody)
Summary: Newline skipped in "for line in file"

Initial Comment:
Creating new ticket for the bug described here since it was closed (and I was not able to reopen it): http://sourceforge.net/tracker/index.php?func=detail&aid=1636950&group_id=5470&atid=105470

The problem is that when you open a hughe file on windows with the "r" mode it will sometimes merge two lines. As I said in the ticket above (but probably ignored since I updated a closed ticket):

Hi

I have the same problem with a huge file (8GB) containing long lines. Sometimes two lines are merged into one and rerunning the test script that reads the file it's always the same lines that are merged. Also the merging happens more frequently towards the end of the file it seems. I tried to reproduce with a smaller data set (10 lines before the two lines that get merged, the two lines that gets merged and the 10 lines after that) but I was not able to reproduce on this smaller data set. However if you open this huge file in "rb" mode instead of "r" mode everything works as it should and no lines are merged at all! If I copy the file over to linux and rerun the test script no lines are merged (regardless if mode is "r" or "rb") so this is windows specific and might have something todo with the adding of \r\n if only \n is found when you open the file in "r" mode maybe? Also I have reproduced it on both python 2.3.5 and 2.5c1 on both windows XP and windows 2003. 

More stats on the input file in both "r" mode and "rb" mode below:

Input file size: 8 695 828 KB

fp = open(file, "r"):
  - total number of lines read:  668909
  - length of the longest line:  13179792
  - length of the shortest line: 89
  - 56 lines contains the content of two lines
  - Always just two lines that are merged into one! 
  - Always the same lines that are merged rerunning the test on the same file. 

open(file, "rb"):
  - total number of lines read:  668965
  - length of the longest line:  13179793
  - length of the shortest line: 90
  - no lines merged

Regards,
Rune Devik

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-07-03 22:16

Message:
Logged In: YES 
user_id=33168
Originator: NO

Without a reproducible test case, there's really nothing we can do.  You
will need to debug this on your own.  Try setting a breakpoint in the
debugger in the file object, probably in get_line().  If you can make a
self contained test case, then we can help.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1744752&group_id=5470


More information about the Python-bugs-list mailing list