[Numpy-discussion] loadtxt broken if file does not end in newline

Christopher Barker Chris.Barker at noaa.gov
Wed Feb 27 12:41:57 EST 2008


David Huard wrote:
> Would everyone be satisfied with a solution using regular expressions ?

Maybe it's because regular expressions make me itch, but I think it's 
overkill for this.

The issue here is a result of what I consider a wart in python's string 
methods -- string.find() returns a valid index( -1 ) when it fails to 
find anything. The usual way to work with this is to test for it:

print "test for comment not found:"
for line in SampleLines:
     i = line.find(comments)
     if i == -1:
         line = line.strip()
     else:
         line = line[:i].strip()
     print line

which does seem like a lot of extra code.

In this case, that wasn't' done, as most of the time there is a newline 
at the end that can be thrown away anyway, so the -1 index is OK. So 
that inspired the following solution -- just add an extra space every time:

print "simply pad the line with a space:"
for line in SampleLines:
     line += " "
     line = line[:(line).find(comments)].strip()
     print line

an extra string creation, but simple.

> pattern = re.compile(r"""
>     ^\s* # leading white space
>     (.*) # Data
>     %s?  # Zero or one comment character
>     (.*) # Comments
>     \s*$ # Trailing white space
>     """%comments, re.VERBOSE)

This pattern fails if the last character of the line is a comment 
character, and if it is a comment only line, though I'm sure that could 
be fixed. I still prefer the python string methods approaches, though.

I've enclosed a little test code, that gives these results:

old way -- this fails with no comment of newline
1 2 3 4 5
1 2 3 4
1 2 3 4 5

with regular expression:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5#
# 1 2 3 4 5
simply pad the line with a space:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

test for comment not found:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5

My suggestions work on all my test cases. We really should put these, 
and others, into a real unit test when this fix is added.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test-loadtxt.py
Type: text/x-python
Size: 1365 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080227/ee8445b9/attachment.py>


More information about the NumPy-Discussion mailing list