[Numpy-discussion] loadtxt broken if file does not end in newline
Christopher Barker
Chris.Barker at noaa.gov
Wed Feb 27 12:41:57 EST 2008
David Huard wrote:
> Would everyone be satisfied with a solution using regular expressions ?
Maybe it's because regular expressions make me itch, but I think it's
overkill for this.
The issue here is a result of what I consider a wart in python's string
methods -- string.find() returns a valid index( -1 ) when it fails to
find anything. The usual way to work with this is to test for it:
print "test for comment not found:"
for line in SampleLines:
i = line.find(comments)
if i == -1:
line = line.strip()
else:
line = line[:i].strip()
print line
which does seem like a lot of extra code.
In this case, that wasn't' done, as most of the time there is a newline
at the end that can be thrown away anyway, so the -1 index is OK. So
that inspired the following solution -- just add an extra space every time:
print "simply pad the line with a space:"
for line in SampleLines:
line += " "
line = line[:(line).find(comments)].strip()
print line
an extra string creation, but simple.
> pattern = re.compile(r"""
> ^\s* # leading white space
> (.*) # Data
> %s? # Zero or one comment character
> (.*) # Comments
> \s*$ # Trailing white space
> """%comments, re.VERBOSE)
This pattern fails if the last character of the line is a comment
character, and if it is a comment only line, though I'm sure that could
be fixed. I still prefer the python string methods approaches, though.
I've enclosed a little test code, that gives these results:
old way -- this fails with no comment of newline
1 2 3 4 5
1 2 3 4
1 2 3 4 5
with regular expression:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5#
# 1 2 3 4 5
simply pad the line with a space:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
test for comment not found:
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
My suggestions work on all my test cases. We really should put these,
and others, into a real unit test when this fix is added.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Test-loadtxt.py
Type: text/x-python
Size: 1365 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080227/ee8445b9/attachment.py>
More information about the NumPy-Discussion
mailing list