[Numpy-discussion] loadtxt broken if file does not end in newline

David Huard david.huard at gmail.com
Wed Feb 27 13:22:55 EST 2008


Hi Christopher,

The advantage of using regular expressions is that in this case it gives you
some flexibility that wasn't there before. For instance, if for any reason
there are two type of characters that coexist in the file to mark comments,
using

pattern = re.compile(comments)
for i,line in enumerate(fh):
     if i<skiprows: continue
     line = pattern.split(line)[0]

can take care of that automatically if comments is a regular expression.

Cheers,

David






2008/2/27, Christopher Barker <Chris.Barker at noaa.gov>:
>
> David Huard wrote:
> > Would everyone be satisfied with a solution using regular expressions ?
>
>
> Maybe it's because regular expressions make me itch, but I think it's
> overkill for this.
>
> The issue here is a result of what I consider a wart in python's string
> methods -- string.find() returns a valid index( -1 ) when it fails to
> find anything. The usual way to work with this is to test for it:
>
> print "test for comment not found:"
> for line in SampleLines:
>      i = line.find(comments)
>      if i == -1:
>          line = line.strip()
>      else:
>          line = line[:i].strip()
>      print line
>
> which does seem like a lot of extra code.
>
> In this case, that wasn't' done, as most of the time there is a newline
> at the end that can be thrown away anyway, so the -1 index is OK. So
> that inspired the following solution -- just add an extra space every
> time:
>
> print "simply pad the line with a space:"
> for line in SampleLines:
>      line += " "
>
>      line = line[:(line).find(comments)].strip()
>
>      print line
>
> an extra string creation, but simple.
>
>
> > pattern = re.compile(r"""
> >     ^\s* # leading white space
> >     (.*) # Data
> >     %s?  # Zero or one comment character
> >     (.*) # Comments
> >     \s*$ # Trailing white space
> >     """%comments, re.VERBOSE)
>
>
> This pattern fails if the last character of the line is a comment
> character, and if it is a comment only line, though I'm sure that could
> be fixed. I still prefer the python string methods approaches, though.
>
> I've enclosed a little test code, that gives these results:
>
> old way -- this fails with no comment of newline
> 1 2 3 4 5
> 1 2 3 4
> 1 2 3 4 5
>
> with regular expression:
> 1 2 3 4 5
> 1 2 3 4 5
> 1 2 3 4 5#
> # 1 2 3 4 5
> simply pad the line with a space:
> 1 2 3 4 5
> 1 2 3 4 5
> 1 2 3 4 5
>
> test for comment not found:
> 1 2 3 4 5
> 1 2 3 4 5
> 1 2 3 4 5
>
> My suggestions work on all my test cases. We really should put these,
> and others, into a real unit test when this fix is added.
>
> -Chris
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080227/e9561f87/attachment.html>


More information about the NumPy-Discussion mailing list