[AstroPy] table string length truncated after reading table

Josh Walawender jmwalawender at gmail.com
Mon Aug 19 22:53:29 EDT 2013


Hi Tom, Matt,

Thanks for the tips.  The converters worked, I should have realized they would be the way to do it.

thanks!
Josh



On Aug 16, 2013, at 2:05 AM, "Aldcroft, Thomas" <aldcroft at head.cfa.harvard.edu> wrote:

> (Sent this last night but accidentally just to Josh).  - Tom
> 
> 
> Hi Josh,
> 
> The problem is that when you read the file back with ascii.read() it
> has no memory of the original 'S50' data type, so it just makes the
> string type as long as the longest one in the ASCII table file.  Then
> when you add a row it is not able to fit the new longer string into
> the existing column so it just truncates.  (This is the behavior of
> the underlying numpy array which is used by Table).
> 
> Fortunately there is a way to force the data type when reading with
> ascii.read() with the converters argument.  In your example below you
> would do the read with:
> 
>>>> secondTable = ascii.read("mytable.txt", converters={'col1': [ascii.convert_numpy('S50')], 'col2': [ascii.convert_numpy('S50')]})
> 
> The idea here is that you are forcing the column to be converted from
> a Python list to a numpy array with a dtype of 'S50' instead of the
> normal default of guessing float, int, str.
> 
> See http://astropy.readthedocs.org/en/latest/io/ascii/read.html#converters
> for the docs.
> 
> - Tom
> 
> On Thu, Aug 15, 2013 at 10:51 PM, Matthew Craig <mcraig at mnstate.edu> wrote:
>> I ran into this once upon a time too (in atPy).
>> 
>> The type for the columns when you read from a table is being guessed at by
>> ascii; in the absence of any other guidance it assumes the dtype of each
>> column is the length of the longest string it finds in the column.
>> 
>> Doesn't look like there is a way to specify type in ascii.read (though I
>> just did a quick skim of the docs), but this would work:
>> 
>> ```
>> In [35]: thirdTable = table.Table(np.array(secondTable), names=('col1',
>> 'col2'), dtypes=('S50', 'S50'))
>> 
>> In [36]: thirdTable.add_row(('abcdefghijklmnopqrst', 'longer_string'))
>> 
>> In [37]: print thirdTable
>>        col1              col2
>> -------------------- -------------
>>         abcdefghijk  short_string
>>    abcdefghijklmnop   long_string
>> abcdefghijklmnopqrst longer_string
>> ```
>> 
>> Matt Craig
>> PS Would have been happy to answer on astrobabel but waiting for my
>> membership to be approved :)
>> 
>> 
>> Office hours/schedule at: http://physics.mnstate.edu/craig
>> ----
>> Professor
>> Department of Physics and Astronomy
>> Minnesota State University Moorhead
>> 1104 7th Ave S, Moorhead MN 56563
>> 
>> phone: (218) 477-2439
>> fax: (218) 477-2290
>> 
>> On Aug 15, 2013, at 8:45 PM, Josh Walawender <jmwalawender at gmail.com> wrote:
>> 
>> Hi all,
>> 
>> I'm having a problem working with astropy.table and astropy.io.ascii and I
>> can't tell if the behavior I'm encountering is a feature or a bug. I'm
>> hoping someone can guide me to a good solution. Here's the situation:
>> 
>> I have code which loops though a series of input data files, does analysis,
>> stores the results in an astropy.table, and writes the table to a text file
>> using astropy.io.ascii. One of the fields in the row is the input filename.
>> As the code loops through the input files, it reads the previous table as
>> output by io.ascii, appends a row to the table object, then overwrites the
>> old file with a new one based on the new table which contains the new row.
>> 
>> The symptom is that the all subsequent times through, the length of all
>> strings written to the file name field are now truncated to whatever length
>> the first file name was.
>> 
>> Here's a quick test case (copied and pasted from iPython) demonstrating the
>> problem:
>> 
>> ```
>> In [1]: import astropy.table as table
>> 
>> In [2]: import astropy.io.ascii as ascii
>> 
>> In [4]: firstTable = table.Table(names=('col1', 'col2'), dtypes=('S50',
>> 'S50'))
>> 
>> In [5]: firstTable.add_row(('abcdefghijk', 'short_string'))
>> 
>> In [6]: firstTable.add_row(('abcdefghijklmnop', 'long_string'))
>> 
>> In [7]: print(firstTable)
>>      col1           col2
>> ---------------- ------------
>>     abcdefghijk short_string
>> abcdefghijklmnop  long_string
>> 
>> In [8]: ascii.write(firstTable, "mytable.txt")
>> 
>> In [9]: secondTable = ascii.read("mytable.txt")
>> 
>> In [10]: print(secondTable)
>>      col1           col2
>> ---------------- ------------
>>     abcdefghijk short_string
>> abcdefghijklmnop  long_string
>> 
>> In [11]: secondTable.add_row(('abcdefghijklmnopqrst', 'longer_string'))
>> 
>> In [12]: print(secondTable)
>>      col1           col2
>> ---------------- ------------
>>     abcdefghijk short_string
>> abcdefghijklmnop  long_string
>> abcdefghijklmnop longer_strin
>> ```
>> 
>> Any suggestions on how to avoid this behavior?
>> 
>> thanks!
>> Josh
>> 
>> P.S.  Based on the recent discussion on astropy-dev about where to get help,
>> I've also posted this on astrobabel:
>> http://www.astrobabel.com/v/discussion/77/astropy-question-table-string-length-truncated-after-reading-table#Item_1
>> 
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>> 
>> 
>> 
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>> 




More information about the AstroPy mailing list