[AstroPy] table string length truncated after reading table
Josh Walawender
jmwalawender at gmail.com
Mon Aug 19 22:53:29 EDT 2013
Hi Tom, Matt,
Thanks for the tips. The converters worked, I should have realized they would be the way to do it.
thanks!
Josh
On Aug 16, 2013, at 2:05 AM, "Aldcroft, Thomas" <aldcroft at head.cfa.harvard.edu> wrote:
> (Sent this last night but accidentally just to Josh). - Tom
>
>
> Hi Josh,
>
> The problem is that when you read the file back with ascii.read() it
> has no memory of the original 'S50' data type, so it just makes the
> string type as long as the longest one in the ASCII table file. Then
> when you add a row it is not able to fit the new longer string into
> the existing column so it just truncates. (This is the behavior of
> the underlying numpy array which is used by Table).
>
> Fortunately there is a way to force the data type when reading with
> ascii.read() with the converters argument. In your example below you
> would do the read with:
>
>>>> secondTable = ascii.read("mytable.txt", converters={'col1': [ascii.convert_numpy('S50')], 'col2': [ascii.convert_numpy('S50')]})
>
> The idea here is that you are forcing the column to be converted from
> a Python list to a numpy array with a dtype of 'S50' instead of the
> normal default of guessing float, int, str.
>
> See http://astropy.readthedocs.org/en/latest/io/ascii/read.html#converters
> for the docs.
>
> - Tom
>
> On Thu, Aug 15, 2013 at 10:51 PM, Matthew Craig <mcraig at mnstate.edu> wrote:
>> I ran into this once upon a time too (in atPy).
>>
>> The type for the columns when you read from a table is being guessed at by
>> ascii; in the absence of any other guidance it assumes the dtype of each
>> column is the length of the longest string it finds in the column.
>>
>> Doesn't look like there is a way to specify type in ascii.read (though I
>> just did a quick skim of the docs), but this would work:
>>
>> ```
>> In [35]: thirdTable = table.Table(np.array(secondTable), names=('col1',
>> 'col2'), dtypes=('S50', 'S50'))
>>
>> In [36]: thirdTable.add_row(('abcdefghijklmnopqrst', 'longer_string'))
>>
>> In [37]: print thirdTable
>> col1 col2
>> -------------------- -------------
>> abcdefghijk short_string
>> abcdefghijklmnop long_string
>> abcdefghijklmnopqrst longer_string
>> ```
>>
>> Matt Craig
>> PS Would have been happy to answer on astrobabel but waiting for my
>> membership to be approved :)
>>
>>
>> Office hours/schedule at: http://physics.mnstate.edu/craig
>> ----
>> Professor
>> Department of Physics and Astronomy
>> Minnesota State University Moorhead
>> 1104 7th Ave S, Moorhead MN 56563
>>
>> phone: (218) 477-2439
>> fax: (218) 477-2290
>>
>> On Aug 15, 2013, at 8:45 PM, Josh Walawender <jmwalawender at gmail.com> wrote:
>>
>> Hi all,
>>
>> I'm having a problem working with astropy.table and astropy.io.ascii and I
>> can't tell if the behavior I'm encountering is a feature or a bug. I'm
>> hoping someone can guide me to a good solution. Here's the situation:
>>
>> I have code which loops though a series of input data files, does analysis,
>> stores the results in an astropy.table, and writes the table to a text file
>> using astropy.io.ascii. One of the fields in the row is the input filename.
>> As the code loops through the input files, it reads the previous table as
>> output by io.ascii, appends a row to the table object, then overwrites the
>> old file with a new one based on the new table which contains the new row.
>>
>> The symptom is that the all subsequent times through, the length of all
>> strings written to the file name field are now truncated to whatever length
>> the first file name was.
>>
>> Here's a quick test case (copied and pasted from iPython) demonstrating the
>> problem:
>>
>> ```
>> In [1]: import astropy.table as table
>>
>> In [2]: import astropy.io.ascii as ascii
>>
>> In [4]: firstTable = table.Table(names=('col1', 'col2'), dtypes=('S50',
>> 'S50'))
>>
>> In [5]: firstTable.add_row(('abcdefghijk', 'short_string'))
>>
>> In [6]: firstTable.add_row(('abcdefghijklmnop', 'long_string'))
>>
>> In [7]: print(firstTable)
>> col1 col2
>> ---------------- ------------
>> abcdefghijk short_string
>> abcdefghijklmnop long_string
>>
>> In [8]: ascii.write(firstTable, "mytable.txt")
>>
>> In [9]: secondTable = ascii.read("mytable.txt")
>>
>> In [10]: print(secondTable)
>> col1 col2
>> ---------------- ------------
>> abcdefghijk short_string
>> abcdefghijklmnop long_string
>>
>> In [11]: secondTable.add_row(('abcdefghijklmnopqrst', 'longer_string'))
>>
>> In [12]: print(secondTable)
>> col1 col2
>> ---------------- ------------
>> abcdefghijk short_string
>> abcdefghijklmnop long_string
>> abcdefghijklmnop longer_strin
>> ```
>>
>> Any suggestions on how to avoid this behavior?
>>
>> thanks!
>> Josh
>>
>> P.S. Based on the recent discussion on astropy-dev about where to get help,
>> I've also posted this on astrobabel:
>> http://www.astrobabel.com/v/discussion/77/astropy-question-table-string-length-truncated-after-reading-table#Item_1
>>
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>>
>>
>>
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>>
More information about the AstroPy
mailing list