[AstroPy] saving tables as VOtables with missing values

Susana Sanchez Exposito sse at iaa.es
Thu Oct 16 05:49:42 EDT 2014


Hi again,
I tried to use None,which in python it is the best representation of a
missing value, and the result is the astropy.table.Table method considers
None as the string "None"

list = [["CIG1", 1.2, 3, None, 3], [None, 3, 2.0, 5, 2], ["CIG3", 2,
3,2.6,  float('nan')] ]
tab=astropy.table.Table(rows=list, names=["A", "B", "C", "D", "E"],
masked=True)
votable=astropy.io.votable.from_table(tab)
votable.to_xml("/home/sse/Desktop/test.xml")

And the VOTable:

<TABLE>
   <FIELD ID="A" arraysize="*" datatype="unicodeChar" name="A"/>
   <FIELD ID="B" datatype="double" name="B"/>
   <FIELD ID="C" datatype="double" name="C"/>
   <FIELD ID="D" arraysize="*" datatype="unicodeChar" name="D"/>
   <FIELD ID="E" datatype="double" name="E"/>
   <DATA>
    <TABLEDATA>
     <TR>
      <TD>CIG1</TD>
      <TD>1.2</TD>
      <TD>3</TD>
      <TD>None</TD>
      <TD>3</TD>
     </TR>
     <TR>
      <TD>None</TD>
      <TD>3</TD>
      <TD>2</TD>
      <TD>5</TD>
      <TD>2</TD>
     </TR>
     <TR>
      <TD>CIG3</TD>
      <TD>2</TD>
      <TD>3</TD>
      <TD>2.6</TD>
      <TD>NaN</TD>
     </TR>
    </TABLEDATA>



2014-10-16 11:39 GMT+02:00 Susana Sanchez Exposito <sse at iaa.es>:

>
>
> 2014-10-15 18:37 GMT+02:00 Michael Droettboom <mdroe at stsci.edu>:
>
>>  On 10/15/2014 09:20 AM, Susana Sanchez Exposito wrote:
>>
>>    Thanks for your answer Tom!
>>
>>  I have tested what you proposed, and I found two things:
>>
>>  1) If you replace the empty string by a value (e.g. np.nan) in order to
>> get a mask array, this value will remain in the resulting VOTable, and in
>> some cases it could be strange. I illustrate this with this example (note
>> the first column of the table is the object name):
>>
>> list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2, 3,2.6,
>> float('nan')], ]
>> to_nan=[]
>> for r in list:
>>     to_nan.append([numpy.nan if (str(x).upper()=="NAN" or x=="") else x
>> for x in r])
>>
>> tab=astropy.table.Table(rows=to_nan, names=["A", "B", "C", "D", "E"],
>> masked=True)
>> votable=astropy.io.votable.from_table(tab)
>> votable.to_xml("/home/sse/Desktop/test.xml")
>>
>> If you open the resulting VOtable (attached as test.xml) in Topcat you
>> will see the 'nan' string in the object name column, which is very strange.
>>
>>   The loop you provide above is putting the string “nan”, not the
>> floating-point value NaN in the column — since the column is a string
>> column, there’s really nothing else it can do. Signs point to using your
>> approach below, with an explicit mask, for this reason.
>>
> mmm, I am pretty sure the loop above is putting the float value NaN. This
> ipython output shows that:
>
>
> In [2]: list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2,
> 3,2.6,  float('nan')], ]
> In [3]: to_nan=[]
> In [4]: for r in list:
>    ...:         to_nan.append([numpy.nan if (str(x).upper()=="NAN" or
> x=="") else x for x in r])
>    ...:
> In [5]: print to_nan
> [['CIG1', 1.2, 3, nan, 3], [nan, 3, 2.0, 5, 2], ['CIG3', 2, 3, 2.6, nan]]
>
> Or maybe you mean other thing ?
>
>
>
>>
>>
>>  Ok. I can build the mask array in other ways, BUT:
>>
>>  2) Even providing a mask, the column types are misinterpret.  The
>> following example illustrates what I mean:
>>
>>
>> list = [["CIG1", 1.2, 3, "", 3], ["", 3, 2.0, 5, 2], ["CIG3", 2, 3,2.6,
>> float('nan')] ]
>>
>> mask_empty=[[False, False, False, True, False], [True, False, False,
>> False, False], [False, False, False,False,  True] ]
>> tab=astropy.table.Table(rows=list, names=["A", "B", "C", "D", "E"],
>> masked=True)
>> tab.mask= numpy.array (mask_empty)
>> votable=astropy.io.votable.from_table(tab)
>> votable.to_xml("/home/sse/Desktop/test.xml")
>>
>>  If you open the resulting VOtable (attached as test2.xml) you will see
>> that the 4th column is typed as char (<FIELD ID="D" arraysize="3"
>> datatype="char" name="D"/>), when the only string in the column is masked!
>>
>>  Maybe I am doing something wrong?
>>
>>   The determination of column types is independent of masking,
>> unfortunately, and happens on the line tab=astropy.table.Table(... above
>> before the table is even made aware of the mask. In general, the masked
>> values need to be of the column type. But since it is masked, it shouldn’t
>> matter what that actual value is.
>>
> Ok.This is what I was imaging.
>
> Thanks for your answer, Mike!
>
>
>> Mike
>>
>>
>>
>>
>> 2014-10-15 11:42 GMT+02:00 Thomas Robitaille <thomas.robitaille at gmail.com
>> >:
>>
>>> Hi Susana,
>>>
>>> Just to simplify your current workflow, what you are doing is equivalent
>>> to:
>>>
>>> tab = Table(rows=list, names=...)
>>> tab.write('test.xml', format='votable')
>>>
>>> so no need for the zip(*) call and the call to votable.from_table.
>>>
>>> Now in terms of the masked values, I think the easiest is actually to
>>> just give some integer value to the missing value and then set the mask,
>>> so:
>>>
>>> In [19]: t = Table(rows=list,
>>>                    names=["A", "B", "C", "D", "E"],
>>>                    masked=True)
>>>
>>> In [20]: t['D'].mask[0] = True
>>>
>>> In [21]: print(t)
>>>  A    B   C   D   E
>>> ---- --- --- --- ---
>>> CIG1 1.2 3.0  -- 3.0
>>> CIG1 3.0 2.0 5.0 2.0
>>> CIG3 2.0 3.0 2.6 nan
>>>
>>> and then write out to votable with:
>>>
>>> t.write('test.xml', format='votable')
>>>
>>> which should preserve the masks.
>>>
>>> If you chose to use a special value (e.g. np.nan) to indicate masked
>>> values, then you can do:
>>>
>>> t['D'].mask = np.isnan(t['D'].mask)
>>>
>>> Let me know if any of the above isn't clear, or if it doesn't solve your
>>> issue!
>>>
>>> Cheers,
>>> Tom
>>>
>>> Susana Sanchez Exposito wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I work on an interface where the user can view and edit tables and save
>>> > them as VOtables, and for that I use the Astropy library.
>>> >
>>> > I keep the data of the tables in python lists, so to save them as
>>> > VOtable I do this:
>>> >
>>> > list = [["CIG1", 1.2, 3, "", 3], ["CIG1", 3, 2.0, 5, 2], ["CIG3", 2,
>>> > 3,2.6,  float('nan')]]
>>> >
>>> > #transform the list of rows into list of columns
>>> > list_cols= zip(*list)
>>> >
>>> > tab=astropy.table.Table(list_cols, names=["A", "B", "C", "D", "E"])
>>> > votable=astropy.io.votable.from_table(tab)
>>> > votable.to_xml("/home/sse/Desktop/test.xml")
>>> >
>>> > The table.Table method interprets correctly  the type of each column
>>> > except for the 3rd column. This column contains a "missing value" or
>>> > maybe a value deleted by the user, so the whole columns is marked as
>>> > "string" type, when actually is float type.
>>> >
>>> > I could transform all empty strings into NaN but this will be strange
>>> > for those columns containing strings .
>>> >
>>> > I have tried to transform the python list into a masked array, but
>>> > without success: problems to mask empty strings.
>>> >
>>> > Maybe I should to find out the type of each columns, going over the
>>> > table and calculating the type of the majority of the column item, and
>>> > then pass this type array to the table.Table method in some way ??
>>> >
>>> > So before, to continue investigating, I would like to ask you for some
>>> > tips to solve the missing values problems with astropy, or maybe even
>>> if
>>> > there is a specific method for that.
>>> >
>>> > Thanks in advanced.
>>> >
>>> > Susana.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Susana Sánchez Expósito
>>> >
>>> > Instituto de Astrofísica de Andalucía   IAA (CSIC)
>>> > Camino Bajo de Huétor, 50. Granada E-18008
>>> > Tel:(+34) 958 121 311 / (+34) 958 230 618
>>> > Fax:(+34) 958 814 530
>>>  > e-mail: sse at iaa.es <mailto:sse at iaa.es>
>>> >
>>> > _______________________________________________
>>> > AstroPy mailing list
>>> > AstroPy at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/astropy
>>> _______________________________________________
>>> AstroPy mailing list
>>> AstroPy at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/astropy
>>>
>>
>>
>>
>> --
>> Susana Sánchez Expósito
>>
>> Instituto de Astrofísica de Andalucía   IAA (CSIC)
>> Camino Bajo de Huétor, 50. Granada E-18008
>> Tel:(+34) 958 121 311 / (+34) 958 230 618
>> Fax:(+34) 958 814 530
>> e-mail: sse at iaa.es
>>
>>
>> _______________________________________________
>> AstroPy mailing listAstroPy at scipy.orghttp://mail.scipy.org/mailman/listinfo/astropy
>>
>>>>
>> --
>> Michael Droettboom
>> Science Software Branch
>> Space Telescope Science Institute
>> http://www.droettboom.com
>>
>>
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>>
>>
>
>
> --
> Susana Sánchez Expósito
>
> Instituto de Astrofísica de Andalucía   IAA (CSIC)
> Camino Bajo de Huétor, 50. Granada E-18008
> Tel:(+34) 958 121 311 / (+34) 958 230 618
> Fax:(+34) 958 814 530
> e-mail: sse at iaa.es
>



-- 
Susana Sánchez Expósito

Instituto de Astrofísica de Andalucía   IAA (CSIC)
Camino Bajo de Huétor, 50. Granada E-18008
Tel:(+34) 958 121 311 / (+34) 958 230 618
Fax:(+34) 958 814 530
e-mail: sse at iaa.es
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20141016/7c0ec97d/attachment.html>


More information about the AstroPy mailing list