[AstroPy] "ASCII" tables that contain non-ASCII characters

Benjamin Alan Weaver baweaver at lbl.gov
Thu Nov 10 12:59:31 EST 2016


Hello y'all,

I just wanted to follow up on this.  Did the problem that Stephen
reported with Python 3 get added to any existing bug reports, or a new
bug report?

Kia ora koutou,
Benjamin Alan Weaver

On 10/24/2016 09:38 PM, Stephen Bailey wrote:
> Thanks for the suggestions.  The original problem also applies to python
> 3.5 though — this isn't just a python 2.7 thing.  If LANG isn't set, the
> ascii table readers can break even with python 3.5 and even if the
> non-ascii character is in a comment field.  e.g. the following table
> can't be read with format='ascii.basic' unless $LANG is set or one of
> locale tricks from Thomas or Derek is used:
> 
> # Some comment
> # Å or not?
> # Another comment
> x y
> 1 2
> 3 4
> 5 6
> 
> Also, for the record: apparently Mac OSX needs 'en_US.UTF-8' and not
> 'en_US.utf8'; flavors of Linux will accept either.
> 
> Stephen
> 
> In [*9*]: !cat blat.csv
> 
> # Some comment
> 
> # Å or not?
> 
> # Another comment
> 
> x y
> 
> 1 2
> 
> 3 4
> 
> 5 6
> 
> 
> In [*10*]: sys.version
> 
> Out[*10*]: '3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016,
> 17:52:12) \n[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]'
> 
> 
> In [*11*]: print(os.getenv('LANG'))
> 
> None
> 
> 
> In [*12*]: t = Table.read('blat.csv', format='ascii.basic')
> 
> ---------------------------------------------------------------------------
> 
> UnicodeDecodeError                        Traceback (most recent call last)
> 
> <ipython-input-12-4c7389d51f8f>in <module>()
> 
> ----> 1t =Table.read('blat.csv',format='ascii.basic')
> 
> 
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/table/table.pyin
> read(cls, *args, **kwargs)
> 
> *   2330*        passed through to the underlying data reader
> (e.g.`~astropy.io.ascii.read`).
> 
> *   2331*        """
> 
> -> 2332        returnio_registry.read(cls,*args,**kwargs)
> 
> *   2333* 
> 
> *   2334*    defwrite(self,*args,**kwargs):
> 
> 
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/registry.pyin
> read(cls, *args, **kwargs)
> 
> *    349* 
> 
> *    350*        reader =get_reader(format,cls)
> 
> --> 351        data =reader(*args,**kwargs)
> 
> *    352* 
> 
> *    353*        ifnotisinstance(data,cls):
> 
> 
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/ascii/connect.pyin
> io_read(format, filename, **kwargs)
> 
> *     35*    from.ui importread
> 
> *     36*    format =re.sub(r'^ascii\.','',format)
> 
> ---> 37    returnread(filename,format=format,**kwargs)
> 
> *     38* 
> 
> *     39* 
> 
> 
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/ascii/ui.pyin
> read(table, guess, **kwargs)
> 
> *    287*            try:
> 
> *    288*                withget_readable_fileobj(table)asfileobj:
> 
> --> 289                    table =fileobj.read()
> 
> *    290*            exceptValueError:  # unreadable or invalid binary file
> 
> *    291*                raise
> 
> 
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/encodings/ascii.pyin
> decode(self, input, final)
> 
> *     24*classIncrementalDecoder(codecs.IncrementalDecoder):
> 
> *     25*    defdecode(self,input,final=False):
> 
> ---> 26        returncodecs.ascii_decode(input,self.errors)[0]
> 
> *     27* 
> 
> *     28*classStreamWriter(Codec,codecs.StreamWriter):
> 
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 17:
> ordinal not in range(128)
> 
> 
> In [*13*]: *with*set_locale('en_US.UTF-8'):
> 
>     ...:     t = Table.read('blat.csv', format='ascii.basic')
> 
> 
> In [*14*]: t
> 
> Out[*14*]: 
> 
> <Table length=3>
> 
>   x     y  
> 
> int64 int64
> 
> ----- -----
> 
>     1     2
> 
>     3     4
> 
>     5     6
> 
> 
> 
> On Mon, Oct 24, 2016 at 6:53 PM, Aldcroft, Thomas
> <aldcroft at head.cfa.harvard.edu <mailto:aldcroft at head.cfa.harvard.edu>>
> wrote:
> 
> 
> 
>     On Mon, Oct 24, 2016 at 6:11 PM, Derek Homeier
>     <derek at astro.physik.uni-goettingen.de
>     <mailto:derek at astro.physik.uni-goettingen.de>> wrote:
> 
>         On 25 Oct 2016, at 12:01 am, Nathan Goldbaum
>         <nathan12343 at gmail.com <mailto:nathan12343 at gmail.com>> wrote:
>         >
>         > I believe this is issue 2923:
>         >
>         > https://github.com/astropy/astropy/issues/2923
>         <https://github.com/astropy/astropy/issues/2923>
>         >
>         > On Mon, Oct 24, 2016 at 4:45 PM, Benjamin Alan Weaver <baweaver at lbl.gov <mailto:baweaver at lbl.gov>> wrote:
>         > Hello y'all,
>         >
>         > We are trying to read "ASCII" tables containing atomic line data
>         > provided by NIST.  When you request the line wavelength data in
>         > angstroms, NIST very helpfully labels the columns with the angstrom
>         > symbol (Å), which is not strictly part of the ASCII character set.
>         >
>         > We can read these tables with Table.read() *and* the environment
>         > variable LANG=en_US.utf-8 set.  However, if LANG is not set,
>         > Table.read() fails to decode these files.
>         >
>         > As far as I can tell the underlying read() function in astropy.io.ascii
>         > does not accept keywords related to the file encoding.
>         >
>         > So two questions:
>         >
>         > 1. Is the lack of an encoding keyword a bug that should be reported?
>         >
>         > 2. Is there a workaround that does not rely on LANG being set?
> 
>         A workaround that would at least get you away without
>         manipulating the
>         environment outside Python would be
> 
>         import locale
>         locale.setlocale(locale.LC_ALL, str(‘en_US.utf8’))
> 
> 
>     You can make this a little cleaner using the set_locale context
>     manager in astropy:
> 
>     from astropy.utils.misc import set_locale
>     with set_locale('en_US.utf8'):
>         dat = Table.read(...)
> 
>     As to the original question of whether this should be reported as a
>     bug, it has already been discussed in:
> 
>      https://github.com/astropy/astropy/issues/3826
>     <https://github.com/astropy/astropy/issues/3826>
> 
>     That discussion ended without any really clear consensus except that
>     using Python 3 is a good thing if that is an option.  I have never
>     seriously evaluated how difficult it would be to implement support
>     for unicode inputs for Python 2.  A basic recipe is shown in the
>     stdlib csv package documentation, but I don't know how messy a fully
>     working implementation would get.
> 
>     Cheers,
>     Tom A
>      
> 
> 
>         Cheers,
>                                                 Derek
> 
>         _______________________________________________
>         AstroPy mailing list
>         AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>         https://mail.scipy.org/mailman/listinfo/astropy
>         <https://mail.scipy.org/mailman/listinfo/astropy>
> 
> 
> 
>     _______________________________________________
>     AstroPy mailing list
>     AstroPy at scipy.org <mailto:AstroPy at scipy.org>
>     https://mail.scipy.org/mailman/listinfo/astropy
>     <https://mail.scipy.org/mailman/listinfo/astropy>
> 
> 
> 
> 
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> https://mail.scipy.org/mailman/listinfo/astropy
> 

-- 
Nothing brightens up my morning.  Coffee simply provides a shade of gray
just above the pitch-black of the infinite depths of the abyss.
  --Sid Dabster, userfriendly.org



More information about the AstroPy mailing list