[AstroPy] "ASCII" tables that contain non-ASCII characters
Benjamin Alan Weaver
baweaver at lbl.gov
Thu Nov 10 12:59:31 EST 2016
Hello y'all,
I just wanted to follow up on this. Did the problem that Stephen
reported with Python 3 get added to any existing bug reports, or a new
bug report?
Kia ora koutou,
Benjamin Alan Weaver
On 10/24/2016 09:38 PM, Stephen Bailey wrote:
> Thanks for the suggestions. The original problem also applies to python
> 3.5 though — this isn't just a python 2.7 thing. If LANG isn't set, the
> ascii table readers can break even with python 3.5 and even if the
> non-ascii character is in a comment field. e.g. the following table
> can't be read with format='ascii.basic' unless $LANG is set or one of
> locale tricks from Thomas or Derek is used:
>
> # Some comment
> # Å or not?
> # Another comment
> x y
> 1 2
> 3 4
> 5 6
>
> Also, for the record: apparently Mac OSX needs 'en_US.UTF-8' and not
> 'en_US.utf8'; flavors of Linux will accept either.
>
> Stephen
>
> In [*9*]: !cat blat.csv
>
> # Some comment
>
> # Å or not?
>
> # Another comment
>
> x y
>
> 1 2
>
> 3 4
>
> 5 6
>
>
> In [*10*]: sys.version
>
> Out[*10*]: '3.5.2 |Continuum Analytics, Inc.| (default, Jul 2 2016,
> 17:52:12) \n[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]'
>
>
> In [*11*]: print(os.getenv('LANG'))
>
> None
>
>
> In [*12*]: t = Table.read('blat.csv', format='ascii.basic')
>
> ---------------------------------------------------------------------------
>
> UnicodeDecodeError Traceback (most recent call last)
>
> <ipython-input-12-4c7389d51f8f>in <module>()
>
> ----> 1t =Table.read('blat.csv',format='ascii.basic')
>
>
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/table/table.pyin
> read(cls, *args, **kwargs)
>
> * 2330* passed through to the underlying data reader
> (e.g.`~astropy.io.ascii.read`).
>
> * 2331* """
>
> -> 2332 returnio_registry.read(cls,*args,**kwargs)
>
> * 2333*
>
> * 2334* defwrite(self,*args,**kwargs):
>
>
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/registry.pyin
> read(cls, *args, **kwargs)
>
> * 349*
>
> * 350* reader =get_reader(format,cls)
>
> --> 351 data =reader(*args,**kwargs)
>
> * 352*
>
> * 353* ifnotisinstance(data,cls):
>
>
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/ascii/connect.pyin
> io_read(format, filename, **kwargs)
>
> * 35* from.ui importread
>
> * 36* format =re.sub(r'^ascii\.','',format)
>
> ---> 37 returnread(filename,format=format,**kwargs)
>
> * 38*
>
> * 39*
>
>
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/ascii/ui.pyin
> read(table, guess, **kwargs)
>
> * 287* try:
>
> * 288* withget_readable_fileobj(table)asfileobj:
>
> --> 289 table =fileobj.read()
>
> * 290* exceptValueError: # unreadable or invalid binary file
>
> * 291* raise
>
>
> /Users/sbailey/anaconda/envs/desi/lib/python3.5/encodings/ascii.pyin
> decode(self, input, final)
>
> * 24*classIncrementalDecoder(codecs.IncrementalDecoder):
>
> * 25* defdecode(self,input,final=False):
>
> ---> 26 returncodecs.ascii_decode(input,self.errors)[0]
>
> * 27*
>
> * 28*classStreamWriter(Codec,codecs.StreamWriter):
>
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 17:
> ordinal not in range(128)
>
>
> In [*13*]: *with*set_locale('en_US.UTF-8'):
>
> ...: t = Table.read('blat.csv', format='ascii.basic')
>
>
> In [*14*]: t
>
> Out[*14*]:
>
> <Table length=3>
>
> x y
>
> int64 int64
>
> ----- -----
>
> 1 2
>
> 3 4
>
> 5 6
>
>
>
> On Mon, Oct 24, 2016 at 6:53 PM, Aldcroft, Thomas
> <aldcroft at head.cfa.harvard.edu <mailto:aldcroft at head.cfa.harvard.edu>>
> wrote:
>
>
>
> On Mon, Oct 24, 2016 at 6:11 PM, Derek Homeier
> <derek at astro.physik.uni-goettingen.de
> <mailto:derek at astro.physik.uni-goettingen.de>> wrote:
>
> On 25 Oct 2016, at 12:01 am, Nathan Goldbaum
> <nathan12343 at gmail.com <mailto:nathan12343 at gmail.com>> wrote:
> >
> > I believe this is issue 2923:
> >
> > https://github.com/astropy/astropy/issues/2923
> <https://github.com/astropy/astropy/issues/2923>
> >
> > On Mon, Oct 24, 2016 at 4:45 PM, Benjamin Alan Weaver <baweaver at lbl.gov <mailto:baweaver at lbl.gov>> wrote:
> > Hello y'all,
> >
> > We are trying to read "ASCII" tables containing atomic line data
> > provided by NIST. When you request the line wavelength data in
> > angstroms, NIST very helpfully labels the columns with the angstrom
> > symbol (Å), which is not strictly part of the ASCII character set.
> >
> > We can read these tables with Table.read() *and* the environment
> > variable LANG=en_US.utf-8 set. However, if LANG is not set,
> > Table.read() fails to decode these files.
> >
> > As far as I can tell the underlying read() function in astropy.io.ascii
> > does not accept keywords related to the file encoding.
> >
> > So two questions:
> >
> > 1. Is the lack of an encoding keyword a bug that should be reported?
> >
> > 2. Is there a workaround that does not rely on LANG being set?
>
> A workaround that would at least get you away without
> manipulating the
> environment outside Python would be
>
> import locale
> locale.setlocale(locale.LC_ALL, str(‘en_US.utf8’))
>
>
> You can make this a little cleaner using the set_locale context
> manager in astropy:
>
> from astropy.utils.misc import set_locale
> with set_locale('en_US.utf8'):
> dat = Table.read(...)
>
> As to the original question of whether this should be reported as a
> bug, it has already been discussed in:
>
> https://github.com/astropy/astropy/issues/3826
> <https://github.com/astropy/astropy/issues/3826>
>
> That discussion ended without any really clear consensus except that
> using Python 3 is a good thing if that is an option. I have never
> seriously evaluated how difficult it would be to implement support
> for unicode inputs for Python 2. A basic recipe is shown in the
> stdlib csv package documentation, but I don't know how messy a fully
> working implementation would get.
>
> Cheers,
> Tom A
>
>
>
> Cheers,
> Derek
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org <mailto:AstroPy at scipy.org>
> https://mail.scipy.org/mailman/listinfo/astropy
> <https://mail.scipy.org/mailman/listinfo/astropy>
>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org <mailto:AstroPy at scipy.org>
> https://mail.scipy.org/mailman/listinfo/astropy
> <https://mail.scipy.org/mailman/listinfo/astropy>
>
>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> https://mail.scipy.org/mailman/listinfo/astropy
>
--
Nothing brightens up my morning. Coffee simply provides a shade of gray
just above the pitch-black of the infinite depths of the abyss.
--Sid Dabster, userfriendly.org
More information about the AstroPy
mailing list