[AstroPy] "ASCII" tables that contain non-ASCII characters

Stephen Bailey stephenbailey at lbl.gov
Tue Oct 25 00:38:06 EDT 2016


Thanks for the suggestions.  The original problem also applies to python
3.5 though — this isn't just a python 2.7 thing.  If LANG isn't set, the
ascii table readers can break even with python 3.5 and even if the
non-ascii character is in a comment field.  e.g. the following table can't
be read with format='ascii.basic' unless $LANG is set or one of locale
tricks from Thomas or Derek is used:

# Some comment
# Å or not?
# Another comment
x y
1 2
3 4
5 6

Also, for the record: apparently Mac OSX needs 'en_US.UTF-8' and not
'en_US.utf8'; flavors of Linux will accept either.

Stephen

In [*9*]: !cat blat.csv

# Some comment

# Å or not?

# Another comment

x y

1 2

3 4

5 6


In [*10*]: sys.version

Out[*10*]: '3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016,
17:52:12) \n[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]'


In [*11*]: print(os.getenv('LANG'))

None


In [*12*]: t = Table.read('blat.csv', format='ascii.basic')

---------------------------------------------------------------------------

UnicodeDecodeError                        Traceback (most recent call last)

<ipython-input-12-4c7389d51f8f> in <module>()

----> 1 t = Table.read('blat.csv', format='ascii.basic')


/Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/table/table.py
in read(cls, *args, **kwargs)

*   2330*         passed through to the underlying data reader (e.g. `~
astropy.io.ascii.read`).

*   2331*         """

-> 2332         return io_registry.read(cls, *args, **kwargs)

*   2333*

*   2334*     def write(self, *args, **kwargs):


/Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/registry.py
in read(cls, *args, **kwargs)

*    349*

*    350*         reader = get_reader(format, cls)

--> 351         data = reader(*args, **kwargs)

*    352*

*    353*         if not isinstance(data, cls):


/Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/ascii/connect.py
in io_read(format, filename, **kwargs)

*     35*     from .ui import read

*     36*     format = re.sub(r'^ascii\.', '', format)

---> 37     return read(filename, format=format, **kwargs)

*     38*

*     39*


/Users/sbailey/anaconda/envs/desi/lib/python3.5/site-packages/astropy/io/ascii/ui.py
in read(table, guess, **kwargs)

*    287*             try:

*    288*                 with get_readable_fileobj(table) as fileobj:

--> 289                     table = fileobj.read()

*    290*             except ValueError:  # unreadable or invalid binary
file

*    291*                 raise


/Users/sbailey/anaconda/envs/desi/lib/python3.5/encodings/ascii.py in
decode(self,
input, final)

*     24* class IncrementalDecoder(codecs.IncrementalDecoder):

*     25*     def decode(self, input, final=False):

---> 26         return codecs.ascii_decode(input, self.errors)[0]

*     27*

*     28* class StreamWriter(Codec,codecs.StreamWriter):


UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 17:
ordinal not in range(128)


In [*13*]: *with* set_locale('en_US.UTF-8'):

    ...:     t = Table.read('blat.csv', format='ascii.basic')


In [*14*]: t

Out[*14*]:

<Table length=3>

  x     y

int64 int64

----- -----

    1     2

    3     4

    5     6


On Mon, Oct 24, 2016 at 6:53 PM, Aldcroft, Thomas <
aldcroft at head.cfa.harvard.edu> wrote:

>
>
> On Mon, Oct 24, 2016 at 6:11 PM, Derek Homeier <derek at astro.physik.uni-
> goettingen.de> wrote:
>
>> On 25 Oct 2016, at 12:01 am, Nathan Goldbaum <nathan12343 at gmail.com>
>> wrote:
>> >
>> > I believe this is issue 2923:
>> >
>> > https://github.com/astropy/astropy/issues/2923
>> >
>> > On Mon, Oct 24, 2016 at 4:45 PM, Benjamin Alan Weaver <baweaver at lbl.gov>
>> wrote:
>> > Hello y'all,
>> >
>> > We are trying to read "ASCII" tables containing atomic line data
>> > provided by NIST.  When you request the line wavelength data in
>> > angstroms, NIST very helpfully labels the columns with the angstrom
>> > symbol (Å), which is not strictly part of the ASCII character set.
>> >
>> > We can read these tables with Table.read() *and* the environment
>> > variable LANG=en_US.utf-8 set.  However, if LANG is not set,
>> > Table.read() fails to decode these files.
>> >
>> > As far as I can tell the underlying read() function in astropy.io.ascii
>> > does not accept keywords related to the file encoding.
>> >
>> > So two questions:
>> >
>> > 1. Is the lack of an encoding keyword a bug that should be reported?
>> >
>> > 2. Is there a workaround that does not rely on LANG being set?
>>
>> A workaround that would at least get you away without manipulating the
>> environment outside Python would be
>>
>> import locale
>> locale.setlocale(locale.LC_ALL, str(‘en_US.utf8’))
>>
>
> You can make this a little cleaner using the set_locale context manager in
> astropy:
>
> from astropy.utils.misc import set_locale
> with set_locale('en_US.utf8'):
>     dat = Table.read(...)
>
> As to the original question of whether this should be reported as a bug,
> it has already been discussed in:
>
>  https://github.com/astropy/astropy/issues/3826
>
> That discussion ended without any really clear consensus except that using
> Python 3 is a good thing if that is an option.  I have never seriously
> evaluated how difficult it would be to implement support for unicode inputs
> for Python 2.  A basic recipe is shown in the stdlib csv package
> documentation, but I don't know how messy a fully working implementation
> would get.
>
> Cheers,
> Tom A
>
>
>>
>> Cheers,
>>                                         Derek
>>
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> https://mail.scipy.org/mailman/listinfo/astropy
>>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> https://mail.scipy.org/mailman/listinfo/astropy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20161024/805a2331/attachment.html>


More information about the AstroPy mailing list