From derek at astro.physik.uni-goettingen.de Sat Jun 2 13:25:38 2018 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Sat, 2 Jun 2018 19:25:38 +0200 Subject: [AstroPy] astropy.io.ascii.FixedWidthNoHeader bug In-Reply-To: <0652DC3A-1FD5-4C64-B19A-57520485FBB8@astro.physik.uni-goettingen.de> References: <0652DC3A-1FD5-4C64-B19A-57520485FBB8@astro.physik.uni-goettingen.de> Message-ID: Hi Rick, > On 31 May 2018, at 2:28 pm, Frederic V. Hessman wrote: > > I've got a simple ASCII table: > > # nix.txt > 1 -68 40574.624730 40574.625190 40574.624025 1 0.0000200 0.0011645 100.61 > 2 0 0.000000 40610.064100 40610.064500 0 0.0001000 -0.0003996 -34.52 > 3 5 40612.670790 40612.671278 40612.670417 1 0.0001000 0.0008612 74.41 > > that I wanted to read using astropy.io.ascii (Table was giving me more problems....), so I played with various parsers and options that didn't work until it finally appeared to parse successfully : > > % python > Python 3.5.4 (default, Sep 22 2017, 08:33:07) > [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> from astropy.io import ascii > >>> ascii.read ('nix.txt',format='fixed_width_no_header',comment='#',delimiter=' ') > > col1 col2 col3 col4 col5 col6 col7 col8 col9 > int64 int64 float64 float64 float64 int64 float64 float64 float64 > ----- ----- ----------- ------------ ------------ ----- ------- --------- ------- > 1 -68 40574.62473 40574.62519 40574.624025 1 2e-05 0.0011645 100.61 > 2 0 0.0 40610.0641 40610.0645 0 0.0001 0.0003996 -34.52 > 3 5 40612.67079 40612.671278 40612.670417 1 0.0001 0.0008612 74.41 > > Note that the minus sign in col8 was zapped but that the minus sign in col9 was not! I then switched the two lines: > > >>> ascii.read ('nix.txt',format='fixed_width_no_header',comment='#',delimiter=' ') >
> col1 col2 col3 col4 col5 col6 col7 col8 col9 > int64 int64 float64 float64 float64 int64 float64 float64 float64 > ----- ----- ------- ------------ ------------ ----- ------- ---------- ------- > 2 0 0.0 40610.0641 40610.0645 0 0.0001 -0.0003996 -34.52 > 1 8 4.62473 40574.62519 40574.624025 1 2e-05 0.0011645 100.61 > 3 5 2.67079 40612.671278 40612.670417 1 0.0001 0.0008612 74.41 > > which gives the correct values, so the behaviour somehow depends on how the fixed width columns are found. My guess is that, in the first case, there was a " 0.00" in col8 (leading space) and "100" in col9 defining the fixed width columns but in the second the columns were already reserved by "-0.00" and "-34". Looks like a bug to me. > perhaps not a bug, rather a limitation in functionality. As FixedWidthNoHeader cannot obtain the column limits from the header, it tries to infer them from the first data line if they are not specified by the user. But this makes such truncations fairly inevitable, if the first line does not cover the maximum range of all subsequent lines. A similar problem occurs when inserting an item with fewer digits: # nix.txt 1 -68 4574.624730 40574.625190 40574.624025 1 0.0000200 0.0011645 100.61 2 0 0.000000 40610.064100 40610.064500 0 0.0001000 -0.0003996 -34.52 3 5 40612.670790 40612.671278 40612.670417 1 0.0001000 0.0008612 74.41 >>> astropy.io.ascii.read('nix.txt', format='fixed_width_no_header', delimiter=' ')
col1 col2 col3 col4 col5 col6 col7 col8 col9 int64 int64 float64 float64 float64 int64 float64 float64 float64 ----- ----- ---------- ------------ ------------ ----- ------- --------- ------- 1 -68 4574.62473 40574.62519 40574.624025 1 2e-05 0.0011645 100.61 2 0 0.0 40610.0641 40610.0645 0 0.0001 0.0003996 -34.52 3 5 612.67079 40612.671278 40612.670417 1 0.0001 0.0008612 74.41 Testing all data lines for possible other column limits would be clearly untenable from a performance POV. What could be improved is perhaps an option to always assume that the entries are correctly right-aligned, thus determining the column ends from the first line and setting them to maximum width - basically equivalent to the user specifying ascii.read('nix.txt', format='fixed_width_no_header', delimiter=? ?, col_ends=[2,10,?]) This might be worth filing an issue on GitHub, and the documentation probably could also be a bit clearer. But for your specific case here, as you have found below, fixed_width is not a very good recipe in the first place, since it is almost a textbook example for a basic ascii[.no_header] format. > Afterwards, I realized that > > ... format='no_header", ... > > would have been easier and safer. Thank goodness I finally recognized that all of my minus signs had dissappeared! > Out of curiosity, what was giving you more problems with the Table reader here; in particular, did Table.read('nix.txt', format='ascii.no_header?) not work? Cheers, Derek From npkuin at gmail.com Sat Jun 2 17:58:44 2018 From: npkuin at gmail.com (Paul Kuin) Date: Sat, 2 Jun 2018 22:58:44 +0100 Subject: [AstroPy] astropy.io.ascii.FixedWidthNoHeader bug In-Reply-To: References: <0652DC3A-1FD5-4C64-B19A-57520485FBB8@astro.physik.uni-goettingen.de> Message-ID: i vote bug. if a format has been adopted, then the skipped columns can easily be checked for being empty. Better fail than read in false values. On Sat, Jun 2, 2018 at 6:25 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > Hi Rick, > > > On 31 May 2018, at 2:28 pm, Frederic V. Hessman < > hessman at astro.physik.uni-goettingen.de> wrote: > > > > I've got a simple ASCII table: > > > > # nix.txt > > 1 -68 40574.624730 40574.625190 40574.624025 1 0.0000200 > 0.0011645 100.61 > > 2 0 0.000000 40610.064100 40610.064500 0 0.0001000 > -0.0003996 -34.52 > > 3 5 40612.670790 40612.671278 40612.670417 1 0.0001000 > 0.0008612 74.41 > > > > that I wanted to read using astropy.io.ascii (Table was giving me more > problems....), so I played with various parsers and options that didn't > work until it finally appeared to parse successfully : > > > > % python > > Python 3.5.4 (default, Sep 22 2017, 08:33:07) > > [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > > >>> from astropy.io import ascii > > >>> ascii.read ('nix.txt',format='fixed_width_no_header',comment='#',delimiter=' > ') > >
> > col1 col2 col3 col4 col5 col6 col7 col8 > col9 > > int64 int64 float64 float64 float64 int64 float64 > float64 float64 > > ----- ----- ----------- ------------ ------------ ----- ------- > --------- ------- > > 1 -68 40574.62473 40574.62519 40574.624025 1 2e-05 > 0.0011645 100.61 > > 2 0 0.0 40610.0641 40610.0645 0 0.0001 > 0.0003996 -34.52 > > 3 5 40612.67079 40612.671278 40612.670417 1 0.0001 > 0.0008612 74.41 > > > > Note that the minus sign in col8 was zapped but that the minus sign in > col9 was not! I then switched the two lines: > > > > >>> ascii.read ('nix.txt',format='fixed_width_no_header',comment='#',delimiter=' > ') > >
> > col1 col2 col3 col4 col5 col6 col7 col8 > col9 > > int64 int64 float64 float64 float64 int64 float64 float64 > float64 > > ----- ----- ------- ------------ ------------ ----- ------- ---------- > ------- > > 2 0 0.0 40610.0641 40610.0645 0 0.0001 -0.0003996 > -34.52 > > 1 8 4.62473 40574.62519 40574.624025 1 2e-05 0.0011645 > 100.61 > > 3 5 2.67079 40612.671278 40612.670417 1 0.0001 0.0008612 > 74.41 > > > > which gives the correct values, so the behaviour somehow depends on how > the fixed width columns are found. My guess is that, in the first case, > there was a " 0.00" in col8 (leading space) and "100" in col9 defining the > fixed width columns but in the second the columns were already reserved by > "-0.00" and "-34". Looks like a bug to me. > > > perhaps not a bug, rather a limitation in functionality. > As FixedWidthNoHeader cannot obtain the column limits from the header, > it tries to infer them from the first data line if they are not specified > by the user. > But this makes such truncations fairly inevitable, if the first line does > not cover > the maximum range of all subsequent lines. A similar problem occurs when > inserting an item with fewer digits: > > # nix.txt > 1 -68 4574.624730 40574.625190 40574.624025 1 0.0000200 0.0011645 > 100.61 > 2 0 0.000000 40610.064100 40610.064500 0 0.0001000 -0.0003996 > -34.52 > 3 5 40612.670790 40612.671278 40612.670417 1 0.0001000 0.0008612 > 74.41 > > >>> astropy.io.ascii.read('nix.txt', format='fixed_width_no_header', > delimiter=' ') >
> col1 col2 col3 col4 col5 col6 col7 col8 > col9 > int64 int64 float64 float64 float64 int64 float64 float64 > float64 > ----- ----- ---------- ------------ ------------ ----- ------- --------- > ------- > 1 -68 4574.62473 40574.62519 40574.624025 1 2e-05 0.0011645 > 100.61 > 2 0 0.0 40610.0641 40610.0645 0 0.0001 0.0003996 > -34.52 > 3 5 612.67079 40612.671278 40612.670417 1 0.0001 0.0008612 > 74.41 > > Testing all data lines for possible other column limits would be clearly > untenable from > a performance POV. > > What could be improved is perhaps an option to always assume that the > entries are > correctly right-aligned, thus determining the column ends from the first > line and > setting them to maximum width - basically equivalent to the user specifying > > ascii.read('nix.txt', format='fixed_width_no_header', delimiter=? ?, > col_ends=[2,10,?]) > > This might be worth filing an issue on GitHub, and the documentation > probably could > also be a bit clearer. > > But for your specific case here, as you have found below, fixed_width is > not a very good > recipe in the first place, since it is almost a textbook example for a > basic ascii[.no_header] format. > > > Afterwards, I realized that > > > > ... format='no_header", ... > > > > would have been easier and safer. Thank goodness I finally recognized > that all of my minus signs had dissappeared! > > > Out of curiosity, what was giving you more problems with the Table reader > here; in particular, did > > Table.read('nix.txt', format='ascii.no_header?) > > not work? > > Cheers, > Derek > > _______________________________________________ > AstroPy mailing list > AstroPy at python.org > https://mail.python.org/mailman/listinfo/astropy > -- * * * * * * * * http://www.mssl.ucl.ac.uk/~npmk/ * * * * N.P.M. Kuin (n.kuin at ucl.ac.uk) phone +44-(0)1483 (prefix) -204111 (work) mobile +44(0)7908715953 skype ID: npkuin Mullard Space Science Laboratory ? University College London ? Holmbury St Mary ? Dorking ? Surrey RH5 6NT? U.K. -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Mon Jun 4 10:45:04 2018 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Mon, 4 Jun 2018 16:45:04 +0200 Subject: [AstroPy] astropy.io.ascii.FixedWidthNoHeader bug In-Reply-To: References: <0652DC3A-1FD5-4C64-B19A-57520485FBB8@astro.physik.uni-goettingen.de> Message-ID: On 2 Jun 2018, at 11:58 pm, Paul Kuin wrote: > > i vote bug. if a format has been adopted, then the skipped columns can easily be checked for being empty. Better fail than read in false values. > Which skipped columns do you mean in this case? Presently, get_fixedwidth_params() simply tries to split the data line on the delimiter and then discards the empty columns that it gets from e.g. several spaces in a row, i.e. it returns the minimal width of the columns given on the first line. So this works both ways - if you read # nix.txt 2 0 0.0 40610.064100 40610.064500 0 0.0001000 -0.0003996 -34.52 1 -68 40574.624730 40574.625190 40574.624025 1 0.0000200 0.0011645 100.61 3 5 40612.670790 40612.671278 40612.670417 1 0.0001000 0.0008612 74.41 ascii.read('nix.txt', format='fixed_width_no_header', delimiter=' ')
col1 col2 col3 col4 col5 col6 col7 col8 col9 int64 int64 float64 float64 float64 int64 float64 float64 float64 ----- ----- ------- ------------ ------------ ----- ------- ---------- ------- 2 0 0.0 40610.0641 40610.0645 0 0.0001 -0.0003996 -34.52 1 8 4.6 40574.62519 40574.624025 1 2e-05 0.0011645 100.61 3 5 2.6 40612.671278 40612.670417 1 0.0001 0.0008612 74.41 both the leading digits and the decimals after the first one are cut. With only the information contained in the first data line it is impossible to fix both cases, but an optional parameter for inferring only the column ends or starts, respectively, would at least allow the user to specify if leading digits (more likely IMO) or trailing decimal zeros are left blank in the table. FixedWidthHeader.get_fixedwidth_params() would need to be modified to determine only either `ends` or `starts` in its second case, and from them infer the other as in the third case else: # exactly one of col_starts or col_ends is given... ... Cheers, Derek From npkuin at gmail.com Mon Jun 4 11:44:05 2018 From: npkuin at gmail.com (Paul Kuin) Date: Mon, 4 Jun 2018 16:44:05 +0100 Subject: [AstroPy] astropy.io.ascii.FixedWidthNoHeader bug In-Reply-To: References: <0652DC3A-1FD5-4C64-B19A-57520485FBB8@astro.physik.uni-goettingen.de> Message-ID: I am still using sometimes the "acut" program written by Francois Ochsenbein at CDS (part of the cdsclient package), which is a fast unix script that can be used to cut up a table. To me it could be put to good use. Indeed if you blindly determine that a column only has delimiters based on the first line, then you are in trouble. Perhaps, you need to run the "anafile" program from the same cdsclient package first, to get a list of columns which have solely the delimiter in it. Just a suggestion. Cheers, - Paul On Mon, Jun 4, 2018 at 3:45 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > On 2 Jun 2018, at 11:58 pm, Paul Kuin wrote: > > > > i vote bug. if a format has been adopted, then the skipped columns can > easily be checked for being empty. Better fail than read in false values. > > > Which skipped columns do you mean in this case? > > Presently, get_fixedwidth_params() simply tries to split the data line on > the delimiter and then > discards the empty columns that it gets from e.g. several spaces in a row, > i.e. it returns the > minimal width of the columns given on the first line. So this works both > ways - if you read > > # nix.txt > 2 0 0.0 40610.064100 40610.064500 0 0.0001000 -0.0003996 > -34.52 > 1 -68 40574.624730 40574.625190 40574.624025 1 0.0000200 0.0011645 > 100.61 > 3 5 40612.670790 40612.671278 40612.670417 1 0.0001000 0.0008612 > 74.41 > > ascii.read('nix.txt', format='fixed_width_no_header', delimiter=' ') >
> col1 col2 col3 col4 col5 col6 col7 col8 > col9 > int64 int64 float64 float64 float64 int64 float64 float64 > float64 > ----- ----- ------- ------------ ------------ ----- ------- ---------- > ------- > 2 0 0.0 40610.0641 40610.0645 0 0.0001 -0.0003996 > -34.52 > 1 8 4.6 40574.62519 40574.624025 1 2e-05 0.0011645 > 100.61 > 3 5 2.6 40612.671278 40612.670417 1 0.0001 0.0008612 > 74.41 > > both the leading digits and the decimals after the first one are cut. > > With only the information contained in the first data line it is > impossible to fix both cases, > but an optional parameter for inferring only the column ends or starts, > respectively, would > at least allow the user to specify if leading digits (more likely IMO) or > trailing decimal zeros > are left blank in the table. > FixedWidthHeader.get_fixedwidth_params() would need to be modified to > determine only > either `ends` or `starts` in its second case, and from them infer the > other as in the third case > > else: > # exactly one of col_starts or col_ends is given... > ... > > Cheers, > Derek > > _______________________________________________ > AstroPy mailing list > AstroPy at python.org > https://mail.python.org/mailman/listinfo/astropy > -- * * * * * * * * http://www.mssl.ucl.ac.uk/~npmk/ * * * * N.P.M. Kuin (n.kuin at ucl.ac.uk) phone +44-(0)1483 (prefix) -204111 (work) mobile +44(0)7908715953 skype ID: npkuin Mullard Space Science Laboratory ? University College London ? Holmbury St Mary ? Dorking ? Surrey RH5 6NT? U.K. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsipocz at gmail.com Thu Jun 7 14:02:46 2018 From: bsipocz at gmail.com (Brigitta Sipocz) Date: Thu, 7 Jun 2018 19:02:46 +0100 Subject: [AstroPy] ANN: astropy bugfix releases v3.0.3 and v2.0.7 (LTS) Message-ID: Dear All, Earlier this week, bugfix releases have been made for both the stable (v3.0.3) and LTS (v2.0.7) edition of astropy. They are available either on PyPI or on the usual conda channels. In additional to the various minor bugfixes, the main driver behind this release was to provide an infrastructural fix for package using astropy and run their tests with the latest pytest version. The full list of fixes can be found in the changelog: https://github.com/astropy/astropy/blob/v3.0.3/CHANGES.rst and https://github.com/astropy/astropy/blob/v2.0.7/CHANGES.rst Thank you for everyone who contributed for these releases! Cheers, Brigitta -------------- next part -------------- An HTML attachment was scrubbed... URL: