Precision reading and writing data frames to csv

Paulo da Silva p_s_d_a_s_i_l_v_a_ns at netcabo.pt
Sat Mar 11 21:26:28 EST 2017


Às 23:29 de 11-03-2017, Erik escreveu:
> Hi Paulo,
> 
> On 11/03/17 22:01, Paulo da Silva wrote:
...

>> Now my question is: Is it possible the occurrence of successive
>> cumulative errors? I mean reading a file, adding lines or change few
>> ones but keeping the most of the other lines untouched and, even so,
>> existent untouched lines keep always changing?
> 
> Firstly, if the values are changing then they are being first written by
> a process that has a very strict decimal representation and then read in
> and re-written by something that has a less-strict representation (I'm
> guessing that by "float/double values", you mean IEEE 754).
I don't know. I can tell you that I'm using pandas data frame read_csv
and to_csv to read and write.

> 
> Because IEEE 754 can't represent all values *exactly*, some input values
> will change to something close as you have seen.

> However, if the input value matches something that IEEE 754 *can*
> represent exactly then it will not change.
> 
> Whether you will see _cumulative_ errors depends on whether the output
> stage of the first pass truncates the output with a field width
> specifier or similar. If not, then you should see the initial change
> you've noticed and then nothing more after that for that particular datum.
Ok, I like that.

> 
> 
> Having said all that, if you use Python's decimal.Decimal type instead
> of float/double for processing your files, then you are better off if
> absolute precision is what you need.
No, I don't think it is needed. It will unnecessarily complicate my
scripts. They have lots of calculations (matrix like operations). I
don't care about fine precision. Only cumulative changes. I was in doubt
if the data was going to change over time.

Thank you very much.




More information about the Python-list mailing list