Silent data corruption in pandas, was Re: Python read text file columnwise

Peter Otten __peter__ at web.de
Sat Jan 12 05:12:43 EST 2019


Peter Otten wrote:

> shibashibani at gmail.com wrote:
> 
>> Hello
>>> 
>>> I'm very new in python. I have a file in the format:
>>> 
>>> 2018-05-31	16:00:00	28.90	81.77	4.3
>>> 2018-05-31	20:32:00	28.17	84.89	4.1
>>> 2018-06-20	04:09:00	27.36	88.01	4.8
>>> 2018-06-20	04:15:00	27.31	87.09	4.7
>>> 2018-06-28	04.07:00	27.87	84.91	5.0
>>> 2018-06-29	00.42:00	32.20	104.61	4.8
>> 
>> I would like to read this file in python column-wise.

> However, in the long term you may be better off with a tool like pandas:
> 
>>>> import pandas
>>>> pandas.read_table(
> ... "seismicity_R023E.txt", sep=r"\s+",
> ... names=["date", "time", "foo", "bar", "baz"],
> ... parse_dates=[["date", "time"]]
> ... )
>             date_time    foo     bar  baz
> 0 2018-05-31 16:00:00  28.90   81.77  4.3
> 1 2018-05-31 20:32:00  28.17   84.89  4.1
> 2 2018-06-20 04:09:00  27.36   88.01  4.8
> 3 2018-06-20 04:15:00  27.31   87.09  4.7
> 4 2018-06-28 04:00:00  27.87   84.91  5.0
> 5 2018-06-29 00:00:00  32.20  104.61  4.8
> 
> [6 rows x 4 columns]
>>>>
> 
> It will be harder in the beginning, but if you work with tabular data
> regularly it will pay off.

After posting the above I noted that the malformed time in the last two rows 
was silently botched. So I just spent an insane amount of time to try and 
fix this from within pandas:

import datetime

import numpy
import pandas


def parse_datetime(dt):
    return datetime.datetime.strptime(
        dt.replace(".", ":"), "%Y-%m-%d %H:%M:%S"
    )


def date_parser(dates, times):
    return numpy.array([
        parse_datetime(date + " " + time)
        for date, time in zip(dates, times)
    ])

 
df = pandas.read_table(
    "seismicity_R023E.txt", sep=r"\s+",
    names=["date", "time", "foo", "bar", "baz"],
    parse_dates=[["date", "time"]], date_parser=date_parser
)


print(df)

There's probably a better way as I am only a determined amateur...




More information about the Python-list mailing list