[Numpy-discussion] npreadtext: `numpy.loadtxt` in C
Ross Barnowski
rossbar15 at gmail.com
Thu Sep 16 14:22:25 EDT 2021
Hi all,
This is to announce [`npreadtext`](https://github.com/BIDS-numpy/npreadtext),
a drop-in replacement for `numpy.loadtxt` written in C for improved
performance. We are now at feature parity with `loadtxt`, and would greatly
appreciate your feedback & testing. We hope eventually to include
`npreadtext` in NumPy itself.
## Installation
`npreadtext` has been tested with NumPy v1.18 and higher and can be
installed using:
```
python -m pip install numpy
python -m pip install git+git://github.com/BIDS-numpy/npreadtext
```
To enable the C-accelerated version of `np.loadtxt`, monkey-patch NumPy:
```python
>>> import numpy as np
>>> from npreadtxt import monkeypatch_numpy
```
This replaces `np.loadtxt` with `npreadtext._loadtxt`.
## Feedback
You may leave comments here or file issues on the [project issue tracker](
https://github.com/BIDS-numpy/npreadtext/issues). Please also share text
files that strain or break the reader.
## Benchmarks
Preliminary benchmarks show a significant improvement in performance:
```
python runtests.py --bench-compare monkeypatch-npreadtext bench_io
npreadtext np.loadtxt speedup function
+ 7.74±0.04ms 146±0.8ms 18.85
bench_io.LoadtxtCSVStructured.time_loadtxt_csv_struct_dtype
+ 9.67±0.1ms 181±0.6ms 18.67
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100000)
+ 969±10μs 17.9±0.1ms 18.48
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10000)
+ 950±7μs 14.6±0.04ms 15.39
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10000)
+ 9.65±0.03ms 146±0.2ms 15.13
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100000)
+ 11.8±0.06ms 141±0.3ms 11.96
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100000)
+ 11.9±0.1ms 141±0.3ms 11.88
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100000)
+ 12.6±0.1ms 150±0.6ms 11.85
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100000)
+ 1.18±0.01ms 13.9±0.1ms 11.74
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10000)
+ 1.19±0.01ms 13.9±0.09ms 11.68
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10000)
+ 1.27±0ms 14.7±0.06ms 11.64
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)
+ 12.4±0.06ms 140±0.6ms 11.28
bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100000)
+ 1.22±0.02ms 13.8±0.09ms 11.26
bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10000)
+ 20.8±0.2μs 194±0.5μs 9.32
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100)
+ 20.4±0.2μs 162±0.3μs 7.97
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100)
+ 1.04±0ms 8.17±0.08ms 7.84
bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3, 5, 7])
+ 884±2μs 6.79±0.02ms 7.68
bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3])
+ 1.56±0.01ms 12.0±0.05ms 7.68
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10000)
+ 16.1±0.05ms 122±0.3ms 7.56
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000)
+ 23.4±0.04μs 163±0.9μs 6.94
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100)
+ 22.6±0.09μs 153±0.2μs 6.76
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100)
+ 22.9±0.5μs 154±0.7μs 6.72
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100)
+ 22.8±0.5μs 150±0.8μs 6.58
bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100)
+ 809±8μs 5.10±0.02ms 6.30
bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv(2)
+ 7.31±0.01ms 42.0±0.08ms 5.75
bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20000)
+ 748±2μs 4.11±0.04ms 5.50
bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(2000)
+ 26.0±0.2μs 131±0.3μs 5.02
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100)
+ 87.3±0.4μs 436±1μs 5.00
bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(200)
+ 2.09±0.01ms 10.1±0.04ms 4.86
bench_io.LoadtxtReadUint64Integers.time_read_uint64(10000)
+ 2.09±0ms 10.1±0.04ms 4.83
bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(10000)
+ 215±0.5μs 1.03±0ms 4.82
bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(1000)
+ 217±0.9μs 1.02±0ms 4.72
bench_io.LoadtxtReadUint64Integers.time_read_uint64(1000)
+ 123±0.6μs 580±3μs 4.71
bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(550)
+ 124±0.8μs 573±4μs 4.63
bench_io.LoadtxtReadUint64Integers.time_read_uint64(550)
+ 4.15±0.01ms 14.4±0.05ms 3.46
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10000)
+ 58.6±0.1ms 195±0.8ms 3.33
bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(10000)
+ 41.8±0.1ms 139±1ms 3.33
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100000)
+ 64.6±0.09ms 215±1ms 3.32
bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(500)
+ 64.9±0.2ms 215±2ms 3.30
bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(0)
+ 55.0±0.5μs 154±0.4μs 2.81
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100)
+ 23.9±0.1μs 60.1±1μs 2.51
bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20)
+ 12.1±0.2μs 29.4±0.2μs 2.44
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10)
+ 12.0±0.05μs 26.2±0.2μs 2.18
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10)
+ 12.5±0.08μs 26.1±0.09μs 2.08
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10)
+ 12.3±0.04μs 24.9±0.4μs 2.02
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10)
+ 12.3±0.1μs 24.8±0.2μs 2.02
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10)
+ 12.2±0.04μs 24.5±0.1μs 2.01
bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10)
+ 13.3±0.1μs 23.4±0.1μs 1.76
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10)
+ 18.5±0.3μs 25.6±0.5μs 1.39
bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10)
```
The repository includes [procedures for running benchmarks locally](
https://github.com/BIDS-numpy/npreadtext#benchmarking).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210916/e4b8ab69/attachment.html>
More information about the NumPy-Discussion
mailing list