[Numpy-discussion] npreadtext: `numpy.loadtxt` in C

Ross Barnowski rossbar15 at gmail.com
Thu Sep 16 14:22:25 EDT 2021


Hi all,

This is to announce [`npreadtext`](https://github.com/BIDS-numpy/npreadtext),
a drop-in replacement for `numpy.loadtxt` written in C for improved
performance. We are now at feature parity with `loadtxt`, and would greatly
appreciate your feedback & testing. We hope eventually to include
`npreadtext` in NumPy itself.

## Installation

`npreadtext` has been tested with NumPy v1.18 and higher and can be
installed using:

```
python -m pip install numpy
python -m pip install git+git://github.com/BIDS-numpy/npreadtext
```

To enable the C-accelerated version of `np.loadtxt`, monkey-patch NumPy:

```python
>>> import numpy as np
>>> from npreadtxt import monkeypatch_numpy
```

This replaces `np.loadtxt` with `npreadtext._loadtxt`.

## Feedback

You may leave comments here or file issues on the [project issue tracker](
https://github.com/BIDS-numpy/npreadtext/issues). Please also share text
files that strain or break the reader.

## Benchmarks

Preliminary benchmarks show a significant improvement in performance:

```
python runtests.py --bench-compare monkeypatch-npreadtext bench_io

       npreadtext       np.loadtxt  speedup  function

+     7.74±0.04ms        146±0.8ms    18.85
 bench_io.LoadtxtCSVStructured.time_loadtxt_csv_struct_dtype
+      9.67±0.1ms        181±0.6ms    18.67
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100000)
+        969±10μs       17.9±0.1ms    18.48
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10000)
+         950±7μs      14.6±0.04ms    15.39
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10000)
+     9.65±0.03ms        146±0.2ms    15.13
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100000)
+     11.8±0.06ms        141±0.3ms    11.96
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100000)
+      11.9±0.1ms        141±0.3ms    11.88
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100000)
+      12.6±0.1ms        150±0.6ms    11.85
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100000)
+     1.18±0.01ms       13.9±0.1ms    11.74
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10000)
+     1.19±0.01ms      13.9±0.09ms    11.68
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10000)
+        1.27±0ms      14.7±0.06ms    11.64
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)
+     12.4±0.06ms        140±0.6ms    11.28
 bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100000)
+     1.22±0.02ms      13.8±0.09ms    11.26
 bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10000)
+      20.8±0.2μs        194±0.5μs     9.32
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 100)
+      20.4±0.2μs        162±0.3μs     7.97
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 100)
+        1.04±0ms      8.17±0.08ms     7.84
 bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3, 5, 7])
+         884±2μs      6.79±0.02ms     7.68
 bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3])
+     1.56±0.01ms      12.0±0.05ms     7.68
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10000)
+     16.1±0.05ms        122±0.3ms     7.56
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000)
+     23.4±0.04μs        163±0.9μs     6.94
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100)
+     22.6±0.09μs        153±0.2μs     6.76
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 100)
+      22.9±0.5μs        154±0.7μs     6.72
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 100)
+      22.8±0.5μs        150±0.8μs     6.58
 bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(100)
+         809±8μs      5.10±0.02ms     6.30
 bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv(2)
+     7.31±0.01ms      42.0±0.08ms     5.75
 bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20000)
+         748±2μs      4.11±0.04ms     5.50
 bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(2000)
+      26.0±0.2μs        131±0.3μs     5.02
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100)
+      87.3±0.4μs          436±1μs     5.00
 bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(200)
+     2.09±0.01ms      10.1±0.04ms     4.86
 bench_io.LoadtxtReadUint64Integers.time_read_uint64(10000)
+        2.09±0ms      10.1±0.04ms     4.83
 bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(10000)
+       215±0.5μs         1.03±0ms     4.82
 bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(1000)
+       217±0.9μs         1.02±0ms     4.72
 bench_io.LoadtxtReadUint64Integers.time_read_uint64(1000)
+       123±0.6μs          580±3μs     4.71
 bench_io.LoadtxtReadUint64Integers.time_read_uint64_neg_values(550)
+       124±0.8μs          573±4μs     4.63
 bench_io.LoadtxtReadUint64Integers.time_read_uint64(550)
+     4.15±0.01ms      14.4±0.05ms     3.46
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10000)
+      58.6±0.1ms        195±0.8ms     3.33
 bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(10000)
+      41.8±0.1ms          139±1ms     3.33
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100000)
+     64.6±0.09ms          215±1ms     3.32
 bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(500)
+      64.9±0.2ms          215±2ms     3.30
 bench_io.LoadtxtCSVSkipRows.time_skiprows_csv(0)
+      55.0±0.5μs        154±0.4μs     2.81
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100)
+      23.9±0.1μs         60.1±1μs     2.51
 bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(20)
+      12.1±0.2μs       29.4±0.2μs     2.44
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int64', 10)
+     12.0±0.05μs       26.2±0.2μs     2.18
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('int32', 10)
+     12.5±0.08μs      26.1±0.09μs     2.08
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10)
+     12.3±0.04μs       24.9±0.4μs     2.02
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float64', 10)
+      12.3±0.1μs       24.8±0.2μs     2.02
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('float32', 10)
+     12.2±0.04μs       24.5±0.1μs     2.01
 bench_io.LoadtxtCSVComments.time_comment_loadtxt_csv(10)
+      13.3±0.1μs       23.4±0.1μs     1.76
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10)
+      18.5±0.3μs       25.6±0.5μs     1.39
 bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10)
```

The repository includes  [procedures for running benchmarks locally](
https://github.com/BIDS-numpy/npreadtext#benchmarking).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210916/e4b8ab69/attachment.html>


More information about the NumPy-Discussion mailing list