[Numpy-discussion] reading *big* inhomogenous text matrices *fast*?
Dan Lenski
Daniel.Lenski at seagate.com
Wed Aug 13 15:56:48 EDT 2008
Hi all,
I'm using NumPy to read and process data from ASCII UCD files. This is a
file format for describing unstructured finite-element meshes.
Most of the file consists of rectangular, numerical text matrices, easily
and efficiently read with loadtxt(). But there is one particularly nasty
section that consists of matrices with variable numbers of columns, like
this:
# index property type nodes
1 1 tet 620 583 1578 1792
2 1 tet 656 551 553 566
3 1 tet 1565 766 1600 1646
4 1 tet 1545 631 1566 1665
5 1 hex 1531 1512 1559 1647 1648 1732
6 1 hex 777 1536 1556 1599 1601 1701
7 1 quad 296 1568 1535 1604
8 1 quad 54 711 285 666
As you might guess, the "type" label in the third column does indicate
the number of following columns.
Some of my files contain sections like this of *more than 1 million
lines*, so I need to be able to read them fast. I have not yet come up
with a good way to do this. What I do right now is I split them up into
separate arrays based on the "type" label:
lines = [f.next() for i in range(n)]
lines = [l.split(None, 3) for l in lines]
id, prop, types, nodes = apply(zip, lines) # THIS TAKES /FOREVER/
id = array(id, dtype=uint)
prop = array(id, dtype=uint)
types = array(types)
cells = {}
for t in N.unique(types):
these = N.nonzero(types==t)
# THIS NEXT LINE TAKES FOREVER
these_nodes = array([nodes[ii].split() for ii in these], dtype=uint).T
cells[t] = N.row_stack(( id[these], prop[these], these_nodes ))
This is really pretty slow and sub-optimal. Has anyone developed a more
efficient way to read arrays with variable numbers of columns???
Dan
More information about the NumPy-Discussion
mailing list