[Numpy-discussion] huge array calculation speed
Andrew Dalke
dalke at dalkescientific.com
Thu Jul 10 19:48:12 EDT 2008
On Jul 10, 2008, at 6:38 PM, Dan Lussier wrote:
> What I'm trying to do is to calculate the total total potential
> energy and coordination number of each atom within a relatively large
> simulation.
Anne Archibald already responded:
> you could try a three-dimensional grid (if your atoms aren't
> too clumpy) or octrees (if they are somewhat clumpy); kd-trees are
> probably overkill (since they're designed for higher-dimensional
> problems).
I implemented something like what you did in VMD about 14(!) years ago.
(VMD is a molecular structure visualization program.) I needed to
find which atoms might be bonded to each other. I assume you have
a cutoff value, which means you don't need to worry about the general
nearest-neighbor problem.
Molecules are nice because the distributions are not clumpy. Atoms
don't get that close to nor that far from other atoms. I implemented
a grid. It goes something like this:
import collections
# Search within 3 A
d = 3.0
coordinates = [
("C1", 34.287, 50.970, 115.006),
("O1", 34.972, 51.144, 113.870),
("C2", 33.929, 52.255, 115.739),
("N2", 34.753, 52.387, 116.954),
("C3", 32.448, 52.219, 116.121),
("O3", 32.033, 50.877, 116.336),
("C4", 31.528, 52.817, 115.054),
("C5", 30.095, 53.013, 115.558),
("C6", 29.226, 53.835, 114.609),
("C7", 29.807, 55.217, 114.304),
("C8", 29.092, 55.920, 113.147),
("C9", 29.525, 57.375, 112.971),
("C10", 28.409, 58.267, 112.422),
("C11", 28.828, 59.734, 112.294),
("C12", 27.902, 60.542, 111.385),
("C13", 26.617, 60.996, 112.085),
("C14", 26.182, 62.401, 111.667),
("C15", 24.739, 62.731, 112.054),
("C16", 24.251, 64.046, 111.441),
("C17", 23.026, 64.624, 112.150),
("C18", 22.631, 66.007, 111.623),]
def dict_of_list():
return collections.defaultdict(list)
def dict_of_dict():
return collections.defaultdict(dict_of_list)
grid = collections.defaultdict(dict_of_dict)
for atom in coordinates:
name,x,y,z = atom
i = int(x/d)
j = int(y/d)
k = int(z/d)
grid[i][j][k].append(atom)
query_name, query_x, query_y, query_z = coordinates[8]
query_i = int(query_x/d)
query_j = int(query_y/d)
query_k = int(query_z/d)
# Given the search distance 'd', only need to check cells up to 1
unit away
within_names = set()
for i in range(query_i-1, query_i+2):
for j in range(query_j-1, query_j+2):
for k in range(query_k-1, query_k+2):
for atom in grid[i][j][k]:
name, x, y, z = atom
print "Check", atom,
query_d2 = (x-query_x)**2+(y-query_y)**2+(z-query_z)**2
if query_d2 < d*d:
print "Within!", query_d2
within_names.add(name)
else:
print "Too far", query_d2
print len(within_names), "were close enough"
# Linear search to verify
count = 0
for name, x, y, z in coordinates:
query_d2 = (x-query_x)**2+(y-query_y)**2+(z-query_z)**2
if query_d2 < d*d:
print "Within", name
if name not in within_names:
raise AssertionError(name)
count += 1
if count != len(within_names):
raise AssertionError("count problem")
You can also grab the KDTree from Biopython, which is implemented in C.
http://www.biopython.org/DIST/docs/api/Bio.KDTree.KDTree'-module.html
It was designed for just this task.
Andrew
dalke at dalkescientific.com
More information about the NumPy-Discussion
mailing list