[Numpy-discussion] Distance Matrix speed (original) (raw)

Sebastian Beca sebastian.beca at gmail.com
Fri Jun 16 19:01:44 EDT 2006


Thanks! Avoiding the inner loop is MUCH faster (~20-300 times than the original). Nevertheless I don't think I can use hypot as it only works for two dimensions. The general problem I have is:

A = random( [C, K] ) B = random( [N, K] )

C ~ 1-10 N ~ Large (thousands, millions.. i.e. my dataset) K ~ 2-100 (dimensions of my problem, i.e. not fixed a priori.)

I adapted your proposed version to this for K dimensions:

def d4(): d = zeros([4, 1000], dtype=float) for i in range(4): xy = A[i] - B d[i] = sqrt( sum(xy**2, axis=1) ) return d

Maybe there's another alternative to d4? Thanks again,

Sebastian.

def d2(): d = zeros([4, 10000], dtype=float) for i in range(4): xy = A[i] - B d[i] = xy[:,0]**2 + xy[:,1]**2 return d

This is something like 250 times as fast as the naive Python solution; another five times faster than the fastest distance computing version that I could come up with (using hypot). -tim


Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list