[Numpy-discussion] Distance Matrix speed (original) (raw)

Tim Hochberg tim.hochberg at cox.net
Mon Jun 19 17:39:14 EDT 2006


Tim Hochberg wrote:

Sebastian Beca wrote:

I just ran Alan's script and I don't get consistent results for 100 repetitions. I boosted it to 1000, and ran it several times. The faster one varied alot, but both came into a ~ +-1.5% difference. When it comes to scaling, for my problem(fuzzy clustering), N is the size of the dataset, which should span from thousands to millions. C is the amount of clusters, usually less than 10, and K the amount of features (the dimension I want to sum over) is also usually less than 100. So mainly I'm concerned with scaling across N. I tried C=3, K=4, N=1000, 2500, 5000, 7500, 10000. Also using 1000 runs, the results were: distbeca: 1.1, 4.5, 16, 28, 37 distloehner1: 1.7, 6.5, 22, 35, 47 I also tried scaling across K, with C=3, N=2500, and K=5-50. I couldn't get any consistent results for small K, but both tend to perform as well (+-2%) for large K (K>15). I'm not sure how these work in the backend so I can't argument as to why one should scale better than the other.

The reason I suspect that distbeca should scale better is that distloehner1 generates an intermediate array of size NxCxK, while distbeca produces intermediate matrices that are only NxK or CxK. For large problems, allocating that extra memory and fetching it into and out of the cache can be a bottleneck. Here's another version that allocates even less in the way of temporaries at the expenses of being borderline incomprehensible. It still allocates an NxK temporary array, but it allocates it once ahead of time and then reuses it for all subsequent calculations. Your welcome to use it, but I'm not sure I'd recomend it unless this function is really a speed bottleneck as it could end up being hard to read later (I left implementing the N<C case as an exercise to the reader....). I have another idea that might reduce the memory overhead still further, if I get a chance I'll try it out and let you know if it results in a further speed up. -tim def dist2(A, B): d = zeros([N, C], dtype=float) if N < C: raise NotImplemented else: tmp = empty([N, K], float) tmp0 = tmp[:,0] rangek = range(1,K) for j in range(C): subtract(A, B[j], tmp) tmp *= tmp for k in rangek: tmp0 += tmp[:,k] sqrt(tmp0, d[:,j]) return d Speaking of scaling: I tried this with K=25000 (10 x greater than Sebastian's original numbers). Much to my suprise it performed somewhat worse than the Sebastian's dist() with large K. Below is a modified dist2 that performs about the same (marginally better here) for large K as well as a dist3 that performs about 50% better at both K=2500 and K=25000.

-tim

def dist2(A, B):
    d = empty([N, C], dtype=float)
    if N < C:
        raise NotImplemented
    else:
        tmp = empty([N, K], float)
        tmp0 = tmp[:,0]
        for j in range(C):
            subtract(A, B[j], tmp)
            tmp **= 2
            d[:,j] = sum(tmp, axis=1)
            sqrt(d[:,j], d[:,j])
    return d

def dist3(A, B):
    d = zeros([N, C], dtype=float)
    rangek = range(K)
    if N < C:
        raise NotImplemented
    else:
        tmp = empty([N], float)
        for j in range(C):
            for k in rangek:
                subtract(A[:,k], B[j,k], tmp)
                tmp **= 2
                d[:,j] += tmp
            sqrt(d[:,j], d[:,j])
    return d

Regards, Sebastian. On 6/19/06, Alan G Isaac <aisaac at american.edu> wrote:

On Sun, 18 Jun 2006, Tim Hochberg apparently wrote:

Alan G Isaac wrote:

On Sun, 18 Jun 2006, Sebastian Beca apparently wrote:

def dist(): d = zeros([N, C], dtype=float) if N < C: for i in range(N): xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) return d else: for j in range(C): xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) return d

But that is 50% slower than Johannes's version: def distloehner1(): d = A[:, newaxis, :] - B[newaxis, :, :] d = sqrt((d**2).sum(axis=2)) return d Are you sure about that? I just ran it through timeit, using Sebastian's array sizes and I get Sebastian's version being 150% faster. This could well be cache size dependant, so may vary from box to box, but I'd expect Sebastian's current version to scale better in general. No, I'm not sure. Script attached bottom. Most recent output follows: for reasons I have not determined, it doesn't match my previous runs ... Alan execfile(r'c:\temp\temp.py') distbeca : 3.042277 distloehner1: 3.170026 ################################# #THE SCRIPT import sys sys.path.append("c:\temp") import numpy from numpy import * import timeit K = 10 C = 2500 N = 3 # One could switch around C and N now. A = numpy.random.random( [N, K] ) B = numpy.random.random( [C, K] ) # beca def distbeca(): d = zeros([N, C], dtype=float) if N < C: for i in range(N): xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1)) return d else: for j in range(C): xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1)) return d #loehnert def distloehner1(): # drawback: memory usage temporarily doubled # solution see below d = A[:, newaxis, :] - B[newaxis, :, :] # written as 3 expressions for more clarity d = sqrt((d**2).sum(axis=2)) return d if name == "main": t1 = timeit.Timer('distbeca()', 'from temp import distbeca').timeit(100) t8 = timeit.Timer('distloehner1()', 'from temp import distloehner1').timeit(100) fmt="%-10s:\t"+"%10.6f" print fmt%('distbeca', t1) print fmt%('distloehner1', t8)


Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list