"All-pairs vector intersection problem on sparse vectors"
W, and asked to find all "close" pairs of vectors (v,w) \in V x W. The pair (v,w) is close if the # of common 1s in v and w is greater than some threshold. The dimension of each vector (D) is assumed to be huge, but the vectors are sparse and contain only a small fixed number (m) of 1s. In the above setting, we are interested in algorithms that beat the naive O(n^2 m) algorithm, where n is the number of vectors in V or W. I was originally interested in the problem due to its application to a core sequence comparison problem. It also has applications in data-mining.
I'll talk about two algorithms for an approximate version of the problem. Both algorithms are adaptations of existing near-neighbour algorithms, and work by reducing the sparse vectors with huge dimension to dense vectors with low dimension. At the end of the talk, I'll discuss some theoretical problems that came out of my experience implementing the sequence comparison all-pairs application.