Tuesday, November 11, 2008

Clustering by affinity propagation

Affinity propagation (AP) is a novel clustering algorithm, first published in NIPS 05 (Mixture modeling by Affinity Propagation) and then in Science, Feb 07 (Clustering by Passing Messages Between Data Points). Slightly different from common clustering setting, it tries to find a subset of the date points as the cluster centers called "exemplars". It iteratively passes two types of messages between each pair of data points. The first type of messages is called responsibility, which is how much the source date point prefers the target data point as its exemplar. The other type of messages is called availability, which is how much the source data point would like to be the exemplar of the target data point. These two types of messages influence each other (for example, if b is more available to a than c is, then a would probably prefer b to c as its exemplar), so we can update them iteratively until convergence. Then we combine responsibility with availability to determine the clustering. Very intuitive right? A nice feature is that AP can determine the cluster number automatically, based on the a priori setting of how preferably each data point is an exemplar.

What impressed me is that, such an intuitive algorithm is actually derived from a standard inference algorithm on factor graphs. A factor graph can be constructed representing the factorization of the energy function of the clustering as well as the constraint that each cluster center must belong to the cluster. The max-sum algorithm (a message passing algorithm, highly related to the loopy belief propagation algorithm on Bayesian networks) can then be used to find an approximate solution (i.e., clustering) that minimizes the energy function. The original max-sum algorithm has exponential time complexity on this fact graph, but can be simplified to O(n^2), resulting in the AP algorithm.

Some additional story:
One year after the publication of AP in Science, there was a technical comment published in Science, Feb 2008. The authors of the comment found that a 40-year-old algorithm called vertex substitution heuristic or VSH (a greedy search algorithm starting from a random initialization) often produces lower error rate than AP with comparable computational time. So they concluded that "...(AP) although a welcome addition to the clustering literature, does not possess the relative advantages over existing procedures that were touted in (the AP paper)." Apparently they were not very happy with AP. Then there was a response from the AP guys, in the same issue of Science. They confirmed the experimental results in the comment, but they also tested some much larger data sets. What they found is that, AP and VSH had comparable error rates over all the data sets, but VSH took significantly more time on larger data sets. On the largest data set that they tested (with 18k data points), AP took 2 hours while VSH took 10 days. So AP is actually much more efficient. Still, I think it would be nice to see how AP compares with many other widely used clustering algorithms on problems of different sizes. From one of the authors' homepage, it seems that they've done some more comprehensive tests, but the result is yet to be published.

3 comments:

James Li said...

Affinity propagation is definitely a very interesting approach to a old classic problem. I have experimented with the VSH algorithm with matlab code posted by the source of that comment. I got the same numerical results as stated in those code, except the performance. VSH is much slower than AP, i.e. 10 to 100 times
slower than AP for data set with just few hundreds data points. So, it seems to me too far fetched to say VSH offers comparable performance, even for small datasets.

TU said...

Thanks for the additional info. Also I find your blog post "K-Center and Affinity Propagation Clustering" informative. I think the last test somehow shows the limitation of AP, that it has to choose data points as cluster centers. In fact, I find that in order to go beyond this limitation, the algorithm would have exponential time complexity.

Anonymous said...
This comment has been removed by a blog administrator.