Wednesday, November 26, 2008

Soft clustering with exemplar-based model

My previous post discussed affinity propagation, which is a hard clustering algorithm, i.e., it assigns each data point to exactly one cluster. Soft clustering, on the other hand, gives the degree or probability of one data point belonging to each cluster. This could be more flexible than hard clustering in many scenarios. The most common way of doing soft clustering is running EM on a mixture model (typically a Gaussian mixture). The problem is that the objective function (likelihood) may contain many local optima, making the algorithm unlikely to find a good solution in one run.

Remember that affinity propagation is different from typical clustering in that it requires cluster centers to be data points (exemplars). We can extend this idea to soft-clustering, and doing this greatly reduces the search space. Even better, this paper "Convex Clustering with Exemplar-Based Models" at NIPS-07 finds that the search space is actually convex. So, the problem becomes a convex optimization problem, and the global optimum can be found. Of course there's a downside of being exemplar-based, that we have to assume there are data points close enough to the real cluster centers, which is not always true.

1 comment:

12mahesh said...

Can any one please explain the affinity propagation algorithm with simple example?