Nonparametric models don't have a fixed number of effective parameters, but instead have it determined by data. This is very useful if we don't know the parameter number in advance. For example, in Gaussian mixture model we have to specify K, the number of Gaussian distributions (components). If we want to use the model to fit some data but don't know the value of K, we have to make a set of models with different values of K, and make our choice based on some criteria like AIC or BIC. With a nonparametric model (more specifically, using Dirichlet process (DP), or equivalently, Chinese restaurant process (CRP) as the prior distribution), we assume an infinite number of components, and do a posterior inference which determines how many of them are actually used. The expected number of effective components is controlled by a hyperparameter.
Because of the popularity of mixture models in AI and machine learning, DP (and CRP) becomes one of the best known nonparametric methods. One limitation of mixture models is that each data point belongs to exactly one component, while in some cases we'd like to associate each data point with multiple labels (or no label at all). For example, an object may have several features, or belong to multiple classes at the same time. To go beyond mixture models, we need a new nonparametric method. The NIPS 06 paper "Infinite latent feature models and the Indian buffet process" by Griffiths and Ghahramani described such a model. Since we can represent the labels of a data point by a binary vector, the labeling of a set of data points can be represented by a matrix. By taking the limit of the distribution over such matrices (actually the equivalence classes of such matrices so that the label order doesn't matter; see their paper) as the number of labels approaches infinity, they get an extension of CRP where each data point can have multiple labels. Just like CRP, they also described a (restaurant-related) stochastic process that samples from the distribution, called Indian buffet process (IBP). Now instead of choosing a table, each customer need to choose buffet dishes, while the number of dishes they can choose is unfixed.
An interesting example is shown in their paper demonstrating how IBP can be used to generate features by learning from a set of images, each containing zero to four objects (with fixed positions however). By running Gibbs sampling on their model with IBP as the prior, they got seven features, where the four most frequent ones correspond exactly to the four objects.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment