Wednesday, August 25, 2010

Graphical model structure learning using logistic regression

It seems to me that, one of the reasons that learning graphical model structure is hard is that it has to be optimized in a discrete space. In addition, when using score based methods for structure learning (see my old post for some intro), one has to do inference over parameters separately from the search of structures, because a parameter set is specific to a structure.

One strategy that circumvents these problems is to encode the structure in parameters. That is, we assume a fully connected graph, and the parameters will indicate which edges are effective. The prior on structures can now be added into the prior over parameters. For example, L0 or L1 regularization of parameters can enforce structure sparsity.

One problem with this strategy is that, in the case of discrete variables, a fully connected graph means we have exponentially many parameters. To avoid this problem, one can use micro-models for CPT, e.g., noisy-OR, logistic regression, etc. The L1-regularized logistic regression is a well-studied problem, which can be optimized efficiently.

It turns out that similar ideas have already been studied in recent years. For example, this paper from NIPS 2006 used L1-regularized logistic regression to learn the structure of discrete Markov network. The case of learning Bayesian network structure is more difficult, because there's no intuitive way to represent edge directions in parameters while avoiding cycles. One solution is to first learn the skeleton of the Bayesian network using the above idea, and then determine the edge directions, just like in constraint based structure learning methods. Another solution is to assume the ordering of variables, which uniquely determines the edge directions. An order search can be carried out to find a good ordering. These two methods are discussed in this paper from AAAI 2007.

Friday, August 6, 2010

Mind Reading

Decoding neural activities using machine learning methods is an emerging area since a few years ago. The neural data is usually obtained by presenting the word and/or image of a concept to an experiment participant and recording his brain images (e.g., fMRI, EEG). The types of concepts tested so far are very simple (e.g., concrete nouns, and more recently adjective-noun compositions), but I believe experiments on more complex and abstract concepts are to be expected in the near future (or are already on progress!). Given the neural imaging data, one natural task is to find out the mapping between concepts and images. An intermediate layer of semantic features can be added between concepts and images, which is intuitive and also makes things more tractable. So now the problems are what the right semantic features are, and how to find out the mappings between these layers.

For the first problem, in earlier work this is somewhat manually constructed. In "Predicting Human Brain Activity Associated with the Meanings of Nouns" (Science, May 2008), the semantic features are the co-occurrence frequency of the stimulus noun with 25 manually selected verbs in a large corpus. In a more recent paper, "A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes" (PLoS ONE, Jan 2010), these features are discovered from the fMRI data by means of factor analysis (and the result is very interesting: the three main features are related to manipulation, shelter and eating, all of which are the most important things for the survival of our primitive ancestors). With the semantic features specified, the second problem can be done by simply applying common machine learning predictors like Naive Bayes.