It seems to me that, one of the reasons that learning graphical model structure is hard is that it has to be optimized in a discrete space. In addition, when using score based methods for structure learning (see my old post for some intro), one has to do inference over parameters separately from the search of structures, because a parameter set is specific to a structure.
One strategy that circumvents these problems is to encode the structure in parameters. That is, we assume a fully connected graph, and the parameters will indicate which edges are effective. The prior on structures can now be added into the prior over parameters. For example, L0 or L1 regularization of parameters can enforce structure sparsity.
One problem with this strategy is that, in the case of discrete variables, a fully connected graph means we have exponentially many parameters. To avoid this problem, one can use micro-models for CPT, e.g., noisy-OR, logistic regression, etc. The L1-regularized logistic regression is a well-studied problem, which can be optimized efficiently.
It turns out that similar ideas have already been studied in recent years. For example, this paper from NIPS 2006 used L1-regularized logistic regression to learn the structure of discrete Markov network. The case of learning Bayesian network structure is more difficult, because there's no intuitive way to represent edge directions in parameters while avoiding cycles. One solution is to first learn the skeleton of the Bayesian network using the above idea, and then determine the edge directions, just like in constraint based structure learning methods. Another solution is to assume the ordering of variables, which uniquely determines the edge directions. An order search can be carried out to find a good ordering. These two methods are discussed in this paper from AAAI 2007.
Wednesday, August 25, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment