In model selection and model averaging, it is important to have a good measurement of model complexity. With two models that fit the data equally well, the simpler model should be preferred (the Occam's razor principle). One reason is that usually the simpler model has better generalizability compared with the more complex one. However, it's not clear what the best way is to measure model complexity.
The paper "
Measuring Model Complexity with the Prior Predictive" (NIPS 2009) lists a few aspects of a model that a good complexity measure should take into account.
- the number of parameters
- the functional form (the way the parameters are combined in the model equation)
- the parameter range
- the prior distribution of the parameters
Therefore the complexity measures used in traditional criteria like
AIC and
BIC is not satisfying. This paper proposes a measure called
prior predictive complexity (PPC). Remember that a model defines a distribution over all possible outcomes (i.e., observables). So PPC is defined to be the range of outcomes that the model puts most of its probability mass on (say, 95%), normalized by the total range of outcomes. Since all four aspects of a model listed above influence the distribution over outcomes, PPC is sensitive to all of them. A natural alternative measure, in my opinion, would be to use the information entropy of the distribution over outcomes defined by the model, so we don't have to artificially specify a threshold like 95%.