Monday, September 27, 2010

Three types of information for learning structured predictor

Structured prediction tries to predict the hidden structure of the input. In other words, it's a prediction problem with multiple outputs that are mutually dependent. An example is sentence parsing, where given an input sentence we want to predict the syntactic parse tree of the sentence. In the classic setting (i.e., supervised structured prediction), the training data is:

1. input data with their hidden structures labeled, e.g., sentences with their syntactic parsing

And of course, we can also have:

2. input data without structure labeling, e.g., just sentences

When both type-1 and type-2 info is available, it's called semi-supervised structured prediction. When only type-2 info is available, it's an unsupervised structured prediction problem, which is usually very hard.

For many structured prediction problems, there is a related binary classification problem, which distinguishes valid inputs from invalid ones. One can view a valid input as the input having at least one valid structure, and an invalid input as the input having no valid structure. In the field of formal grammar, the binary classification problem is associated with the weak generative capacity of a grammar, and the structured prediction problem is associated with the strong generative capacity of a grammar. So we can have a third type of information:

3. invalid inputs, e.g., ungrammatical sentences

The question is, whether this type-3 info can help structured prediction. This is studied in "Structured Output Learning with Indirect Supervision" from ICML 2010. They found that it does help, because invalid inputs further restrict the space of correct structures, as illustrated in their Figure.1.

Now let's turn to the binary classification problem. If both type-2 and type-3 info is available, this is the classic supervised learning; if only type-2 or only type-3 info is available, this is the one-class classification problem. So one would wonder, whether type-1 info would help the binary classification? This is also experimented in the same paper, and interestingly, the answer is again yes.

No comments: