There doesn't seem to be much work done on how to help LBP converge, and here is what I've found and tried.
- Damping. That is, when updating a message, set it to a weighted average of its old values and new values. This somewhat slows down the pace of LBP and avoids possible overshooting.
- Using a more informed message passing schedule. Instead of updating all the messages synchronously or in a fixed order, we can choose the next message according to some criterion. A very simple yet powerful schedule is called "residual belief propagation" (RBP), which always updates the message that has the most change (residue). An improved (but maybe less general) version of RBP is RBP0L, which estimates the residues instead of actually computing them.
With this experience, I suspect that the convergence problem is one of the reasons that LBP is less used for graphical model inference than MCMC and variational methods. For the latter two, in general we would know in advance whether we can use them or not; but for LBP, in many cases running it seems the only way to know if it works or not.