In this tutorial we show how the MCMC framework, which was introduced in the previous tutorial, can be used for shape model fitting.
We will illustrate it by computing a posterior of a shape model, given a set of corresponding landmark pairs. This is the same setup that we have discussed in the tutorial about Gaussian process regression. The difference is, that here we will also allow for rotation and translation of the model. In this setting, it is not possible anymore to compute the posterior analytically. Rather, our only hope are approximation methods, such as using Markov-chain monte carlo methods.
In this tutorial we show not only a working example, but also how to make it computationally efficient. Making the individual parts as efficient as possible is important in sampling approaches, as we need to produce many samples to get accurate estimates.
Week 3 of our online course on shape model fitting may provide some helpful context for this tutorial.
As in the previous tutorials, we start by importing some commonly used objects and initializing the system.
Loading and visualizing the data
In a first step, we load and visualize all the data that we need. First, we load the statistical model:
In this example, we will fit the model such that a set of model landmarks, coincide with a set of landmark points defined on a target face. We load and visualize the corresponding landmark data:
In the following, we will refer to the points on the model using their point id, while the target position is represented as physical points. The reason why we use the point id for the model is that the model instances, and therefore the points, which are represented by the point id, are changing as we fit the model.
We summarize the correspondences as a tuple, consisting of model id and target position.
The parameter class
In this example, we want to model the posterior , where the parameters consist of the translation parameters , the rotation parameters , represented as Euler angles as well a shape model coefficients . Furthermore, we also model the noise as a parameter.
As in the previous tutorial, we wrap this into a class representing the sample, which can keep track by whom it was generated. Furthermore, we will add convenience method,
which builds a
RigidTransformation from the parameters. As a rigid transformation
is not completely determined by the translation and rotation parameters, we need to
store also the center of rotation.
Evaluators: Modelling the target density
As in the previous tutorial, we represent the unnormalized posterior distribution as the product of prior and likelihood: , where denotes the data (i.e. the corresponding landmark points) and are our parameters.
As a prior over the shape parameters is given by the shape model. For the translation and rotation, we assume a zero-mean normal distribution. As the standard deviation characterizing the noise needs to be positive, we use a lognormal distribution.:
To compute the likelihood we first compute the current model instance as determined by the shape and pose parameters. From this model instance, the points at the given points id are extracted and the distance to their target position is computed. This distance is what was modelled by the uncertainty of the observations. We can therefore directly use the modelled uncertainty to compute the likelihood of our model given the data:
Conceptually, this is all that needed to be done to specify the target distribution. In practice, we are interested to make these evaluators as efficient as possible, as they are usually called thousands of times.
In the above implementation, we compute a full model instance (the new position of all the mesh points
represented by the shape model), although we are only interested in the position of the landmark points.
This is rather inefficient. A more efficient version would first marginalize the model to the
points of interest. Since marginalization changes the point ids, we need to map the
ids given as
correspondences to their new ids. This is achieved by the following helper function:
The more efficient version of the evaluator uses now the marginalized model to evaluate the likelihood:
In order for the Metropolis-Hastings algorithm to decide if a new sample is accepted,
the likelihood needs to be computed several times for each set of parameters. To further
increase the efficiency, we should therefore cache the computations, such that when
an evaluator is used the second time with the same parameters, the
not recomputed, but simply taken from cache. Using the following utility class,
we can obtain for any evaluator a new evaluator, which performs such caching:
The posterior evaluator
Given these evaluators, we can now build the computationally efficient version of our target density
The proposal generator
As in the previous tutorials, we will use simple random walk proposals. We will define separate proposals for shape, translation and rotation. On one hand, this lets us set the step length (i.e. stddev of the distribution from which we sample the next step) individually for each group, and thus to incorporate our knowledge that changes in rotation will be much smaller than the shape changes. On the other hand, splitting the parameter updates in blocks will increase our chance for the random updates to be accepted. The reason for this is that when many parameters are updated at one, chances are high that some of the proposed changes make the new state more unlikely, and hence increase the chance of the new state being rejected.
The definition of the proposals are straight-forward.
We start with the shape update proposal:
The update of the roation parameters is very similar. Note that we only update the rotation parameters, but keep the center of rotation unchanged.
We define a similar proposal for the translation.
The final proposal is a mixture of the proposals we defined above. We choose to update the shape more often than the translation and rotation parameters, as we expect most changes to be shape changes.
Building the Markov Chain
For running the Markov Chain, we proceed exactly as in the previous tutorial. We start by defining the logger, to compute the accept/reject ratios of the individual generators
We then create the initial sample, where we choose here the center of mass of the model mean as the rotation center.
Remark: Setting the rotation center correctly is very important for the rotation proposal to work as expected. Fortunately, most of the time this error is easy to diagnose, as the acceptance ratio of the rotation proposal will be unexpectedly low.
Next we set up the chain and obtain an iterator.
In this example we are interested to visualize some samples from the posterior as we run the chain. This can be done by augmenting the iterator with visualization code:
Finally, we draw the samples using the chain by consuming the iterator. We drop the first 1000 iterations, as the chain needs some burn-in time to converge to a equilibrium solution:
Before working with the results, we check the acceptance ratios to verify that all the proposals work as expected:
Analyzing the results
Once we have the samples, we can now use them to analyze our fit. For example, we can select the best fit from these samples and visualize it
The samples allow us to infer much more about the distribution. For example, we can estimate the expected position of any point in the model and the variance from the samples:
For efficiency reasons, we do the computations here only for the landmark points, using again the marginalized model:
Beyond landmark fitting
We have shown above how Scalismo can be used to perform Bayesian model fitting on the example of fitting 3D landmarks. This example can easily be extended to other fitting tasks, such as fitting the model to points with unkown correspondences, fitting shapes in surfaces of fitting a model to an image using an Active Shape Model as a likelihood function. In principle, all that is required is to change the likelihood function and rerun the fit. In practice, however, as a change in the likelihood function can dramatically change the posterior density, it is often required to tune the proposals, such that good convergence can be achieved. Indeed, finding good proposal distributions is the key to applying this method successfully. The more prior knowledge about the target distribution we can incorporate into the proposals, the faster will the chain converge to the equilibrium distribution.
For more complicated use-cases of this method in image analysis , we refer the interested reader is referred to the paper by S. Schönborn et al. and references therein:
- Schönborn, Sandro, et al. "Markov chain monte carlo for automated face image analysis." International Journal of Computer Vision 123.2 (2017): 160-183.