# Model fitting using MCMC - The basic framework

In this tutorial we show how Bayesian model fitting using Markov Chain Monte Carlo can be done in Scalismo. To be able to focus on the main components of the framework instead of technical details, we start in this tutorial with a simple toy example from statistics. Although the example has nothing to do with shape modelling, the modelling principles and the steps involve to do the inference are exactly the same. The application to shape modelling is discussed in depth in the next tutorial.

### Problem setting

The problem we are considering here is a simple toy problem from Bayesian statistics: We are trying to fit a (univariate) normal distribution $N(\mu, \sigma)$,
with unknown mean and unknown standard deviation to a set of data points.
In the following we will denote the unknown parameters by $\theta$; I.e. $\theta = (\mu, \sigma)$ and the observed data points
by $y$. In a Bayesian setting, doing inference means that we compute the *posterior distribution* $p(\theta | y)$. Formally, the *posterior distribution* is defined as follows:

$p(\theta \mid y) = \frac{p(\theta) p(y \mid \theta)}{p(y)}$.

The term $p(y | \theta)$ is called the likelihood function, and is given directly by the problem definition as $p(y | \theta) = N(\mu, \sigma)$. The term $p(\theta)$ is a prior distribution over the parameters, which we will define later. The final term involved $p(y)$ is called the marginal likelihood. Formally, it can be defined as $p(y) = \int_\theta p(y | \theta) p(\theta) d\theta$. Fortunately, we will never need to compute this quantity.

*Remark: Computing the posterior distribution of the parameters will also be our goal in a real shape model fitting application.
The only difference is that the parameters $\theta$ are not mean and standard deviation, but the shape model parameters, and the data $y$ are
not simulated numbers, but measurements of the target object, such as a set of landmark points, a surface or even an image.*

### Metropolis Hastings Algorithm

The way we approach such fitting problem in Scalismo is by using the Metropolis Hastings algorithm. The Metropolis-Hastings algorithm allows us to draw samples from any distribution, given that the unnormalized distribution can be evaluated point-wise. This requirement is easy to fulfill for all shape modelling applications.

For setting up the Metropolis-Hastings algorithm, we need two things:

- The (unnormalized) target distribution, from which we want to sample. In our case this is the posterior distribution $p(\theta \mid y)$. In Scalismo
the corresponding class is called the
`Distribution Evaluator`

. - A proposal distribution $Q(\theta' \mid \theta)$, which generates for a given sample $\theta$ a new sample $\theta'$.

The Metropolis Hastings algorithm introduces an ingenious scheme for accepting and rejecting the samples from this proposal distribution, based on their probability under the target density, such that the resulting sequence of samples is guaranteed to be distributed according to the target distribution. In practice, the algorithm works as follows: It uses the proposal generator to perturb a given sample $\theta$ to obtain a new sample $\theta'$. Then it checks, using the evaluator, which of the two samples, $\theta$ or $\theta'$ is more likely and uses this ratio as a basis for rejecting or accepting the new sample.

### Implementation in Scalismo

##### Preparation

As in the previous tutorials, we start by importing some commonly used objects and initializing the system.

To test our method, we generate data from a normal distribution $N(-5, 17)$.

Before we discuss the two main components, the *Evaluator* and *Proposal generator* in detail, we first define a class for representing
the parameters $\theta = (\mu, \sigma)$:

We introduce a further class to represent a sample from the chain. A sample is simply a set of parameters together with a tag, which helps us to keep track later on, which proposal generator generated the sample:

#### Evaluators: Modelling the target density

In Scalismo, the target density is represented by classes, which we will refer to
as *Evaluators*. Any Evaluator is a subclass of the class `DistributionEvalutor`

,
defined as follows:

We see that the only thing we need to define is the log probability of a sample.

In our case, we will define separate evaluators for the prior distribution $p(\theta)$ and the likelihood $p(y | \theta)$. The evaluator for the likelihood is simple: Assuming a normal model, we define the normal distribution with the given parameters $\theta$, and use this model to evaluate the likelihood of the individual observations. We assume that the observations are i.i.d. and hence the joint probability factorises as $p(y |\theta) = p(y_1, \ldots, y_n |\theta) = \prod_{i=1}^n p(y_i |\theta)$. This leads to the following implementation of the liklihood function:

Notice that we work in Scalismo with log probabilities, and hence the product in above formula becomes a sum.

As a prior, we also use for both parameters a univariate normal distribution.

The target density (i.e. the posterior distribution) can be computed by taking the product of the prior and the likelihood.

Note that the posteriorEvaluator represents the unnormalized posterior, as we did not normalize by the probability of the data $p(y)$.

#### The proposal generator

In Scalismo, a proposal generator is defined by extending the trait
*ProposalGenerator*, which is defined as follows

In order to be able to use a proposal generator in the Metropolis-Hastings algorithm,
we also need to implement the trait `TransitionProbability`

:

To keep things simple, we use here a *random walk proposal*. This is a proposal
which updates the current state by taking a step of random length in a random direction.
It is defined as follows:

*Remark: the second constructor argument implicit rng : scalismo.utils.Random
is used to automatically pass the globally defined random generator object to the
class. If we always use this random generator to generate our random numbers, we can obtain reproducible runs,
by seeding this random generator at the beginning of our program.*

Let's define two random walk proposals with different step length:

Varying the step length allow us to sometimes take large step, in order to explore the global
landscape, and sometimes smaller steps, to explore a local environment. We can combine these proposal into a
`MixtureProposal`

, which chooses the individual proposals with a given
probability. Here We choose to take the large step 20% of the time, and the smaller
steps 80% of the time:

#### Building the Markov Chain

Now that we have all the components set up, we can assemble the Markov Chain.

To run the chain, we obtain an iterator, which we then consume to drive the sampling generation. To obtain the iterator, we need to specify the initial sample:

Our initial parameters might be far away from a high-probability area of our target density. Therefore it might take a few hundred or even a few thousand iterations before the produced samples start to follow the required distribution. We therefore have to drop the samples in this burn-in phase, before we use the samples:

As we have generated synthetic data, we can check if the expected value, computed from this samples, really corresponds to the parameters from which we sampled our data:

In the next tutorial, we see an example of how the exact same mechanism can be used for fitting shape models. Before we discuss this, we should, however, spend some time to discuss how the chain can be debugged in case something goes wrong. You can safely skip this section and come back to it later if you first want to see a practical example.

#### Debugging the markov Chain

Sometimes a chain does not work as expected. The reason is usually that our proposals
are not suitable for the target distribution. To diagnose the
behaviour of the chain we can introduce a logger. To write a logger, we need to extend
the trait `AcceptRejectLogger`

, which is defined as follows:

The two methods, `accept`

and `reject`

are called whenever a sample is
accepted or rejected. We can overwrite these methods to implement our debugging code.

The following, very simple logger counts all the accepted and rejected samples and computes the acceptance ratio. This acceptance ratio is a simple, but already useful indicator to diagnose if all proposal generators function as expected.

To use the logger, we simply rerun the chain, but pass the logger now as
a second argument to the `iterator`

method:

We can now check how often the individual samples got accepted.

We see that the acceptance ratio of the random walk proposal, which takes the smaller step is quite high, but that the larger step is often rejected. We might therefore want to reduce this step size slightly, as a proposal that is so often rejected is not very efficient.

In more complicated applications, this type of debugging is crucial for obtaining efficient fitting algorithms.