How to detect Farkas or MPEG4 FDP points on image with a face?

Question

Brief problem description

I'm using a Basel Morphable Face Model, which was labeled with MPEG4 and Farkas landmarks. I can generate different faces, use different lighting conditions, rotations, perspective etc.

I want to use it to train an algorithm in order to find these landmarks on real pictures further.

More strictly

Given a labeled set of images (with fixed size, if it's critical), I want to get a function $q\left( x \right)$ (where $x$ is an image), which will minimize loss expectation (Bayes' task)

$$E = \sum\limits_{x \in X, k \in K} P\left( k \mid x \right) \cdot W\left( k, q\left( x \right) \right)$$

$k$ is a tuple of feature points' coordinates like $<(x_1, y_1), (x_2, y_2), \dots>$.

Loss function is either

$$W\left(k, k'\right) = \sum\limits_{i=1}^n ||k_i - k_i'||^2$$

or

$$W\left(k, k'\right) = \sum\limits_{i=1}^n(1\; if\; ||k_i - k_i'|| > \Delta\; otherwise\; 0)$$

Maybe there is a better problem description, and loss expectation minimization is not suitable for this task?

Details

I guess, there are more specific algorithms for this issue than Brute Force (estimate probabilities by training set and use them to find landmarks on input images) or Artificial Neural Networks.

Why not ANN: the problem "as is" is incredibly hard. I have $44$ points, most of which have symmetric ones (so, it's something between $60$ and $80$). Given image $100×100$, I have at least $10'000^{60}$ different solutions and should find the best one (or weighted average in case of $l^2$ loss function). $10^{240}$ is a huge number. Add probabilities estimate to this Math carnival, and you'll get a problem of estimating probability function for $10'000$ dependent random variables via each image of the training set. So, if some general approach will work fast, I'm not sure that it will be accurate enough. The new problem will occur — what exactly will we lose when ANN will be applied?

Maybe two-dimensional context-free grammars can solve this, and I just need a proper representation to take scale and rotations into account.

Maybe the problem can be stated in terms of constraint satisfaction problem and solved efficiently.

Question

Which algorithms you can recommend for my purpose?

score 1 · Answer 1 · answered Feb 10 '17 at 04:56

In the simplified special case where each image $x$ has a single correct labelling $k=\kappa(x)$, this becomes just a regression problem: you want to find a function $q$ that minimizes the risk

$$\mathbb{E}[||q(X)-\kappa(X)||^2],$$

where $X$ is a random variable distributed according to some distribution on faces. One way to solve this is to build up a training set of pairs $(x_i,k_i)$ where $x_i$ is a face and $k_i$ is its label, and then minimize the empirical risk

$$\sum_i ||q(x_i)-k_i||^2.$$

This then becomes an optimization problem: find parameters for $q$ that minimize the empirical risk. You could use any of a number of different methods for regression, based on what structure you think $q$ might have. For instance, you could use a convolutional neural network.

Your problem is a more general version of this, where we want to minimize the risk

$$\mathbb{E}[\ell(X,q(X))],$$

where

$$\ell(x,y) = \int p(k|x) ||y-k||^2 dk.$$

I will assume you have an efficient way to compute $\ell(x,y)$ for any particular $x,y$ (whether by integrating analytically using some expression for $p(k|x)$, or by sampling many $k$ from the distribution $p(k|x)$ and taking the average value of $||y-k||^2$). If this is the case, one approach is again to build a training set of faces $x_i$, each with a known distribution $p(k|x_i)$, and then minimize the empirical risk

$$\sum_i \ell(x_i,q(x_i)).$$

This is just an optimization problem, where you try to find parameters for $q$ that minimize the empirical risk. Assuming you have an efficient way to compute $\ell(\cdot,\cdot)$, you can proceed as before and use any standard method for regression (e.g., a convolutional neural network).

How to detect Farkas or MPEG4 FDP points on image with a face?

Brief problem description

More strictly

Details

Question

1 Answers1