Find the best embeddding for this gene/lipid graph

Question

I want to find a 'nice' drawing of the lipids and genes in my database. Lipids belong to one one of several classes, while genes belong to one of several regions. Each gene/lipid pair has an associated strength value, which indicates how closely related they are.

I have tried to define my dataset formally as follows. A gene-lipid embedding problem is a $5$-tuple $(\texttt{lipids}, \texttt{genes},\texttt{strength},\texttt{region},\texttt{class})$ where

$\texttt{lipids}$ and $\texttt{genes}$ are finite, non-empty, non-overlapping sets,
$\texttt{strength}$ is a map $\texttt{lipids} \times \texttt{genes} \to [0, 1]$,
$\texttt{region}$ maps genes to $\mathbb N$ and
$\texttt{class}$ maps lipids to $\mathbb N$

Essentially, my data is a sort of graph where the vertices are the lipids and genes, the edges and edge weights are given by the $\texttt{strength}$ function, and each vertex is of one of two types (lipid or gene) which belongs to class or region.

A candidate solution for such a problem is a map $\texttt{genes} \cup \texttt{lipids} \to \mathbb R^2$.

I want to find a 'best' map. By best I mean that

Lipids of the same class should be close together, and lipids of different classes should be far apart.
Similarly, genes of the same region should be drawn close together, and far apart otherwise.
Lipid/gene pairs with a high strength value should be drawn close to one another, and far apart otherwise.

My first question is: How can I mathematically define what it means for a candidate solution to be good? I.e. how can I come up with fitness function that assigns each candidate solution a score?

My second question is: Once I have defined such a function, how can I use it to find the 'best' solution?

For the second question, I’d suggest combinatorial optimization methods such as simulated annealing. The function is likely to have many local optima, and gradient-based descent methods might not easily find a good solution. — joriki, Mar 19 '24 at 11:50

Find the best embeddding for this gene/lipid graph

0 Answers0