1

I want to find a 'nice' drawing of the lipids and genes in my database. Lipids belong to one one of several classes, while genes belong to one of several regions. Each gene/lipid pair has an associated strength value, which indicates how closely related they are.

I have tried to define my dataset formally as follows. A gene-lipid embedding problem is a $5$-tuple $(\texttt{lipids}, \texttt{genes},\texttt{strength},\texttt{region},\texttt{class})$ where

  • $\texttt{lipids}$ and $\texttt{genes}$ are finite, non-empty, non-overlapping sets,
  • $\texttt{strength}$ is a map $\texttt{lipids} \times \texttt{genes} \to [0, 1]$,
  • $\texttt{region}$ maps genes to $\mathbb N$ and
  • $\texttt{class}$ maps lipids to $\mathbb N$

Essentially, my data is a sort of graph where the vertices are the lipids and genes, the edges and edge weights are given by the $\texttt{strength}$ function, and each vertex is of one of two types (lipid or gene) which belongs to class or region.

A candidate solution for such a problem is a map $\texttt{genes} \cup \texttt{lipids} \to \mathbb R^2$.

I want to find a 'best' map. By best I mean that

  • Lipids of the same class should be close together, and lipids of different classes should be far apart.
  • Similarly, genes of the same region should be drawn close together, and far apart otherwise.
  • Lipid/gene pairs with a high strength value should be drawn close to one another, and far apart otherwise.

My first question is: How can I mathematically define what it means for a candidate solution to be good? I.e. how can I come up with fitness function that assigns each candidate solution a score?

My second question is: Once I have defined such a function, how can I use it to find the 'best' solution?

TRP
  • 163
  • 1
    For the second question, I’d suggest combinatorial optimization methods such as simulated annealing. The function is likely to have many local optima, and gradient-based descent methods might not easily find a good solution. – joriki Mar 19 '24 at 11:50

0 Answers0