3

In a standard DNA system with $4$ bases ($\texttt{A}: 0, \texttt{T}: 1, \texttt{C}: 2, \texttt{G}: 3$), we enforce an $\ell$-RLL constraint (no more than $\ell$ consecutive identical bases, e.g., for $\ell= 2$, $\texttt{CCC}$ is forbidden) using a state-based transition matrix.

The sequence is transformed such that a zero represents a repetition and non-zero symbols ($1, 2, 3$) represent changes). The $\ell$-RLL constraint becomes an $\left ( \ell- 1 \right )$-zero-RLL constraint (no more than $\ell- 1$ consecutive zeros).

The transition matrix $B$ has $\ell$ states ($0$ to $\ell- 1$), where state $i$ indicates the last $i$ symbols were zeros. Transitions are:

  1. From state $i< \ell- 1$:
  • Output a zero: Move to state $i+ 1$ ($1$ way).
  • Output a non-zero: Move to state $0$ ($3$ ways).
  1. From state $\ell- 1$:
  • Output a non-zero: Move to state $0$ ($3$ ways).

The $\ell\times\ell$ matrix $B$ is:

  1. For $i= 0$ to $\ell- 2$:
  • $B\left [ i, 0 \right ]= 3, B\left [ i, i+ 1 \right ]= 1$.
  1. For $i= \ell- 1$:
  • $B\left [ \ell- 1, 0 \right ]= 3$.

For general $\ell$: $$B= \begin{bmatrix} 3 & 1 & \ddots & 0 & 0\\ 3 & 0 & 1 & \ddots & 0\\ 3 & 0 & \ddots & 1 & \vdots\\ 3 & \vdots & 0 & 0 & 1\\ 3 & 0 & 0 & 0 & 0\\ \end{bmatrix}_{\ell\times\ell}$$ This maximizes the spectral radius for standard DNA.

For a DNA composite system with 6 letters ($\texttt{A}: 0, \texttt{T}: 1, \texttt{C}: 2, \texttt{G}: 3, \texttt{M}: 4\left ( \texttt{A}\mid\texttt{C} \right ), \texttt{K}: 5\left ( \texttt{T}\mid\texttt{G} \right )$), constructing a similar transition matrix is challenging due to composite letters. For example:

  • $\texttt{M}\rightarrow\texttt{A}$: If $\texttt{M}$ acts as $\texttt{A}$, potentially a zero in a transformed sequence, for instance, as $\texttt{AAA}$, thus, $\texttt{AMA}$ violates $2$-RLL.
  • $\texttt{C}\rightarrow\texttt{M}\rightarrow\texttt{A}$: [I don't have an idea yet, but take a look at the example....], for instance, as $\texttt{CCA}$ or $\texttt{CAA}$ could be considered, but $\texttt{CMA}$ doesn't violate $2$-RLL.

So, is there a way to construct a transition matrix $A$ for DNA composite?

There is, but it requires a $6\ell\times 6\ell$ matrix: $$A= \begin{bmatrix} U & L & L & L & U & L\\ L & U & L & L & U & L\\ L & L & U & L & L & U\\ L & L & L & U & L & U\\ U & U & L & L & U & L\\ L & L & U & U & L & U\\ \end{bmatrix},\,{\rm where}\,U= \begin{bmatrix} 0 & 1 & 0 & \cdots & 0 & 0\\ 0 & 0 & 1 & 0 & \cdots & 0\\ 0 & \ddots & 0 & \ddots & 0 & 0\\ 0 & 0 & \cdots & 0 & 1 & 0\\ 0 & 0 & 0 & \ddots & 0 & 1\\ 0 & 0 & 0 & 0 & \cdots & 0\\ \end{bmatrix}_{\ell\times\ell}, \quad L= \begin{bmatrix} 1 & 0 & 0 & \cdots & 0 & 0\\ 1 & 0 & \ddots & 0 & 0 & 0\\ \vdots & 0 & 0 & \ddots & 0 & 0\\ 1 & 0 & 0 & 0 & \cdots & 0\\ 1 & 0 & 0 & 0 & 0 & \cdots\\ 1 & 0 & 0 & \cdots & 0 & 0\\ \end{bmatrix}_{\ell\times\ell}.$$

To obtain an $\ell\times\ell$ transition matrix, we tried a differential mapping $d_{j}= s_{i+ 1}- s_{i}\mod 6$, setting rows and columns for difference $0$ to zero ($B\left [ 0, j \right ]= 0, B\left [ i, 0 \right ]= 0$ to manage repetitions), but this restricts connectivity due to $\texttt{A}: 0, \texttt{T}: 1, \texttt{C}: 2, \texttt{G}: 3, \texttt{M}: 4\left ( \texttt{A}\mid\texttt{C} \right ), \texttt{K}: 5\left ( \texttt{T}\mid\texttt{G} \right )$, for instance, $4- 0\pmod 6$ or $4- 2\pmod 6$ "should have been" $0$.

Question. So, how can we construct the logical/arithmetic structure for this $\ell\times\ell$ transition matrix?

My motivation is that, with an optimal-size transition matrix, we might be able to compare the capacity of DNA composites more effectively.

Dang Dang
  • 320

0 Answers0