$
\def\o{{\tt1}}\def\p{\partial}
\def\L{\left}\def\R{\right}
\def\LR#1{\L(#1\R)}
\def\BR#1{\Big(#1\Big)}
\def\bR#1{\big(#1\big)}
\def\vecc#1{\operatorname{vec}\LR{#1}}
\def\diag#1{\operatorname{diag}\LR{#1}}
\def\Sym#1{\operatorname{Sym}\!\BR{#1}}
\def\sym#1{\operatorname{Sym}\LR{#1}}
\def\Diag#1{\operatorname{Diag}\LR{#1}}
\def\trace#1{\operatorname{Tr}\LR{#1}}
\def\qiq{\quad\implies\quad}
\def\grad#1#2{\frac{\p #1}{\p #2}}
\def\m#1{\left[\begin{array}{r}#1\end{array}\right]}
\def\c#1{\color{red}{#1}}
$First, define some new variables
$$\eqalign{
B &= \tfrac 12D \\
y_k &= Mx_k \\
X &= \m{x_1&x_2&\ldots&x_n}\;\in{\mathbb R}^{m\times n} \\
Y &= \m{y_1&y_2&\,\ldots&\,y_n}\;= MX \qiq \c{dY = dM\,X} \\
J &=X\oslash X \qiq J_{ij}=\o\quad\big({\rm all\!-\!ones\;matrix}\big) \\
X &= J\odot X \\
{\cal D}_Y
&= {\rm Diag}\BR{\!\vecc{Y}}\,\in{\mathbb R}^{mn\times mn} \\
}$$
where $\{\odot,\oslash\}$ denote elementwise/Hadamard multiplication and division.
Then rewrite the problem in pure matrix notation
and calculate its differential.
$$\eqalign{
2B &= (X\odot Y)^TJ + J^T(X\odot Y) - X^TY - Y^TX \\
&= 2\,\Sym{J^T(X\odot{Y}) - X^T{Y}} \\
dB &= \Sym{J^T(X\odot{\c{dY}}) - X^T{\c{dY}}} \\
&= \Sym{J^T(X\odot\LR{\c{dM\,X}}) - X^T{\c{dM\,X}}} \\
}$$
where $\;\sym{A} \doteq \tfrac 12\LR{A+A^T}$
At this point, one can calculate a component-wise gradient
$$\eqalign{
\grad{B}{M_{ij}}
&= \Sym{J^T(X\odot\LR{E_{ij}\,X}) - X^T{E_{ij}\,X}} \\
}$$
where $E_{ij}$ is a single-entry matrix whose $(i,j)$ element is $\o$ and all others are zero.
In terms of the original variable
$$\eqalign{
\grad{B}{M_{ij}} &= \frac 12 \LR{\grad{D}{M_{ij}}} \\
}$$
The problem with the requested gradient $\LR{\grad DM}$ is that it's a fourth-order tensor, which cannot be written in standard matrix notation.
Update
You've added a constraint to your original question, i.e.
$$M=A^TA \qiq \c{dM=dA^TA+A^TdA}$$
The procedure to find the new gradient is straightforward. Write the differential, change the independent variable from $P\to A$, then isolate the gradient.
Here is the calculation for the component-wise gradient
$$\eqalign{
dB &= \Sym{J^T(X\odot\LR{\c{dM\,X}}) - X^T\LR{\c{dM\,X}}} \\
&= \Sym{J^T(X\odot\LR{\c{dA^TAX+A^TdA\,X}}
- X^T\LR{\c{dA^TAX+A^TdA\,X}}} \\
&= 2 \Sym{J^T\bR{\!\LR{AX}\odot\LR{dA\,X}\!} - X^TA^TdA\,X} \\
\grad{B}{A_{ij}}
&= 2 \Sym{J^T\bR{\!\LR{AX}\odot\LR{E_{ij}X}\!}-X^TA^TE_{ij}X} \\
}$$
The linked paper is pretty good when compared to most
Machine Learning papers, but the mathematics is still far
too "hand-wavy". It's geared towards an audience who just want to code the gradient expression directly into their Python programs without bothering to check the math.
Deriving their gradient result using matrix notation would be a herculean effort. Even converting their result into matrix notation would be very difficult, because the nested sum violates the usual summation convention since it contains an expression with four $i$-indices and four $j$-indices. Conventional index notation only permits two of any single index.
If you want someone to decode that mess for you then, as Steph suggested, you should post a new question $-$ I'll let him answer that one.