$
\def\t{\times} \def\R#1{{\mathbb R}^{#1}}
\def\a{\alpha} \def\b{\beta}
\def\s{\sigma} \def\l{\lambda}
\def\lR#1{\Big(#1\Big)}
\def\bR#1{\Big[#1\Big]}
\def\LR#1{\left(#1\right)}
\def\BR#1{\left[#1\right]}
\def\CR#1{\left\lbrace #1 \right\rbrace}
\def\op#1{\operatorname{#1}}
\def\vc#1{\op{vec}\LR{#1}}
\def\trace#1{\op{Tr}\LR{#1}}
\def\cross#1{\,\left[\,#1\,\right]_\times}
\def\frob#1{\left\| #1 \right\|_F}
\def\q{\quad} \def\qq{\qquad}
\def\qif{\q\iff\q} \def\qiq{\q\implies\q}
\def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}}
\def\m#1{\left[\begin{array}{c}#1\end{array}\right]}
\def\mb#1{\left[\begin{array}{c|c}#1\end{array}\right]}
\def\rr#1{\color{red}{#1}}
\def\gg#1{\color{green}{#1}}
\def\bb#1{\color{blue}{#1}}
\def\RLR#1{\rr{\LR{#1}}}
\def\BLR#1{\bb{\LR{#1}}}
\def\GLR#1{\gg{\LR{#1}}}
\def\gradLR#1#2{\LR{\grad{#1}{#2}}}
\def\ai{\a^{-1}} \def\aa{\a^2}
\def\bi{\b^{-1}} \def\bbi{\b^{-2}}
\def\T{{\sf T}}
$Gather all of the indexed matrices into block matrices
$$\small\eqalign{
U = \m{U_1 \\ U_2 \\ \vdots \\ U_p}, \q
V = \m{V_1\;V_2 \ldots\;V_p}, \q
A = \m{
A_{11} & A_{12} & \ldots & A_{1p} \\
A_{21} & A_{22} & \ldots & A_{2p} \\
\vdots & \vdots & \ddots & \vdots \\
A_{p1} & A_{p2} & \ldots & A_{pp} \\
} \in\R{pn\t pn} \\
}$$
Then define these auxiliary variables
$$\eqalign{
B &= UV &\qiq dB = dU\:V + U\:dV \\
\a &= A:B &\qiq \gg{d\a = A:dB} \\
\b &= B:B &\qiq \rr{d\b = 2B:dB} \\
}$$
where a colon denotes the Frobenius product
which has the following properties
$$\eqalign{
A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\
B:B &= \frob{B}^2 \qquad \{ {\sf Frobenius\;norm} \}\\
A:B &= B:A \;=\; B^T:A^T \\
\LR{PQ}:B &= P:\LR{BQ^T} \;=\; Q:\LR{P^TB} \\
\\
}$$
Rewrite the objective function using the above notation
and calculate its differential
$$\eqalign{
\def\smB{{\small B}}
\l &= \bi\aa \\
d\l
&= \bi(2\a\,\gg{d\a}) \;-\; \aa\bbi\,\rr{d\b} \\
&= \bi(2\a\gg{A:dB}) \;-\; \aa\bbi\,\RLR{2B:dB} \\
&= \bb{2\bbi\a{\lR{\b A-\a B}}}:{dB} \\
&\bb{\equiv G}:{dB} \\
&= G:\LR{dU\:V + U\:dV} \\
&= GV^T:dU \;+\; U^TG:dV \\
}$$
from which the gradients are readily identified
$$\eqalign{
\grad{\l}{U} &= G V^T \qq
\grad{\l}{V} &= U^T G \qq\qq\qq
}$$
With the gradients in hand, we can apply gradient ascent
(since this is a $\max$ problem) and alternate between solving for $U$, then for $V$, then for $U,\,\ldots$ until convergence is achieved
$$\eqalign{
U_{k+1} &= U_k + \tau_k \gradLR{\l_k}{U_k}
&\qiq U_{k+1}=\frac{U_{k+1}}{\|U_{k+1}\|} \\
V_{j+1} &= V_j + \s_j \gradLR{\l_j}{V_j}
&\qiq V_{j+1}=\frac{V_{j+1}}{\|V_{j+1}\|} \\
}$$
where $\CR{\s_j,\tau_k}$ denote the steplengths in the respective ascent algorithms, and the subscripts denote the iteration count, rather than the partition of the block matrix.
Since the function value depends only on the matrix directions and not their magnitudes
$$\eqalign{
&\l(U,V) \;=\; \l\!\LR{\frac{U}{\|U\|},\,\frac{V}{\|V\|}} \\
}$$
normalizing $U$ and $V$ on every step
will prevent runaway growth of the matrices.
Another idea is to set the gradients to zero, and solve for each matrix variable in terms of the pseudoinverse of the other, i.e.
$$\eqalign{
\def\g{\gamma\,}
U &= \g AV^{\bf+}
&\qiq U=\frac{U}{\|U\|} \\
V &= \g U^{\bf+}\!A
&\qiq V=\frac{V}{\|V\|} \\
}$$
where $\g=\large\LR{\frac{UV\,:\,UV}{\:A\::\,UV}}\,$ generates the correct gradient, but since we normalize immediately afterward, setting $\g={\tt1}$ also works and avoids unnecessary computations.
Once again, you hold $V$ constant and solve for $U$, then hold $U$ constant and solve for $V$, then solve for $U$, then for $V\:\ldots$ until convergence.
This is Alternating Least Squares and converges faster than gradient ascent/descent.
After testing a few small systems, this problem behaves like
Nonnegative Matrix Factorization, i.e. it probably has a global optimum (which is $\sf NP$-hard to find) and it has lots of practical optima which are very easy to compute.