I'm not sure I understand why your "mathematically" statement implies that you can delay feedback to the end of the circuit. However, there is a construction that allows you to delay feedback until the end of the circuit, but requires you to hold an additional ancilla qubit for every $T$ gate in your circuit. Provided your circuit is only polynomially long, this only requires a polynomial space overhead, but it could be significant.

This is from Fig. 25 of Litinski's Game of Surface Codes paper. It allows you to choose whether a teleported $\pi/8$ rotation (generalized T-gate) is replaced by its hermitian conjugate ($-\pi/8$ rotation) by measuring an ancilla "control" qubit in either the X or Z basis. It also does not require any Clifford correction, but only a Pauli correction.
This means you can execute your entire circuit without any Clifford corrections. The Pauli corrections will change some $T$ gates to $T^\dagger$ gates when you commute the Pauli corrections to the end of the circuit, but you can decide whether you implement a $T$ or $T^\dagger$ by changing the measurement basis of that gate's control qubit. The lack of Clifford corrections plus the ability to replace a $T$ gate with $T^\dagger$ after the teleportation allows you to delay feedback until the end.
Note, however, that you'll have to perform your end-of-the-line measurements sequentially, not in parallel. Measuring the control qubit of the first $T$ gate tells you what Pauli correction occurred, which tells you if the subsequent $T$ gates should be replaced with $T^\dagger$s. So before you measure the control qubit of the $n$th $T$-gate, you need to have measured all the control qubits of the $T$ gates before that $T$ gate that could introduce Pauli corrections that affect the $n$th $T$ gate, so you know if it should be replaced by $T^\dagger$.