This is self study, but not homework. I am reviewing some slides I found online and have come across the following question.
Question:
If the latency of integer multiply is $3$ and the cycles/issue is $1$ then
- How fast can 10 independent int mults be executed? $$t_1 = a_1*b_1 \quad t_2 = a_2*b_2 \quad t_3 = a_3*b_3 \quad \cdots \quad t_{10} = a_{10}*b_{10}$$
- How fast can $10$ sequentially dependent int mults be executed? $$t_1 = a_1*b_1 \quad t_2 = t_1*b_2 \quad t_3 = t_2*b_3 \quad \cdots\quad t_{10} = t_9*b_{10}$$
Attempt:
Obviously the sequentially dependent case will take longer, because each multiply must wait for the previous multiply to finish. I'm not sure exactly how to interpret latency and cycles/issue in this context. My attempt.
Each multiply takes $3$ cycles, but we can start a new multiply each cycle (can we?), so we need $3 + 10 - 1 = 12$ cycles?
Naively, this would take $30$ cycles if we wait for the previous one to finish. It seems that we could do better though. For instance
- $t_1$ first
- $b_2*b_3$
- wait a cycle
- $t_2 = t_1* b_2$
- $t_3 = t_1*(b_2*b_3)$
So the first $3$ mutliplys can be done in $7$ cycles instead of $9$. I think I just need to see a problem worked out and I'll be able to pick up whats going on.