Calculate the CPI of a program

Question

foo:
addi $t0,$zero,1
addi $v0,$zero,0
outer:
beq $t0,$zero,exitout
sll $t0,$t0,0
addi $t1,$zero,0
add $t2,$zero,$a0
addi $t0,$zero,0
inner: addi $t8,$a1,-1
slt $t9,$t1,$t8
beq $t9,$zero,outer
sll $t0,$t0,0
addi $v0,$v0,1
lw $t8,0($t2)
lw $t9,4($t2)
slt $t7,$t9,$t8
beq $t7,$zero,skip
sll $t0,$t0,0
lw $t8,0($t2)
lw $t9,4($t2)
sw $t9,0($t2)
sw $t8,4($t2)
addi $t0,$zero,1
skip: addi $t2,$t2,4
j inner
addi $t1,$t1,1
exitout:
jr $ra
sll $t0,$t0,0

Question What is the clock cycles per instruction (CPI) when executing foo with the following C code call: foo(lst,1) (that is the second argument is 1 instead of 3)? Do not include the time it takes to call the function, but you must include the clock cycles for returning. That is, count the clock cycles up until just before the next instruction is fetched after returning from foo. Include the branch delay slot in the instruction count.The first instruction is located at address 0x40003300.

int lst[] ={100,23,8};
int r = foo(lst,3);

Here is what I don't understand Don't we only execute 11 instructions from 1st to 10th then 4th agan? Or do we include the branch penality instructions also in which case we will have 15? Also how can 4 misses give 40 cycles? Also we never get far enough to execute 3 branches. Can someone just help me?

Answers

By following the program, we can see that there will be no memory accesses. Hence, we do not have any data cache misses. Only 4 instruction cache blocks will be touched. Hence, we have 4 instruction cache misses, which imply a cost of 40 clock cycles. If we count the number of executed instructions, we get 12 instructions without penalties for hazards. We do not have any stalls due to data hazards, but we have 3 branch instructions that give a penalty of 3 + 3 + 1 = 7 clock cycles. Hence, we have in total 40 + 12 + 7 = 59 clock cycles. If we count, we get 15 instructions, including the delay slot instructions. Hence, the CPI is 59/15.

score 1 · Answer 1 · answered Jan 25 '24 at 19:27

Begin by drawing a table of the instructions executed:

#	INSN	OP1	OP2	OP3	COMMENT
1	addi	t0	zero	1	t0 = 1
2	addi	v0	zero	0	v0 = 0
3	beq	t0	zero	exitout	t0 != 0 (hit)
4	sll	t0	t0	0	(bds)
5	addi	t1	zero	0	t1 = 0
6	add	t2	zero	a0	t2 = 0x40003300
7	addi	t0	zero	0	t0 = 0
8	addi	t8	a1	-1	t8 = a1 - 1 = 1 - 1 = 0
9	slt	t9	t1	t8	t9 = 0
10	beq	t9	zero	outer	t9 == 0 (miss)
11	sll	t0	t0	0	(bds)
12	beq	t0	zero	exitout	t0 == 0 (miss)
13	sll	t0	t0	0	(bds)
14	jr	ra
15	sll	t0	t0	0	(bds)

I've marked the instructions in the branch delay slots with (bds).

Your question doesn't describe the microarchitecture so we assume it is the five-stage MIPS pipeline commonly used in education. It has five stages; instruction fetch (F), decode (D), execute (X), memory (M), and writeback (W). We assume bypasses from execute to decode for arithmetic instructions and from decode to fetch for branches (so that branches never stall). I have indicated stalls with "!":

#	INSN	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21
1	`addi t0, zero, 1`	F	D	X	M	W
2	`addi v0, zero, 0`		F	D	X	M	W
3	`beq t0, zero, exitout`			F	D	X	M	W
4	`sll t0, t0, 0`				F	D	X	M	W
5	`addi t1, zero, 0`					F	D	X	M	W
6	`add t2, zero, a0`						F	D	X	M	W
7	`addi t0, zero, 0`							F	D	X	M	W
8	`addi t8, a1, -1`								F	D	X	M	W
9	`slt t9, t1, t8`									F	!	D	X	M	W
10	`beq t9, zero, outer`											F	!	D	X	M	W
11	`sll t0, t0, 0`													F	D	X	M	W
12	`beq t0, zero, exitout`														F	D	X	M	W
13	`sll t0, t0, 0`															F	D	X	M	W
14	`jr ra`																F	D	X	M	W
15	`sll t0, t0, 0`																	F	D	X	M	W

The answer claims that there are no stalls due to data hazards. That's, afaict, incorrect; both instruction 9 and 10 need to stall due to RAW hazards. We can compute the total number of cycles with $n + (p - 1) + s$, where $n$ is the number of instructions, $p$ the number of pipeline stages, and $s$, the number of stall cycles, so $15 + (5 - 1) + 2 = 21$. I'm not sure why there are four instruction cache misses or why the instruction cache miss penalty is 10 cycles, but if we account for that we get $(40 + 21) / 15 \approx 4.06$ CPI.

Calculate the CPI of a program

1 Answers1