On page 67 of the same spec, Algorithm 42 does exactly what you are asking for. Here it is again in Python:
def strategy(n, p, q):
S = { 1: [] }
C = { 1: 0 }
for i in range(2, n+2):
b, cost = min(((b, C[i-b] + C[b] + b*p + (i-b)*q) for b in range(1,i)),
key=lambda t: t[1])
S[i] = [b] + S[i-b] + S[b]
C[i] = cost
return S[n+1]
The parameters p and q are used to weigh the strategy, they depend on your implementation of SIKE. Following the definitions of the spec, p is the cost of one multiplication step, q the cost of one isogeny step.
When choosing a strategy, there may be other considerations than those related to the weights p and q. For example, you may prefer a strategy that is slightly more costly, but easier to parallelize. So the algorithm above is by no means a definitive answer: you should choose whatever strategy works best for you, as long as it is a valid one (see page 15 of the spec). If you have no idea what's best for you, choose the ones given in appendix C of the spec, or use a non-optimized strategy.
Regarding your "why" question, each "leaf point" defines the kernel of one ℓ-isogeny step. You want to compute all of them, so you can compute all isogenies, and compose them. This is not a theorem. It is just the best way we know.