In my ADS course we were given this pseudo code for the "exact string matching brute force" algorithm:
1 ESM-BF(P, S)
2 m = length(P), n = length(S)
3 k = 0 # number of matches
4 for j=1,...,n-m+1do
5 i=1
6 while i ≤ m and P[i] == S[j+i] do
7 i = i +1
8 if i == m+1 then
9 k=k+1
10 return k
Given the example pattern P = ACGTACT
and the example string S = ACGGTACGTACGTACT
Note that indexing starts at 1.
I started to write down with pen and paper what would happen in the first 2 times the for is executed:
1 ESM-BF('ACGTACT', 'ACGGTACGTACGTACT')
2 m = 7
3 n = 16
4 for j = 1 # first round
5 i = 1
6 while 1 < 7 AND P[1] == S[1+1] # (--> 'A' == 'C' FALSE skip body)
7 if 1 == 7+1 # (FALSE skip body)
8 for j = 2 # second round in for loop
9 i = 1
10 while 1 < 7 AND P[1] == S[2+1] # (--> 'A' == 'G' FALSE skip body)
11 if 1 == 7+1 # (FALSE skip body)# back to next for
12 for j = 3 # third round in for loop
13 i = 1
14 while 1 < 7 AND P[1] == S[3+1] # (--> 'A' == 'G' FALSE skip body)
15 if 1 == 7+1 # (FALSE skip body)# back to next for
I see how it would eventually the right match once we reach S[6].
- Why do we not start comparing P[1] with S[1]?
- What would happen if the pattern P remained the same but the string was changed to: S = ACGTACTACGGTACGT (- as far as I can understand the algo would just miss the match...)?
Our professor told us that the ESM-BF compares the letters one by one starting from position one in the pattern and string and encouraged us to do it with pen and paper.
I asked some colleagues if they k but they are also confused about it.
What am I missing?
In case I got typos I will add the original from the slides to exclude that I just wrote it down wrong for myself:
