3

I have implemented in C++ the Needleman-Wunsch algorithm for pairwise sequence alignment using the following scores:

  • +1 for match (regardless of base)
  • -1 gap penalty
  • -1 for a mismatch.

Given two sequences of length $m$ and $n$ the algorithm then generates the $mn$ search space row by row as per the algorithm's specifications and backtracks generating all solutions of maximum score. Given two sequences in which the first two letters overlap ie:

GCATGCU

and

GATTACA

the algorithm succesfully generates all the alignments of maximum score. Howeever, given two sequences of the kind:

ACTGCUGCTA

and

UGCTAATCGU

the algorithm fails to find any of the solutions that begin with a gap in sequence 1 or 2, thus failing to find the optimal solution as well:

ACTGCUGCTA-----
-----UGCTAATCGU

Once I add the letter A to the second sequence the algorithm once again has no problem finding the optimal solution (which in this case begins with a match and not a gap)

ACTGCUGCTA-----
A----UGCTAATCGU

Could someone explain to me why the algorithm doesn't find this subset of solutions and perhaps how I can modify it to find these solutions as well?

andreas.vitikan
  • 215
  • 2
  • 6

1 Answers1

4

Your optimal alignment seems to be an optimal local alignment, the best substring match. Needleman-Wunsch is for global alignment. With the simple program by Eddy that you can find on the internet I have determined a global score of 0. This is better than the gobal score you get: -5 (for 10 gaps and 5 matches).

Sequence X: GCATGCU
Sequence Y: GATTACA
Scoring system: 1 for match; -1 for mismatch; -1 for gap

Dynamic programming matrix:
             G    A    T    T    A    C    A 
        0   -1   -2   -3   -4   -5   -6   -7 
   G   -1    1    0   -1   -2   -3   -4   -5 
   C   -2    0    0   -1   -2   -3   -2   -3 
   A   -3   -1    1    0   -1   -1   -2   -1 
   T   -4   -2    0    2    1    0   -1   -2 
   G   -5   -3   -1    1    1    0   -1   -2 
   C   -6   -4   -2    0    0    0    1    0 
   U   -7   -5   -3   -1   -1   -1    0    0 

Alignment:
X: GCA-TGCU
Y: G-ATTACA
Optimum alignment score: 0
Hendrik Jan
  • 31,459
  • 1
  • 54
  • 109