Ambiguous substring with mismatches

Question

I'm trying to use regular expressions to find a substring in a string of DNA. This substring has ambiguous bases, that like ATCGR, where R could be A or G. Also, the script must allow x number of mismatches. So this is my code

import regex

s = 'ACTGCTGAGTCGT'    
regex.findall(r"T[AG]T"+'{e<=1}', s, overlapped=True)

So, with one mismatch I would expect 3 substrings AC**TGC**TGAGTCGT and ACTGC**TGA**GTCGT and ACTGCTGAGT**CGT**. The expected result should be like this:

['TGC', 'TGA', 'AGT', 'CGT']

But the output is

['TGC', 'TGA']

Even using re.findall, the code doesn't recognize the last substring. On the other hand, if the code is setting to allow 2 mismatches with {e<=2}, the output is

['TGC', 'TGA']

Is there another way to get all the substrings?

Welcome to SO: Please have a read of [MCVE] to help improve your questions and increase your chances of getting help. — Shawn Mehan, Sep 22 '17 at 02:30
@leleonp: works well for me except that you have to use `regex.findall` instead of `regex.search` if you want the two matches. As an aside writing *"the code doesn't work"* doesn't help anyone. — Casimir et Hippolyte, Sep 22 '17 at 03:33
Thanks for your suggestions. But findall doesn't seems to work with all the occurrences of the substring — leleonp, Sep 22 '17 at 07:59

Casimir et Hippolyte · Accepted Answer · 2017-09-22T18:04:52.090

If I understand well, you are looking for all three letters substrings that match the pattern T[GA]T and you allow at worst one error, but I think the error you are looking for is only a character substitution since you never spoke about 2 letters results.

To obtain the expected result, you have to change {e<=1} to {s<=1} (or {s<2}) and to apply it to the whole pattern (and not only the last letter) enclosing it in a group (capturing or not capturing, like you want), otherwise the predicate {s<=1} is only linked to the last letter:

regex.findall(r'(T[AG]T){s<=1}', s, overlapped=True)

Ambiguous substring with mismatches

1 Answers1

Linked