1

I'm trying to use a genetic algorithm that optimizes a 28x28 matrix (its shape) to make it look like an image of the number 7 that could be found in the MNIST image dataset. My attempt is to basically use the probability the CNN digit classifier gives my matrix of being the number 7 as a score (fitness) function to "breed" matrices with until I get one that looks like a 7. I'm implementing the CNN with pytorch, and the GA with object oriented programming where I've defined a member and a population datastruct.

My algorithm doesn't converge and I'm at a bit of a loss, any help is massively appreciated.

important bits of code: score function:

 def score(self):
        seed_tensor = torch.FloatTensor(self.seed).unsqueeze(0).unsqueeze(0).to(
            'cuda' if torch.cuda.is_available() else 'cpu')
        logits = loaded_model(seed_tensor)
        probabilities = torch.nn.functional.softmax(logits, dim=1)
        output = probabilities[0, 6].item()
        loss = 1 - output
        return np.exp(-loss)

reproduction:

 def reproduction(self):
        crossover_num = int(self.population_size * 0.75)
        crossovers = self.choose_parents(crossover_num )
        self.mutate(crossovers)
        reproduction_num = int(self.population_size * 0.25)
        for i in range(reproduction_num):
            parents = self.choose_parents_in_random()
            crossovers.extend(self.create_offsprings(*parents))
        self.members = crossovers

mutation

def mutate(self, mutation_rate):
        mask = np.random.uniform(0, 1, size=(self.rows, self.columns)) < mutation_rate
        self.seed[mask] = 1 - self.seed[mask]

crossover is at 75%, two parents create a new seed at the other 25%. here's my github with the code in it: https://github.com/Max-SF1/GA_Mnist_generator thank you for your help!

I've also recently tried a grayscale instead of binary matrix approach which similarly yielded no positive results - the algorithm does not converge.

kal_elk122
  • 33
  • 6

2 Answers2

2

When you say "fail to converge", do you mean the loss is not changing, or that the resulting images are bad?

If the loss is not changing, something is up with your code. If the loss goes down but the images don't look good, this is expected.

Your classifier has no ability to determine "digit" vs "non digit" images. The expected outcome of your approach would be images that look like noise, but nevertheless achieve high confidence classification scores. Your approach is expected to create adversarial examples.

The process you are describing (optimizing an image towards a classification decision) is one of the early ways adversarial examples were discovered. You might find this lecture interesting.

Karl
  • 1,176
  • 5
  • 7
1

mutation

        self.seed[mask] = 1 - self.seed[mask]

The mutation rate of half a percent suggests that we're flipping about four pixels there. But they're scattered all over.

Maybe constrain flipping to focus on a randomly chosen 6x6 block?

pixel density

Any given MNIST image will have ballpark 20% foreground pixels. I guess this flipping could eventually get us there? But I'm worried that it will tend to produce 50% television static and then stay there, because the recognition model sees nothing like a digit and cannot usefully guide us in the proper direction (no gradient).

Rather than exploring the space of all 784-bit binary strings, consider exploring the space of such bitstrings that have a popcount of roughly a hundred and fifty. (Count the median number of pixels in reference "7" images, rather than trust my numeric guess.) For generated images that start to have significantly too few or too many foreground pixels, just flip some at random to produce the desired population count.

As an aside, we see roughly double the foreground pixels in hand-drawn "0" images versus "1" images. So when asked to classify "0" vs "1", a classifier could resort to comparing population counts and picking a threshold that separates the two classes.

combining

When creating offspring, left side comes from member_2's pixels while right side comes from member_1. Rather than always picking a vertical dividing line, consider sometimes picking a horizontal dividing line. Or consider making offspring mostly resemble member_1, with a randomly chosen 9x9 patch copied from member_2.

last resort

If nothing is working there is still a last ditch "cheating" feature you could implement: occasionally copy a random patch from a reference "7" image directly into offspring. The idea is (A.) to give the recognition model a chance to steer the process toward convergence using a non-zero gradient, and (B.) to give you a better feel for what "healthy" evolution looks like, so you can diagnose / repair some troublesome logic and then delete the cheat.

J_H
  • 1,233
  • 9
  • 12