3

From what I gathered, data augmentation consists in increasing your number of instances in your dataset by applying some transfromations. Let's say I want to classify images. If I apply a random rotation to every image in a data set containing $n$ images, I will obtain a new dataset with $2n$ images, $n$ pairs of the original image plus it's random-rotated counterpart.

Assuming this is true, I don't understand what keras experimental layers related to data augmentation are doing.

Take tf.keras.layers.experimental.preprocessing.RandomRotation . In the image classification tutorial, it puts this layer inside the Sequential model like this:

model = Sequential([
  layers.experimental.preprocessing.RandomFlip("horizontal", 
                                                 input_shape=(img_height, 
                                                              img_width,
                                                              3)),
  layers.experimental.preprocessing.Rescaling(1./255),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Dropout(0.2),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])

This is already kind of weird because a layer produces an output from an input (obviously) but it doesn't duplicate the image. Anyway, I decided to check this in the documentation, and, in effect, this is what is happening.

Init signature: layers.experimental.preprocessing.RandomRotation(*args, **kwargs)
Docstring:     
Randomly rotate each image.

By default, random rotations are only applied during training. At inference time, the layer does nothing. If you need to apply random rotations at inference time, set training to True when calling the layer.

Input shape: 4D tensor with shape: (samples, height, width, channels), data_format='channels_last'.

Output shape: 4D tensor with shape: (samples, height, width, channels), data_format='channels_last'.

Therefore, I understand that I'm just randomly rotating the image, that is, changing each image in the dataset, but I'm not doing any data augmentation. However, I find this would make no sense, otherwise they wouldn't mention this as a data augmentation procedure. So what am I missing?

Zephyr
  • 997
  • 4
  • 11
  • 20
CMB
  • 43
  • 4

1 Answers1

1

if you look into the code you can figure what's exactly happening.

Take tf.keras.layers.experimental.preprocessing.RandomRotation for example

def call(self, inputs, training=True):
  ...
  def random_flipped_inputs():
    flipped_outputs = inputs
    if self.horizontal:
      flipped_outputs = image_ops.random_flip_left_right(flipped_outputs,
                                                         self.seed)
    ...
    return flipped_outputs

You will see it's gonna to random_flip_left_right on your input images, and the comments inside it is

"""Randomly flip an image horizontally (left to right).
With a 1 in 2 chance, outputs the contents of `image` flipped along the
second dimension, which is `width`.  Otherwise output the image as-is.
When passing a batch of images, each image will be randomly flipped
independent of other images."""
Example usage:
...
>>> images = np.array(
... [
...     [[[1], [2]], [[3], [4]]],
...     [[[5], [6]], [[7], [8]]]
... ])
>>> tf.image.random_flip_left_right(images, 6).numpy().tolist()
[[[[2], [1]], [[4], [3]]], [[[5], [6]], [[7], [8]]]]

So basically what it is doing is each image will have a 50% chance to horizontally be flipped, so if you have 10 epochs, image_1 will have possibility to be [original, flipped, original, flipped, ...], and so on so forth to other images. Seeing from the outside would identify we have 10 different datasets(particularly on multiple data augmentation).

Brady Huang
  • 126
  • 3