How to prepare the varied size input in CNN prediction

Question

I want to make a CNN model in Keras which can be fed images of different sizes. According to other questions, I could understand how to set a model, like Input =(None,None,3). However, I'm not sure how to prepare the input/output datasets. Concretely, now I want to combine the datasets with (100,100) and (240,360). However, I don't know how to combine these datasets.

score 11 · Answer 1 · answered Oct 31 '18 at 14:04

Conventionally, when dealing with images of different sizes in CNN(which happens very often in real world problems), we resize the images to the size of the smallest images with the help of any image manipulation library (OpenCV, PIL etc) or some times, pad the images of unequal size to desired size. Resizing the image is simpler and is used most often.

As mentioned by Media in the above answer, it is not possible to directly use images of different sizes. It is because when you define a CNN architecture, you plan as to how many layers you should have depending on the input size. Without having a fixed input shape, you cannot define architecture of your model. It is therefore necessary to convert all your images to same size.

score 4 · Answer 2 · answered Feb 21 '19 at 14:06

There is a way to include both image sizes. You can preprocess your images so that they are re-sized to the same dimensions.

Some of the freely available code that shows this:

img_width, img_height = 150, 150

train_data_dir = '/yourdir/train'
validation_data_dir = '/yourdir/validation'
nb_train_samples = 
nb_validation_samples = 
epochs = 50
batch_size = 16

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)



model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])


train_datagen = ImageDataGenerator(
    rescale=1. / 255,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

This uses the Keras image flow API for data augmentation on the fly, and the data generators at the bottom of the code will adjust your images to whatever dimensions you specify at the top.

score 2 · Answer 3 · edited Aug 15 '20 at 18:19

2

There is a concatenate function in Keras (docs and docs).
Also see this paper. Its application can be seen here and here.

This method can be used to have multiple input channels with different image sizes.

edited Aug 15 '20 at 18:19

Zephyr

997
4
11
20

answered Oct 31 '18 at 01:15

rnso

1,608
3
19
35

score 1 · Answer 4 · edited May 06 '21 at 14:23

1

One way is to pad the images while training. That is to say, while training, Keras will expect all tensors in a batch to be of the same size. However, while inference, if you use only a single image, it can be of any size. So what you can do while training is to pad your 100 x 100 images so that their new dimension after padding becomes 240 x 360.

You can have a look at this tutorial.

edited May 06 '21 at 14:23

Ethan

1,657
9
25
39

answered May 06 '21 at 06:06

user117206

11
1

score 0 · Answer 5 · answered Oct 30 '18 at 17:25

At least, as far as I know, you can't. The reason is clear. In neural networks, you attempt to find appropriate weights to diminish a typical cost function. You have to find appropriate weights for a specified number of predefined weights. When you specify an input shape, the rest of the network weights will depend on the weights of input. You can't change the input size of a network. In other words, you can't feed your network with different input sizes for convolutional networks. A typical solution for dealing with such situations is to resize the input.

score 0 · Answer 6 · answered Oct 30 '18 at 23:59

There are some ways to deal with it but they do not solve the problem well. You can use black pixels, special values for nan, resizing and a separate mask layer that says where the information on the picture is. But most likely they are working not so well. Otherwise the image datasets would have images of different sizes. Separate layers for masks is used in the currently best image recognition neural network (SENet. Hu et al. Winner of ImageNet in 2017). But they use masking for zooming into the picture and not for different image sizes.

How to prepare the varied size input in CNN prediction

6 Answers6