I want to make a CNN model in Keras which can be fed images of different sizes. According to other questions, I could understand how to set a model, like Input =(None,None,3). However, I'm not sure how to prepare the input/output datasets.
Concretely, now I want to combine the datasets with (100,100) and (240,360).
However, I don't know how to combine these datasets.
- 14,308
- 10
- 59
- 98
- 311
- 1
- 3
- 8
6 Answers
Conventionally, when dealing with images of different sizes in CNN(which happens very often in real world problems), we resize the images to the size of the smallest images with the help of any image manipulation library (OpenCV, PIL etc) or some times, pad the images of unequal size to desired size. Resizing the image is simpler and is used most often.
As mentioned by Media in the above answer, it is not possible to directly use images of different sizes. It is because when you define a CNN architecture, you plan as to how many layers you should have depending on the input size. Without having a fixed input shape, you cannot define architecture of your model. It is therefore necessary to convert all your images to same size.
- 296
- 2
- 6
There is a way to include both image sizes. You can preprocess your images so that they are re-sized to the same dimensions.
Some of the freely available code that shows this:
img_width, img_height = 150, 150
train_data_dir = '/yourdir/train'
validation_data_dir = '/yourdir/validation'
nb_train_samples =
nb_validation_samples =
epochs = 50
batch_size = 16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
This uses the Keras image flow API for data augmentation on the fly, and the data generators at the bottom of the code will adjust your images to whatever dimensions you specify at the top.
- 41
- 1
One way is to pad the images while training. That is to say, while training, Keras will expect all tensors in a batch to be of the same size. However, while inference, if you use only a single image, it can be of any size. So what you can do while training is to pad your 100 x 100 images so that their new dimension after padding becomes 240 x 360.
You can have a look at this tutorial.
- 1,657
- 9
- 25
- 39
- 11
- 1
At least, as far as I know, you can't. The reason is clear. In neural networks, you attempt to find appropriate weights to diminish a typical cost function. You have to find appropriate weights for a specified number of predefined weights. When you specify an input shape, the rest of the network weights will depend on the weights of input. You can't change the input size of a network. In other words, you can't feed your network with different input sizes for convolutional networks. A typical solution for dealing with such situations is to resize the input.
- 14,308
- 10
- 59
- 98
There are some ways to deal with it but they do not solve the problem well. You can use black pixels, special values for nan, resizing and a separate mask layer that says where the information on the picture is. But most likely they are working not so well. Otherwise the image datasets would have images of different sizes. Separate layers for masks is used in the currently best image recognition neural network (SENet. Hu et al. Winner of ImageNet in 2017). But they use masking for zooming into the picture and not for different image sizes.
- 1,299
- 9
- 14