vgg16 needs less epochs than resnet ,why?

Question

Recently i Have been comparing the vgg16 with resnetv1 with 20 layers.I have found out that although each epoch on vgg takes more time to complete,it generally needs less epoch to reach a certain training accuracy than resnet20.Why vgg learns faster ? is my experiments correct ? I have tried it on Cifar100 and a proportion of imagenet(tiny image net from stanford cv course)

the vgg has nearly 14m parameters but resnet has only 0.3m. here is my implementation of the resent:

def resnet_layer1(inputs,initializer,
             num_filters=16,
             kernel_size=3,
             strides=1,
             activation='relu',
             conv_first=True,batch_normalization=True):


conv = Conv2D(num_filters,
              kernel_size=kernel_size,
              strides=strides,
              padding='same',
              kernel_initializer=initializer
             )

x = inputs
if conv_first:
    x = conv(x)
    if batch_normalization:
        x = BatchNormalization()(x)
    x = Activation(activation)(x)
else:
    if batch_normalization:
        x = BatchNormalization()(x)
    x = Activation(activation)(x)
    x = conv(x)
return x
def resnet_1(model_number,x_train,y_train,x_test,y_test,datagen,initializer,epochs=20,bs=512,output_nodes=10,optim='adam',padding='same',dout=True,callbacks=None):

depth=20
# Start model definition.
num_filters = 16
num_res_blocks = int((depth - 2) / 6)


if model_number == 1:
    resnet_layer=resnet_layer1
elif model_number == 2:
    resnet_layer=resnet_layer2
elif model_number == 3:
    resnet_layer=resnet_layer3
elif model_number == 4:
    resnet_layer=resnet_layer4

inputs = Input(shape=x_train.shape[1:])
x = resnet_layer(inputs=inputs,initializer=initializer)
# Instantiate the stack of residual units
for stack in range(3):
    for res_block in range(num_res_blocks):
        strides = 1
        if stack > 0 and res_block == 0:  # first layer but not first stack
            strides = 2  # downsample
        y = resnet_layer(inputs=x,initializer=initializer,
                         num_filters=num_filters,
                         strides=strides)
        y = resnet_layer(inputs=y,initializer=initializer,
                         num_filters=num_filters,
                         activation=None)
        if stack > 0 and res_block == 0:  # first layer but not first stack
            # linear projection residual shortcut connection to match
            # changed dims
            x = resnet_layer(inputs=x,initializer=initializer,
                             num_filters=num_filters,
                             kernel_size=1,
                             strides=strides,
                             activation=None,
                             batch_normalization=False)
        x = layers.add([x, y])
        x = Activation('relu')(x)
    num_filters *= 2

# Add classifier on top.
# v1 does not use BN after last shortcut connection-ReLU
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(output_nodes,
                activation='softmax',
                kernel_initializer=initializer)(y)

# Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
model.summary()
# Compile the model
model.compile(loss=['categorical_crossentropy'], optimizer=optim, metrics=["accuracy"])

checkpointer = ModelCheckpoint(filepath=str(model_number)+'_weights.hdf5', verbose=1,save_weights_only=True,save_freq=2000000)

callbacks.append(checkpointer)

if x_test.all() == None:
    history=model.fit_generator(datagen.flow(x_train, y_train,batch_size=bs),callbacks=callbacks,epochs=epochs, steps_per_epoch=x_train.shape[0]//bs)
else:
    history=model.fit_generator(datagen.flow(x_train, y_train,batch_size=bs),callbacks=callbacks,epochs=epochs,validation_data=(x_test, y_test), steps_per_epoch=x_train.shape[0]//bs)

return history,model

score 5 · Answer 1 · answered Aug 24 '19 at 23:33

5

For some reason VGG might be better suited for cifar10 (maybe kernel sizes etc.). Generally speaking, however, this isn't the case. I've trained VGGs much slower than even the largest resnets (i.e. 150 layers).

answered Aug 24 '19 at 23:33

Javier

362
1
8

score 2 · Answer 2 · answered Aug 24 '19 at 20:32

I'm a pretty new to deep learning but will try to give an answer. A short answer could be the number of features the VGG has compared to the resnet. That being said, only relevant features are important to perform better. My guess is that the relevant features for your training are part of the VGG set and some might be absent from the resnet.

score -1 · Accepted Answer · answered Dec 18 '20 at 08:16

Now more than 1 year later looking at this question, I say VGG learns faster because it has more parameters than the restnet20. The higher number of the parameters allows it to overfit to the dataset faster. More parameters allow it to be a better feature extractor. Also I mentioned it takes fewer epochs for VGG, but it does not mean overall training time is lower.

vgg16 needs less epochs than resnet ,why?

3 Answers3