3

I am trying to understand what is going on so I built a simpler version of my project. I set the X and the Y to be identical and I'm trying to predict Y using X, this should be very simple, but my setup isn't working. Here is my code :

import numpy
import keras
import pandas


# I want to evaluate the model when X and Y are the same
# This should be very easy for the model to evaluate
X = numpy.random.randint(2, size=100000)
Y = X

# Setup the model
model = keras.models.Sequential()

model.add(keras.layers.Dense(1, input_dim=1, init='uniform', activation='relu'   ))
model.add(keras.layers.Dense(1,              init='uniform', activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

hist = model.fit(X, Y, nb_epoch=10, validation_split=.3, batch_size=10, verbose=1)

df             = pandas.DataFrame()
df['loss']     = hist.history['loss']
df['acc']      = hist.history['acc']
df['val_loss'] = hist.history['val_loss']
df['val_acc']  = hist.history['val_acc']
df.index       = df.index + 1
#
print(df)

And this is my output:

        loss       acc  val_loss   val_acc
1   0.693162  0.504357  0.693300  0.496233
2   0.693150  0.503100  0.693250  0.496233
3   0.693157  0.502357  0.693132  0.503767
4   0.693171  0.502214  0.693119  0.503767
5   0.693167  0.502043  0.693121  0.503767
6   0.693129  0.504014  0.693133  0.503767
7   0.693167  0.503243  0.693129  0.503767
8   0.693157  0.502357  0.693181  0.496233
9   0.693180  0.502614  0.693141  0.503767
10  0.693170  0.502300  0.693119  0.503767

I expected that accuracy to go to 100%, but that is not the case. What am I doing wrong?

This is the example that I was following.

user1367204
  • 201
  • 1
  • 3
  • 6

3 Answers3

3

You should think about how the initial values impact the ReL Units. If, for example, you use init='one' for the activation='relu' layer you'll get the desired result (in this simple setup).

oW_
  • 6,502
  • 4
  • 29
  • 47
1

The problem is the relu unit. It is not a very good choice in such a simple network. There is a good chance that the ReLU starts off "dead" - - if the weight for the neuron in the first layer is negative (a 50/50 chance), then both 0 and 1 inputs will produce a 0 output and no gradient, so the network cannot learn to separate them.

Change to tanh instead will completely fix the problem, and the network will learn the relationship trivially. This as will also work with "leaky" ReLU or any other unit without the simple cutoff of ReLU.

A leaky ReLU version of your model would look like this:

model.add(keras.layers.Dense(1, input_dim=1, init='uniform' ))
model.add(keras.layers.advanced_activations.LeakyReLU(alpha=0.01))
model.add(keras.layers.Dense(1, init='uniform', activation='sigmoid'))

In larger/deeper networks with more complex input data, this disadvantage of ReLU units generally has lower impact and can be worked around more easily.

Neil Slater
  • 29,388
  • 5
  • 82
  • 101
0

The answer is that the code above works as I thought it should, most of the time. Each run of the program is slightly different due to some randomness, and sometimes this randomness means that the program will not find a link between the X and the Y. The way to fix this is to run the program several times over. After running it 10 times, I got a successful result 8/10 times.

user1367204
  • 201
  • 1
  • 3
  • 6