I am trying to figure out how to match activation=sigmoid and activation=softmax with the correct model.compile() loss parameters. Specifically those associated with binary_crossentropy.
I have researched related topics and read the docs. Also I have built a model and got it working with sigmoid but not softmax. And I cannot get it working properly with the "from_logits" parameters.
Specifically, here it says:
Args:
from_logits: Whetheroutputis expected to be a logits tensor. By default, we consider thatoutputencodes a probability distribution.
This says to me that if you use a sigmoid activation you want "from_logits=True". And for softmax activation you want "from_logits=False" by default. Here I am assuming that sigmoid provides logits and softmax provides a probability distribution.
Next is some code:
model = Sequential()
model.add(LSTM(units=128,
input_shape=(n_timesteps, n_features),
return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=32))
model.add(Dropout(0.3))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))
Notice the last line is using the sigmoid activation. Then:
model.compile(optimizer=optimizer,
loss='binary_crossentropy',
metrics=['accuracy'])
This works fine but it is working with the default "from_logits=False" which is expecting a probability distribution.
If I do the following, it fails:
model.compile(optimizer=optimizer,
loss='binary_crossentropy',
metrics=['accuracy'],
from_logits=True) # For 'sigmoid' in above Dense
with this error message:
ValueError: Invalid argument "from_logits" passed to K.function with TensorFlow backend
If I try using the softmax activation as:
model.add(Dense(1, activation='softmax'))
It runs but I get 50% accuracy results. With sigmoid I am getting +99% accuracy. (I am using a very contrived data set to debug my models and would expect very high accuracy. Plus it is a very small data set and will over fit but that is OK for now.)
So I expect that I should be able to use the "from_logits" parameter in the compile function. But it does not recognize that parameter.
Also I would like to know why it works with the sigmoid activation and not the softmax activation and how do I get it working with the softmax activation.
Thank you,
Jon.