A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.
Questions tagged [loss-function]
522 questions
115
votes
5 answers
Why do cost functions use the square error?
I'm just getting started with some machine learning, and until now I have been dealing with linear regression over one variable.
I have learnt that there is a hypothesis, which is:
$h_\theta(x)=\theta_0+\theta_1x$
To find out good values for the…
Golo Roden
- 1,323
- 2
- 9
- 6
70
votes
2 answers
Sparse_categorical_crossentropy vs categorical_crossentropy (keras, accuracy)
Which is better for accuracy or are they the same?
Of course, if you use categorical_crossentropy you use one hot encoding, and if you use sparse_categorical_crossentropy you encode as normal integers.
Additionally, when is one better than the…
Master M
- 803
- 1
- 7
- 5
46
votes
3 answers
What does from_logits=True do in SparseCategoricalcrossEntropy loss function?
In the documentation it has been mentioned that y_pred needs to be in the range of [-inf to inf] when from_logits=True. I truly didn't understand what this means, since the probabilities need to be in the range of 0 to 1! Can someone please explain…
Nagendra Prasad
- 573
- 1
- 5
- 4
46
votes
5 answers
Intuitive explanation of Noise Contrastive Estimation (NCE) loss?
I read about NCE (a form of candidate sampling) from these two sources:
Tensorflow writeup
Original Paper
Can someone help me with the following:
A simple explanation of how NCE works (I found the above difficult to parse and get an understanding…
tejaskhot
- 4,125
- 7
- 22
- 18
32
votes
7 answers
L2 loss vs. mean squared loss
I see literature that suggests L2 loss and mean squared error loss are two different kinds of loss functions.
However, it seems to me these two loss functions essentially compute the same thing (with a 1/n factor difference).
So I am wondering if I…
Edamame
- 2,785
- 5
- 25
- 34
30
votes
2 answers
What is the advantage of using log softmax instead of softmax?
Are there any advantages to using log softmax over softmax? What are the reasons to choose one over the other?
rawwar
- 881
- 2
- 12
- 23
20
votes
2 answers
Parameterization regression of rotation angle
Let's say I have a top-down picture of an arrow, and I want to predict the angle this arrow makes. This would be between $0$ and $360$ degrees, or between $0$ and $2\pi$. The problem is that this target is circular, $0$ and $360$ degrees are exactly…
Jan van der Vegt
- 9,448
- 37
- 52
17
votes
3 answers
Keras Sequential model returns loss 'nan'
I'm implementing a neural network with Keras, but the Sequential model returns nan as loss value.
I have sigmoid activation function in the output layer to squeeze output between 0 and 1, but maybe doesn't work properly.
This is the code:
def…
pairon
- 433
- 2
- 5
- 15
17
votes
2 answers
Custom loss function with additional parameter in Keras
I'm looking for a way to create a loss function that looks like this:
The function should then maximize for the reward. Is this possible to achieve in Keras?
Any suggestions how this can be achieved are highly appreciated.
def…
Nickpick
- 661
- 2
- 7
- 18
15
votes
2 answers
Interpreting the Root Mean Squared Error (RMSE)!
I read all about pros and cons of RMSE vs. other absolute errors namely mean absolute error (MAE). See the the following references:
MAE and RMSE — Which Metric is Better?
What's the bottom line? How to compare models
Or this nice blogpost, or this…
TwinPenguins
- 4,429
- 3
- 22
- 54
14
votes
3 answers
Why is there a $2$ at the denominator of the mean squared error function?
In the famous Deep Learning Book, in chapter 1, equation 6, the Quadratic Cost (or Mean Squared Error) in a neural network is defined as
$ C(w, b) = \frac{1}{2n}\sum_{x}||y(x)-a||^2 $
where $w$ is the set of all weights and $b$ the set of all…
Silas Berger
- 161
- 1
- 5
13
votes
3 answers
Tensorflow Adjusting Cost Function for Imbalanced Data
I have a classification problem with highly imbalanced data. I have read that over and undersampling as well as changing the cost for underrepresented categorical outputs will lead to better fitting. Before this was done tensorflow would categorize…
Cole
- 181
- 1
- 1
- 7
10
votes
1 answer
XGBoost custom objective for regression in R
I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard…
Peter
- 7,896
- 5
- 23
- 50
10
votes
4 answers
Loss Function for Probability Regression
I am trying to predict a probability with a neural network, but having trouble figuring out which loss function is best. Cross entropy was my first thought, but other resources always talk about it in the context of a binary classification problem…
ahbutfore
- 201
- 2
- 3
10
votes
1 answer
What is the difference between SGD classifier and the Logisitc regression?
To my understanding, the SGD classifier, and Logistic regression seems similar.
An SGD classifier with loss = 'log' implements Logistic regression and loss = 'hinge' implements Linear SVM. I also understand that logistic regression uses gradient…
Akash Dubey
- 696
- 2
- 5
- 19