Questions tagged [cost-function]

75 questions
27
votes
2 answers

Why do we have to divide by 2 in the ML squared error cost function?

I'm not sure why you need to multiply by $\frac1{2m}$ in the beginning. I understand that you would have to divide the whole sum by $\frac1{m}$, but why do we have to multiply $m$ by two? Is it because we have two $\theta$ here in the example?
Marton Langa
  • 373
  • 1
  • 3
  • 4
24
votes
1 answer

How does Gradient Descent and Backpropagation work together?

Please forgive me as I am new to this. I have attached a diagram trying to model my understanding of neural network and Back-propagation? From videos on Coursera and resources online I formed the following understanding of how neural network…
24
votes
3 answers

Python implementation of cost function in logistic regression: why dot multiplication in one expression but element-wise multiplication in another

I have a very basic question which relates to Python, numpy and multiplication of matrices in the setting of logistic regression. First, let me apologise for not using math notation. I am confused about the use of matrix dot multiplication versus…
GhostRider
  • 353
  • 1
  • 2
  • 8
21
votes
1 answer

What do "compile", "fit", and "predict" do in Keras sequential models?

I am a little confused between these two parts of Keras sequential models functions. May someone explains what is exactly the job of each one? I mean compile doing forward pass and calculating cost function then pass it through fit to do backward…
user3486308
  • 1,310
  • 5
  • 19
  • 29
12
votes
1 answer

Cost function for Ordinal Regression using neural networks

What is the best cost function to train a neural network to perform ordinal regression, i.e. to predict a result whose value exists on an arbitrary scale where only the relative ordering between different values is significant (e.g: to predict…
xboard
  • 388
  • 3
  • 14
11
votes
3 answers

What is the Time Complexity of Linear Regression?

I am working with linear regression and I would like to know the Time complexity in big-O notation. The cost function of linear regression without an optimisation algorithm (such as Gradient descent) needs to be computed over iterations of the…
7
votes
2 answers

What are the cases where it is fine to initialize all weights to zero

I've taken a few online courses in machine learning, and in general, the advice has been to choose random weights for a neural network to ensure that your neurons don't all learn the same thing, breaking symmetry. However, there were other cases…
6
votes
2 answers

Does MLP always find local minimum

In linear regression we use the following cost function which is a convex function: We Use the following cost function in logistic regression because the preceding cost function is not convex whenever the hypothesis (h) is logistic function. We…
Green Falcon
  • 14,308
  • 10
  • 59
  • 98
5
votes
2 answers

What is an intuitive explanation for the log loss cost function?

I would really appreciate if someone could explain the log loss cost function And the use of it in measuring a classification model performance. I have read a few articles but most of them concentrate on mathematics and not on intuitive explanation…
Sai Kumar
  • 631
  • 2
  • 8
  • 15
4
votes
2 answers

Cost sensitive learning and class balancing

I am facing a classification problem with classes that are really imbalanced (more or less 1% of positive cases). In addition, the "cost" of a False Negative (FN) is much higher than the cost of False Positive (FP). Considering so, I decided to…
4
votes
2 answers

Regularization for intercept parameter

Why is the regularization parameter not applied to the intercept parameter? From what I have read about the cost functions for Linear and Logistic regression, the regularization parameter (λ) is applied to all terms except the intercept. For…
4
votes
2 answers

Question of using gradient descent instead of calculus. I checked previous questions there are still points to clarify

First of all I checked http://stats.stackexchange.com/questions/23128/solving-for-regression-parameters-in-closed-form-vs-gradient-descent, http://stackoverflow.com/questions/26804656/why-do-we-use-gradient-descent-in-linear-regression,…
J.Smith
  • 468
  • 4
  • 16
4
votes
2 answers

Logistic regression cost function

In Aurelien Geron's book I found this line This cost function makes sense because –log(t) grows very large when t approaches 0, so the cost will be large if the model estimates a probability close to 0 for a positive instance, and it will also be…
4
votes
2 answers

XGBoost change loss function

I'm using XGBoost (through the sklearn API) and I'm trying to do a binary classification. False Positives are much worse for me than False Negatives, how can I take this into account? The API confuses me a bit and I found two arguments that might be…
cosec
  • 51
  • 1
  • 1
  • 3
4
votes
1 answer

What cost optimisation problem is solved by F score?

I know the general expression of the F1-score: $$F1 = \frac{precision * recall}{precision + recall}$$ And its $F_{beta}$ variants (see: https://en.wikipedia.org/wiki/F-score): $$F_{beta} = (1+\beta^2) * \frac{precision * recall}{\beta^2 * precision…
Lucas Morin
  • 2,775
  • 5
  • 25
  • 47
1
2 3 4 5