12

I saw the following result: $$ \dfrac{\mathrm{d}}{\mathrm{d}x} \left( \log\left( \dfrac{1}{1+\mathrm{e}^{-x}} \right) \right) = \dfrac{1}{\mathrm{e}^x+1} $$ What are the intermediary steps for obtaining this result?

bsky
  • 510
  • Where are you stuck? – StubbornAtom Jun 13 '17 at 11:44
  • So I first differentiate the logarithm, according to the rule $log(x)' = 1/x$ and then I differentiate the rest. From the first step I get a $1 + e^{-x}$ which I can not then simplify. – bsky Jun 13 '17 at 11:55
  • 1
    many software engineers are not super familiar with the math concepts or have forgotten some concepts, and now we see machine learning everywhere! I would like moderators not to just downvoting questions like this. Genuinely many software developers don't have a background in machine learning math and applying ML as a tool. So would appreciate a bit helpful approach! – Exploring Oct 21 '20 at 01:11

6 Answers6

15

Hint:

First, notice that $$ \begin{align} \dfrac{1}{1+e^{-x}} = \dfrac{\mathrm{e}^{x} \cdot 1}{\mathrm{e}^{x} \cdot 1 + \mathrm{e}^{x} \cdot e^{-x}} = \dfrac{\mathrm{e}^{x}}{\mathrm{e}^{x} + 1} \;. \end{align} $$

Second, notice that $$ \begin{align} \ln\left( \dfrac{\mathrm{e}^{x}}{\mathrm{e}^{x} + 1} \right) = \ln\left( \mathrm{e}^{x}\right) - \ln\left( \mathrm{e}^{x} + 1 \right) = x - \ln\left( \mathrm{e}^{x} + 1 \right) \;. \end{align} $$

So, we have $$ \begin{align} \dfrac{\mathrm{d}}{\mathrm{d}x} \ln\left( \dfrac{1}{1+\mathrm{e}^{-x}} \right) &= \dfrac{\mathrm{d}}{\mathrm{d}x} \left( x - \ln\left( \mathrm{e}^{x} + 1 \right) \right) = \dfrac{\mathrm{d}x}{\mathrm{d}x} - \dfrac{\mathrm{d}\ln\left( \mathrm{e}^{x} + 1 \right)}{\mathrm{d}x} \;. \end{align} $$

Can you go on from here using the chain rule?

  • Can you please complete the rest of this explanation? – stackoverflowuser2010 Apr 13 '21 at 19:26
  • @stackoverflowuser2010 The rest is mostly algebraic, similar to what is near the end of SomethingSomething's answer: $$ \dfrac{\mathrm{d}x}{\mathrm{d}x} - \dfrac{\mathrm{d}\ln\left( \mathrm{e}^{x} + 1 \right)}{\mathrm{d}x} \ \ = \ \ 1 \ - \ \frac{1}{e^x \ + \ 1}·e^x \ \ = \ \ \frac{(e^x \ + \ 1) \ - \ e^x}{e^x \ + \ 1} \ \ = \ \ \frac{ 1 }{e^x \ + \ 1} \ \ . $$ –  Jul 07 '22 at 06:51
7

You just have to use the Chain Rule.

$\alpha = 1+e^{-x}$

$\beta = \alpha^{-1}$

$\frac{d\,log(\beta)}{d\,x} = \frac{d\,log(\beta)}{d\,\beta}\,\frac{d\,\beta}{d\,x} = \frac{d\,log(\beta)}{d\,\beta}\,\frac{d\,\alpha^{-1}}{d\,\alpha}\,\frac{d\,\alpha}{d\,x} = \left(\frac{1}{\beta}\right)\,\left(-\frac{1}{\alpha^2}\right)\,\left(-e^{-x}\right) = \frac{e^{-x}}{1+e^{-x}} = \boxed{\frac{1}{e^x + 1}}$

You don't have to worry with signs, because everything in there is always strictly positive.

Daniel Cunha
  • 1,842
4

Maybe you can use the useful relation $ \sigma'(x)=\sigma(x)[1-\sigma(x)] $. In your case, it follows $$ \frac{d}{dx} \log[\sigma(x)] = \frac{\sigma'(x)}{\sigma(x)} = 1-\sigma(x) = \sigma(-x) $$

Steph
  • 4,140
  • 1
  • 5
  • 13
0

Log base could refer different bases for different fields. Notice that log(x) refers to base-2 log for computer science, base-e log for mathematical analysis and base-10 log for logarithm tables.

In most general form, derivative of y = logb(1/(1 + ex)) is in following form:

dy/dx = 1 / (ln(b) . (1 + ex))

Of course, if main function were refered to natural logarithm, then b would equal to e, and derivative would be:

dy/dx = 1 / (ln(e) . (1 + ex))

ln(e) would be 1 based on the logarithm of the base rule.

dy/dx = 1 / ((1 + ex))

Mostly, natural logarithm of sigmoid function is mentioned in neural networks. Activation function is calculated in feedforward step whereas its derivative is calculated in backprogation. And derivative of natural log of sigmoid is easier to calculate than other bases.

sefiks
  • 101
0

Bottom line:

$$\frac{d}{dx}log(\frac{1}{1+e^{-x}}) = 1 - \frac{1}{1+e^{-x}} = \frac{1}{e^{x} + 1}$$

For the other part of BCE (Binary Cross Entropy):

$$\frac{d}{dx}log(1 - \frac{1}{1+e^{-x}}) = -\frac{1}{1+e^{-x}}$$

For multivariate case:

Suppose that our features are $x_1, x_2, ..., x_n$ and the weights of the model are $w_0, w_1, w_2, ..., w_n$, where $w_0$ is bias, such that we want to differentiate

$$log(\frac{1}{1+e^{-(w_0 + w_1x_1 + ... + w_nx_n)}})$$

For convenience, we will define a constant feature $x_0 = 1$, then rewrite the same expression as

$$log(\frac{1}{1+e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}})$$

Then,

$$\frac{\partial}{\partial w_i}log(\frac{1}{1+e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}}) = x_i\frac{1}{e^{(w_0x_0 + w_1x_1 + ... + w_nx_n)} + 1}$$

When we derive the other part of BCE loss:

$$log(1 - \frac{1}{1+e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}})$$

Then,

$$i > 0 : \frac{\partial}{\partial w_i}log(1 - \frac{1}{1+e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}}) = -x_i\frac{1}{1 + e^{-(w_0x_0 + w_1x_1 + ... + w_nx_n)}}$$

In more details with all the steps:

It is way easier that what it might look at first sight, so try to enjoy the ride...

The Sigmoid function is $$f = \frac{1}{1+e^{-x}}$$

So it's derivative must be (according to the derivation rule for division): $$f' = \frac{e^{-x}}{(1+e^{-x})^2}$$

But this expression can be written as:

$$f' = \frac{e^{-x}}{(1+e^{-x})^2} = \frac{1 + e^{-x} - 1}{(1+e^{-x})^2} = \frac{1 + e^{-x}}{(1+e^{-x})^2} - \frac{1}{(1+e^{-x})^2} = \frac{1}{(1+e^{-x})} - \frac{1}{(1+e^{-x})^2}$$

But notice that

$$\frac{1}{(1+e^{-x})} = f , \frac{1}{(1+e^{-x})^2} = f^2$$

So we actually get

$$f' = f - f^2 = f(1-f)$$

Now applying $log$ on the Sigmoid - let us define:

$$g = log(f)$$

So $$g' = \frac{f'}{f}$$

But we already know that $f' = f(1-f)$, so we get:

$$g' = \frac{f'}{f} = \frac{f(1-f)}{f} = 1-f$$

So for sigmoid function $f$, the derivative of $g = log(f)$ is simply $1-f$:

$$g' = [log(\frac{1}{1+e^{-x}})]' = 1 - \frac{1}{1+e^{-x}}$$

If you want to simplify it even more, you can do this:

$$g' = 1 - \frac{1}{1+e^{-x}} = \frac{1+e^{-x}-1}{1+e^{-x}} = \frac{e^{-x}}{1+e^{-x}} = \frac{1}{e^{x} + 1}$$

Similarly, we can compute the derivative of the other BCE term:

$$h = log(1 - \frac{1}{1+e^{-x}}) = log(1 - f)$$

So, $$h' = [log(1-f)]' = \frac{-f'}{1-f} = \frac{-f(1-f)}{1-f} = -f = -\frac{1}{1+e^{-x}}$$

I didn't include the steps for the partial derivatives, but they are very similar to the above steps.

0

We may also re-arrange the equation and differentiate implicitly with respect to $ \ x \ \ : $ $$ y \ \ = \ \ \ln\left( \ \frac{1}{1 \ + \ e^{-x}} \ \right) \ \ \Rightarrow \ \ e^y \ \ = \ \ \frac{1}{1 \ + \ e^{-x}} \ \ \Rightarrow \ \ e^y · (1 \ + \ e^{-x}) \ \ = \ \ 1 $$ [this creates no difficulties since the denominator in the ratio is never zero]; $$ \frac{d}{dx} \ [ \ e^y · (1 \ + \ e^{-x}) \ ] \ \ = \ \ \frac{d}{dx} \ [ \ 1 \ ] \ \ \Rightarrow \ \ e^y · y' · (1 \ + \ e^{-x}) \ + \ e^y · (- e^{-x}) \ \ = \ \ 0 $$ $$ \Rightarrow \ \ e^y · y' · (1 \ + \ e^{-x}) \ \ = \ \ e^y · e^{-x} \ \ \Rightarrow \ \ y' · (1 \ + \ e^{-x}) \ \ = \ \ e^{-x} $$ [the factor $ \ e^y \ $ may be "divided out", as it is also never zero] $$ \Rightarrow \ \ y' \ \ = \ \ \frac{e^{-x}}{1 \ + \ e^{-x}} \ \ = \ \ \frac{e^{-x}}{1 \ + \ e^{-x}} \ · \ \frac{e^x}{e^x} \ \ = \ \ \frac{1}{e^x \ + \ 1} \ \ . $$