5

I understand the softmax equation is

$\boldsymbol{P}(y=j \mid x)=\frac{e^{x_{j}}}{\sum_{k=1}^{K} e^{x_{k}}}$

My question is: why use $e^x$ instead of say, $3^x$. I understand $e^x$ is it's own derivative, but how is that advantageous in this situation?

I'm generally trying to understand why euler's number appears everywhere, especially in statistics and probability, but specifically in this case.

Codedorf
  • 53
  • 5

1 Answers1

8

Choosing a different base would squash the graph of the function uniformly in the horizontal direction, since $$ a^x = e^{x\cdot \ln(a)}. $$

The exponential function with base $e$ is widely considered the simplest exponential function. It has nice properties that no other base has, mainly:

  • The function $e^x$ is its derivative.
  • It has a particularly simple power series expansion: $$ e^x = 1 + x + \frac12 x^2 + \frac16 x^3 + \cdots + \frac1{n!}x^n + \cdots $$ All of the coefficients are rational numbers. If the base had been something intuitively "nicer" than $e$, such as an integer, the coefficients would need to be irrational.

For this reason, most mathematicians will pick $e^x$ when they need an exponential function and have no particular reason to choose one base over another. (Except for computer scientists and information theorists, who sometimes prefer $2^x$).