2

Use of smooth bump functions in probabilities: make any sense?

After reading this answer I started to wonder if it make sense to define a Cumulative distribution function as follows: $$F(x)=\begin{cases} \dfrac{1}{2}\left(1+\tanh\left(\dfrac{2x}{1-x^2}\right)\right),\ |x|<1\\ 0,\ x\leq -1\\ 1,\ x\geq 1\end{cases}$$ which is a smooth transition function class $C^\infty$ function, but I don't know if make sense, conceptually speaking, that before some point ($x=-1$) all probability is exactly zero, and after some point ($x=1$) all probability is exactly one.

This is better seen in its Probability density function $f(x)=F'(x)$ which is given by: $$f(x)=\begin{cases} 0,\ |x|\geq 1\\ \left(x^2+1\right)\left(\dfrac{\text{sech}\left(\frac{2x}{1-x^2}\right)}{1-x^2}\right)^2,\ |x|<1\end{cases}$$ where you could see that $f(x)\in C_c^\infty$ making it an example of a smooth bump function which have some interesting properties, like being examples of Non-analytic smooth function since at their edges they smoothly becomes a flat function, something classic powers series cannot do because of the Identity theorem. Notice that $f(x)>0,\ \forall x$ and $\int\limits_{-\infty}^{\infty} f(x)\ dx = \int\limits_{-1}^{1} f(x)\ dx =1$ as shown in the mentioned answer, so is indeed a probability density function, being defined in the whole real line (the piecewise is done such it excludes undefined points).

You can see their plots in Desmos: comparison with logistic

Even with the mentioned weirdness of this functions, for someone with just graduated math knowledge they don't seen strange at all: as example, $F(x)$ it is just a modification of a Logistic function since: $$\dfrac{1}{2}\left(1+\tanh\left(\dfrac{2x}{1-x^2}\right)\right)=\dfrac{1}{1+\exp\left(\frac{-4x}{1-x^2}\right)}$$

where in the blue dashed line it is compared to the logistic function: $$L(x)=\dfrac{1}{1+e^{-4x}}$$

and they look quite similar, with the exception that the derivative of $F(x)$ is compact-supported.

Even so, looking the Wikipedia for the Logistic regression, the logistic function could be used as for Binary classification or as an activation function for artificial neuronal networks, but it don't appears as any of the examples, even so just recently it was added as example of a Sigmoid function in Wikipedia.

So given that it looks inoffensive, it lack of pressence in Wikipedia makes me wonder if there are some fundamental issues that make it undesirable as a CDF, or as Logistic regression fit or neuronal activation function (somehow all are related).


Added later

After the answer somehow right, but not directly answering the question, I want to reframe it:

Imagine the following thought experiment:

I have a squared section of 2x2 meters (units are irrelevant here) aligned with the cartesian coordinate system with the center of the square at the point $(x,\ y)=(0,\ 0)$.

In this square, I am able to measure only the $x$ coordinate of water drops that fall into the square from a circular aperture of radius $1/2$ meters located over the square at some height (lets say 1 meter), such as the center of the circular aperture is located also at at the point $(x,\ y)=(0,\ 0)$.

The water drops fall over the circular aperture randomly following a uniform distribution, so I am expecting, here inaccurately, to have some concentration of points near the center of the $x$ axis, thinking in how the Box–Muller transform builts a Gaussian distribution as the $x$ axis of a 2D uniformly distributed variable in polar form, but also accurately given the concentration inequalities.

As observer, I have only access to the data points of the water drops in the x-axis, so I have no clue from where the drops are falling (don't know the height, the aperture shape, neither the falling distribution before the aperture).

Now imagine the data is such, that it could be modeled through the following fitting distribution functions: $f_i(x)= c_i \hat{f}_i(x)$ where $c_i$ are normalizing constants and the shape functions $\hat{f}_i(x)$ are given by:

  1. A gaussian shape $$\hat{f}_1(x)=e^{-16x^2}$$
  2. A compact-supported polynomial $$\hat{f}_2(x)=\left(\frac{1-4x^2+|1-4x^2|}{2}\right)^4$$
  3. A compact-supported trigonometric function $$\hat{f}_3(x)=\cos^4(\pi x)\ \theta(1-4x^2)$$ with $\theta(x)$ the heaviside step function.
  4. A smooth compact-supported function $$\hat{f}_4(x)= e^{-\frac{1}{\left(x+\frac12\right)\left(\frac12-x\right)}}\ e^{\frac{1}{\left(\frac{1}{2}\right)^2}}$$

all of them could been seen here:

shape fns

Also assume that from the data I cannot dismiss any of these curves because of p-values or significance of the confidence intervals (or other classic curve fitting figures).

Postulate 1: From what I know of the data I see, I should dismiss the Gaussian envelope $\hat{f}_1(x)$ since I know beforehand I will only have data points strictly among $[-1,\ 1]$, so I will pick only compact-supported alternatives (maybe this postulate is mistaken, but I am using it here for developing what I would ask next).

In the same sense, Are there any reasons why I should pick the smooth alternative as the representative model for the data?

Joako
  • 1,957

2 Answers2

8

There's a common misconception that parametric probability distributions arise because of a need to model data that appears to satisfy certain shape criteria--e.g., data that have a "bell-shaped" frequency should be modeled by a normal distribution. This is incorrect.

Most if not all distributions used in statistical practice arise from the investigation of the mathematical properties of particular random phenomena under various assumptions. The normal distribution arises from the central limit theorem. The binomial distribution is the sum of iid Bernoulli variates. The geometric distribution is a discrete stopping time distribution; etc. Their functional forms look the way they do because some underlying behavior is being modeled, not because we need the curve to look a particular way.

So, as to why such a CDF is "undesirable," that's not really the relevant question. Rather, the question should be, "what random process or property exists that would give rise to such a parametric model?" This is what makes parametric distributions useful--their relationship to the process that generates data.

heropup
  • 143,828
  • Thanks for the answer. I am aware of what you said, but not $100%$ agreed, sometimes you have data and later try to make a goodness of fit in order to figure out what could be possible happening (I remember use this for weird distributions of intermodal interference on laser light communication), but particular to this question, that kind of smooth transition functions also resembles the probabilities functions used in fuzzy logic. (...) – Joako Aug 06 '24 at 20:16
  • (..) I have been outside academy for a decade so I don't have much insight of what is happening in between, Do you know any process that could or have been described with something smooth with compact-support? (I have seen those triangular functions in Excel for approximating gaussian functions in finances, but their are just not smooth, they are also a sin - lol) – Joako Aug 06 '24 at 20:18
1

As far as statistical modelling is concerned:

  1. This is a valid statistical model. This is a not a high-bar to pass. Any family of probability distributions is going to be a valid statistical model.

  2. This model makes a very specific prediction for the data: that the probability tends aggressively fast to 0 at the limits of the support. To me, that's a disqualifying flaw for this model, and any smooth bump model. Depending on how the model is fitted, this might be gigantic issue or a minor issue.

  • Thanks for taking the time to answer. I got curious about what you mention in point 2: when you say that a "probability tends aggressively fast to 0", Are you thinking on the compacted-supported examples or only on the smooth bump functions? since are smooth in my mind I think are less aggressive than a simple compact-supported alternative since at some derivative it will get discontinuous (so somehow it jumps), but maybe my intuition is mistaken. – Joako Aug 24 '24 at 03:41
  • Maybe I'm misunderstanding what you mean with "smooth bump" functions. Aren't these such that all their derivatives at the border are 0? That means that the behavior of the function just before the bound is going to be: the function goes extremely fast to 0. That's what I meant. Have I made a mistake? – Guillaume Dehaene Aug 26 '24 at 07:35
  • nope... maybe its mine... with "fast" I though a non-smooth compact supported goes faster since their higher derivatives becomes discontinuous – Joako Aug 26 '24 at 09:15
  • Hey @Joako , I saw your follow-up question and I notice I never answered this comment. I was thinking of the limit behavior of the integral just before the boundary $\int_a^{a+\epsilon} f(x)$. Smooth functions are, unless I'm missing something, very very unflexible in their behavior there: those values are going to be extremely small. In contrast, you get linear decrease if the density is approximately constant at the edge. Properly modelling that edge could be quite important, depending on the statistical analysis – Guillaume Dehaene Oct 03 '24 at 20:54
  • Actually you rise in me a very interesting question I have now on bounty here: from one point of view they should be no problem since is becoming a constant there such as every derivative must become null, so even an almost-horizontal line would be rising faster, but from the other side, at different point near the flat point the derivatives could be achieving finite-sized peaks that grow faster-than-exponencial with the derivative order, so it is also kind of exploding there (but maybe its just a numerical issue). – Joako Oct 03 '24 at 21:55
  • 1
    Lorenzo's answer is as good as it gets and captures all there is to understand. If you feel that you are stuck in understanding it, perhaps consider taking some time off from the question (a week, or a month?) and come back to it with fresh eyes. Maybe it'll make more sense? – Guillaume Dehaene Oct 07 '24 at 14:22