My custom Inverse Normal CDF Approximation "clustering" curiosity

Question

So I was trying to come up with Inverse Normal CDF (because reasons*), though I'm potentially quite over my head. Anyways with some not so working GPT given samples I went to geogebra and played with formulas until I got to "looks close enough to what I think it should look". I got this:

$$ b(x) = \frac{(x - 0.5) \cdot \frac{1}{s}}{\sqrt{\frac{1}{4s} - (x - 0.5)^2 \cdot \frac{1}{s}}} $$

Now I wanted to convert sigma to s, so with help of "Reasoning GPT", I got to this (which is not great but something):

$$ s = \frac{2}{\pi \sigma^2} $$

Then I went to plot it (well using it to map uniform random val to normal random val, which started all this) against built in normal distribution in NumPy, and there is a peculiarity... It seems that approximation "clusters significantly". Both plots had 1k data points, and requested buckets was 100, but seems my approximation function is tending to hit just certain values from expected range. Even though distribution shape seems ok (but it is also visible that ${\sigma \to s}$ mapping is so so, but whatever).

Now I like shape, I like that equation for approximation seems relatively simple. But I can not understand why is it clustering like that. I though that maybe it some floating point math or what not. Though GPT says it shouldn't be, but if it's right, what a hell could it be? I believe function should be smooth and monotonic, so how come it seems to be mapping multiple inputs to same output?!

Here is my geogebra thing that I used to come up with approximation: https://www.geogebra.org/graphing/atn3axmb

* I'm working on a game and doing some shader stuff where I wan all kinds of random numbers, hence I wanted to write function to map output of (hopefully) uniform PRNG I have, to normally distributed on. I'm not mathematician (thus I'm not event 100% sure that the name Inverse Normal CDF applies to what I'm doing, but it seems so, and AI agreed too, so hope I'm not lying in my title :) ) so was kinda surprised that this Inverse Normal CDF was so complicated with integrals and what not... Unable to find something simple tried to come up with something with some AI help.

Python used for plotting:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import math
def approx_inverted_normal(mean, std, size=10000):
    s = 2. / (math.pi * std*2.) # Convert std to steepness parameter s using s = 2 / (pi  std^2)
u = np.random.rand(size) # Generate uniform random numbers in (0,1)
numerator = (u - 0.5) * (1. / s)
denominator = np.sqrt((1. / (4. * s)) - (u - 0.5)**2. * (1. / s))
b = numerator / denominator

return mean + std * b


size = 1000
mean = 0
std = 1
data_approx = approx_inverted_normal(mean, std, size)
data_normal = np.random.normal(mean, std, size)
plt.figure(figsize=(8, 6))
sns.histplot(data_approx, bins=100, color='red', label='Approx Inverted Normal', stat='density', kde=True, alpha=0.5)
sns.histplot(data_normal, bins=100, color='green', label='Regular Normal', stat='density', kde=True, alpha=0.5)
plt.title("Comparison of Normal Distribution Approximation")
plt.xlim(mean - 5std, mean + 5std)
plt.xlabel("Value")
plt.ylabel("Density")
plt.legend()
plt.show()

Looking at it some more, and matching input uniform to output val, it seems ok. So possibly it's just that it combined with uniform generator somehow results in much smoother data? And thus bins differently? But why?! — morphles, Mar 28 '25 at 12:34
When you say "Inverse Normal CDF" , do you mean the CDF of $Y=\frac1X$ where $X$ has a (standard?) normal distribution? If so, see Wikipedia on Inverse Gaussian distribution. Or do you mean the function $G$ where, if $F(x)$ is the CDF of (standard?) normally distributed $X$, you have $G(F(x))=x$ and $F(G(p))=p$? If so, Wikipedia on Quantile function and Normal distribution#Quantile function — Henry, Mar 28 '25 at 18:09
I don't know exact terminology and statistics notation. But as far as I understand CDF maps "random output" to "probability of output and below" so $(-\infty,\infty) \to (0,1)$. I need inverse of that - mapping $(0,1)$ so probability, to $(-\infty,\infty)$ so possible random output. Thus I can then use uniform rng outputting from $(0,1)$ to get normally distributed values from $(-\infty,\infty)$ by just applying that function to uniform rng output. As for linked articles I saw them, and as mentioned functions there seem very complex with integrals and I don't know how to program that. — morphles, Mar 29 '25 at 06:24
Hm, seems multiple terms can be used and Inverse of CDF is also correct. Looking a bit more I find https://en.wikipedia.org/wiki/Probit seems to be mostly what I wanted, but as per article no closed form expressions. But description says: "probit function is the quantile function" and also: "Mathematically, the probit is the inverse of the cumulative distribution function of the standard normal distribution". Regardless my approx was crap :) And it was simpler to just multi-sample uniform and add up stuff as I did not need too many samples. And this probit thing is way complicated. — morphles, Mar 29 '25 at 07:27
It seems from comments that you are looking for an approximation to the quantile function for the standard normal distribution. There are very good approximations in statistical packages, such a qnorm()in R. The source code for that illustrates the formulae used (essentially polynomials of degree $6$) and can be found at https://github.com/SurajGupta/r-source/blob/master/src/nmath/qnorm.c — Henry, Mar 29 '25 at 22:22
If what you ultimately want is to generate standard normally pseudo-random numbers, then there are other ways such as the Box–Muller transform. — Henry, Mar 29 '25 at 22:23
https://math.stackexchange.com/questions/3279660/how-to-find-inverse-cdf-for-the-range-of-normal-values-probit-for-a-range/3279672#3279672 — Claude Leibovici, Mar 30 '25 at 04:36
@Henry The Box-Muller thing seems very interesting, and apropriate, will have to study it a bit more, but most likely will be exactly what I need! Thanks! — morphles, Mar 30 '25 at 09:43

score -1 · Accepted Answer · answered Mar 28 '25 at 15:05

Ok so I after digging some more it's quite simple. Approximation ain't super great, and main problem is that tails are way longer/more probable in it. Resulting in much wider output range thus bins are used up for data "out of frame". Guess I'll try tweaking my approximation some more....

My custom Inverse Normal CDF Approximation "clustering" curiosity

1 Answers1