7

Suppose I have a set of data points like this:

1;1
2;4
3;9
4;16
5;25
6;36
...

The first column is the input of the function and the second one is the result. I can tell if I look at it that the function here is y = f(x^2). The problem is that the data points are not always this obvious and I want to write a program which can tell me the definition of the function.

How can I do so?

Some context: I'm trying to write a program which can be used to analyze asymptotic time complexity of algorithms. So in fact I'm only interested in a predefined set of functions:

y = f(n)

y = f(log(n))

y = f(n * log(n))

y = f(n ^ m)

y = f(2 ^ n)

4 Answers4

5

In general, if you have a set of points, there are infinitely many different functions passing through those points. For example, if you had the data $(1;2),(2;4),(3;8),(4;16)$, it'd be "obvious" that it's the function $f(n)=2^n$... but it could just as well be $f(n)=\frac{1}{3}(n^3-3n^2+8n)$.

If you restrict yourself just to some specific "candidate" functions in advance, you can simply compare the data you have with the values predicted by the particular function -- and if they match, declare the function to be the "right" one.

That being said, based on your description of the context, it seems your problem might be a better fit for Least Squares fitting method. This is especially true if your data is based on experiments (such as measuring the actual time taken by the program); since it's likely to be affected by "noise" (measurement imprecision; effect of some smaller factors which become negligible for large values of the parameters, ...). The Least-Squares method allows you to find the "best" fit of a particular function (which contains some unknown parameters) to the data you have and also to measure the "quality" of the fit (= how much do the function values with the optimal choice of those unknown parameters differ from your data).

For example, you can use it to fit the function $f(n)=an^3+bn^2+cn+d$, where $a$, $b$, $c$ and $d$ are unknown parameters. The method will give you the best possible choice of those parameters so that the function is as close to your data as it can get (in your example case, it'll give you $a=c=d0$ and $b=1$) and a value which measures the difference between your data and the resulting function (which will be zero in this example, since the data corresponds exactly to $f(n)=n^2$).

1

The best possible way is to use the Newton Interpolation formula (http://nptel.ac.in/courses/122104018/node109.html) or Lagrange Interpolation Formula (http://mathworld.wolfram.com/LagrangeInterpolatingPolynomial.html).

You also have Matlab or C programs for these interpolation methods available widely on the internet.

Soham
  • 10,120
1

Here is a Python Jupyter script for that purpose.

If you can restrict yourself to polynomials, NumPy (a Python math library) will offer an easy solution:

import numpy as np
from numpy.polynomial import Polynomial as P
import matplotlib.pyplot as plt

x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27] y = [1,2,4,7,6,3,2,3,4,5,9,12,11,9,7,3,1,2,3,5,8,12,14,16,12,8,3]

plt.plot(x,y) # plot the original dataset

polynomial = P.fit(x, y, 14) # 14 is the degree of the polynomial fx, fy = p.linspace(100) # generate 100 sample points on this graph plt.plot(fx, fy) # plot the calculated polynomial

I am not sure whether this applies to your usecase, but restricting oneself to polynomials might not be as bad as it seems since most functions can be approximated with a Taylor series and compared to other functions, polynomials are usually easier to work with.

The provided code sample produces the following output (the blue line being the original dataset and the orange line being the calculated function):

enter image description here

0

How about using Neural Network (you can find it in many software packages like MATLAB, SCILAB..etc). It will "Fit" your data points and will give you function value of intermediate points. Just one disadvantage.. It wont give you "Function" which you can understand like F = x^2 + 3x + 5. Instead of it, it will give you "Network" which can be used further for interpolation, or optimization.

dhruv
  • 1