8

Okay, I need to develop an alorithm to take a collection of 3d points with x,y,and z components and find a line of best fit. I found a commonly referenced item from Geometric Tools but there doesn't seem to be a lot of information to get someone not already familiar with the method going. Does anyone know of an alorithm to accomplish this? Or can someone help me work the Geometric Tools approach out through an example so that I can figure it out from there? Any help would be greatly appreciated.

bjhuffine
  • 213
  • This question is not altogether clear. Usually in two dimensions if you have a scatterplot ${(x_i,y_i): i=1,\ldots,n}$ then you can fit a least-squares line for estimating the average $y$ value for a given $x$ value or vice-versa, and they are two different lines. In three dimensions you can similarly fit a plane for estimating the average value of one of the three variables given the other two. That would yield three different planes. There is another possibility that you may or may not have in mind, and that involves what are called principal components. – Michael Hardy Jan 13 '16 at 21:47
  • The algorithm to use depends on how you think these points are related to each other. Are you assuming that the coordinate pair $(y, z)$ is a "noisy" function of $x$ such that the resulting $(x,y,z)$ points lie approximately along a line? – David K Jan 13 '16 at 22:35
  • @Michael Hardy: I'm basically looking for as straight forward to understand and efficient algorithm possible. I know adding the z-component complicates it more so than would have occurred with a 2-dimensional analysis, but that's the system I have. – bjhuffine Jan 14 '16 at 03:55
  • @David K: not sure what you mean by "noisy" function, but if you're asking if I'm assuming whether or not the points to which the line would fit are more or less clustered around the line thereby being "noise" to the line's actual parametric equations, the answer would be kinda... This algorithm will be used in an API and therefore will be available for general purposes as well as some internal to the API where points in a space will be analyzed. So basically there'll be little governing on when it's used just depending on whether it was used in the right circumstances or not. – bjhuffine Jan 14 '16 at 03:57
  • @David K: Now that I'm thinking of it a measure of how well it fits would be nice for this as well. – bjhuffine Jan 14 '16 at 03:59
  • A related question is http://math.stackexchange.com/questions/370862/find-a-best-fit-line-through-a-point-cloud-that-goes-through-a-specific-point -- different from yours, because they have an additional constraint, but I think some of the discussion in the answers relates to your problem. – David K Jan 14 '16 at 05:25
  • When you say you need to develop an algorithm, do you mean programming your own software? Would you be able to use R or some other statistics software "off the shelf" if it could answer the question? The thing about the algorithms described by Geometric Tools is, they've been around a while and they've been implemented many times. Depending on what tools/languages/whatever you're in a position to use, you might find a "canned" solution or that either solves this problem directly or can be adapted to solve it. – David K Jan 14 '16 at 05:31
  • @David K: Yes, I'm developing my own so any "off-the-shelf" or plug-in solution is off the table so to speak... pardon the pun... I would like to use the Geometric Tools solution, but like I said, right now I don't understand his solution well enough to implement it. If someone can provide an example of how to do this or maybe ellaborate on his explanation to add more clarity, then I should be able to pick it up from there. I'm not too bad at the basic linear algebra or calculus concepts; I just couldn't make sense of his writeup. – bjhuffine Jan 14 '16 at 14:46

4 Answers4

6

Here's how you can do a principal component analysis:

Let $X$ be a $n\times 3$ matrix in which each $1\times3$ row is one of your points and there are $n$ of them.

Let $\displaystyle (\bar x,\bar y,\bar z) = \frac 1 n \sum_{i=1}^n (x_i,y_i,z_i)$ be the average location of your $n$ points. Replace each row $(x_i,y_i,z_i)$ with $(x_i,y_i,z_i) - (\bar x,\bar y,\bar z)$. This is a linear transformation that amounts to replacing $X$ with $PX$, where $P$ is the $n\times n$ matrix in which every diagonal entry is $1-(1/n)$ and ever off-diagonal entry is $-1/n$. Note that $PX\in\mathbb R^{n\times 3}$. Notice that $P^2 = P = P^T$, so that $P^T P = P$.

Now look at the $3\times3$ matrix $(PX)^T PX = (X^T P^T) (P X) = X^T (P^T P) X = X^T P X$.

This matrix $X^T PX$ is $n$ times the matrix of covariances of the three variables $x$, $y$, and $z$. It is a symmetric positive definite matrix (or nonnegative definite, but not strictly positive definite, if the $n$ points all lie in a common plane). Since this matrix symmetric and positive definite, the spectral theorem tells us that it has a basis of eigenvectors that are mutually orthogonal, and the eigenvalues are nonnegative. Let $\vec e$ be the (left) eigenvector with the largest of the three eigenvalues. The the line you seek is $$ \left\{ (\bar x,\bar y,\bar z) + t\vec e\ : \ t\in\mathbb R \right\} $$ where $t$ is a parameter that is different at different points on the line, and $t=0$ at the average point $(\bar x,\bar y,\bar z)$.

0

We start with a sequence of $m$ measurements $\left\{ x_{k}, y_{k}, z_{k}, f_{k} \right\}_{k=1}^{m}$, and the trial function $$ f(x,y,z) = a_{0} + a_{1} x + a_{2} y + a_{3} z. $$ The linear system is $$ \begin{align} \mathbf{A} a &= f \\ % \left[ \begin{array}{cccc} 1 & x_{1} & y_{1} & z_{1} \\ 1 & x_{2} & y_{2} & z_{2} \\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_{m} & y_{m} & z_{m} \end{array} \right] % \left[ \begin{array}{cccc} a_{0} \\ a_{1} \\ a_{2} \\ a_{3} \end{array} \right] % &= % \left[ \begin{array}{cccc} f_{1} \\ f_{2} \\ \vdots \\ f_{m} \end{array} \right]. % \end{align} $$ The normal equations are $$ \begin{align} \mathbf{A}^{*} \mathbf{A} a &= \mathbf{A}^{*} y \\ % \left[ \begin{array}{cccc} \mathbf{1} \cdot \mathbf{1} & \mathbf{1} \cdot x & \mathbf{1} \cdot y & \mathbf{1} \cdot z \\ x \cdot \mathbf{1} & x \cdot x & x \cdot y & x \cdot z \\ y \cdot \mathbf{1} & y \cdot x & y \cdot y & y \cdot z \\ z \cdot \mathbf{1} & z \cdot x & z \cdot y & z \cdot z \\ \end{array} \right] % \left[ \begin{array}{cccc} a_{0} \\ a_{1} \\ a_{2} \\ a_{3} \end{array} \right] % &= % \left[ \begin{array}{cccc} \mathbf{1} \cdot f \\ x \cdot f \\ y \cdot f \\ z \cdot f \end{array} \right] % \end{align} $$ The solution is $$ \begin{align} a_{LS} &= \left( \mathbf{A}^{*} \mathbf{A} \right)^{-1} \mathbf{A}^{*} y \\ % \left[ \begin{array}{cccc} a_{0} \\ a_{1} \\ a_{2} \\ a_{3} \end{array} \right] % &= % \left[ \begin{array}{cccc} \mathbf{1} \cdot \mathbf{1} & \mathbf{1} \cdot x & \mathbf{1} \cdot y & \mathbf{1} \cdot z \\ x \cdot \mathbf{1} & x \cdot x & x \cdot y & x \cdot z \\ y \cdot \mathbf{1} & y \cdot x & y \cdot y & y \cdot z \\ z \cdot \mathbf{1} & z \cdot x & z \cdot y & z \cdot z \\ \end{array} \right]^{-1} % \left[ \begin{array}{cccc} \mathbf{1} \cdot f \\ x \cdot f \\ y \cdot f \\ z \cdot f \end{array} \right] % \end{align} $$

dantopa
  • 10,768
0

In $\mathbb{R}^3$ a line can be defined by the following set of points

$$ \{\vec \ell = \vec a + \vec n \, t \, : \, t \in \mathbb R\} $$

where $\vec a$ is some point on the line, and $\vec n$ is (normalized) line direction.

To find $\vec a$ from given set of points $\{(x_i,y_i,z_i)\}$ is quite easy, just calculate average position $(\overline x, \overline y, \overline z)$

$$ \vec a = (\overline x, \overline y, \overline z) = \frac 1N \sum_i (x_i, y_i, z_i). $$

To find line direction $\vec n$, one can solve eigen problem $K\,\vec v_k = k\,\vec v$ for the covariance matrix, expressed by

$$ \text{cov}(x,y,z) = K =\left[ \begin{matrix} \overline{x^2} - \overline x^2 & \overline{xy} - \overline x\, \overline y & \overline{xz} - \overline x\,\overline z\\[0.5em] \overline{xy} - \overline x\,\overline y & \overline{y^2} - \overline y^2 & \overline{yz} - \overline y\,\overline z\\[0.5em] \overline{xz} - \overline{x}\,\overline{z} & \overline{yz} - \overline y\,\overline z & \overline{z^2} - \overline z^2 \end{matrix}\right], $$

and then by selecting eigenvector $\vec v_k$, which corresponds to the largest eigenvalue $k$, one finds the $\vec n = \text{max}_k\,\vec v_k$.

An example code (C++) can be found here.

0

I think that I have found an alternative and easy approach. A 3D straight line would have:

xi=a.q+x'

yi=b.q+y'

zi=c.q+z'

here the variable a, b, c are the normal direction vector of 3D line. Each point on the line are (xi, yi, zi).

when average of all these points are taken for xi, yi, and zi separately; the xbar, ybar, zbar may be accepted as (x', y', z'). Now substract the averages from xi, yi, zi values to convert the equations into

xi=a.q

yi=b.q

zi=c.q

Now, all poins are alingned around (0, 0, 0) point. Therefore, all xi, yi, zi values are constant replicates of each other.

In other words, yi = C * xi; and zi = D * xi and zi = E * xi etc.

These ratios would provide us the direction vector of the line.

Just take average of all yi/xi.

Then take average of all zi/xi.

These two ratios will be the imperfect normal vector by assuming x direction value is one. i.e., (1, average(yi/xi), average(zi/xi)) is the direction vector.

At this stage, this vector's amplitude can be calculated by A=sqrt(1+ average(yi/xi)^2+average(zi/xi)^2)

The direction vector will be (1, average(yi/xi), average(zi/xi))/A = (a, b, c)

The line will pass from point (x', y', z')

Therefore, the equation of the best fit line is

xi=a.q+x'

yi=b.q+y'

zi=c.q+z'

I hope you find this easy to understand and useful. Ahmet Turer.