1

In this algorithmic analysis of least squares regression, we throw away big-$O$ terms that will be dominated by the biggest, and keep only the dominant term.

On the other hand, in this algorithmic analysis of matrix multiplication followed by truncation, the matrix multiplication dominates the truncation, but we keep the lesser big-O term, and the final result is the product of both.

Why is this? When should we throw away non-dominating terms, and when should we keep them?

caitlin
  • 125

2 Answers2

1

We always could throw away non-dominating terms.

For the second Q, it's the same.

Given that the truncation operator :$ℝ^→ℝ^$ by $()=max\{min\{,\},0\}$, it's reasonable to deduce that it takes $O(m)$.

Matrix multiplication takes $O(m n)$, and then we get a new vector Ay, applying T on it takes another O(m).

Hence, it takes O(mn+m) = O(mn) in total, throwing away the non-dominating term.

(Exception is that it's done in designated hardware, where it could be parallelized, as low as O(1) for truncation operator.)

Peter HU
  • 139
0

In the latter, $T$ requires $O(m)$ work for each of the $O(mn)$ entries of $A\bf y$. That’s why they are multiplied.

  • As it happens, this answer is wrong. The $T$ defined in the link in the question only takes $O(1)$ work for each entry of $Ay$. It is not applied to each entry of $A$, but to each entry of $Ay$. The answer on Math.SE claiming it takes $O(m^2n)$ time is also wrong. – D.W. Nov 26 '23 at 06:10
  • @D.W., okay. Of course, in that case the running time is still a product ;-) – Paul Tanenbaum Nov 26 '23 at 13:10