Norm-wise condition number vs component-wise condition number

Question

I'm going through my lecture notes for a numerical linear algebra class and there were a few things in the chapter which covers condition numbers which I did not quite understand.

Two types of condition numbers are introduced, the first one is given by

$\kappa_{1}({f(\boldsymbol{x})} ; \boldsymbol{x})=\frac{\|\boldsymbol{J}(\boldsymbol{x})\|}{\|\boldsymbol{f}(\boldsymbol{x})\| /\|\boldsymbol{x}\|}$

and the second one is

$\kappa_{2}(f(\boldsymbol{x}) ; \boldsymbol{x}) =\frac{\left|\boldsymbol{J}^{\mathrm{T}}(\boldsymbol{x})\right||\boldsymbol{x}|}{|f(\boldsymbol{x})|}$ .

My first question is, what is the difference between the two? It says that we can employ the second condition number when the first one provides a "pessimistic" result, but this seems very arbitrary to me.

Then it is derived that the second condition number can be bounded by the first one if the infinity norm is used and the output $f$ is assumed to be scalar. For the derivation they used following equation

$\kappa_{1, \infty}(f ; \boldsymbol{x})=\frac{\left\|\boldsymbol{J}^{\mathrm{T}}(\boldsymbol{x})\right\|_{\infty}\|\boldsymbol{x}\|_{\infty}}{|f(\boldsymbol{x})|}$.

Because the transpose of the Jacobian matrix was used and $J^T$ is a row vector it can be written as $\left\|\boldsymbol{J}^{\mathrm{T}}(\boldsymbol{x})\right\|_{\infty}=\|\boldsymbol{J}(\boldsymbol{x})\|_{1}=\sum_{i=1}^{m}\left|[\boldsymbol{J}(\boldsymbol{x})]_{i}\right|$. I did not understand why we use the transpose of $J$. Is it possible to do the same with $x$ or do we have to take the infinity norm there?

Would you check with your instructor if there is an infinity norm missing in the definition of $\kappa_2$. Moreover, it is not clear how the fraction is interpreted when $f$ is not a scalar. — Carl Christian, Aug 25 '20 at 17:39
Unfortunately my instructor was/is not responding, that's why I asked here in the first place. — Opiumaster, Aug 27 '20 at 10:52

score 2 · Answer 1 · answered Aug 24 '20 at 10:04

The expression for $\kappa_2$ does not make sense unless $f(x)$ is a scalar. The given expressions for $\kappa_1$ and $\kappa_2$ are not definitions, but theorems.

In this answer I will rigorously define the normwise relative condition number and the component relative condition number. This should clarify their differences.

Let $\Omega \subseteq \mathbb{R}^n$ be an open set, let $f : \Omega \rightarrow \mathbb{R}^m$ and let $x \in \Omega$. If $x \not = 0$ and if $f(x) \not = 0$, then the normwise relative condition number $\kappa_f^{nr}$ is defined as follows. First we define an auxiliary function \begin{equation} \kappa_f^{nr}(x,\delta) = \sup \left \{ \frac{\|f(x)-f(y)\|}{\|f(x)\|} \big/ \frac{\|x-y\|}{\|x\|}\: : \: 0 < \| x-y\| < \delta \|x\| \right \}. \end{equation} where $\delta > 0$ is any number such that $$ \{ y \in \mathbb{R}^n \: : \: \|x\| < \delta \|x\|) \subseteq \Omega. $$ It is clear that the function $\delta \rightarrow \kappa_f^{nr}(x,\delta)$ is nonnegative and nondecreasing. It follows that the limit $$ \underset{\delta \rightarrow 0_.}{\lim} \kappa_f^{nr}(x,\delta) $$ exists. This allows us to define the normwise relative condition number $\kappa_f^{nr}$ as follows $$ \kappa_f^{nr}(x) = \underset{\delta \rightarrow 0_+}{\lim} \kappa_f^{nr}(x,\delta).$$

The normwise relative condition number imposes a hard limit on the normwise relative error which can be achieved. If $y \in \Omega$ satisfies $\|x-y\| \leq \delta \|x\|$, then $$ \frac{\|f(x)-f(y)\|}{\|f(x)\|} = \left(\frac{\|f(x)-f(y)\|}{\|f(x)\|} \big/ \frac{\|x-y\|}{\|x\|}\right) \frac{\|x-y\|}{\|x\|} \leq \kappa_f^{nr}(x,\delta) \frac{\|x-y\|}{\|x\|} $$ Moreover, if $\delta$ is sufficiently small, then $$ \kappa_f^{nr}(x,\delta) \approx \kappa_f^{nr}(x) $$ is a good approximation. It follows that we cannot expect a normwise relative error which is smaller than $$ \frac{\|f(x)-f(y)\|}{\|f(x)\|} \approx \kappa_f^{nr}(x,\delta) \frac{\|x-y\|}{\|x\|}. $$ From this definition it is possible to prove the following result. If $f : \Omega \rightarrow \mathbb{R}^m$ is also differentiable at the point $x \in \Omega$, then $$ \kappa_f^{nr}(x) = \frac{\|Df(x)\|\|x\|}{\|f(x)\|} $$ where $Df(x) \in \mathbb{R}^{m \times n}$ is the Jacobian of $f$ at the point $x$. To be clear, if $A = Df(x)$ is the Jacobian of $f$ at $x$, then $a_{ij} = \frac{\partial f_i}{\partial x_j}(x)$.

Now in order to define the componentwise relative condition number, we first define the componentwise relative error. Let $x \in \mathbb{R}^n$ denote the target value and let $y \in \mathbb{R}^n$ denote the approximation. Then the componentwise relative error is given by \begin{equation} \rho(x,y) = \max \left \{ \frac{|x_j - y_j|}{|x_j|} \: : \: j = 1,2,\dotsc,n \right \}, \end{equation} where we extend the usual definition of fractions to include \begin{equation} \frac{a}{b} = \begin{cases} 0 & a = 0 \wedge b = 0, \\ \infty & a > 0 \wedge b = 0. \end{cases} \end{equation} Now let $x \in \Omega$ be a point such that $x_j \not = 0$ for all $j$ and $f_i(x) \not = 0$ for all $i$. We begin by defining an auxiliary function $\kappa_f^{cr}$ given by $$ \kappa_f^{cr}(x,\delta) = \sup \left\{ \frac{\rho(f(x),f(y))}{\rho(x,y)} \: : \: 0 < \rho(x,y) < \delta \right\}. $$ It is clear that $\delta \rightarrow \kappa_f^{cr}(x,\delta)$ is a nonnegative and nondecreasing function. It follows that the limit $$ \underset{\delta \rightarrow 0_+}{\lim} \kappa_f^{cr}(x,\delta) $$ exists and is nonnegative. This allows us to define the componentwise relative condition number of $f$ as follows $$ \kappa_f^{cr}(x) = \underset{\delta \rightarrow 0_+}{\lim} \kappa_f^{cr}(x,\delta). $$ The componentwise relative condition number imposes a hard limit on the componentwise accuracy which can be achieved. If $y \in \Omega$ is such that $0 < \rho(x,y) < \delta$, then $$ \rho(f(x),f(y)) = \left(\frac{\rho(f(x),f(y))}{\rho(x,y)}\right) \rho(x,y)\leq \kappa_f^{cr}(x,\delta) \rho(x,y). $$ Moreover, if $\delta$ is sufficiently small, then $$ \kappa_f^{cr}(x,\delta) \approx \kappa_f{cr}(x) $$ is a good approximation. It follows that we cannot expect a componentwise relative error which is smaller than $$ \rho(f(x),f(y)) \approx \kappa_f^{cr}(x,\delta) \rho(x,y). $$ From the definition it is possible to prove the following result. If $f$ is also differentiable at $x \in \Omega$, then $$ \kappa_f^{cr}(x) = \left \|\frac{|Df(x)||x|}{|f(x)|} \right\|_\infty. $$ Here it is vital to appreciate the fact that the division on the right-hand side is componentwise when $f$ is a vector function.

It is clear that the two condition numbers measures the sensitivity of $f$ to small changes in the input, but they rely on different definitions of "small". If $f$ is also scalar function, i.e., if $m = 1$, then we have $$ \kappa_f^{cr}(x) = \left \|\frac{|Df(x)||x|}{|f(x)|} \right\|_\infty = \frac{\||Df(x)||x|\|_\infty}{|f(x)|} \leq \frac{\||Df(x)|\|_\infty\||x|\|_\infty}{\|f(x)\|_\infty} = \frac{\|Df(x)\|_\infty\|x\|_\infty}{\|f(x)\|_\infty} =\kappa_f^{nr}(x). $$ In this case, we see that the normwise relative condition number is always larger than the componentwise condition number. However, I feel that it is a bit misleading to state that the normwise relative condition number more is pessimistic than the componentwise condition number, simply because they use a different definition of "small".

Much confusion can be avoided by always stating clearly the domain and the codomain of the function in question. In fact, a function is really a triple $(f,U,V)$ consisting of a domain $U$, a co-domain $V$ and a rule $f$ for assigning to exact element in the domain $U$ exactly one element in the co-domain $V$. Unfortunately, the established notation tends to focus only on the rule $f$.

The definition of the componentwise relative condition number used here is extracted from the paper "Mixed, componentwise and structured condition numbers" by Gohberg and Koltracht, SIAM J. Matrix Anal. Appl., 14(3), page(s) 688–704, 1993.

eepperly16 · Accepted Answer · 2020-08-25T04:30:14.543

Your definitions of the two conditions numbers seem to be inconsistent with each other. A slight sticking point is that different authors define the Jacobian differently. Some use (A) $J_{ij} = \partial f_i / \partial x_j$ and others use (B) $J_{ij} = \partial f_j / \partial x_i$. With the first definition, we have that $f(x+\Delta x) = f(x) + J(x) \Delta x + O(\|\Delta x\|^2)$ and with the second we get $f(x+\Delta x) = f(x) + J^\top(x) \Delta x + O(\|\Delta x\|^2)$. The first definition appears to use the definition (A) of the Jacobian and the second definition definitely requires one to use definition (B) for the product $|J^\top(x)||x|$ to be well-defined. In the case that the norm $\|\cdot\|$ is transpose-invariant $\|A\| = \|A^\top\|$ it doesn't matter which definition you use. There are enough notational consistencies between different authors that it's hard for me to exactly disambiguate what's happening here. I checked popular numerical linear algebra books (Golub and Van Loan, Trefethen and Bau, Demmel, Higham) and could not find any other explicitly using this particular set of definitions. Perhaps if you could find another source with this set of definitions, I (or someone else) could help further.

Let me now address your main question. Suppose I want to solve the diagonal system of linear equations

\begin{equation} \underbrace{\begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}}_{=A} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}. \end{equation}

This corresponds to the function $f(a,b) = (a^{-1},b^{-1})$ with Jacobian

$$ J(a,b) = -\begin{bmatrix} a^{-2} & 0 \\ 0 & b^{-2} \end{bmatrix} $$

which has norm $\|J(a,b)\| = \max(a^{-2},b^{-2})$ in the operator $\infty$-norm. Let's assume going forward that $a > b > 0$ so $\|J(a,b)\| = b^{-2}$. The first condition number is then

$$ \kappa_1(f(a,b);(a,b)) = \frac{\|J(a,b)\|}{\|f(a,b)\|/\|(a,b)\|} = \frac{b^{-2}}{b^{-1}/ a} = \frac{a}{b}. $$

Thus if $a\gg b$, this problem is very ill-conditioned. Now, let's look at the component-wise condition number

$$ \kappa_2(f(a,b);(a,b)) = \frac{\begin{bmatrix} a^{-2} & 0 \\ 0 & b^{-2} \end{bmatrix}\begin{bmatrix}a\\ b\end{bmatrix}}{\begin{bmatrix} a^{-1} \\ b^{-1}\end{bmatrix}} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}. $$

I haven't seen this definition you've given using elementwise division of vectors, and I believe the canonical component-wise condition number would be a norm of this "vector condition number". (e.g. Using the $\infty$-norm, $\kappa_2(f(a,b);(a,b)) = 1$.) Using the component-wise condition number, the problem seems perfectly well-conditioned! What is going on here?

The standard vanilla norm-wise condition number measures approximately how much we expect the relative error between $f(x+\Delta x)$ and $f(x)$ to be compared to the relative error between $x$ and $x+\Delta x$. Specifically,

$$ \mbox{relative error in $f$} \le \kappa \cdot (\mbox{relative error in $x$}) + \mbox{higher order terms}. $$

If we say $(a+\Delta a, b+\Delta b)$ has a relative error, say, $10^{-6}$ in the $\infty$-norm compared to the true value $(a,b)$ this means that the errors $\Delta a$ and $\Delta b$ in each component are less than $10^{-6}\|(a,b)\| = 10^{-6}a$. Note that if $a$ is more than $10^6b$, then this means the error $\Delta b$ can be larger than $b$ itself! But when we actually evaluate $f$, $a^{-1}$ is much much smaller than $b^{-1}$ but $b$ has been perturbed by a large error $\Delta b$ and thus the relative error in $f$ (largely dominated by the relative error of $b^{-1}$ is very high. In effect, if one considers norm-wise relative error, the relative error of small components of a vector can be made very large and these large component-wise errors can be amplified if $f$ depends on the small entries of its input.

In many practical settings, we have an input vector for which every component has small relative error. For example, if the errors $\Delta a$ and $\Delta b$ are the result of approximating arbitrary real numbers $a$ and $b$ by floating point numbers, we have that $|\Delta a| \le \epsilon |a|$ and $|\Delta b| \le \epsilon |b|$ for a small constant $\epsilon$. Thus, this worst-case scenario in the last case is impossible, but there's no way to prove that using norms as, if we only assume, $\|(\Delta a, \Delta b)\| \le \epsilon \|(a,b)\|$, there's no way to show $\Delta b$ is small relative to $b$. Component-wise condition numbers do exactly this. They allow you to measure the conditioning of a problem relative to small component-wise perturbations in the input, which allows one much better control over the relative error in small values in the input vector.

At the end of the day, I still have to say the line "we can employ the second condition number when the first one provides a 'pessimistic' result" because there's not a catch-all heuristic to definitively show when componentwise conditioning will or won't give a substantially better error bound. However, I hope that the example I've given is a revealing illustration for how norm-wise conditioning can give misleadingly pessimistic error bounds for a problem and how component-wise conditioning can give more realistic bounds.

Note that condition numbers are numbers and not vectors. Certainly, we can have a matrix of conditon numbers, but this is not under consideration here. I am quite certain that there was error in the equation for $k_2$. I have include what I perceive to be the correct expression in my reply. It is clear to me what you mean when you speak of the error $\Delta a$, but it can be useful to emphasize that you are really considering $|\Delta a|$. The last line of your last paragraph is not clear to me. It seems to me that a few words were omitted? — Carl Christian, Aug 24 '20 at 10:21
@CarlChristian I've made some edits to reflect your comment. I think there is a reasonable definition of a "vector condition number" where $\mbox{norm of relative componentwise error of f} \lesssim (\mbox{vector condition number}) \cdot (\mbox{vector of relative errors in $x$})$ but I'm not positive the OP's definition satisfies this property. — eepperly16, Aug 25 '20 at 04:43

Norm-wise condition number vs component-wise condition number

2 Answers2

Linked