Newton’s method works for convex real functions

Theorem 1.

Let $f\\colon I\\to\\mathbb{R}$ be a convex differentiable functionon an interval $I\\subseteq\\mathbb{R}$ , with at least one root.Then the following sequence $\\{x_{n}\\}$ obtained from Newton’s method,

x_{n+1}=x_{n}-\\frac{f(x_{n})}{f^{\\prime}(x_{n})}\\,,

will converge to a root of $f$ , provided that $f^{\\prime}(x_{0})\eq 0$ and $x_{1}\\in I$ for the given starting point $x_{0}\\in I$ .

Obviously, “convex” can be replaced by “concave” in Theorem 1.

The proof will proceed in several steps. First, a simple converse result:

Theorem 2.

Let $f\\colon I\\to\\mathbb{R}$ be a differentiable convex function,and $\\{x_{n}\\}$ be the sequence in Theorem 1.If it is convergent to a number $a\\in I$ ,then $a$ is necessarily a root of $f$ .

Proof.

We have

0=\\lim_{n\\to\\infty}f^{\\prime}(x_{n})\\cdot(x_{n}-x_{n+1})=\\lim_{n\\to\\infty}f(x_%{n})=f(a)\\,.

(The first limit is zero by local boundedness of $f^{\\prime}$ at $a$ ,which follows from $f^{\\prime}$ being finite and monotone¹¹Actually, a differentiable convex function must necessarily have a continuous derivative,for the derivative is increasing, and it cannot have any jump discontinuities byDarboux’s Theorem (http://planetmath.org/DarbouxsTheorem)..)∎

Proof of Theorem 1

Case A: when $I=\\mathbb{R}$ , $f^{\\prime}>0$ , and $f(x_{0})\\geq 0$

We claim that whenever $f(x_{n})\\geq 0$ , then $f(x_{n+1})\\geq 0$ too.Recall that the graph of a convex function $f$ always liesabove any of its tangent lines.So in particular, the point $(x_{n+1},f(x_{n+1}))$ is higher than the point $(x_{n+1},0)$ , which is on the tangent line by definition.By induction, we have $f(x_{n})\\geq 0$ for all $n$ .

We also have $x_{n+1}\\leq x_{n}$ for all $n$ .By hypothesis the slope $f^{\\prime}(x_{n})$ of the tangent line is positive,and $(x_{n},f(x_{n}))$ lies above the horizontal axis.Then the intersection point of the tangent line and the horizontal axis,which is $(x_{n+1},0)$ , must be to the left of $(x_{n},0)$ .

So the sequence $\\{x_{n}\\}$ is decreasing. It converges to a real number ordiverges to $-\\infty$ .If the limit is a real number, by Theorem 2 that numberis a root of $f$ , and we are done.

If the limit is $-\\infty$ , then we must have $f(x_{n})>0$ for all $x_{n}$ . Hence $f(x)>0$ for all $x$ , as $f$ is monotoneand $x_{n}$ can be arbitrarily large negative.In other words, $f$ has no root to begin with.

Figure 1: Case A: $\\{x_{n}\\}$ is monotonically decreasing

Case B: when $I=\\mathbb{R}$ , $f^{\\prime}<0$ , and $f(x_{0})\\geq 0$

This situation is the (left-right) mirror of Case A,except that because the slope of the curve is negative,the sequence $\\{x_{n}\\}$ is increasing this time. We still have $x_{n}\\geq 0$ ,so the sequence either converges to a root of $f$ , or diverges to $+\\infty$ (in which case $f$ has no root).

Figure 2: Case B: $\\{x_{n}\\}$ is monotonically increasing

Case C: when $I=\\mathbb{R}$ , $f^{\\prime}(x_{0})>0$ , and $f(x_{0})\\geq 0$

To begin, note that the first part of Case A, asserting that $f(x_{n})\\geq 0$ for all $n$ , still goes through only assuming $f^{\\prime}(x_{n})\eq 0$ — in which case the sequence would not be defined after the point $x_{n}$ .We show that, in fact, $f^{\\prime}(x_{n})>0$ for all $n$ ,so the rest of Case A goes through.

Suppose $n$ is the smallest integerfor which $f^{\\prime}(x_{n+1})\\leq 0$ .Also assume $f(x_{n}),f(x_{n+1})\eq 0$ , otherwise we would be done anyway.The reasoning in Case A shows $x_{n+1}<x_{n}$ .

Figure 3: Case C: a tangent line separates $f$ from the horizontal axis

The function $f$ is strictly positive on $[x_{n+1},x_{n}]$ ,for the graph of $f$ lies above the tangent line segment extendingfrom $(x_{n+1},0)$ to $(x_{n},f(x_{n}))$ , which in turn liesstrictly above the horizontal axis.Since $f$ is decreasing to the left of $[x_{n+1},x_{n}]$ and increasing to the right, $f$ never touches the horizontal axis anywhere.This contradicts our hypothesis that $f$ has roots.

Case D: when $I=\\mathbb{R}$ , $f^{\\prime}(x_{0})<0$ , and $f(x_{0})\\geq 0$

This situation is the (left-right) mirrorof Case C. The same argument shows that $f^{\\prime}(x_{n})<0$ for all $n$ (or $f(x_{n})=0$ for some $n$ ), and theargument of Case B applies.

Case E: for general intervals $I$

The only concern is that the iteration $\\{x_{n}\\}$ might go outside the interval $I$ . We show that it does not.

Suppose $f(x_{0})\\geq 0$ .Then as we have proved, $f(x_{n})\\geq 0$ for all $n$ .Suppose $x_{n}\\in I$ but $x_{n+1}\otin I$ .Then $f$ has no root, because the graph of $f$ lies abovethe tangent line from $(x_{n},f(x_{n}))$ to $(x_{n+1},0)$ which lies above the horizontal axis.

The limit of $x_{n}$ could conceivably go just outsidethe interval $I$ .But this case is similar to the case $x_{n}\\to\\pm\\infty$ already considered in Case A: $f$ possesses no root then.

Finally, suppose $f(x_{0})<0$ , the case we have ignored all along.Our hypotheses assume that $f(x_{1})$ is defined.We immediately have $f(x_{1})\\geq 0$ ,since the graph of $f$ lies above the tangent linefrom $(x_{0},f(x_{0}))$ to $(x_{1},0)$ .But this just brings us back to the other cases.

Figure 4: Case E: $f(x_{1})$ is always $\\geq 0$

Examples

Newton’s method for finding square roots of positive numbers $\\alpha$ (which reduces to the ancient divide-and-average method) alwaysconverges, for the target function $f(x)=x^{2}-\\alpha$ is convex.

References

1 Michael Spivak. Calculus, third edition. Publish or Perish, 1994.