请输入您要查询的字词:

 

单词 ExtendedDiscussionOfTheConjugateGradientMethod
释义

extended discussion of the conjugate gradient method


Suppose we wish to solve the system

A𝐱=𝐛(1)

where A is a symmetricPlanetmathPlanetmathPlanetmathPlanetmath positive definite matrix. If we define the function

Φ(𝐱)=12𝐱TA𝐱-𝐱T𝐛(2)

we realise that solving (1) is equivalentMathworldPlanetmathPlanetmathPlanetmathPlanetmath to minimizing Φ. This is because if 𝐱 is a minimum of Φ then we must have

Φ(𝐱)=A𝐱-𝐛=𝟎(3)

These considerations give rise to the steepest descent algorithm. Given an approximation 𝐱~ to the solution 𝐱, the idea is to improve the approximation by moving in the direction in which Φ decreases most rapidly. This direction is given by the gradientMathworldPlanetmath of Φ in 𝐱~. Therefore we formulate our algorithm as follows

Given an initial approximation 𝐱(0), for k1N,

𝐱(k)=𝐱(k-1)+αk𝐫(k)

where 𝐫(k)=𝐛-A𝐱(k-1)=-Φ(𝐱(k-1)) and αk is a scalar to be determined.

𝐫(k) is traditionally called the residual vector. We wish to choose αk in such a way that we reduce Φ as much as possibile in each iteration, in other words, we wish to minimize the function ϕ(αk)=Φ(𝐱(k-1)+αk𝐫(k)) with respect to αk

ϕ(αk)=Φ(𝐱(k-1)+αk𝐫(k))T𝐫(k)=[A𝐱(k)-𝐛+αkA𝐫(k)]T𝐫(k)=[-𝐫(k)+αkA𝐫(k)]T𝐫(k)=0αk=𝐫(k)𝐫(k)T𝐫(k)AT𝐫(k)

It’s possible to demonstrate that the steepest descent algorithm described above converges to the solution 𝐱, in an infiniteMathworldPlanetmath time. The conjugate gradient method improves on this by finding the exact solution after only n iterations. Let’s see how we can achieve this.

We say that 𝐱~ is optimal with respect to a direction 𝐝 if λ=0 is a local minimumMathworldPlanetmath for the function Φ(𝐱~+λ𝐝).

In the steepest descent algorithm, 𝐱(k) is optimal with respect to 𝐫(k), but in general it is not optimal with respect to 𝐫(0),,𝐫(k-1). If we could modify the algorithm such that the optimality with respect to the search directions is preserved we might hope that that 𝐱(n) is optimal with respect to n linearly independentMathworldPlanetmath directions, at which point we would have found the exact solution.

Let’s make the following modification

𝐩(k)={𝐫(1)if k=1𝐫(k)-βk𝐩(k-1)if k>1(4)
𝐱(k)=𝐱(k-1)+αk𝐩(k)(5)

where αk and βk are scalar multipliers to be determined

We choose αk in the same way as before, i.e. such that 𝐱(k) is optimal with respect to 𝐩(k)

αk=𝐫(k)𝐩(k)T𝐩(k)AT𝐩(k)(6)

Now we wish to choose βk such that 𝐱(k), is also optimal with respect to 𝐩(k-1). We require that

Φ(𝐱(k)+λ𝐩(k-1))λ|λ=0=[A𝐱(k)-𝐛]T𝐩(k-1)=0(7)

Since 𝐱(k)=𝐱(k-1)+αk𝐩(k), and assuming that 𝐱(k-1) is optimal with respect to 𝐩(k-1) (i.e. that [A𝐱(k-1)-𝐛]T𝐩(k-1)=0) we can rewrite this condition as

[A(𝐱(k-1)+αk𝐩(k))-𝐛]T𝐩(k-1)=αk[𝐫(k)-βk𝐩(k-1)]TA𝐩(k-1)=0(8)

and therefore we obtain the required value for βk,

βk=𝐫(k)AT𝐩(k-1)𝐩(k-1)AT𝐩(k-1)(9)

Now we want to show that 𝐱(k) is also optimal with respect to 𝐩(1),,𝐩(k-2), i.e. that

[A𝐱(k)-𝐛]T𝐩(j)=𝐫(k+1)𝐩(j)T=0  j1k-2(10)

We do this by strong inductionMathworldPlanetmath on k, assuming that for all 1k-1,j1, 𝐱() is optimal with respect to 𝐩(j) or equivalently that

[A𝐱()-𝐛]T𝐩(j)=𝐫(+1)𝐩(j)T=0(11)

Noticing that 𝐫(k+1)=𝐫(k)+αkA𝐩(k), we want to show

𝐫(k)𝐩(j)T+αk𝐩(k)AT𝐩(j)=0(12)

and therefore since 𝐫(k)𝐩(j)T=0 by inductive hypothesis, it suffices to prove that

𝐩(k)AT𝐩(j)=0  j1k-2(13)

But, again by the definition of 𝐩(k), this is equivalent to proving

𝐫(k)AT𝐩(j)-βk𝐩(k-1)AT𝐩(j)=0(14)

and since 𝐩(k-1)AT𝐩(j)=0 by inductive hypothesis, all we need to prove is that

𝐫(k)AT𝐩(j)=0(15)

To proceed we require the following lemma.

Let V=span{A-1𝐫(1),,A𝐫(1),𝐫(1)}.

  • 𝐫(),𝐩()V

  • If 𝐲V then 𝐫(k)𝐲 for all k>.

Since 𝐩(j)Vj, it follows that A𝐩(j)Vj+1, and since j+1<k,

𝐫(k)(A𝐩(j))T=0(16)

To finish let’s prove the lemma. The first point is by induction. The base case =1 holds. Assuming that 𝐫(-1),𝐩(-1)V-1, we have that

𝐫()=𝐫(-1)+α-1A𝐩(-1)V  𝐩()=𝐫()-β𝐩(-1)V(17)

For the second point we need an alternative characterizationMathworldPlanetmath of V. Since 𝐩(1),,𝐩()V,

span{𝐩(1),,𝐩()}V(18)

By (11), we have that if for some s1

𝐩(s)=λs-1𝐩(s-1)++λ1𝐩(1)(19)

then 𝐩(s)𝐫(s)T=0, but we know that

𝐩(s)𝐫(s)T=[𝐫(s)-βs𝐩(s-1)]T𝐫(s)=𝐫(s)𝐫(s)T(20)

but were this zero, we’d have 𝐫(s)=𝟎 and we would have solved the original problem. Thus we conclude that 𝐩(s)span{𝐩(1),,𝐩(s-1)}, so the vectors 𝐩(1),,𝐩(s) are linearly independent. It follows that dimspan{𝐩(1),,𝐩()}=, which on the other hand is the maximum possible dimensionPlanetmathPlanetmath of V and thus we must have

V=span{𝐩(1),,𝐩()}(21)

which is the alternative characterization we were looking for.Now we have, again by (11), that if 𝐲V, and <k, then

𝐫(k)𝐲T=0(22)

thus the second point is proven.

随便看

 

数学辞典收录了18232条数学词条,基本涵盖了常用数学知识及数学英语单词词组的翻译及用法,是数学学习的有利工具。

 

Copyright © 2000-2023 Newdu.com.com All Rights Reserved
更新时间:2025/5/4 17:08:39