gradient

Summary.

The gradient is a first-order differential operator that mapsscalar functions to vector fields. It is a generalization of the ordinaryderivative, and as such conveys information about the rate of changeof a function relative to small variations in the independentvariables. The gradient of a function $f$ is customarily denoted by $\abla f$ or by $\\operatorname{grad}f$ .

1 Definition: Euclidean space

Let $f\\colon\\mathbb{R}^{n}\\to\\mathbb{R}$ be continuously differentiable.The gradient of $f$ , denoted by $\abla f$ ,is definedby the property:

\\operatorname{D}_{\\mathbf{v}}f=\abla f\\cdot\\mathbf{v}\\quad\\text{for all %vectors $\\mathbf{v}\\in\\mathbb{R}^{n}$.}

(1)

The middle dot is the dot product,and $\\operatorname{D}_{\\mathbf{v}}$ is the directional derivative with respect to $\\mathbf{v}$ .

If $x^{1},\\ldots,x^{n}$ are Euclidean coordinates,corresponding to the orthonormal basis $\\mathbf{e}_{1},\\ldots,\\mathbf{e}_{n}$ ,then

\abla f=\\sum_{i=1}^{n}\\frac{\\partial f}{\\partial x^{i}}\\,\\mathbf{e}_{i}\\,.

(2)

The formula (2) is sometimes given as the definition of $\abla f$ .We prefer to define $\abla f$ by the coordinate-free formula (1) instead,because then the geometric interpretations (see below) become obvious,and (1) also indicates how we would go aboutcalculating the gradient in other curvilinear coordinate systems.Formula (1) also makes it clear that the gradientis a physical vector, depending only on the inner product structure of $\\mathbb{R}^{n}$ ,and not on the specific coordinate system used to calculate it.

There is the issue of whether the $\abla f$ as defined by (1)exists; but this is proved easily enough, by substituting the concrete expression (2)and seeing that it satisfies (1).

The gradient can be considered to be avector-valued differential operator, written as

\abla=\\sum_{i=1}^{n}\\mathbf{e}_{i}\\,\\frac{\\partial}{\\partial x^{i}}\\,,

or, in the context of Euclidean 3-space, as

\abla=\\mathbf{i}\\,\\frac{\\partial}{\\partial x}+\\mathbf{j}\\,\\frac{\\partial}{%\\partial y}+\\mathbf{k}\\,\\frac{\\partial}{\\partial z}\\,,

where $\\mathbf{i},\\mathbf{j},\\mathbf{k}$ are the unit vectors lying along the positivedirection of the $x, y, z$ axes, respectively.

2 Geometric and physical interpretations

(a)
The direction of the vector $\abla f$ is the direction of the greatest positive change, or increase, in $f$ .The magnitude of $\abla f$ is the magnitude of this increase.This follows immediately from (1):
$\\operatorname{D}_{\\mathbf{v}}f=\abla f\\cdot\\mathbf{v}=\\lVert\abla f\\rVert\\,%\\lVert\\mathbf{v}\\rVert\\,\\cos\\theta\\,,$
where $\\theta$ is the angle between $\abla f$ and $\\mathbf{v}$ .So among all unit directions $\\mathbf{v}$ of change, if $\\mathbf{v}$ is perpendicular to $\abla f$ then the change $\\operatorname{D}_{\\mathbf{v}}f$ is zero; if $\\mathbf{v}$ is parallel to $\abla f$ then the change is maximized.
Similarly, $-\abla f$ is the direction of the greatest negative change, or decrease, in $f$ .
(b)
If $M$ is the hypersurface in $\\mathbb{R}^{n}$ defined by
$M=\\{p\\in\\mathbb{R}^{n}:f(p)=0\\,,\\>\\operatorname{D}f(p)\eq 0\\}\\,,$
then $\abla f(p)$ is the normal to the hypersurface $M$ at the point $p$ . For $\\ker\\operatorname{D}f(p)$ is the tangent space $\\mathrm{T}_{p}M$ to $M$ at $p$ , that is, $\\operatorname{D}_{\\mathbf{v}}f(p)=0$ for all $\\mathbf{v}\\in\\mathrm{T}_{p}M$ ,and by definition (1), $\abla f(p)$ must be perpendicular to all $\\mathbf{v}\\in\\mathrm{T}_{p}M$ .
Note that $\\operatorname{D}f\eq 0$ is equivalent to $\abla f\eq 0$ . Consequently, $\abla f$ also givesan orientation to the hypersurface $M$ .
For example, if $f(\\mathbf{x})=\\lVert\\mathbf{x}\\rVert-1$ for $\\mathbf{x}\\in\\mathbb{R}^{n}$ , $M$ is the $(n-1)$ -dimensional sphere of unit radius, embedded in $\\mathbb{R}^{n}$ .Its normal, $\abla f(\\mathbf{x})=\\mathbf{x}/\\lVert\\mathbf{x}\\rVert$ , as one would expect, points outward radially.
(c)
As a simple case of (b), consider thesurface $z=f(x,y)$ in $\\mathbb{R}^{3}$ , with Cartesian coordinates $(x,y,z)$ .Think of this surface as describing a hill, with height $z$ .Then the direction of the gradient vector $\abla f$ is the direction of steepest ascent of the hill, while itsmagnitude
$\\|\abla f\\|=\\sqrt{\\left(\\frac{\\partial f}{\\partial x}\\right)^{2}+\\left(\\frac{%\\partial f}{\\partial y}\\right)^{2}}$
is the slope or steepness in that direction.
If a ball is placed on the hill at a point $(x,y,z)$ ,theoretically it should roll down the hill in the direction of the gradientvector $-\abla f(x,y)$ . This may be easily derived by considering the mechanical forceson the ball. The direction of $-\abla f(x,y)$ is, in fact, the projectionto the xy-plane of an outward normal vector to the hill at $(x,y,z)$ ;the normal vector is involved because the movement of the ballarises from the normal force from the hill.
(d)
Suppose the surface $z=f(x,y)$ in (c) describes a bowl instead of a hill,and we place a marble at any point $(x,y,z)$ on this bowl.We would expect the marble to roll down to a local minimum point of $f(x,y)$ .Since the marble should roll down in the direction of $-\abla f$ ,we might hope that we can find local minimaof a given function $f$ by following the path mapped outby the gradients $-\abla f$ . Formally, this method offinding local extrema (with some modifications) is called gradient descent.
(e)
If $U$ is the potential function corresponding to aconservative physical force, then $\\mathbf{F}=-\abla U$ is the corresponding force field.
Consequently, the gradient theorem,
$\\int_{\\gamma}\\mathbf{F}\\cdot d\\mathbf{s}=-\\int_{\\gamma}\abla U\\cdot d\\mathbf{%s}=-U(\\gamma(b))+U(\\gamma(a)),\\quad\\gamma\\colon[a,b]\\to\\mathbb{R}^{3}$
simply gives the formula for the change in the potential energy $U$ when an object “does work” along a path $\\gamma$ in a conservative force field $\\mathbf{F}$ .

3 Definition: Riemannian geometry

It is obvious how (1)can be generalized to the setting of Riemannian manifolds:the dot product of $\\mathbb{R}^{n}$ must be replacedby the Riemannian metric, and the analogueof $\\operatorname{D}_{\\mathbf{v}}f$ is the directional derivative $\\mathbf{v}[f]$ , for tangent vectors $\\mathbf{v}$ on the Riemannian manifold.Thus for a smooth scalar-valued function $f$ on a Riemannian manifold,

\\mathbf{X}=\\operatorname{grad}f\\>\\Leftrightarrow\\>df_{p}(\\mathbf{v})=\\mathbf{v%}[f]=\\langle\\mathbf{X},\\mathbf{v}\\rangle_{p}\\,.

(3)

We can calculate $\\mathbf{X}$ explicitly as follows.If $x^{i}$ are local coordinates on the manifold (not necessarily orthonormal),set $\\mathbf{X}=X^{i}\\frac{\\partial}{\\partial x^{i}}$ (the Einstein summation convention is being used).Let $g_{ij}$ and $g^{ij}$ be the covariant and contravariant metric tensors, respectively.Then from (3),

\\frac{\\partial f}{\\partial x^{j}}=\\left\\langle X^{i}\\frac{\\partial}{\\partial x%^{i}},\\frac{\\partial}{\\partial x^{j}}\\right\\rangle=X^{i}\\,\\left\\langle\\frac{%\\partial}{\\partial x^{i}},\\frac{\\partial}{\\partial x^{j}}\\right\\rangle=g_{ij}%\\,X^{i}\\,,

and taking inverses,

X^{i}=g^{ij}\\,\\frac{\\partial f}{\\partial x^{j}}\\,.

(4)

4 Duality with differential one-forms

Definitions (1) and (3)exhibit $\abla f$ as the vector fielddual to the differential form $d f$ .The isomorphism is given by applying the inner product or Riemannian metric.This isomorphism is, of course, linear;in particular it leads to the identity

\abla f=\\frac{\\partial f}{\\partial x^{i}}\\,\abla x^{i}\\,,

(5)

which is the dual to the standard formula of differential one-forms:

df=\\frac{\\partial f}{\\partial x^{i}}\\,dx^{i}\\,.

Using (3) and (4),we have

\abla x^{i}=g^{ij}\\frac{\\partial}{\\partial x^{j}}\\,,\\quad\\frac{\\partial}{%\\partial x^{i}}=g_{ij}\abla x^{j}\\,.

(6)

So the isomorphism between vector fields and one-formsis expressed by changingthe $\abla$ ’s in (6) to $d$ ’s, and vice versa. That is,

dx^{i}\\leftrightarrow g^{ij}\\frac{\\partial}{\\partial x^{j}}\\,,\\quad\\frac{%\\partial}{\\partial x^{i}}\\leftrightarrow g_{ij}\\,dx^{j}\\,.

(7)

It is commonly said that this isomorphism is expressed by“raising and lowering the indices of a tensor field,using contractions with $g_{ij}$ and $g^{ij}$ ”.

Notice that when $x^{i}$ are orthonormal coordinateson $\\mathbb{R}^{n}$ , equation (5)reduces to equation (2), because $g_{ij}=\\mathbf{e}_{i}\\cdot\\mathbf{e}_{j}=\\delta_{ij}$ (Kronecker delta).

The formulae presented in this section are useful in the Euclidean setting aswell, for deriving the formulae for the gradient in various curvilinear coordinate systems (http://planetmath.org/GradientInCurvilinearCoordinates).

5 Differential identities

Several properties of the one-dimensional derivative generalize to amulti-dimensional setting

$\\displaystyle\abla(af+bg)$	$\\displaystyle=a\abla f+b\abla g$	Linearity
$\\displaystyle\abla(fg)$	$\\displaystyle=f\abla g+g\abla f$	Product rule
$\\displaystyle\abla(f\\circ\\phi)(p)$	$\\displaystyle=\\operatorname{D}\\phi(p)^{*}\abla f(\\phi(p))$	Chain rule
$\\displaystyle\abla(h\\circ f)(p)$	$\\displaystyle=h^{\\prime}(f(p))\\,\abla f(p)$	Another Chain rule

The function $h$ is $h\\colon\\mathbb{R}\\to\\mathbb{R}$ .The notation $(\\operatorname{D}\\phi)^{*}$ denotes the transpose of the Jacobian matrix,in Euclidean coordinates, of $\\phi\\colon\\mathbb{R}^{m}\\to\\mathbb{R}^{n}$ .In the abstract setting, $(\\operatorname{D}\\phi)^{*}$ is the adjoint to the tangent map $\\operatorname{D}\\phi$ between the tangent bundles of two Riemannian manifolds.

These identities can be proved directly from the definition,but the first three are really just the dualsof the following well-known identities for differential forms:

	$\\displaystyle d(af+bg)$	$\\displaystyle=adf+bdg$
	$\\displaystyle d(fg)$	$\\displaystyle=fdg+gdf$
	$\\displaystyle d(\\phi^{*}f)$	$\\displaystyle=\\phi^{*}df$

and so may be derived by changing the $d$ ’s hereto $\abla$ ’s! (Though the third identity may take a bit of thought.)

The following identity

\\operatorname{curl}\\operatorname{grad}f=\abla\\times\abla f=0\\,,

is a special case of the differential forms identity $d^{2}=0$ .Conversely, if $\\operatorname{curl}g=0$ on a simply connected domain, then thereis $f$ such that $g=\\operatorname{grad}f$ . See laminar field for details.

6 The $\abla$ symbolism

(This discussion does not really belong here, but should be movedto the nabla entry.)

Using the $\abla$ formalism,the divergence operator can be expressed as $\abla\\cdot$ , the curl operator as $\abla\\times$ , and theLaplacian operator as $\abla^{2}$ . To wit, for a given vector field

\\mathbf{A}=A_{x}\\,\\mathbf{i}+A_{y}\\,\\mathbf{j}+A_{z}\\,\\mathbf{k},

and a given function $f$ we have

	$\\displaystyle\abla\\cdot\\mathbf{A}$	$\\displaystyle=\\frac{\\partial A_{x}}{\\partial x}+\\frac{\\partial A_{y}}{\\partialy%}+\\frac{\\partial A_{z}}{\\partial z}$
	$\\displaystyle\abla\\times\\mathbf{A}$	$\\displaystyle=\\left(\\frac{\\partial A_{z}}{\\partial y}-\\frac{\\partial A_{y}}{%\\partial z}\\right)\\mathbf{i}+\\left(\\frac{\\partial A_{x}}{\\partial z}-\\frac{%\\partial A_{z}}{\\partial x}\\right)\\mathbf{j}+\\left(\\frac{\\partial A_{y}}{%\\partial x}-\\frac{\\partial A_{x}}{\\partial y}\\right)\\mathbf{k}$
	$\\displaystyle\abla^{2}f$	$\\displaystyle=\\frac{\\partial^{2}f}{\\partial x^{2}}+\\frac{\\partial^{2}f}{%\\partial y^{2}}+\\frac{\\partial^{2}f}{\\partial z^{2}}.$

References

1 Michael Spivak. A Comprehensive Introduction to Differential Geometry,Volume I. Publish or Perish, 1979.