sufficient statistic

Let $\\{f_{\\theta}\\}$ be a statistical model with parameter $\\theta$ . Let $\\boldsymbol{X}=(X_{1},\\ldots,X_{n})$ be a random vectorof random variables representing $n$ observations. A statistic $T=T(\\boldsymbol{X})$ of $\\boldsymbol{X}$ for the parameter $\\theta$ is called asufficient statistic, or a sufficient estimator, ifthe conditional probability distribution of $\\boldsymbol{X}$ given $T(\\boldsymbol{X})=t$ is not a function of $\\theta$ (equivalently,does not depend on $\\theta$ ).

In other words, all the information about the unknown parameter $\\theta$ is captured in the sufficient statistic $T$ . If, say, weare interested in finding out the percentage of defective lightbulbs in a shipment of new ones, it is enough, or sufficient,to count the number of defective ones (sum of the $X_{i}$ ’s), ratherthan worrying about which individual light bulbs are the defectiveones (the vector $(X_{1},\\ldots,X_{n})$ ). By taking the sum, a certain“reduction” of data has been achieved.

Examples

1.
Let $X_{1},\\ldots,X_{n}$ be $n$ independent observations from auniform distribution on integers $1,\\ldots,\\theta$ . Let $T=\\max\\{X_{1},\\ldots,X_{n}\\}$ be a statistic for $\\theta$ .Then the conditional probability distribution of $\\boldsymbol{X}=(X_{1},\\ldots,X_{n})$ given $T=t$ is
$P(\\boldsymbol{X}\\mid t)=\\frac{P(X_{1}=x_{1},\\ldots,X_{n}=x_{n},\\max\\{X_{n}\\}=t%)}{P(\\max\\{X_{n}\\}=t)}.$
The numerator is $0$ if $\\max\\{x_{n}\\}\eq t$ . So in this case, $P(\\boldsymbol{X}\\mid t)=0$ and is not a function of $\\theta$ .Otherwise, the numerator is $\\theta^{-n}$ and $P(\\boldsymbol{X}\\mid t)$ becomes
$\\frac{\\theta^{-n}}{P(\\max\\{X_{n}\\}=t)}=(\\theta^{n}P(X_{(1)}\\leq\\cdots\\leq X_{(%n)}=t))^{-1},$
where $X_{(i)}$ ’s are the rearrangements of the $X_{i}$ ’s in anon-decreasing order from $i=1$ to $n$ . For the denominator, we first note that
$\\displaystyle P(X_{(1)}\\leq\\cdots\\leq X_{(n)}=t)$ $\\displaystyle=$ $\\displaystyle P(X_{(1)}\\leq\\cdots\\leq X_{(n)}\\leq t)-P(X_{(1)}\\leq\\cdots\\leq X%_{(n)}<t)$
$\\displaystyle=$ $\\displaystyle P(X_{(1)}\\leq\\cdots\\leq X_{(n)}\\leq t)-P(X_{(1)}\\leq\\cdots\\leq X%_{(n)}\\leq t-1).$
From the above equation, we find that there are $t^{n}-(t-1)^{n}$ ways to form non-decreasing finite sequences of $n$ positive integers such that the maximum of the sequence is $t$ . So
$(\\theta^{n}P(X_{(1)}\\leq\\cdots\\leq X_{(n)}=t))^{-1}=(\\theta^{n}(t^{n}-(t-1)^{n%})\\theta^{-n})^{-1}=(t^{n}-(t-1)^{n})^{-1}$
again is not a function of $\\theta$ . Therefore, $T=\\max\\{X_{i}\\}$ is asufficient statistic for $\\theta$ .Here, we see that a reduction of data has been achieved by takingonly the largest member of set of observations, not the entire set.
2.
If we set $T(X_{1},\\ldots,X_{n})=(X_{1},\\ldots,X_{n})$ , then we seethat $T$ is trivially a sufficient statistic for anyparameter $\\theta$ . The conditional probability distribution of $(X_{1},\\ldots,X_{n})$ given $T$ is 1. Even though this is a sufficientstatistic by definition (of course, the individual observationsprovide as much information there is to know about $\\theta$ aspossible), and there is no loss of data in $T$ (which is simply alist of all observations), there is really no reduction of data tospeak of here.
3.
The sample mean
$\\overline{X}=\\frac{X_{1}+\\cdots+X_{n}}{n}$
of $n$ independent observations from a normal distribution $N(\\mu,\\sigma^{2})$ (both $\\mu$ and $\\sigma^{2}$ unknown) is asufficient statistic for $\\mu$ . This is the result of thefactorization criterion. Similarly, one sees that any partition ofthe sum of $n$ observations $X_{i}$ into $m$ subtotals is a sufficientstatistic for $\\mu$ . For instance,
$T(X_{1},\\ldots,X_{n})=(\\sum_{i=1}^{j}X_{i},\\sum_{i=j+1}^{k}X_{i},\\sum_{i=k+1}^%{n}X_{i})$
is a sufficient statistic for $\\mu$ .
4.
Again, assume there are $n$ independent observations $X_{i}$ froma normal distribution $N(\\mu,\\sigma^{2})$ with unknown mean andvariance. The sample variance
$\\frac{1}{n-1}\\sum_{i=1}^{n}(X_{i}-\\overline{X})^{2}$
is not asufficient statistic for $\\sigma^{2}$ . However, if $\\mu$ is a knownconstant, then
$\\frac{1}{n-1}\\sum_{i=1}^{n}(X_{i}-\\mu)^{2}$
is a sufficient statisticfor $\\sigma^{2}$ .

A sufficient statistic for a parameter $\\theta$ is calleda minimal sufficient statistic if it can be expressed as afunction of any sufficient statistic for $\\theta$ .

Example. In example $3$ above, both the sample mean $\\overline{X}$ and the finite sum $S=X_{1}+\\cdots+X_{n}$ are minimalsufficient statistics for the mean $\\mu$ . Since, by thefactorization criterion, any sufficient statistic $T$ for $\\mu$ is avector whose coordinates form a partition of the finite sum, takingthe sum of these coordinates is just the finite sum $S$ . So, wehave just expressed $S$ as a function of $T$ . Therefore, $S$ isminimal. Similarly, $\\overline{X}$ is minimal.

Two sufficient statistics $T_{1},T_{2}$ for a parameter $\\theta$ aresaid to be equivalent provided that there is a bijection $g$ suchthat $g\\circ T_{1}=T_{2}$ . $\\overline{X}$ and $S$ from the aboveexample are two equivalent sufficient statistics. Two minimal sufficient statistics for the same parameter are equivalent.

Title	sufficient statistic
Canonical name	SufficientStatistic
Date of creation	2013-03-22 15:02:42
Last modified on	2013-03-22 15:02:42
Owner	CWoo (3771)
Last modified by	CWoo (3771)
Numerical id	11
Author	CWoo (3771)
Entry type	Definition
Classification	msc 62B05
Synonym	sufficient estimator
Synonym	minimally sufficient statistic
Synonym	minimal sufficient
Synonym	minimally sufficient
Defines	minimal sufficient statistic
Defines	equivalent statistic