sufficient statistic
Let be a statistical model with parameter. Let be a random vectorof random variables representing observations. A statistic
of for the parameter is called asufficient statistic, or a sufficient estimator, ifthe conditional probability distribution of given is not a function of (equivalently,does not depend on ).
In other words, all the information about the unknown parameter is captured in the sufficient statistic . If, say, weare interested in finding out the percentage of defective lightbulbs in a shipment of new ones, it is enough, or sufficient,to count the number of defective ones (sum of the ’s), ratherthan worrying about which individual light bulbs are the defectiveones (the vector ). By taking the sum, a certain“reduction” of data has been achieved.
Examples
- 1.
Let be independent
observations from auniform distribution
on integers . Let be a statistic for .Then the conditional probability distribution of given is
The numerator is if. So in this case, and is not a function of .Otherwise, the numerator is and becomes
where’s are the rearrangements of the ’s in anon-decreasing order from to . For the denominator, we first note that
From the above equation, we find that there are ways to form non-decreasing finite sequences
of positive integers such that the maximum of the sequence is. So
again is not a function of . Therefore, is asufficient statistic for .Here, we see that a reduction of data has been achieved by takingonly the largest member of set of observations, not the entire set.
- 2.
If we set , then we seethat is trivially a sufficient statistic for anyparameter . The conditional probability distribution of given is 1. Even though this is a sufficientstatistic by definition (of course, the individual observationsprovide as much information there is to know about aspossible), and there is no loss of data in (which is simply alist of all observations), there is really no reduction of data tospeak of here.
- 3.
The sample mean
of independent observations from a normal distribution
(both and unknown) is asufficient statistic for . This is the result of thefactorization criterion. Similarly, one sees that any partition
ofthe sum of observations into subtotals is a sufficientstatistic for . For instance,
is a sufficient statistic for .
- 4.
Again, assume there are independent observations froma normal distribution with unknown mean andvariance
. The sample variance
is not asufficient statistic for . However, if is a knownconstant, then
is a sufficient statisticfor .
A sufficient statistic for a parameter is calleda minimal sufficient statistic if it can be expressed as afunction of any sufficient statistic for .
Example. In example above, both the sample mean and the finite sum are minimalsufficient statistics for the mean . Since, by thefactorization criterion, any sufficient statistic for is avector whose coordinates form a partition of the finite sum, takingthe sum of these coordinates is just the finite sum . So, wehave just expressed as a function of . Therefore, isminimal. Similarly, is minimal.
Two sufficient statistics for a parameter aresaid to be equivalent provided that there is a bijection
suchthat . and from the aboveexample are two equivalent sufficient statistics. Two minimal sufficient statistics for the same parameter are equivalent.
Title | sufficient statistic |
Canonical name | SufficientStatistic |
Date of creation | 2013-03-22 15:02:42 |
Last modified on | 2013-03-22 15:02:42 |
Owner | CWoo (3771) |
Last modified by | CWoo (3771) |
Numerical id | 11 |
Author | CWoo (3771) |
Entry type | Definition |
Classification | msc 62B05 |
Synonym | sufficient estimator |
Synonym | minimally sufficient statistic |
Synonym | minimal sufficient |
Synonym | minimally sufficient |
Defines | minimal sufficient statistic |
Defines | equivalent statistic |