请输入您要查询的字词:

 

单词 ChisquaredStatistic
释义

chi-squared statistic


Let X be a discrete random variable with m possible outcomesx1,,xm with probability of each outcomeP(X=xi)=pi.

n independent observations are obtained where each observation hasthe same distributionPlanetmathPlanetmath as X. Bin the observations into m groups,so that each group contains all observations having the same outcomexi. Next, count the number of observations in each group to getn1,,nk corresponding to the outcomes x1,,xk, sothat n=ni. It is desired to find out how close the actualnumber of outcomes ni are to their expected valuesMathworldPlanetmath npi.

Intuitively, this “closeness” depends on how big the sample is,and how large the deviations are between the observed and theexpected, for all categories. The value

χ2=i=1m(ni-npi)2npi,(1)

called the χ2 statisticMathworldMathworldPlanetmath, or the chi-squaredstatistic, is such a measure of “closeness”. It is also known asthe Pearson-chi-squared statistic, in honor of the Englishstatistician Karl Pearson, who showed that (1) has approximately achi-squared distribution (http://planetmath.org/ChiSquaredRandomVariable) withm-1 degrees of freedom. The degree of freedom depends on thenumber of free variablesMathworldPlanetmathPlanetmath in χ2, and is not always m-1, as wewill see in Example 3.

Usually, χ2 statistic is utilized in hypothesis testingMathworldPlanetmath, wherethe null hypothesis specifies that the actual equals the expected. Alarge value of χ2 means either the deviations from theexpectations are large or the sample is small, and therefore, eitherthe null hypothesis should be rejected or there is not enoughinformation to give a meaningful interpretationMathworldPlanetmathPlanetmath. How large of adeviation, compared to the sample size, is enough to reject the nullhypothesis depends on the degree of freedom of chi-squareddistribution of χ2 and the specified critical values.

Examples.

  1. 1.

    Suppose a coin is tossed 10 times and 7 heads are observed.We would like to know if the coin is fair based on theobservations. We have the following hypothesisMathworldPlanetmath:

    H0:p=12  H1:p12.

    Break up the observations into two groups: heads and tails. Then,according to H0,

    χ2=(7-5)25+(3-5)25=1.60.

    Checking the table of critical values of chi-squared distributions,we see that at degree of freedom =1, there is a 0.100 chance that theχ2 value is higher than 2.706. Since 1.600<2.706, we maynot want to reject the null hypothesis. However, we may not wantto outrightly accept it either simply because the sample size is not verylarge.

  2. 2.

    Now, a coin is tossed 100 times and 70 heads are observed.Using the same null hypothesis as above,

    χ2=(70-50)250+(30-50)250=16.00.

    Even at p-value =0.005, the corresponding critical value of 7.879is quite a bit smaller than 16. So we will reject the nullhypothesis even at confidence level 99.5%(=1-p-value).

  3. 3.

    χ2 statistic can be used in non-parametric situations aswell, particularly, in contingency tablesMathworldPlanetmath. Three dice of varyingsizes are each tossed 100 times and the top faces are recorded. Theresults of the count of each possible value of the top face, foreach die is summarized in the following table:

    Die\\top face123456all
    Die 1161917151914100
    Die 2171814132216100
    Die 3122019182011100
    All dice455750466141300

    Let Xi= count of top face=i, and Yj= Die j. Next, wewant to test the following hypotheses:

    H0:Xi is independent of Yj  H1:otherwise.

    Since we do not know the exact distribution ofthe top faces, we approximate the distribution by using the lastrow. For example, the (marginal) probability that top face = 1 is45300=0.15. This says that the probability that top face = 1in Die i = 0.15×13=0.05. Then, based on thenull hypothesis, we have the following table of “expected count”:

    Die\\top face123456
    Die 115.019.016.715.320.313.7
    Die 215.019.016.715.320.313.7
    Die 315.019.016.715.320.313.7

    For each die, we can compute the χ2. For instance, for thefirst die,

    χ2=(16-15.0)215.0+(19-19.0)219.0+(17-16.7)216.7+
    (15-15.3)215.3+(19-20.3)220.3+(14-13.7)213.7
    =0.176

    The results are summarized in the following

    χ2degrees of freedom
    Die 10.1765
    Die 21.6365
    Die 31.9690
    All dice3.78110

    Note that the degree of freedom for the last dice is 0 because theexpected counts in the last row are completely determined by thosein the first two rows (and the totals). Looking up the table, wesee that there is a 90% that the value of χ2 will begreater than 4.865, and since 3.781<4.865, we accept the nullhypothesis: the outcomes of the tosses have no bearing on which dieis tossed.

Remark. In general, for a p×q 2-way contingencytable, the χ2 statistic is given by

χ2=i=1pj=1q(nij-mij)2mij,(2)

where nij and mij are the actual and expected counts inCell (i,j). When the sample is large, χ2 has a chi-squareddistribution with (p-1)(q-1) degrees of freedom. In particular,when testing for the independence between two categorical variables,the expected count mij is

mij=ni*n*jn, where ni*=j=1qnij, n*j=i=1pnij, and n=i=1pj=1qnij.
随便看

 

数学辞典收录了18232条数学词条,基本涵盖了常用数学知识及数学英语单词词组的翻译及用法,是数学学习的有利工具。

 

Copyright © 2000-2023 Newdu.com.com All Rights Reserved
更新时间:2025/5/4 16:43:37