请输入您要查询的字词:

 

单词 LogisticRegression
释义

logistic regression


Given a binary respose variable Y with probability of success p, the logistic regressionMathworldPlanetmath is a non-linearregression model with the following model equation:

E[Y]=exp(𝑿T𝜷)1+exp(𝑿T𝜷),

where 𝑿T𝜷 is the productPlanetmathPlanetmath of the transposeMathworldPlanetmath of the column matrix𝑿 of explanatory variables and the unknown column matrix 𝜷 of regression coefficients.Rewriting this so that the right hand side is 𝑿T𝜷, we arrive at a newequation

ln(E[Y]1-E[Y])=𝑿T𝜷.

The left hand side of this new equation is known as the logit function, defined on the open unit interval (0,1) withrange the entire real line :

logit(p):=ln(p1-p) where p(0,1).

Note that the logit of p is the same as the natural log of the odds of success (over failures) with the probability ofsuccess = p.Since Y is a binary response variable, so it has a binomial distribution with parameter (probability of success)p=E[Y], the logistic regression model equation can be rewritten as

logit(E[Y])=logit(p)=𝑿T𝜷.(1)

Logistic regression is a particular type of generalized linear model. In addition, the associated logit function isthe most appropriate and natural choice for a link function. By natural we mean that logit(p) is equalto the natural parameter θ appearing in the distribution functionMathworldPlanetmath for the GLM (generalized linear model). To seethis, first note that the distribution function for a binomial random variableMathworldPlanetmath Y is

P(Y=y)=(ny)py(1-p)(n-y),

where n is the number of trials and Y=y is the event that there are y success in these n trials. p, theparameter, is the probability of success. Let there be N iid binomial random variables Y1,Y2,,YN eachcorresponding to ni trials with pi probability of success. Then the joint probability distribution of these Nrandom variables is simply the product of the individual binomial distributions. Equating this to the distributionDlmfPlanetmath forthe GLM, which belongs to the exponential family of distributions, we have:

i=1N(niyi)piyi(1-pi)(ni-yi)=i=1Nexp[yiθi-b(θi)+c(yi)].

Taking the natural log on both sides, we have the equality of log-likelihood functionMathworldPlanetmath in two different forms:

i=1N[ln(niyi)+yilnpi+(ni-yi)ln(1-pi)]=i=1N[yiθi-b(θi)+c(yi)].

Rearranging the left hand side and comparing term i, we have

yiln(pi1-pi)+niln(1-pi)+ln(niyi)=yiθi-b(θi)+c(yi),

so that θi=ln(pi/(1-pi))=logit(pi).

Next, setting the natural link function logit of the expected valueMathworldPlanetmath of Yi, which is pi, to the linear portion ofthe GLM, we have

logit(pi)=𝑿iT𝜷,

giving us the model formulaMathworldPlanetmathPlanetmath for the logistic regression.

Remarks.

  • Comparing model equation for the logistic regression to that of the normal or Gaussian linear regression model, wesee that the difference is in the choice of link function. In normal liner model, the regression equation looks like

    E[Y]=𝑿T𝜷.(2)

    The link function in this case is the identity functionMathworldPlanetmath. The model equation is consistentPlanetmathPlanetmath because the linear terms onthe right hand side allow E[Y] on the left hand side to vary over the reals. However, for a binaryresponse variable, Equation (2) would not be appropriate as the left hand side is restricted to only within the unitinterval, whereas the right hand side has the possibility of going outside of (0,1). Therefore, Equation (1) is moreappropriate when we are dealing with a binary response data variable.

  • The logit function is not the only choice of link function for the logistic regression. Other, “non-natural”link functions are available. Two such examples are the probit functionMathworldPlanetmath, or the inversePlanetmathPlanetmathPlanetmathPlanetmath cumulative normal distributionfunction Φ-1(p) and the complimentary-log-log function ln(-ln(1-p)). Both of these functions map the openunit interval to .

随便看

 

数学辞典收录了18232条数学词条,基本涵盖了常用数学知识及数学英语单词词组的翻译及用法,是数学学习的有利工具。

 

Copyright © 2000-2023 Newdu.com.com All Rights Reserved
更新时间:2025/5/4 15:07:35