请输入您要查询的字词:

 

单词 ShannonsEntropy
释义

Shannon’s entropy


Definition (Discrete)

Let X be a discrete random variable on a finite setMathworldPlanetmath𝒳={x1,,xn}, with probability distributionfunction p(x)=Pr(X=x). The entropyMathworldPlanetmath H(X) of X isdefined as

H(X)=-x𝒳p(x)logbp(x).(1)

The convention 0log0=0 is adopted in the definition. Thelogarithm is usually taken to the base 2, in which case theentropy is measured in “bits,” or to the base e, in which caseH(X) is measured in “nats.”

If X and Y are random variables on 𝒳 and𝒴 respectively, the joint entropy of X and Yis

H(X,Y)=-(x,y)𝒳×𝒴p(x,y)logbp(x,y),

where p(x,y) denote the joint distributionPlanetmathPlanetmath of X and Y.

Discussion

The Shannon entropy was first introduced by Shannon in 1948 in hislandmark paper “A Mathematical Theory of Communication.” Theentropy is a functionalPlanetmathPlanetmathPlanetmath of the probability distribution functionp(x), and is sometime written as

H(p(x1),p(x2),,p(xn)).

It is noted that the entropy of X does not depend on the actualvalues of X, it only depends on p(x). The definition ofShannon’s entropy can be written as an expectation

H(X)=-E[logbp(X)].

The quantity -logbp(x) is interpreted as the informationcontent of the outcome x𝒳, and is also called the Hartleyinformation of x. Hence the Shannon’s entropy is the averageMathworldPlanetmathamount of information contained in random variable X, it is alsothe uncertainty removed after the actual outcome of X isrevealed.

Characterization

We write H(X) as Hn(p1,,pn). The Shannon entropysatisfies the following properties.

  1. 1.

    For any n, Hn(p1,,pn) is a continuousMathworldPlanetmathPlanetmath andsymmetric function on variables p1, p2,,pn.

  2. 2.

    Event of probability zero does not contribute to the entropy, i.e. for any n,

    Hn+1(p1,,pn,0)=Hn(p1,,pn).
  3. 3.

    Entropy is maximized when the probability distribution isuniform. For all n,

    Hn(p1,,pn)Hn(1n,,1n).

    This follows from Jensen inequalityMathworldPlanetmath,

    H(X)=E[logb(1p(X))]logb(E[1p(X)])=logb(n).
  4. 4.

    If pij, 1im, 1jn arenon-negative real numbers summing up to one, and qi=j=1npij, then

    Hmn(p11,,pmn)=Hm(q1,,qm)+i=1mqiHn(pi1qi,,pinqi).

    If we partitionPlanetmathPlanetmath the mn outcomes of the random experiment intom groups, each group contains n elements, we can do theexperiment in two steps: first determine the group to which theactual outcome belongs to, and second find the outcome in thisgroup. The probability that you will observe group i is qi.The conditional probability distribution function given group iis (pi1/qi,,pin/qi). The entropy

    Hn(pi1qi,,pinqi)

    is the entropy of the probability distribution conditioned ongroup i. Property 4 says that the total information is the sumof the information you gain in the first step, Hm(q1,,qm), and a weighted sum of the entropies conditioned on eachgroup.

Khinchin in 1957 showed that the only function satisfying theabove assumptionsPlanetmathPlanetmath is of the form:

Hn(p1,,pn)=-ki=1npilogpi

where k is a positive constant, essentially a choice of unit ofmeasure.

Definition (Continuous)

Entropy in the continuous case is called differential entropyMathworldPlanetmath (http://planetmath.org/DifferentialEntropy).

Discussion—Continuous Entropy

Despite its seductively analogous form, continuous entropy cannot be obtained as a limiting case of discrete entropy.

We wish to obtain a generally finite measure as the “bin size” goes to zero. In the discrete case, the bin size is the (implicit) width of each of the n (finite or infiniteMathworldPlanetmath) bins/buckets/states whose probabilities are the pn. As we generalize to the continuous domain, we must make this width explicit.

To do this, start with a continuous function f discretized as shown in the figure:

Figure 1: Discretizing the function f into bins of width Δ

As the figure indicates, by the mean-value theorem there exists a value xi in each bin such that

f(xi)Δ=iΔ(i+1)Δf(x)𝑑x(2)

and thus the integral of the function f can be approximated (in the Riemannian sense) by

-f(x)𝑑x=limΔ0i=-f(xi)Δ(3)

where this limit and “bin size goes to zero” are equivalentMathworldPlanetmathPlanetmathPlanetmathPlanetmathPlanetmathPlanetmath.

We will denote

HΔ:=-i=-Δf(xi)logΔf(xi)(4)

and expanding the log we have

HΔ=-i=-Δf(xi)logΔf(xi)(5)
=-i=-Δf(xi)logf(xi)-i=-f(xi)ΔlogΔ.(6)

As Δ0, we have

i=-f(xi)Δf(x)𝑑x=1  and(7)
i=-Δf(xi)logf(xi)f(x)logf(x)𝑑x(8)

This leads us to our definition of the differential entropy (continuous entropy):

h[f]=limΔ0[HΔ+logΔ]=--f(x)logf(x)𝑑x.(9)
随便看

 

数学辞典收录了18232条数学词条,基本涵盖了常用数学知识及数学英语单词词组的翻译及用法,是数学学习的有利工具。

 

Copyright © 2000-2023 Newdu.com.com All Rights Reserved
更新时间:2025/6/18 9:20:06