4. Measurement

This section adapts Definition 1 (http://planetmath.org/1introduction#Thmdefn1)to distributedstochastic systems. The first step is to replace elements ofstate space $X$ with stochastic maps ${d_{in}}:{\\mathbb{R}}\\rightarrow{\\mathcal{V}}{S}^{\\mathbf{D}}$ , or equivalently probability distributionson ${S}^{\\mathbf{D}}$ , which are the system’s inputs. Individualelements of ${S}^{\\mathbf{D}}$ correspond to Dirac distributions.

Second, replace function $f:X\\rightarrow{\\mathbb{R}}$ with mechanism ${\\mathfrak{m}}_{\\mathbf{D}}:{\\mathcal{V}}{S}^{\\mathbf{D}}\\rightarrow{\\mathcal{%V}}{A}^{\\mathbf{D}}$ .Since we are interested in the compositional structure ofmeasurements we also consider submechanisms ${\\mathfrak{m}}_{\\mathbf{C}}$ .However, comparing mechanisms requires that they have the samedomain and range, so we extend ${\\mathfrak{m}}_{\\mathbf{C}}$ to the entiresystem as follows

{\\mathfrak{m}}_{\\mathbf{C}}={\\mathcal{V}}{S}^{\\mathbf{D}}\\xrightarrow{\\pi}{%\\mathcal{V}}{S}^{\\mathbf{C}}\\xrightarrow{{\\mathfrak{m}}_{\\mathbf{C}}}{\\mathcal%{V}}{A}^{\\mathbf{C}}\\xrightarrow{\\pi^{\atural}}{\\mathcal{V}}{A}^{\\mathbf{D}}.

(1)

We refer to the extension as ${\\mathfrak{m}}_{\\mathbf{C}}$ by abuse of notation.We extend mechanisms implicitly whenever necessary withoutfurther comment. Extending mechanisms in this way maps thequale into a cloud of points in $\\operatorname{Hom}({\\mathcal{V}}{A}^{\\mathbf{D}},{\\mathcal{V}}{S}^{\\mathbf{D}})$ labeled by objects in ${\\mathtt{Sys}}_{\\mathbf{D}}$ .

In the special case of the initial object $\\bot_{\\mathbf{D}}$ , define

{\\mathfrak{m}}_{\\bot}={\\mathcal{V}}{S}^{\\mathbf{D}}\\xrightarrow{\\omega}{%\\mathbb{R}}\\xrightarrow{\\omega^{\atural}}{\\mathcal{V}}{A}^{\\mathbf{D}}.

Remark 3.

Subsystems differing by non-existent edges (Remark 2 (http://planetmath.org/3distributeddynamicalsystems#Thmrem2))are mapped to the same mechanism by this construction, thusmaking the fact that the edges do not exist explicit withinthe formalism.

Composing an input with a submechanism yields an output ${d_{out}}:={\\mathfrak{m}}_{\\mathbf{C}}\\circ{d_{in}}:{\\mathbb{R}}\\rightarrow{%\\mathcal{V}}{A}^{\\mathbf{D}}$ , which is a probability distribution on ${A}^{\\mathbf{D}}$ . We are now in a position to define

Definition 8.

A measuring device is the dual ${\\mathfrak{m}}^{\atural}_{\\mathbf{C}}$ to the mechanism of a subsystem. An output is astochastic map ${d_{out}}:{\\mathbb{R}}\\rightarrow{\\mathcal{V}}{A}^{\\mathbf{D}}$ . Ameasurement is a composition ${\\mathfrak{m}}^{\atural}_{\\mathbf{C}}\\circ{d_{out}}:{\\mathbb{R}}\\rightarrow{%\\mathcal{V}}{S}^{\\mathbf{D}}$ .

Recall that stochastic maps of the form ${\\mathbb{R}}\\rightarrow{\\mathcal{V}}X$ correspond to probability distributions on $X$ . Outputs asdefined above are thus probability distributions on ${A}^{\\mathbf{D}}$ , the output alphabet of ${\\mathbf{D}}$ . Individualelements of ${A}^{\\mathbf{D}}$ are recovered as Dirac vectors: ${\\mathbb{R}}\\xrightarrow{\\delta_{a}}{\\mathcal{V}}{A}^{\\mathbf{D}}$ .

Definition 9.

The effective information generated by ${\\mathbf{C}}_{1}$ in the context of subsystem ${\\mathbf{C}}_{2}\\subset{\\mathbf{C}}_{1}$ is

ei({\\mathfrak{m}}_{{\\mathbf{C}}_{2}}\\rightarrow{\\mathfrak{m}}_{{\\mathbf{C}}_{1%}},{d_{out}}):=H\\left[{\\mathfrak{m}}_{{\\mathbf{C}}_{1}}^{\atural}\\circ{d_{out%}}\\Big{\\|}{\\mathfrak{m}}_{{\\mathbf{C}}_{2}}^{\atural}\\circ{d_{out}}\\right].

(2)

The null context, corresponding to the empty subsystem $\\bot=\\emptyset\\subset V_{\\mathbf{D}}\\times V_{\\mathbf{D}}$ , is a specialcase where ${\\mathfrak{m}}_{\\mathbf{C}}^{\atural}\\circ{d_{out}}$ is replaced by theuniform distribution $\\omega_{\\mathbf{D}}^{\atural}$ on ${S}^{\\mathbf{D}}$ .To simplify notation define

ei({\\mathfrak{m}}_{\\mathbf{C}},{d_{out}}):=ei({\\mathfrak{m}}_{\\bot}\\rightarrow%{\\mathfrak{m}}_{\\mathbf{C}},{d_{out}}).

Here, $H[p\\|q]=\\sum_{i}p_{i}\\log_{2}\\frac{p_{i}}{q_{i}}$ is theKullback-Leibler divergence or relative entropy[1]. Eq. (2) expands as

ei({\\mathfrak{m}}_{{\\mathbf{C}}_{2}}\\rightarrow{\\mathfrak{m}}_{{\\mathbf{C}}_{1%}},{d_{out}})=\\sum_{s\\in{S}^{\\mathbf{D}}}\\left\\langle{\\mathfrak{m}}^{\atural}%_{{\\mathbf{C}}_{1}}\\circ{d_{out}}\\Big{|}\\delta_{s}\\right\\rangle\\cdot\\log_{2}%\\frac{\\left\\langle{\\mathfrak{m}}^{\atural}_{{\\mathbf{C}}_{1}}\\circ{d_{out}}%\\Big{|}\\delta_{s}\\right\\rangle}{\\left\\langle{\\mathfrak{m}}^{\atural}_{{%\\mathbf{C}}_{2}}\\circ{d_{out}}\\Big{|}\\delta_{s}\\right\\rangle}.

(3)

When $d_{out}=\\delta_{a}$ for some ${a}\\in{A}^{\\mathbf{D}}$ wehave

ei({\\mathfrak{m}}_{{\\mathbf{C}}_{2}}\\rightarrow{\\mathfrak{m}}_{{\\mathbf{C}}_{1%}},\\delta_{a})=\\sum_{s\\in{S}^{\\mathbf{D}}}p_{{\\mathfrak{m}}_{{\\mathbf{C}}_{1}}%}(s|{a})\\cdot\\log_{2}\\frac{p_{{\\mathfrak{m}}_{{\\mathbf{C}}_{1}}}(s|{a})}{p_{{%\\mathfrak{m}}_{{\\mathbf{C}}_{2}}}(s|{a})}.

(4)

Definition 8 requires some unpacking. To relateit to the classical notion of measurement, Definition 1 (http://planetmath.org/1introduction#Thmdefn1),we consider system ${\\mathbf{D}}=\\left\\{v_{X}\\xrightarrow{f}v_{Y}\\right\\}$ where the alphabets of $v_{X}$ and $v_{Y}$ are the sets ${A}_{v_{X}}=X$ and ${A}_{v_{Y}}=Y$ respectively, and themechanism of $v_{Y}$ is ${\\mathfrak{m}}_{Y}={\\mathcal{V}}f$ . In other words,system ${\\mathbf{D}}$ corresponds to a single deterministic function $f:X\\rightarrow Y$ .

Proposition 5 (classical measurement).

The measurement $({\\mathcal{V}}f)^{\atural}\\circ\\delta_{y}$ performed when deterministic function $f:X\\rightarrow Y$ outputs $y$ is equivalent to the preimage $f^{-1}(y)$ .Effective information is $ei({\\mathcal{V}}f,\\delta_{y})=\\log_{2}\\frac{|X|}{|f^{-1}(y)|}$ .

Proof:By Corollary 2 (http://planetmath.org/2stochasticmaps#Thmthm2)measurement $({\\mathcal{V}}f)^{\atural}\\circ\\delta_{y}$ is conditionaldistribution

p_{{\\mathcal{V}}f}(x|y)=\\left\\{\\begin{matrix}\\frac{1}{|f^{-1}(y)|}&\\mbox{if }f%(x)=y\\\\0&\\mbox{else}.\\end{matrix}\\right.

which generalizes the preimage. Effective information followsimmediately. $\\blacksquare$

Effective information can be interpreted as quantifying ameasurement’s precision. It is high if few inputs cause $f$ to output $y$ out of many – i.e. $f^{-1}(y)$ has fewelements relative to $|X|$ – and conversely is low if manyinputs cause $f$ to output $y$ – i.e. if the output isrelatively insensitive to changes in the input. Precisemeasurements say a lot about what the input could have beenand conversely for vague measurements with low $e i$ .

The point of this paper is to develop techniques for studyingmeasurements constructed out of two or more functions. Wetherefore present computations for the simplest case,distributed system $X\\times Y\\xrightarrow{g}Z$ , in considerabledetail. Let ${\\mathbf{D}}$ be the graph

with obvious assignments of alphabets and the mechanism of $v_{Z}$ as ${\\mathfrak{m}}_{Z}={\\mathcal{V}}g$ . To make the formulas more readablelet ${\\mathfrak{m}}_{XY}={\\mathcal{V}}g$ , ${\\mathfrak{m}}_{X\\bullet}={\\mathcal{V}}g\\circ\\pi^{\atural}_{XY,X}$ and ${\\mathfrak{m}}_{\\bullet Y}={\\mathcal{V}}g\\circ\\pi^{\atural}_{XY,Y}$ . We then obtain lattice

The remainder of this section and most of the next analyzesmeasurements in the lattice.

Proposition 6 (partial measurement).

The measurement performed on $X$ when $g:X\\times Y\\rightarrow Z$ outputs $z$ , treating $Y$ as extrinsicnoise, is conditional distribution

p(x|z)=\\left\\{\\begin{matrix}\\frac{|g_{x\\times Y}^{-1}(z)|}{|g^{-1}(z)|}&\\mbox{%if }g(x,y)=z\\mbox{ for some }y\\in Y\\\\0&\\mbox{else,}\\end{matrix}\\right.

(5)

where $g^{-1}_{x\\times Y}(z):=pr_{Y}(g^{-1}(z)\\cap\\{x\\}\\times Y)$ . The effective information generated by thepartial measurement is

ei\\big{(}{\\mathfrak{m}}_{X\\bullet}^{\atural},\\delta_{z}\\big{)}=\\log_{2}|X|+%\\sum_{x\\in X}p(x|z)\\cdot\\log_{2}p(x|z).\\delta_{z}\\big{)}{|g^{-1}(z)|}\\cdot

(6)

Proof: Treating $Y$ as a source of extrinsic noise yields ${\\mathcal{V}}X\\xrightarrow{\\pi^{\atural}}{\\mathcal{V}}X\\otimes{\\mathcal{V}}Y%\\xrightarrow{{\\mathcal{V}}g}{\\mathcal{V}}Z$ which takes $\\delta_{x}\\mapsto\\frac{1}{|Y|}\\sum_{y\\in Y}\\delta_{g(x,y)}$ .The dual is

{\\mathfrak{m}}_{X\\bullet}^{\atural}=\\pi_{XY,X}\\circ({\\mathcal{V}}g)^{\atural%}:\\delta_{z}\\mapsto\\sum_{x\\in X}\\frac{|g^{-1}_{x\\times Y}(z)|}{|g^{-1}(z)|}%\\cdot\\delta_{x}.

The computation of effective information follows immediately. $\\blacksquare$

A partial measurement is precise if the preimage $g^{-1}(z)$ has small or empty intersection with $\\{x\\}\\times Y$ for most $x$ , and large intersection for few $x$ .

Propositions 5 and 6 computeeffective information of a measurement relative to the nullcontext provided by complete ignorance (the uniformdistribution). We can also compute the effective informationgenerated by a measurement in the context of a submeasurement:

Proposition 7 (relative measurement).

The information generated by measurement $X\\times Y\\xrightarrow{g}Z$ in the context of the partial measurement where $Y$ is unobserved noise, is

ei({\\mathfrak{m}}_{X\\bullet}\\rightarrow{\\mathfrak{m}}_{XY},\\delta_{z})=\\sum_{x%\\in X}\\frac{g^{-1}_{x\\times Y}(z)}{g^{-1}(z)}\\log_{2}\\frac{|Y|}{g^{-1}_{x%\\times Y}(z)}.

(7)

Proof: Applying Propositions 5 and6 obtains

ei({\\mathfrak{m}}_{X\\bullet}\\rightarrow{\\mathfrak{m}}_{XY},\\delta_{z})=\\sum_{(%x,y)\\in g^{-1}(z)}\\frac{1}{|g^{-1}(z)|}\\log_{2}\\left[\\frac{1}{|g^{-1}(z)|}%\\cdot\\frac{|g^{-1}(z)|\\cdot|Y|}{|g^{-1}_{x\\times Y}(z)|}\\right]

which simplifies to the desired expression. $\\blacksquare$

To interpret the result decompose $X\\times Y\\xrightarrow{g}Z$ into a family of functions $\\mathcal{R}=\\left\\{Y\\xrightarrow{g_{x\\times Y}}Z\\big{|}x\\in X\\right\\}$ labeled by elements of $X$ , where $g_{x\\times Y}(y):=g(x,y)$ .The precision of the measurement performed by $g_{x\\times Y}$ s $\\log_{2}\\frac{|Y|}{g^{-1}_{x\\times Y}(z)}$ . It follows thatthe precision of the relative measurement,Eq. (7), is the expected precision of themeasurements performed by family $\\mathcal{R}$ taken withrespect to the probability distribution $p(x|z)=\\frac{g^{-1}_{x\\times Y}(z)}{g^{-1}(z)}$ generated bythe noisy measurement.

In the special case of $g:X\\times Y\\rightarrow Z$ relativeprecision is simply the difference of the precision of thelarger and smaller subsystems:

Corollary 8 (comparing measurements).

ei({\\mathfrak{m}}_{X\\bullet}\\rightarrow{\\mathfrak{m}}_{XY},\\delta_{z})=ei({%\\mathfrak{m}}_{XY},\\delta_{z})-ei({\\mathfrak{m}}_{X\\bullet},\\delta_{z})

Proof: Applying Propositions 5,6, 7 and simplifying obtains

	$\\displaystyle ei({\\mathfrak{m}}_{XY},\\delta_{z})-ei({\\mathfrak{m}}_{X\\bullet},%\\delta_{z})$	$\\displaystyle=\\log_{2}\\frac{\|X\|\\cdot\|Y\|}{\|g^{-1}(z)\|}-\\sum_{x}\\frac{\|g^{-1}_{x%\\times Y}(z)\|}{\|g^{-1}(z)\|}\\log_{2}\\frac{\|X\|\\cdot\|g^{-1}_{x\\times Y}(z)\|}{\|g^{%-1}(z)\|}$
		$\\displaystyle=\\log_{2}\\frac{\|Y\|}{\|g^{-1}(z)\|}+\\sum_{(x,y)\\in g^{-1}(z)}\\frac{1%}{\|g^{-1}(z)\|}\\log_{2}\\frac{\|g^{-1}(z)\|}{\|g^{-1}_{x\\times Y}(z)\|}$
		$\\displaystyle=ei({\\mathfrak{m}}_{X\\bullet}\\rightarrow{\\mathfrak{m}}_{XY},%\\delta_{z}).\\,\\,\\blacksquare$

References

1 E T Jaynes (1985):Entropy and Search Theory. In CR Smith & WT Grandy,editors: Maximum-entropy and Bayesian Methods inInverse Problems, Springer.