2. Stochastic maps

Any conditional distribution $p(y|x)$ on finite sets $X$ and $Y$ can berepresented as a matrix as follows. Let ${\\mathcal{V}}X=\\{\\varphi:X\\rightarrow{\\mathbb{R}}\\}$ denote the vector space of real valued functions on $X$ and similarlyfor $Y$ . ${\\mathcal{V}}X$ is equipped with Dirac basis $\\{\\delta_{x}:X\\rightarrow{\\mathbb{R}}|x\\in X\\}$ , where

\\delta_{x}(x^{\\prime})=\\left\\{\\begin{matrix}1&\\mbox{if }x=x^{\\prime}\\\\0&\\mbox{else}.\\end{matrix}\\right.

Given a conditional distribution $p(y|x)$ construct matrix ${\\mathfrak{m}}_{p}$ withentry $p(y|x)$ in column $\\delta_{x}$ and row $\\delta_{y}$ . Matrix ${\\mathfrak{m}}_{p}$ isstochastic: it has nonnegative entries and its columns sum to 1.Alternatively, given a stochastic matrix ${\\mathfrak{m}}:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}Y$ , we can recover the conditional distribution. The Dirac basis inducesEuclidean metric

{\\langle}\\bullet|\\bullet{\\rangle}:{\\mathcal{V}}X\\otimes{\\mathcal{V}}X%\\rightarrow{\\mathbb{R}}:\\left\\langle\\sum\\alpha_{x}\\delta_{x}\\left|\\sum\\beta_{x%}\\delta_{x}\\right\\rangle\\right.=\\sum\\alpha_{x}\\beta_{x}

(1)

which identifies vector spaces with their duals ${\\mathcal{V}}X\\approx({\\mathcal{V}}X)^{*}$ . Let $p_{\\mathfrak{m}}(y|x):={\\langle}\\delta_{y}|{\\mathfrak{m}}(\\delta_{x}){\\rangle}$ .

Definition 2.

The category of stochastic maps ${\\mathtt{Stoch}}$ has function spaces ${\\mathcal{V}}X$ for objects and stochastic matrices ${\\mathfrak{m}}:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}Y$ with respect to Dirac bases for arrows. Weidentify of $({\\mathcal{V}}X)^{*}$ with ${\\mathcal{V}}X$ using the Dirac basiswithout further comment below.

Definition 3.

The dual of surjective stochastic map ${\\mathfrak{m}}:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}Y$ is the composition ${\\mathfrak{m}}^{\atural}:={\\mathcal{V}}Y\\xrightarrow{{\\mathfrak{m}}^{*}\\circren%}{\\mathcal{V}}X$ , where $r e n$ is the unique mapmaking diagram ${\\mathfrak{m}}^{\\perp}$ of ${\\mathfrak{m}}$ with columns renormalized to sum to 1. Thestochastic dual ${\\mathfrak{m}}^{\atural}$ is

commute. Precomposing ${\\mathfrak{m}}^{*}$ with $r e n$ renormalizes¹¹If ${\\mathfrak{m}}$ is not surjective, i.e. if one of the rows has allzero entries, then the renormalization is not well-defined. itscolumns to sum to 1. The stochastic dual of a stochastic transform isstochastic; further, if ${\\mathfrak{m}}$ is stochastic then $({\\mathfrak{m}}^{\atural})^{\atural}={\\mathfrak{m}}$ .

Category ${\\mathtt{Stoch}}$ is described in terms of braid-like generators andrelations in [2]. A more general, but also more complicated,category of conditional distributions was introduced by Giry [3],see [5].

Example 1 (deterministic functions).

Let $\\mathtt{FSet}$ be the category of finite sets. Define faithfulfunctor ${\\mathcal{V}}:\\mathtt{FSet}\\rightarrow{\\mathtt{Stoch}}$ taking set $X$ to ${\\mathcal{V}}X$ and function $f:X\\rightarrow Y$ to stochastic map ${\\mathcal{V}}f:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}Y:\\delta_{x}\\mapsto\\delta%_{f(x)}$ . It is easy to see that ${\\mathcal{V}}(X\\times Y)={\\mathcal{V}}X\\otimes{\\mathcal{V}}Y$ and ${\\mathcal{V}}(X\\cup Y)={\\mathcal{V}}X\\times{\\mathcal{V}}Y$ .

We introduce special notation for commonly used functions:

•
Set inclusion.For any inclusion $i:X\\hookrightarrow Y$ of sets, let $\\iota:={\\mathcal{V}}i:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}Y$ denote the corresponding stochasticmap. Two important examples are
- –
  Point inclusion.Given $x\\in X$ define $\\iota_{x}:{\\mathbb{R}}\\rightarrow{\\mathcal{V}}X:1\\mapsto\\delta_{x}$ .
- –
  Diagonal map.Inclusion $\\Delta:X\\hookrightarrow X\\times X:x\\mapsto(x,x)$ induces $\\iota_{\\Delta}:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}X\\otimes{\\mathcal{V}}X:%\\delta_{x}\\mapsto\\delta_{x}\\otimes\\delta_{x}$ .
•
Terminal map.Let $\\omega_{X}:{\\mathcal{V}}X\\rightarrow{\\mathbb{R}}:\\delta_{x}\\mapsto 1$ denote theterminal map induced by $X\\rightarrow\\{\\bullet\\}$ .
•
Projection.Let $\\pi_{XY,X}:{\\mathcal{V}}X\\otimes{\\mathcal{V}}Y\\rightarrow{\\mathcal{V}}X:\\delta%_{x}\\otimes\\delta_{y}\\mapsto\\delta_{x}$ denote the projectioninduced by $pr_{X\\times Y,X}:X\\times Y\\rightarrow X:(x,y)\\mapsto x$ .

Proposition 1 (dual is Bayes over uniform distribution).

The dual of a stochastic map applies Bayes rule to compute theposterior distribution ${\\langle}{\\mathfrak{m}}^{\atural}(\\delta_{y})|\\delta_{x}{\\rangle}=p_{%\\mathfrak{m}}(x|y)$ using the uniform probability distribution.

Proof:The uniform distribution is the dual $\\omega_{X}^{\atural}:{\\mathbb{R}}\\rightarrow{\\mathcal{V}}X:1\\mapsto\\frac{1}{|%X|}\\sum_{x}\\delta_{x}$ of the terminal map $\\omega_{X}:{\\mathcal{V}}X\\rightarrow{\\mathbb{R}}$ . It assignsequal probability $p_{\\omega^{\atural}}(x)=\\frac{1}{|X|}$ to all of $X$ ’selements, and can be characterized as the maximally uninformativedistribution [4]. Let ${\\mathfrak{m}}:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}Y$ .The normalized transpose is

{\\mathfrak{m}}^{\atural}(\\delta_{y})=\\sum_{x}\\frac{p_{\\mathfrak{m}}(y|x)}{%\\sum_{x^{\\prime}}p_{\\mathfrak{m}}(y|x^{\\prime})}\\delta_{x}=\\sum_{x}\\frac{p_{%\\mathfrak{m}}(y|x)\\cdot p_{\\omega^{\atural}}(x)}{\\sum_{x^{\\prime}}p_{%\\mathfrak{m}}(y|x^{\\prime})p_{\\omega^{\atural}}(x^{\\prime})}\\delta_{x}=\\sum_{%x}p_{\\mathfrak{m}}(x|y)\\cdot\\delta_{x}.\\,\\,\\blacksquare

Remark 1.

Note that $p_{\\mathfrak{m}}(x|y):={\\langle}{\\mathfrak{m}}^{\atural}(\\delta_{y})|\\delta_{%x}{\\rangle}\eq{\\langle}\\delta_{y}|{\\mathfrak{m}}(\\delta_{x}){\\rangle}=:p_{%\\mathfrak{m}}(y|x)$ .Dirac’s bra-ket notation must be used with care since stochasticmatrices are not necessarily symmetric [1].

Corollary 2 (preimages).

The dual $({\\mathcal{V}}f)^{\atural}:{\\mathcal{V}}Y\\rightarrow{\\mathcal{V}}X$ ofstochastic map ${\\mathcal{V}}f:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}Y$ isconditional distribution

p_{{\\mathcal{V}}f}(x|y)=\\left\\{\\begin{matrix}\\frac{1}{|f^{-1}(y)|}&\\mbox{if }f%(x)=y\\\\0&\\mbox{else}.\\end{matrix}\\right.

(2)

Proof:By the proof of Proposition 1

({\\mathcal{V}}f)^{\atural}(\\delta_{y})=\\frac{1}{|f^{-1}(y)|}\\sum_{\\{x|f(x)=y%\\}}\\delta_{x}.\\,\\,\\blacksquare

The support of $p_{{\\mathcal{V}}f}(X|y)$ is $f^{-1}(y)$ . Elements in thesupport are assigned equal probability, thereby treating them as anundifferentiated list. Dual $({\\mathcal{V}}f)^{\atural}$ thus generalizes theinverse image $f^{-1}:Y\\rightarrow\\underline{2}^{X}$ . Conveniently however,the dual $({\\mathcal{V}}X)^{\atural}$ simply flips the domain and range of ${\\mathcal{V}}f$ , whereas the inverse image maps to powerset $\\underline{2}^{X}$ ,an entirely new object.

Corollary 3 (marginalization with respect to uniform distribution).

Precomposing ${\\mathcal{V}}X\\otimes{\\mathcal{V}}Y\\xrightarrow{{\\mathfrak{m}}}{\\mathcal{V}}Z$ with the dual $\\pi_{X}^{\atural}$ to ${\\mathcal{V}}X\\otimes{\\mathcal{V}}Y\\xrightarrow{\\pi_{X}}{\\mathcal{V}}X$ marginalizes $p_{\\mathfrak{m}}(z|x,y)$ over the uniform distribution on $Y$ .

Proof:By Corollary 2 we have $\\pi^{\atural}_{X}:{\\mathcal{V}}X\\rightarrow{\\mathcal{V}}X\\otimes{\\mathcal{V}}%Y:\\delta_{y}\\mapsto\\frac{1}{|Y|}\\sum_{y\\in Y}\\delta_{x}\\otimes\\delta_{y}$ . It followsimmediately that

p_{{\\mathfrak{m}}\\circ\\pi^{\atural}_{X}}(z|x)=\\frac{1}{|Y|}\\sum_{y\\in Y}p_{%\\mathfrak{m}}(z|x,y).\\,\\,\\blacksquare

Precomposing with $\\pi_{X}^{\atural}$ treats inputs from $Y$ asextrinsic noise. Although duals can be defined so that theyimplement Bayes’ rule with respect to other probability distributions,this paper restricts attention to the simplest possible renormalizationof columns, Definition 2. The uniform distribution isconvenient since it uses minimal prior knowledge (it depends only on thenumber of elements in the set) to generalize pre-images to the stochasticcase, Proposition 2.

References

1 P A M Dirac (1958): ThePrinciples of Quantum Mechanics. Oxford University Press.
2 Tobias Fritz (2009): Apresentation of the category of stochastic matrices. arXiv:0902.2554v1 .
3 M Giry (1981): Acategorical approach to probability theory. In B Banaschewski, editor: Categorical Aspects of Topology and Analysis,Springer.
4 E T Jaynes (1957):Information theory and statistical mechanics. Phys. Rev.106(4), pp. 620–630.
5 P Panangaden (1998):Probabilistic relations. In C Baier, M Huth,M Kwiatkowska & M Ryan, editors: PROBMIV’98, pp. 59–74.