Markov Chains and partial views/Mutual Information of Coupled Markov Chains
From GusWiki
X and Y are Markov chains, with transition matrices P and Q, respectively; Z is a Markov chain, with transition matrix R, which we get by coupling them, and which lives on their product space. That is, X and Y are both partial views of Z. We are interested in measuring, information-theoretically, how much dependence the coupling introduces between X and Y, if any. We will do this by looking at the mutual information between the simultaneous partial states Xt and Yt, under the invariant distribution. (This is most relevant when we assume that Z is mixing, so there is only a single invariant distribution, and everything approaches it reasonably rapidly.)
[edit] Definitions
H[Xt] is the entropy of the partial state of X at time t. Similarly H[Zt] is the entropy of the complete state at t. By the assumption that Z is a chain on the product space, this is equal to the joint entropy H[Xt,Yt], since there is a 1-1 function between the complete state and the pair of partial states. The mutual information between the states of X and Y is thus H[Xt] + H[Yt] − H[Zt].
In the invariant distribution, the distributions of the states do not change (duh), so we can drop the time subscript in favor of one reminding us that we're in a steady state:
[edit] Calculation by Eigenvectors
For any Markov chain with only a single strongly-connected component in the state graph, we know from the Frobenius-Perron theorem that the transition matrix has a non-degenerate largest eigenvalue (namely, 1), and that the eigenvector corresponding to this has only non-negative entries. Scaled so that its entries sum to 1, this eigenvector is in fact the invariant probability (see here) (A degenerate eigenvalue of 1 corresponds to multiple strongly-connected components, each with an invariant distribution indicated by an eigenvector.)
Thus we can find the invariant distributions of X, Y and Z by taking the leading eigenvectors of P, Q and R, respectively. (Alternately, we can just take the leading eigenvector of R, and derive the invariant distributions for the each partial view by summing over the other view's states. This may be faster numerically.) Then we simply plug in to the Shannon formula for the entropy, getting H[Xinv], H[Yinv] and H[Zinv], and finally arithmetic gives us the desired mutual information.
