Abstract:
The field of embodied intelligence emphasises the importance of the morphology and environment with respect to the behaviour of a cognitive system. The contribution of the morphology to the behaviour, commonly known as morphological computation, is well-recognised in this community. We believe that the field would benefit from a formalisation of this concept as we would like to ask how much the morphology and the environment contribute to an embodied agent’s behaviour, or how an embodied agent can maximise the exploitation of its morphology within its environment. In this work we derive two concepts of measuring morphological computation, and we discuss their relation to the Information Bottleneck Method. The first concepts asks how much the world contributes to the overall behaviour and the second concept asks how much the agent’s action contributes to a behaviour. Various measures are derived from the concepts and validated in two experiments that highlight their strengths and weaknesses.

Abstract:
Dynamics of information flow in adaptively interacting stochastic processes is studied. We give an extended form of game dynamics for Markovian processes and study its behavior to observe information flow through the system. Examples of the adaptive dynamics for two stochastic processes interacting through matching pennies game interaction are exhibited along with underlying causal structure.

Abstract:
Stochastic interdependence of a probablility distribution on a product space is measured by its Kullback-Leibler distance from the exponential family of product distributions (called multi-information). Here we investigate low-dimensional exponential families that contain the maximizers of stochastic interdependence in their closure. Based on a detailed description of the structure of probablility distributions with globally maximal multi-information we obtain our main result: The exponential family of pure pair-interactions contains all global maximizers of the multi-information in its closure.

Abstract:
A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds, that is, if every variable is independent of its non-descendants given its parents. In general, there is a whole class of DAGs that represents a given set of conditional independence relations. We are interested in properties of this class that can be derived from observations of a subsystem only. To this end, we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in any DAG representing some unknown larger system. More explicitly, we show that a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information. Within the causal interpretation of DAGs our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause to more than two variables. Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version.

Abstract:
We study notions of robustness of Markov kernels and probability distribution of a system that is described by $n$ input random variables and one output random variable. Markov kernels can be expanded in a series of potentials that allow to describe the system's behaviour after knockouts. Robustness imposes structural constraints on these potentials. Robustness of probability distributions is defined via conditional independence statements. These statements can be studied algebraically. The corresponding conditional independence ideals are related to binary edge ideals. The set of robust probability distributions lies on an algebraic variety. We compute a Gr\"obner basis of this ideal and study the irreducible decomposition of the variety. These algebraic results allow to parametrize the set of all robust probability distributions.

Abstract:
The decomposition of channel information into synergies of different order is an open, active problem in the theory of complex systems. Most approaches to the problem are based on information theory, and propose decompositions of mutual information between inputs and outputs in se\-veral ways, none of which is generally accepted yet. We propose a new point of view on the topic. We model a multi-input channel as a Markov kernel. We can project the channel onto a series of exponential families which form a hierarchical structure. This is carried out with tools from information geometry, in a way analogous to the projections of probability distributions introduced by Amari. A Pythagorean relation leads naturally to a decomposition of the mutual information between inputs and outputs into terms which represent single node information, pairwise interactions, and in general n-node interactions. The synergy measures introduced in this paper can be easily evaluated by an iterative scaling algorithm, which is a standard procedure in information geometry.

Abstract:
We study a notion of robustness of a Markov kernel that describes a system of several input random variables and one output random variable. Robustness requires that the behaviour of the system does not change if one or several of the input variables are knocked out. If the system is required to be robust against too many knockouts, then the output variable cannot distinguish reliably between input states and must be independent of the input. We study how many input states the output variable can distinguish as a function of the required level of robustness. Gibbs potentials allow a mechanistic description of the behaviour of the system after knockouts. Robustness imposes structural constraints on these potentials. We show that interaction families of Gibbs potentials allow to describe robust systems. Given a distribution of the input random variables and the Markov kernel describing the system, we obtain a joint probability distribution. Robustness implies a number of conditional independence statements for this joint distribution. The set of all probability distributions corresponding to robust systems can be decomposed into a finite union of components, and we find parametrizations of the components. The decomposition corresponds to a primary decomposition of the conditional independence ideal and can be derived from more general results about generalized binomial edge ideals.

Abstract:
We improve recently published results about resources of Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) required to make them Universal Approximators. We show that any distribution p on the set of binary vectors of length n can be arbitrarily well approximated by an RBM with k-1 hidden units, where k is the minimal number of pairs of binary vectors differing in only one entry such that their union contains the support set of p. In important cases this number is half of the cardinality of the support set of p. We construct a DBN with 2^n/2(n-b), b ~ log(n), hidden layers of width n that is capable of approximating any distribution on {0,1}^n arbitrarily well. This confirms a conjecture presented by Le Roux and Bengio 2010.

Abstract:
In all but special circumstances, measurements of time-dependent processes reflect internal structures and correlations only indirectly. Building predictive models of such hidden information sources requires discovering, in some way, the internal states and mechanisms. Unfortunately, there are often many possible models that are observationally equivalent. Here we show that the situation is not as arbitrary as one would think. We show that generators of hidden stochastic processes can be reduced to a minimal form and compare this reduced representation to that provided by computational mechanics--the epsilon-machine. On the way to developing deeper, measure-theoretic foundations for the latter, we introduce a new two-step reduction process. The first step (internal-event reduction) produces the smallest observationally equivalent sigma-algebra and the second (internal-state reduction) removes sigma-algebra components that are redundant for optimal prediction. For several classes of stochastic dynamical systems these reductions produce representations that are equivalent to epsilon-machines.

Abstract:
Information theory is a powerful tool to express principles to drive autonomous systems because it is domain invariant and allows for an intuitive interpretation. This paper studies the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process as a driving force to generate behavior. We study nonlinear and nonstationary systems and introduce the time-local predicting information (TiPI) which allows us to derive exact results together with explicit update rules for the parameters of the controller in the dynamical systems framework. In this way the information principle, formulated at the level of behavior, is translated to the dynamics of the synapses. We underpin our results with a number of case studies with high-dimensional robotic systems. We show the spontaneous cooperativity in a complex physical system with decentralized control. Moreover, a jointly controlled humanoid robot develops a high behavioral variety depending on its physics and the environment it is dynamically embedded into. The behavior can be decomposed into a succession of low-dimensional modes that increasingly explore the behavior space. This is a promising way to avoid the curse of dimensionality which hinders learning systems to scale well.