Abstract:
A new method is proposed for exploiting causal independencies in exact Bayesian network inference. A Bayesian network can be viewed as representing a factorization of a joint probability into the multiplication of a set of conditional probabilities. We present a notion of causal independence that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The new formulation of causal independence lets us specify the conditional probability of a variable given its parents in terms of an associative and commutative operator, such as ``or'', ``sum'' or ``max'', on the contribution of each parent. We start with a simple algorithm VE for Bayesian network inference that, given evidence and a query variable, uses the factorization to find the posterior distribution of the query. We show how this algorithm can be extended to exploit causal independence. Empirical studies, based on the CPCS networks for medical diagnosis, show that this method is more efficient than previous methods and allows for inference in larger networks than previous algorithms.

Abstract:
I introduce a temporal belief-network representation of causal independence that a knowledge engineer can use to elicit probabilistic models. Like the current, atemporal belief-network representation of causal independence, the new representation makes knowledge acquisition tractable. Unlike the atemproal representation, however, the temporal representation can simplify inference, and does not require the use of unobservable variables. The representation is less general than is the atemporal representation, but appears to be useful for many practical applications.

Abstract:
This paper reports experiments with the causal independence inference algorithm proposed by Zhang and Poole (1994b) on the CPSC network created by Pradhan et al. (1994). It is found that the algorithm is able to answer 420 of the 422 possible zero-observation queries, 94 of 100 randomly generated five-observation queries, 87 of 100 randomly generated ten-observation queries, and 69 of 100 randomly generated twenty-observation queries.

Abstract:
This work investigates the intersection property of conditional independence. It states that for random variables $A,B,C$ and $X$ we have that $X$ independent of $A$ given $B,C$ and $X$ independent of $B$ given $A,C$ implies $X$ independent of $(A,B)$ given $C$. Under the assumption that the joint distribution has a continuous density, we provide necessary and sufficient conditions under which the intersection property holds. The result has direct applications to causal inference: it leads to strictly weaker conditions under which the graphical structure becomes identifiable from the joint distribution of an additive noise model.

Abstract:
It is well known that conditional independence can be used to factorize a joint probability into a multiplication of conditional probabilities. This paper proposes a constructive definition of inter-causal independence, which can be used to further factorize a conditional probability. An inference algorithm is developed, which makes use of both conditional independence and inter-causal independence to reduce inference complexity in Bayesian networks.

Abstract:
In broad applications, it is routinely of interest to assess whether there is evidence in the data to refute the assumption of conditional independence of $Y$ and $X$ conditionally on $Z$. Such tests are well developed in parametric models but are not straightforward in the nonparametric case. We propose a general Bayesian approach, which relies on an encompassing nonparametric Bayes model for the joint distribution of $Y$, $X$ and $Z$. The framework allows $Y$, $X$ and $Z$ to be random variables on arbitrary spaces, and can accommodate different dimensional vectors having a mixture of discrete and continuous measurement scales. Using conditional mutual information as a scalar summary of the strength of the conditional dependence relationship, we construct null and alternative hypotheses. We provide conditions under which the correct hypothesis will be consistently selected. Computational methods are developed, which can be incorporated within MCMC algorithms for the encompassing model. The methods are applied to variable selection and assessed through simulations and criminology applications.

Abstract:
Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties.

Abstract:
Heckerman (1993) defined causal independence in terms of a set of temporal conditional independence statements. These statements formalized certain types of causal interaction where (1) the effect is independent of the order that causes are introduced and (2) the impact of a single cause on the effect does not depend on what other causes have previously been applied. In this paper, we introduce an equivalent a temporal characterization of causal independence based on a functional representation of the relationship between causes and the effect. In this representation, the interaction between causes and effect can be written as a nested decomposition of functions. Causal independence can be exploited by representing this decomposition in the belief network, resulting in representations that are more efficient for inference than general causal models. We present empirical results showing the benefits of a causal-independence representation for belief-network inference.

Abstract:
We postulate a principle stating that the initial condition of a physical system is typically algorithmically independent of the dynamical law. We argue that this links thermodynamics and causal inference. On the one hand, it entails behaviour that is similar to the usual arrow of time. On the other hand, it motivates a statistical asymmetry between cause and effect that has recently postulated in the field of causal inference, namely, that the probability distribution P(cause) contains no information about the conditional distribution P(effect|cause) and vice versa, while P(effect) may contain information about P(cause|effect).

Abstract:
Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when only single observations are present. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also discuss how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.