Abstract:
It is often stated in papers tackling the task of inferring Bayesian network structures from data that there are these two distinct approaches: (i) Apply conditional independence tests when testing for the presence or otherwise of edges; (ii) Search the model space using a scoring metric. Here I argue that for complete data and a given node ordering this division is a myth, by showing that cross entropy methods for checking conditional independence are mathematically identical to methods based upon discriminating between models by their overall goodness-of-fit logarithmic scores.

Abstract:
Several characterizations of the joint multinomial distribution of two discrete random vectors are derived assuming conditional multinomial distributions.

Abstract:
In broad applications, it is routinely of interest to assess whether there is evidence in the data to refute the assumption of conditional independence of $Y$ and $X$ conditionally on $Z$. Such tests are well developed in parametric models but are not straightforward in the nonparametric case. We propose a general Bayesian approach, which relies on an encompassing nonparametric Bayes model for the joint distribution of $Y$, $X$ and $Z$. The framework allows $Y$, $X$ and $Z$ to be random variables on arbitrary spaces, and can accommodate different dimensional vectors having a mixture of discrete and continuous measurement scales. Using conditional mutual information as a scalar summary of the strength of the conditional dependence relationship, we construct null and alternative hypotheses. We provide conditions under which the correct hypothesis will be consistently selected. Computational methods are developed, which can be incorporated within MCMC algorithms for the encompassing model. The methods are applied to variable selection and assessed through simulations and criminology applications.

Abstract:
Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties.

Abstract:
This paper introduces the notions of independence and conditional independence in valuation-based systems (VBS). VBS is an axiomatic framework capable of representing many different uncertainty calculi. We define independence and conditional independence in terms of factorization of the joint valuation. The definitions of independence and conditional independence in VBS generalize the corresponding definitions in probability theory. Our definitions apply not only to probability theory, but also to Dempster-Shafer's belief-function theory, Spohn's epistemic-belief theory, and Zadeh's possibility theory. In fact, they apply to any uncertainty calculi that fit in the framework of valuation-based systems.

Abstract:
We introduce a new interpretation of two related notions - conditional utility and utility independence. Unlike the traditional interpretation, the new interpretation renders the notions the direct analogues of their probabilistic counterparts. To capture these notions formally, we appeal to the notion of utility distribution, introduced in previous paper. We show that utility distributions, which have a structure that is identical to that of probability distributions, can be viewed as a special case of an additive multiattribute utility functions, and show how this special case permits us to capture the novel senses of conditional utility and utility independence. Finally, we present the notion of utility networks, which do for utilities what Bayesian networks do for probabilities. Specifically, utility networks exploit the new interpretation of conditional utility and utility independence to compactly represent a utility distribution.

Abstract:
It is well-known that the notion of (strong) conditional independence (CI) is too restrictive to capture independencies that only hold in certain contexts. This kind of contextual independency, called context-strong independence (CSI), can be used to facilitate the acquisition, representation, and inference of probabilistic knowledge. In this paper, we suggest the use of contextual weak independence (CWI) in Bayesian networks. It should be emphasized that the notion of CWI is a more general form of contextual independence than CSI. Furthermore, if the contextual strong independence holds for all contexts, then the notion of CSI becomes strong CI. On the other hand, if the weak contextual independence holds for all contexts, then the notion of CWI becomes weak independence (WI) nwhich is a more general noncontextual independency than strong CI. More importantly, complete axiomatizations are studied for both the class of WI and the class of CI and WI together. Finally, the interesting property of WI being a necessary and sufficient condition for ensuring consistency in granular probabilistic networks is shown.

Abstract:
Possibilistic conditional independence is investigated: we propose a definition of this notion similar to the one used in probability theory. The links between independence and non-interactivity are investigated, and properties of these relations are given. The influence of the conjunction used to define a conditional measure of possibility is also highlighted: we examine three types of conjunctions: Lukasiewicz - like T-norms, product-like T-norms and the minimum operator.

Abstract:
We consider a non-parametric Bayesian model for conditional densities. The model is a finite mixture of normal distributions with covariate dependent multinomial logit mixing probabilities. A prior for the number of mixture components is specified on positive integers. The marginal distribution of covariates is not modeled. We study asymptotic frequentist behavior of the posterior in this model. Specifically, we show that when the true conditional density has a certain smoothness level, then the posterior contraction rate around the truth is equal up to a log factor to the frequentist minimax rate of estimation. As our result holds without a priori knowledge of the smoothness level of the true density, the established posterior contraction rates are adaptive. Moreover, we show that the rate is not affected by inclusion of irrelevant covariates in the model.

Abstract:
Independence screening is a powerful method for variable selection for `Big Data' when the number of variables is massive. Commonly used independence screening methods are based on marginal correlations or variations of it. In many applications, researchers often have some prior knowledge that a certain set of variables is related to the response. In such a situation, a natural assessment on the relative importance of the other predictors is the conditional contributions of the individual predictors in presence of the known set of variables. This results in conditional sure independence screening (CSIS). Conditioning helps for reducing the false positive and the false negative rates in the variable selection process. In this paper, we propose and study CSIS in the context of generalized linear models. For ultrahigh-dimensional statistical problems, we give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency. Moreover, we provide two data-driven methods to select the thresholding parameter of conditional screening. The utility of the procedure is illustrated by simulation studies and analysis of two real data sets.