Abstract:
In this paper we study the computation of Markov bases for contingency tables whose cell entries have an upper bound. In general a Markov basis for unbounded contingency table under a certain model differs from a Markov basis for bounded tables. Rapallo, (2007) applied Lawrence lifting to compute a Markov basis for contingency tables whose cell entries are bounded. However, in the process, one has to compute the universal Gr\"obner basis of the ideal associated with the design matrix for a model which is, in general, larger than any reduced Gr\"obner basis. Thus, this is also infeasible in small- and medium-sized problems. In this paper we focus on bounded two-way contingency tables under independence model and show that if these bounds on cells are positive, i.e., they are not structural zeros, the set of basic moves of all $2 \times 2$ minors connects all tables with given margins. We end this paper with an open problem that if we know the given margins are positive, we want to find the necessary and sufficient condition on the set of structural zeros so that the set of basic moves of all $2 \times 2$ minors connects all incomplete contingency tables with given margins.

Abstract:
Pearson's statistic is investigated for nominal-ordinal two-way contingency tables in which we wish to test for identical rows. The statistic is expressed as a sum, the first summand of which is a statistic given by Yates (1948), and examines location effects for the nominal category. The second and subsequent summands reflect the corresponding moments: for example, dispersion, skewness, kurtosis etc. The summands are shown to be weakly optimal in that they are score statistics.

Abstract:
Spatial interaction between two or more classes or species has important implications in various fields and causes multivariate patterns such as segregation or association. Segregation occurs when members of a class or species are more likely to be found near members of the same class or conspecifics; while association occurs when members of a class or species are more likely to be found near members of another class or species. The null patterns considered are random labeling (RL) and complete spatial randomness (CSR) of points from two or more classes, which is called \emph{CSR independence}, henceforth. The clustering tests based on nearest neighbor contingency tables (NNCTs) that are in use in literature are two-sided tests. In this article, we consider the directional (i.e., one-sided) versions of the cell-specific NNCT-tests and introduce new directional NNCT-tests for the two-class case. We analyze the distributional properties; compare the empirical significant levels and empirical power estimates of the tests using extensive Monte Carlo simulations. We demonstrate that the new directional tests have comparable performance with the currently available NNCT-tests in terms of empirical size and power. We use four example data sets for illustrative purposes and provide guidelines for using these NNCT-tests.

Abstract:
In this paper we study a new class of statistical models for contingency tables. We define this class of models through a subset of the binomial equations of the classical independence model. We use some notions from Algebraic Statistics to compute their sufficient statistic, and to prove that they are log-linear. Moreover, we show how to compute maximum likelihood estimates and to perform exact inference through the Diaconis-Sturmfels algorithm. Examples show that these models can be useful in a wide range of applications.

Abstract:
The spatial clustering of points from two or more classes (or species) has important implications in many fields and may cause the spatial patterns of segregation and association, which are two major types of spatial interaction between the classes. The null patterns we consider are random labeling (RL) and complete spatial randomness (CSR) of points from two or more classes, which is called CSR independence. The segregation and association patterns can be studied using a nearest neighbor contingency table (NNCT) which is constructed using the frequencies of nearest neighbor (NN) types in a contingency table. Among NNCT-tests Pielou's test is liberal the null pattern but Dixon's test has the desired significance level under the RL pattern. We propose three new multivariate clustering tests based on NNCTs. We compare the finite sample performance of these new tests with Pielou's and Dixon's tests and Cuzick & Edward's k-NN tests in terms of empirical size under the null cases and empirical power under various segregation and association alternatives and provide guidelines for using the tests in practice. We demonstrate that the newly proposed NNCT-tests perform relatively well compared to their competitors and illustrate the tests using three example data sets. Furthermore, we compare the NNCT-tests with the second-order methods using these examples.

Abstract:
The spatial interaction between two or more classes of points may cause spatial clustering patterns such as segregation or association, which can be tested using a nearest neighbor contingency table (NNCT). A NNCT is constructed using the frequencies of class types of points in nearest neighbor (NN) pairs. For the NNCT-tests, the null pattern is either complete spatial randomness (CSR) of the points from two or more classes (called CSR independence) or random labeling (RL). The distributions of the NNCT-test statistics depend on the number of reflexive NNs (denoted by $R$) and the number of shared NNs (denoted by $Q$), both of which depend on the allocation of the points. Hence $Q$ and $R$ are fixed quantities under RL, but random variables under CSR independence. Using their observed values in NNCT analysis makes the distributions of the NNCT-test statistics conditional on $Q$ and $R$ under CSR independence. In this article, I use the empirically estimated expected values of $Q$ and $R$ under CSR independence pattern to remove the conditioning of NNCT-tests (such a correction is called the \emph{QR-adjustment}, henceforth). I present a Monte Carlo simulation study to compare the conditional NNCT-tests and QR-adjusted tests under CSR independence and segregation and association alternatives. I demonstrate that QR-adjustment does not significantly improve the empirical size estimates under CSR independence and power estimates under segregation or association alternatives. For illustrative purposes, I apply the conditional and empirically corrected tests on two example data sets.

Abstract:
Multivariate interaction between two or more classes (or species) has important consequences in many fields and causes multivariate clustering patterns such as segregation or association. The spatial segregation occurs when members of a class tend to be found near members of the same class (i.e., near conspecifics) while spatial association occurs when members of a class tend to be found near members of the other class or classes. These patterns can be studied using a nearest neighbor contingency table (NNCT). The null hypothesis is randomness in the nearest neighbor (NN) structure, which may result from -- among other patterns -- random labeling (RL) or complete spatial randomness (CSR) of points from two or more classes (which is called the CSR independence, henceforth). In this article, we introduce new versions of overall and cell-specific tests based on NNCTs (i.e., NNCT-tests) and compare them with Dixon's overall and cell-specific tests. These NNCT-tests provide information on the spatial interaction between the classes at small scales (i.e., around the average NN distances between the points). Overall tests are used to detect any deviation from the null case, while the cell-specific tests are post hoc pairwise spatial interaction tests that are applied when the overall test yields a significant result. We analyze the distributional properties of these tests; assess the finite sample performance of the tests by an extensive Monte Carlo simulation study. Furthermore, we show that the new NNCT-tests have better performance in terms of Type I error and power. We also illustrate these NNCT-tests on two real life data sets.

Abstract:
In contingency table analysis, the odds ratio is a commonly applied measure used to summarize the degree of association between two categorical variables, say R and S. Suppose now that for each individual in the table, a vector of continuous variables X is also observed. It is then vital to analyze whether and how the degree of association varies with X. In this work, we extend the classical odds ratio to the conditional case, and develop nonparametric estimators of this "pointwise odds ratio" to summarize the strength of local association between R and S given X. To allow for maximum flexibility, we make this extension using kernel regression. We develop confidence intervals based on these nonparametric estimators. We demonstrate via simulation that our pointwise odds ratio estimators can outperform model-based counterparts from logistic regression and GAMs, without the need for a linearity or additivity assumption. Finally, we illustrate its application to a dataset of patients from an intensive care unit (ICU), offering a greater insight into how the association between survival of patients admitted for emergency versus elective reasons varies with the patients' ages.

Abstract:
A reference set, or a fiber, of a contingency table is the space of all realizations of the table under a given set of constraints such as marginal totals. Understanding the geometry of this space is a key problem in algebraic statistics, important for conducting exact conditional inference, calculating cell bounds, imputing missing cell values, and assessing the risk of disclosure of sensitive information. Motivated primarily by disclosure limitation problems where constraints can come from summary statistics other than the margins, in this paper we study the space $\mathcal{F_T}$ of all possible multi-way contingency tables for a given sample size and set of observed conditional frequencies. We show that this space can be decomposed according to different possible marginals, which, in turn, are encoded by the solution set of a linear Diophantine equation. We characterize the difference between two fibers: $\mathcal{F_T}$ and the space of tables for a given set of corresponding marginal totals. In particular, we solve a generalization of an open problem posed by Dobra et al. (2008). Our decomposition of $\mathcal{F_T}$ has two important consequences: (1) we derive new cell bounds, some including connections to Directed Acyclic Graphs, and (2) we describe a structure for the Markov bases for the space $\mathcal{F_T}$ that leads to a simplified calculation of Markov bases in this particular setting.

Abstract:
We study the geometric structure of the statistical models for two-by-two contingency tables. One or two odds ratios are fixed and the corresponding models are shown to be a portion of a ruled quadratic surface or a segment. Some pointers to the general case of two-way contingency tables are also given and an application to case-control studies is presented.