Abstract:
This paper addresses the problem of inference for a
multinomial regression model in the presence of likelihood monotonicity. This
paper proposes translating the multinomial regression problem into a
conditional logistic regression problem, using existing techniques to reduce
this conditional logistic regression problem to one with fewer observations and
fewer covariates, such that probabilities for the canonical sufficient
statistic of interest, conditional on remaining sufficient statistics, are
identical, and translating this conditional logistic regression problem back to
the multinomial regression setting. This reduced multinomial regression problem
does not exhibit monotonicity of its likelihood, and so conventional asymptotic
techniques can be used.

Abstract:
We obtain two theorems extending the use of a saddlepoint approximation to multiparameter problems for likelihood ratio-like statistics which allow their use in permutation and rank tests and could be used in bootstrap approximations. In the first, we show that in some cases when no density exists, the integral of the formal saddlepoint density over the set corresponding to large values of the likelihood ratio-like statistic approximates the true probability with relative error of order $1/n$. In the second, we give multivariate generalizations of the Lugannani--Rice and Barndorff-Nielsen or $r^*$ formulas for the approximations. These theorems are applied to obtain permutation tests based on the likelihood ratio-like statistics for the $k$ sample and the multivariate two-sample cases. Numerical examples are given to illustrate the high degree of accuracy, and these statistics are compared to the classical statistics in both cases.

Abstract:
We extend known saddlepoint tail probability approximations to multivariate cases, including multivariate conditional cases. Our approximation applies to both continuous and lattice variables, and requires the existence of a cumulant generating function. The method is applied to some examples, including a real data set from a case-control study of endometrial cancer. The method contains less terms and is easier to implement than existing methods, while showing an accuracy comparable to those methods.

Abstract:
Consider a model parameterized by a scalar parameter of interest and a nuisance parameter vector. Inference about the parameter of interest may be based on the signed root of the likelihood ratio statistic R. The standard normal approximation to the conditional distribution of R typically has error of order O(n^{-1/2}), where n is the sample size. There are several modifications for R, which reduce the order of error in the approximations. In this paper, we mainly investigate Barndorff-Nielsen's modified directed likelihood ratio statistic, Severini's empirical adjustment, and DiCiccio and Martin's two modifications, involving the Bayesian approach and the conditional likelihood ratio statistic. For each modification, two formats were employed to approximate the conditional cumulative distribution function; these are Barndorff-Nielson formats and the Lugannani and Rice formats. All approximations were applied to inference on the ratio of means for two independent exponential random variables. We constructed one and two-sided hypotheses tests and used the actual sizes of the tests as the measurements of accuracy to compare those approximations.

Abstract:
In the manuscript, we present a practical way to find the matching priors proposed by Welch & Peers (1963) and Peers (1965). We investigate the use of saddlepoint approximations combined with matching priors and obtain p-values of the test of an interest parameter in the presence of nuisance parameter. The advantage of our procedure is the flexibility of choosing different initial conditions so that one can adjust the performance of the test. Two examples have been studied, with coverage verified via Monte Carlo simulation. One relates to the ratio of two exponential means, and the other relates the logistic regression model. Particularly, we are interested in small sample settings.

Abstract:
Recently Liu and Wang derived the likelihood ratio test (LRT) statistic and its asymptotic distribution for testing equality of two multinomial distributions vs. the alternative that the second distribution is larger in terms of increasing convex order (ICX). ICX is less restrictive than stochastic order and is a notion that has found applications in insurance and actuarial science. In this paper we propose a new test for ICX. The new test has several advantages over the LRT and over any test procedure that depends on asymptotic theory for implementation. The advantages include the following: (i) The test is exact (non-asymptotic). (ii) The test is performed by conditioning on marginal column totals (and row totals in a full multinomial model for a $2\times C$ table). (iii) The test has desirable monotonicity properties. That is, the test is monotone in all practical directions (to be formally defined). (iv) The test can be carried out computationally with the aid of a computer program. (v) The test has good power properties among a wide variety of possible alternatives. (vi) The test is admissible. The basis of the new test is the directed chi-square methodology developed by Cohen, Madigan, and Sackrowitz.

Abstract:
This cross-sectional observational study used 1999 national VA (US Department of Veterans Affairs) pharmacy, inpatient and outpatient utilization, and laboratory data on diabetic veterans. We adjusted individual A1c levels for available domains of complexity: age, social support (marital status), comorbid illnesses, and severity of disease (insulin use). We used adjusted A1c values to generate VA medical center level performance measures, and compared medical center ranks using adjusted versus unadjusted A1c levels across several thresholds of A1c (8.0%, 8.5%, 9.0%, and 9.5%).The adjustment model had R2 = 8.3% with stable parameter estimates on thirty random 50% resamples. Adjustment for patient complexity resulted in the greatest rank differences in the best and worst performing deciles, with similar patterns across all tested thresholds.Adjustment for complexity resulted in large differences in identified best and worst performers at all tested thresholds. Current performance measures of glycemic control may not be reliably identifying quality problems, and tying reimbursements to such measures may compromise the care of complex patients.Patient complexity has recently been raised as an important issue in patient care and quality assessment [1-4]. While complexity from multiple medical conditions has been increasingly discussed [1-4], there are important additional sources of complexity that directly impact patient care. For example, patients' behavior and availability of psychosocial support mechanisms may directly impact clinical decision-making. The Vector Model of Complexity proposes that a patient's complexity arises out of interactions between six domains: biology/genetics, socioeconomics, culture, environment/ecology, behavior, and the medical system [5]. Currently, the only aspect of patient complexity included in quality assessments is patient age, because most performance measures for accountability exclude older individuals.The influence of complexity o

Abstract:
Open sourcing modelling tools and generators becomes more and more important as open source software as a whole becomes more important. We evaluate the impact open source licenses of code generators have on the intellectual property (IP) of generated artifacts comparing the most common open source licenses by categories found in literature. Restrictively licensed generators do have effects on the IP and therefore on the usability of the artifacts they produce. We then how how this effects can be shaped to the needs of the licensor and the licensee.

Abstract:
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. An author's commit frequency describes how often that author commits. Knowing the distribution of all commit frequencies is a fundamental part of understanding software development processes. This paper presents a detailed quantitative analysis of commit frequencies in open-source software development. The analysis is based on a large sample of open source projects, and presents the overall distribution of commit frequencies. We analyze the data to show the differences between authors and projects by project size; we also includes a comparison of successful and non successful projects and we derive an activity indicator from these analyses. By measuring a fundamental dimension of programming we help improve software development tools and our understanding of software development. We also validate some fundamental assumptions about software development.

Abstract:
A fundamental unit of work in programming is the code contribution ("commit") that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate the goodness of fit of our model. By measuring and modeling a fundamental dimension of programming we help improve software development tools and our understanding of software development.