Abstract:
Marginally specified models have recently become a popular tool for discrete longitudinal data analysis. Nonetheless, they introduce complex constraint equations and model fitting algorithms. Moreover, there is a lack of available software to fit these models. In this paper, we propose a three-level marginally specified model for analysis of multivariate longitudinal binary response data. The implicit function theorem is introduced to approximately solve the marginal constraint equations explicitly. Furthermore, the use of \textit{probit} link enables direct solutions to the convolution equations. We propose an R package \textbf{pnmtrem} to fit the model. A simulation study is conducted to examine the properties of the estimator. We illustrate the model on the Iowa Youth and Families Project data set.

Abstract:
Forecasting with longitudinal data has been rarely studied. Most of the available studies are for continuous response and all of them are for univariate response. In this study, we consider forecasting multivariate longitudinal binary data. Five different models including simple ones, univariate and multivariate marginal models, and complex ones, marginally specified models, are studied to forecast such data. Model forecasting abilities are illustrated via a real life data set and a simulation study. The simulation study includes a model independent data generation to provide a fair environment for model competitions. Independent variables are forecast as well as the dependent ones to mimic the real life cases best. Several accuracy measures are considered to compare model forecasting abilities. Results show that complex models yield better forecasts.

Abstract:
This paper provides a theoretical and computational justification of the long held claim that of the similarity of the probit and logit link functions often used in binary classification. Despite this widespread recognition of the strong similarities between these two link functions, very few (if any) researchers have dedicated time to carry out a formal study aimed at establishing and characterizing firmly all the aspects of the similarities and differences. This paper proposes a definition of both structural and predictive equivalence of link functions-based binary regression models, and explores the various ways in which they are either similar or dissimilar. From a predictive analytics perspective, it turns out that not only are probit and logit perfectly predictively concordant, but the other link functions like cauchit and complementary log log enjoy very high percentage of predictive equivalence. Throughout this paper, simulated and real life examples demonstrate all the equivalence results that we prove theoretically.

Abstract:
This paper develops nonparametric estimation for discrete choice models based on the mixed multinomial logit (MMNL) model. It has been shown that MMNL models encompass all discrete choice models derived under the assumption of random utility maximization, subject to the identification of an unknown distribution $G$. Noting the mixture model description of the MMNL, we employ a Bayesian nonparametric approach, using nonparametric priors on the unknown mixing distribution $G$, to estimate choice probabilities. We provide an important theoretical support for the use of the proposed methodology by investigating consistency of the posterior distribution for a general nonparametric prior on the mixing distribution. Consistency is defined according to an $L_1$-type distance on the space of choice probabilities and is achieved by extending to a regression model framework a recent approach to strong consistency based on the summability of square roots of prior probabilities. Moving to estimation, slightly different techniques for non-panel and panel data models are discussed. For practical implementation, we describe efficient and relatively easy-to-use blocked Gibbs sampling procedures. These procedures are based on approximations of the random probability measure by classes of finite stick-breaking processes. A simulation study is also performed to investigate the performance of the proposed methods.

Abstract:
This study presents discussion on the effects of correlation among response respect to estimator properties in mixed logit model on multivariate binary response. It is assumed that each respondent was observed for T response. Yit is the tth response for the ith individual/subject and each response is binary. Each subject has covariate Xi (individual characteristic) and covariate Zijt (characteristic of alternative j). Individual response i that is represented by Yi = (Yi1,....,YiT), Yit is tnd response on ith individual/subject and the response is binary. In order to simplify, one of individual characteristic was and alternative characteristics. We studied effects of correlations using data simulation. Methods of estimations used in this study are Generalized Estimating Equations (GEE) and Maximum Likelihood Estimator (MLE). We generate data and estimate parameters using software R.2.10. From simulation data, we conclude that MLE on mixed logit model is better than GEE. The higher correlation among utility, the higher deviation estimator to parameter.

Abstract:
Motivated by generating personalized recommendations using ordinal (or preference) data, we study the question of learning a mixture of MultiNomial Logit (MNL) model, a parameterized class of distributions over permutations, from partial ordinal or preference data (e.g. pair-wise comparisons). Despite its long standing importance across disciplines including social choice, operations research and revenue management, little is known about this question. In case of single MNL models (no mixture), computationally and statistically tractable learning from pair-wise comparisons is feasible. However, even learning mixture with two MNL components is infeasible in general. Given this state of affairs, we seek conditions under which it is feasible to learn the mixture model in both computationally and statistically efficient manner. We present a sufficient condition as well as an efficient algorithm for learning mixed MNL models from partial preferences/comparisons data. In particular, a mixture of $r$ MNL components over $n$ objects can be learnt using samples whose size scales polynomially in $n$ and $r$ (concretely, $r^{3.5}n^3(log n)^4$, with $r\ll n^{2/7}$ when the model parameters are sufficiently incoherent). The algorithm has two phases: first, learn the pair-wise marginals for each component using tensor decomposition; second, learn the model parameters for each component using Rank Centrality introduced by Negahban et al. In the process of proving these results, we obtain a generalization of existing analysis for tensor decomposition to a more realistic regime where only partial information about each sample is available.

Abstract:
Most of the available multivariate statistical models dictate on fitting different parameters for the covariate effects on each multiple responses. This might be unnecessary and inefficient for some cases. In this article, we propose a modeling framework for multivariate marginal models to analyze multivariate longitudinal data which provides flexible model building strategies. We show that the model handles several response families such as binomial, count and continuous. We illustrate the model on the Mother's Stress and Children's Morbidity data set. A simulation study is conducted to examine the parameter estimates. An R package mmm2 is proposed to fit the model.

Abstract:
The study of longitudinal data plays a significant role in medicine, epidemiology and social sciences. Typically, the interest is in the dependence of an outcome variable on the covariates. The Generalized Linear Models (GLMs) were proposed to unify the regression approach for a wide variety of discrete and continuous longitudinal data. The responses (outcomes) in longitudinal data are usually correlated. Hence, we need to use an extension of the GLMs that account for such correlation. This can be done by inclusion of random effects in the linear predictor; that is the Generalized Linear Mixed Models (GLMMs) (also called random effects models). The maximum likelihood estimates (MLE) are obtained for the regression parameters of a logit model, when the traditional assumption of normal random effects is relaxed. In this case a more convenient distribution, such as the lognormal distribution, is used. However, adding non-normal random effects to the GLMM considerably complicates the likelihood estimation. So, the direct numerical evaluation techniques (such as Newton - Raphson) become analytically and computationally tedious. To overcome such problems, we propose and develop a Monte Carlo EM (MCEM) algorithm, to obtain the maximum likelihood estimates. The proposed method is illustrated using a simulated data.

Abstract:
We focus on the development of model selection criteria in linear
mixed models. In particular, we propose the model selection criteria following
the Mallows’ Conceptual Predictive Statistic (Cp) [1] [2] in linear mixed
models. When correlation exists between the observations in data, the normal
Gauss discrepancy in univariate case is not appropriate to measure the distance
between the true model and a candidate model. Instead, we define a marginal
Gauss discrepancy which takes the correlation into account in the mixed models.
The model selection criterion, marginal Cp, called MCp, serves as an
asymptotically unbiased estimator of the expected marginal Gauss discrepancy.
An improvement of MCp, called IMCp, is then derived and proved to be a more
accurate estimator of the expected marginal Gauss discrepancy than MCp. The
performance of the proposed criteria is investigated in a simulation study. The
simulation results show that in small samples, the proposed criteria outperform
the Akaike Information Criteria (AIC) [3] [4] and Bayesian Information
Criterion (BIC) [5] in selecting the correct model; in large samples, their
performance is competitive. Further, the proposed criteria perform
significantly better for highly correlated response data than for weakly
correlated data.

Abstract:
A new class of Marginal Structural Models (MSMs), History-Restricted MSMs (HRMSMs), was recently introduced for longitudinal data for the purpose of defining causal parameters which may often be better suited for public health research or at least more practicable than MSMs \citejoffe,feldman. HRMSMs allow investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represent the treatment causal effect of interest based on a treatment history defined by the treatments assigned between the study's start and outcome collection. We lay out in this article the formal statistical framework behind HRMSMs. Beyond allowing a more flexible causal analysis, HRMSMs improve computational tractability and mitigate statistical power concerns when designing longitudinal studies. We also develop three consistent estimators of HRMSM parameters under sufficient model assumptions: the Inverse Probability of Treatment Weighted (IPTW), G-computation and Double Robust (DR) estimators. In addition, we show that the assumptions commonly adopted for identification and consistent estimation of MSM parameters (existence of counterfactuals, consistency, time-ordering and sequential randomization assumptions) also lead to identification and consistent estimation of HRMSM parameters.