Abstract:
This paper describes the core features of the R package geepack, which implements the generalized estimating equations (GEE) approach for fitting marginal generalized linear models to clustered data. Clustered data arise in many applications such as longitudinal data and repeated measures. The GEE approach focuses on models for the mean of the correlated observations within clusters without fully specifying the joint distribution of the observations. It has been widely used in statistical practice. This paper illustrates the application of the GEE approach with geepack through an example of clustered binary data.

Abstract:
We propose a two-step estimating procedure for generalized additive partially linear models with clustered data using estimating equations. Our proposed method applies to the case that the number of observations per cluster is allowed to increase with the number of independent subjects. We establish oracle properties for the two-step estimator of each function component such that it performs as well as the univariate function estimator by assuming that the parametric vector and all other function components are known. Asymptotic distributions and consistency properties of the estimators are obtained. Finite-sample experiments with both simulated continuous and binary response variables confirm the asymptotic results. We illustrate the methods with an application to a U.S. unemployment data set.

Abstract:
The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian-Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored.

Abstract:
Clustered binary data with a large number of covariates have become increasingly common in many scientific disciplines. This paper develops an asymptotic theory for generalized estimating equations (GEE) analysis of clustered binary data when the number of covariates grows to infinity with the number of clusters. In this "large $n$, diverging $p$" framework, we provide appropriate regularity conditions and establish the existence, consistency and asymptotic normality of the GEE estimator. Furthermore, we prove that the sandwich variance formula remains valid. Even when the working correlation matrix is misspecified, the use of the sandwich variance formula leads to an asymptotically valid confidence interval and Wald test for an estimable linear combination of the unknown parameters. The accuracy of the asymptotic approximation is examined via numerical simulations. We also discuss the "diverging $p$" asymptotic theory for general GEE. The results in this paper extend the recent elegant work of Xie and Yang [Ann. Statist. 31 (2003) 310--347] and Balan and Schiopu-Kratina [Ann. Statist. 32 (2005) 522--541] in the "fixed $p$" setting.

Abstract:
This study presents discussion on the effects of correlation among response respect to estimator properties in mixed logit model on multivariate binary response. It is assumed that each respondent was observed for T response. Yit is the tth response for the ith individual/subject and each response is binary. Each subject has covariate Xi (individual characteristic) and covariate Zijt (characteristic of alternative j). Individual response i that is represented by Yi = (Yi1,....,YiT), Yit is tnd response on ith individual/subject and the response is binary. In order to simplify, one of individual characteristic was and alternative characteristics. We studied effects of correlations using data simulation. Methods of estimations used in this study are Generalized Estimating Equations (GEE) and Maximum Likelihood Estimator (MLE). We generate data and estimate parameters using software R.2.10. From simulation data, we conclude that MLE on mixed logit model is better than GEE. The higher correlation among utility, the higher deviation estimator to parameter.

Abstract:
This study focused for estimating the parameters of marginal model for repeated binary responses through the Generalized Estimating Equations (GEE) methodology. The GEE were applied to observe how certain covariates relate to change of the disease status overtime. In addition, we focused on the methodology of GEE using conditional and unconditional residuals along with common correlation structures seen in longitudinal studies. Here, the GEE has been applied to the data of four repeated binary observations of the registered patients at BIRDEM. We demonstrate that the estimator of the correlation based on conditional residuals is nearly efficient when compared with maximum likelihood. This estimator also yields more efficient estimates of the correlation than the usual GEE estimator that is based on unconditional residuals. Finally the results of applying the data set are presented.

Abstract:
Longitudinal studies involving binary responses are widely applied in medical, health and economic science research, have focused increasingly on how various independent variables affect responses over time. These studies involve repeated observations on a subject and thus correlation within each subject is expected. Correct inferences can only be obtained by taking into account the correct specification of within-subject correlation structure between repeated observations. In recent years, non-normal longitudinal data is analyzed by Generalized Estimating Equations (GEE) method. Goodness-of-fit statistics have been suggested for selecting an appropriate working correlation structure in GEE with longitudinal binary data. The purpose of this article to provide an overview of the GEE approach for analyzing correlated binary data and to choose the structure of the correlation matrix between repeated observations for model comparison, using data from Istanbul Stock Exchange (ISE) to increase on the return.

Abstract:
We characterize the expected statistical errors with which the parameters of black-hole binaries can be measured from gravitational-wave (GW) observations of their inspiral, merger and ringdown by a network of second-generation ground-based GW observatories. We simulate a population of black-hole binaries with uniform distribution of component masses in the interval $(3,80)~M_\odot$, distributed uniformly in comoving volume, with isotropic orientations. From signals producing signal-to-noise ratio $\geq 5$ in at least two detectors, we estimate the posterior distributions of the binary parameters using the Bayesian parameter estimation code LALInference. The GW signals will be redshifted due to the cosmological expansion and we measure only the "redshifted" masses. By assuming a cosmology, it is possible to estimate the gravitational masses by inferring the redshift from the measured posterior of the luminosity distance. We find that the measurement of the gravitational masses will be in general dominated by the error in measuring the luminosity distance. In spite of this, the component masses of more than $50\%$ of the population can be measured with accuracy better than $\sim 25\%$ using the Advanced LIGO-Virgo network. Additionally, the mass of the final black hole can be measured with median accuracy $\sim 18\%$. Spin of the final black hole can be measured with median accuracy $\sim 5\% ~(17\%)$ for binaries with non-spinning (aligned-spin) black holes. Additional detectors in Japan and India significantly improve the accuracy of sky localization, and moderately improve the estimation of luminosity distance, and hence, that of all mass parameters. We discuss the implication of these results on the observational evidence of intermediate-mass black holes and the estimation of cosmological parameters using GW observations.

Abstract:
An R package for specifying and estimating linear latent variable models is presented. The philosophy of the implementation is to separate the model specification from the actual data, which leads to a dynamic and easy way of modeling complex hierarchical structures. Several advanced features are implemented including robust standard errors for clustered correlated data, multigroup analyses, non-linear parameter constraints, inference with incomplete data, maximum likelihood estimation with censored and binary observations, and instrumental variable estimators. In addition an extensive simulation interface covering a broad range of non-linear generalized structural equation models is described. The model and software are demonstrated in data of measurements of the serotonin transporter in the human brain.

Abstract:
Generalized Estimating Equation (GEE) is a marginal model popularly applied for longitudinal/clustered data analysis in clinical trials or biomedical studies. We provide a systematic review on GEE including basic concepts as well as several recent developments due to practical challenges in real applications. The topics including the selection of “working” correlation structure, sample size and power calculation, and the issue of informative cluster size are covered because these aspects play important roles in GEE utilization and its statistical inference. A brief summary and discussion of potential research interests regarding GEE are provided in the end. 1. Introduction Generalized Estimating Equation (GEE) is a general statistical approach to fit a marginal model for longitudinal/clustered data analysis, and it has been popularly applied into clinical trials and biomedical studies [1–3]. One longitudinal data example can be taken from a study of orthodontic measurements on children including 11 girls and 16 boys. The response is the measurement of the distance (in millimeters) from the center of the pituitary to the pterygomaxillary fissure, which is repeatedly measured at ages 8, 10, 12, and 14 years. The primary goal is to investigate whether there exists significant gender difference in dental growth measures and the temporal trend as age increases [4]. For such data analysis, it is obvious that the responses from the same individual tend to be “more alike”; thus incorporating within-subject and between-subject variations into model fitting is necessary to improve efficiency of the estimation and the power [5]. There are several simple methods existing for repeated data analysis, that is, ANOVA/MANOVA for repeated measures, but the limitation is the incapability of incorporating covariates. There are two types of approaches, mixed-effect models and GEE [6, 7], which are traditional and are widely used in practice now. Of note is that these two methods have different tendencies in model fitting depending on the study objectives. In particular, the mixed-effect model is an individual-level approach by adopting random effects to capture the correlation between the observations of the same subject [7]. On the other hand, GEE is a population-level approach based on a quasilikelihood function and provides the population-averaged estimates of the parameters [8]. In this paper, we focus on the latter to provide a review and recent developments of GEE. As is well known, GEE has several defining features [9–11]. The variance-covariance matrix of responses