Publish in OALib Journal
APC: Only $99
In deriving a regression model analysts often have to use variable
selection, despite of problems introduced by data- dependent model
building. Resampling approaches are proposed to handle some of the critical
issues. In order to assess and compare several strategies, we will conduct a
simulation study with 15 predictors and a complex correlation structure in
the linear regression model. Using sample sizes of 100 and 400 and estimates of
the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information.
We also consider two examples with 24 and 13 predictors, respectively. We will
discuss the value of cross-validation, shrinkage and backward
elimination (BE) with varying significance level. We will assess whether 2-step
approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to
models derived with the LASSO procedure. Beside of MSE we will use model
sparsity and further criteria for model assessment. The amount of information
in the data has an influence on the selected models and the comparison of the
procedures. None of the approaches was best in all scenarios. The
performance of backward elimination with a suitably chosen significance level
was not worse compared to the LASSO and BE models selected were much sparser,
an important advantage for interpretation and transportability. Compared to
global shrinkage, PWSF had better performance. Provided that the amount of
information is not too small, we conclude that BE followed by PWSF is a suitable
approach when variable selection is a key part of data analysis.
selection with a large number of predictors is a very challenging and important
problem in educational and social domains. However, relatively little attention
has been paid to issues of variable selection in longitudinal data with
application to education. Using this longitudinal educational data (Test of
English for International Communication, TOEIC), this study compares multiple regression,
backward elimination, group least selection absolute shrinkage and selection operator
(LASSO), and linear mixed models in terms of their performance in variable selection.
The results from the study show that four different statistical methods contain
different sets of predictors in their models. The linear mixed model (LMM)
provides the smallest number of predictors (4 predictors among a total of 19
predictors). In addition, LMM is the only appropriate method for the repeated
measurement and is the best method with respect to the principal of parsimony.
This study also provides interpretation of the selected model by LMM in the
conclusion using marginal R2.
We consider the problem of variable selection
for the single-index random effects models with longitudinal data. An automatic
variable selection procedure is developed using smooth-threshold. The proposed
method shares some of the desired features of existing variable selection
methods: the resulting estimator enjoys the oracle property; the proposed
procedure avoids the convex optimization problem and is flexible and easy to
implement. Moreover, we use the penalized weighted deviance criterion for a
data-driven choice of the tuning parameters. Simulation studies are carried out
to assess the performance of our method, and a real dataset is analyzed for
n. In addition, a simulation about grouped variable selection is performed. Finally, The model is applied to two real data: US Crime Data and Gasoline Data. In terms of prediction error and estimation error, empirical studies show the efficiency of LqCP.