Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99


Any time

2018 ( 3 )

2017 ( 3 )

2016 ( 1 )

2015 ( 55 )

Custom range...

Search Results: 1 - 10 of 1002 matches for " Torsten Hothorn "
All listed articles are free for downloading (OA Articles)
Page 1 /1002
Display every page Item
Flexible boosting of accelerated failure time models
Matthias Schmid, Torsten Hothorn
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-269
Abstract: We introduce a new boosting algorithm for censored time-to-event data which is suitable for fitting parametric accelerated failure time models. Estimation of the predictor function is carried out simultaneously with the estimation of the scale parameter, so that the negative log likelihood of the survival distribution can be used as a loss function for the boosting algorithm. The estimation of the scale parameter does not affect the favorable properties of boosting with respect to variable selection.The analysis of a high-dimensional set of microarray data demonstrates that the new algorithm is able to outperform boosting with the Cox partial likelihood when the proportional hazards assumption is questionable. In low-dimensional settings, i.e., when classical likelihood estimation of a parametric accelerated failure time model is possible, simulations show that the new boosting algorithm closely approximates the estimates obtained from the maximum likelihood method.Predicting the expected time to event from a high-dimensional set of predictor variables has become increasingly important in the last years. A particularly interesting problem in this context is the analysis of studies relating patients' genotypes, for example measured via gene expression levels, to a clinical outcome such as "disease free survival" or "time to progression". Survival models of this type share the common problems that are typical for the analysis of gene expression data: Sample sizes are small while the number of potential predictors (i.e., gene expression levels) is extremely large. As a consequence, standard estimation techniques can not be applied any more.For these reasons, a variety of new methods for obtaining survival predictions from high-dimensional data have been suggested in the literature. Most of these methods are focused on the Cox proportional hazards model [1], while some other methods have been developed for fitting semiparametric accelerated failure time (AFT) models [2]
Trend tests for the evaluation of exposure-response relationships in epidemiological exposure studies
Hothorn Ludwig A,Vaeth Michael,Hothorn Torsten
Epidemiologic Perspectives and Innovations , 2009, DOI: 10.1186/1742-5573-6-1
Abstract: One possibility for the statistical evaluation of trends in epidemiological exposure studies is the use of a trend test for data organized in a 2 × k contingency table. Commonly, the exposure data are naturally grouped or continuous exposure data are appropriately categorized. The trend test should be sensitive to any shape of the exposure-response relationship. Commonly, a global trend test only determines whether there is a trend or not. Once a trend is seen it is important to identify the likely shape of the exposure-response relationship. This paper introduces a best contrast approach and an alternative approach based on order-restricted information criteria for the model selection of a particular exposure-response relationship. For the simple change point alternative H1 : π1 = ...= πq <πq+1 = ... = πk an appropriate approach for the identification of a global trend as well as for the most likely shape of that exposure-response relationship is characterized by simulation and demonstrated for real data examples. Power and simultaneous confidence intervals can be estimated as well. If the conditions are fulfilled to transform the exposure-response data into a 2 × k table, a simple approach for identification of a global trend and its elementary shape is available for epidemiologists.
Boosting Algorithms: Regularization, Prediction and Model Fitting
Peter Bühlmann,Torsten Hothorn
Statistics , 2008, DOI: 10.1214/07-STS242
Abstract: We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in high-dimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions.
Rejoinder: Boosting Algorithms: Regularization, Prediction and Model Fitting
Peter Bühlmann,Torsten Hothorn
Statistics , 2008, DOI: 10.1214/07-STS242REJ
Abstract: Rejoinder to ``Boosting Algorithms: Regularization, Prediction and Model Fitting'' [arXiv:0804.2752]
A Robust Procedure for Comparing Multiple Means under Heteroscedasticity in Unbalanced Designs
Esther Herberich,Johannes Sikorski,Torsten Hothorn
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0009788
Abstract: Investigating differences between means of more than two groups or experimental conditions is a routine research question addressed in biology. In order to assess differences statistically, multiple comparison procedures are applied. The most prominent procedures of this type, the Dunnett and Tukey-Kramer test, control the probability of reporting at least one false positive result when the data are normally distributed and when the sample sizes and variances do not differ between groups. All three assumptions are non-realistic in biological research and any violation leads to an increased number of reported false positive results. Based on a general statistical framework for simultaneous inference and robust covariance estimators we propose a new statistical multiple comparison procedure for assessing multiple means. In contrast to the Dunnett or Tukey-Kramer tests, no assumptions regarding the distribution, sample sizes or variance homogeneity are necessary. The performance of the new procedure is assessed by means of its familywise error rate and power under different distributions. The practical merits are demonstrated by a reanalysis of fatty acid phenotypes of the bacterium Bacillus simplex from the “Evolution Canyons” I and II in Israel. The simulation results show that even under severely varying variances, the procedure controls the number of false positive findings very well. Thus, the here presented procedure works well under biologically realistic scenarios of unbalanced group sizes, non-normality and heteroscedasticity.
Testing the additional predictive value of high-dimensional molecular data
Anne-Laure Boulesteix, Torsten Hothorn
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-78
Abstract: We suggest an intuitive permutation-based testing procedure for assessing the additional predictive value of high-dimensional molecular data. Our method combines two well-known statistical tools: logistic regression and boosting regression. We give clear advice for the choice of the only method parameter (the number of boosting iterations). In simulations, our novel approach is found to have very good power in different settings, e.g. few strong predictors or many weak predictors. For illustrative purpose, it is applied to the two publicly available cancer data sets.Our simple and computationally efficient approach can be used to globally assess the additional predictive power of a large number of candidate predictors given that a few clinical covariates or a known prognostic index are already available. It is implemented in the R package "globalboosttest" which is publicly available from R-forge and will be sent to the CRAN as soon as possible.While high-dimensional molecular data such as microarray gene expression data have been used for disease outcome prediction or diagnosis purposes for about ten years [1] in biomedical research, the question of the additional predictive value of such data given that classical predictors are already available has long been under-considered in the bioinformatics literature.This issue can be summarized as follows. For a given prediction problem (for example tumor subtype diagnosis or long-term outcome prediction), we consider two types of predictors. On the one hand, conventional clinical covariates such as, e.g. age, sex, disease duration or tumor stage are available as potential predictors. They have often been extensively investigated and validated in previous studies. On the other hand, we have molecular predictors which are generally much more difficult to measure and collect than conventional clinical predictors, and not yet well-established. In the context of translational biomedical research, investigators are interested
Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting
Andreas Mayr, Torsten Hothorn, Nora Fenske
BMC Medical Research Methodology , 2012, DOI: 10.1186/1471-2288-12-6
Abstract: We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data.The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child.Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.Childhood obesity is more and more becoming a problem of epidemic dimensions in modern societies [1,2]. The body mass index (BMI) has proved to be a reliable measure to assess childhood obesity and can also be seen as an indicator for obesity in adulthood [3,4]. Therefore, the prediction of future BMI values for individual children may be used as a warning bell for clinicians, parents and children. Predicting future BMI values raises awareness for problems to come - as long as they are still avoidable - and can thus lower the risk of later obesity.In this setting, we focus on obtaining reliable predictions for future BMI values of children. Prediction intervals (PIs) offer information on the expected variability by providing not only a point predi
Conditional Transformation Models
Torsten Hothorn,Thomas Kneib,Peter Bühlmann
Statistics , 2012, DOI: 10.1111/rssb.12017
Abstract: The ultimate goal of regression analysis is to obtain information about the conditional distribution of a response given a set of explanatory variables. This goal is, however, seldom achieved because most established regression models only estimate the conditional mean as a function of the explanatory variables and assume that higher moments are not affected by the regressors. The underlying reason for such a restriction is the assumption of additivity of signal and noise. We propose to relax this common assumption in the framework of transformation models. The novel class of semiparametric regression models proposed herein allows transformation functions to depend on explanatory variables. These transformation functions are estimated by regularised optimisation of scoring rules for probabilistic forecasts, e.g. the continuous ranked probability score. The corresponding estimated conditional distribution functions are consistent. Conditional transformation models are potentially useful for describing possible heteroscedasticity, comparing spatially varying distributions, identifying extreme events, deriving prediction intervals and selecting variables beyond mean regression effects. An empirical investigation based on a heteroscedastic varying coefficient simulation model demonstrates that semiparametric estimation of conditional distribution functions can be more beneficial than kernel-based non-parametric approaches or parametric generalised additive models for location, scale and shape.
A Unified Framework of Constrained Regression
Benjamin Hofner,Thomas Kneib,Torsten Hothorn
Statistics , 2014, DOI: 10.1007/s11222-014-9520-y
Abstract: Generalized additive models (GAMs) play an important role in modeling and understanding complex relationships in modern applied statistics. They allow for flexible, data-driven estimation of covariate effects. Yet researchers often have a priori knowledge of certain effects, which might be monotonic or periodic (cyclic) or should fulfill boundary conditions. We propose a unified framework to incorporate these constraints for both univariate and bivariate effect estimates and for varying coefficients. As the framework is based on component-wise boosting methods, variables can be selected intrinsically, and effects can be estimated for a wide range of different distributional assumptions. Bootstrap confidence intervals for the effect estimates are derived to assess the models. We present three case studies from environmental sciences to illustrate the proposed seamless modeling framework. All discussed constrained effect estimates are implemented in the comprehensive R package mboost for model-based boosting.
Large-Scale Model-Based Assessment of Deer-Vehicle Collision Risk
Torsten Hothorn, Roland Brandl, J?rg Müller
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0029510
Abstract: Ungulates, in particular the Central European roe deer Capreolus capreolus and the North American white-tailed deer Odocoileus virginianus, are economically and ecologically important. The two species are risk factors for deer–vehicle collisions and as browsers of palatable trees have implications for forest regeneration. However, no large-scale management systems for ungulates have been implemented, mainly because of the high efforts and costs associated with attempts to estimate population sizes of free-living ungulates living in a complex landscape. Attempts to directly estimate population sizes of deer are problematic owing to poor data quality and lack of spatial representation on larger scales. We used data on 74,000 deer–vehicle collisions observed in 2006 and 2009 in Bavaria, Germany, to model the local risk of deer–vehicle collisions and to investigate the relationship between deer–vehicle collisions and both environmental conditions and browsing intensities. An innovative modelling approach for the number of deer–vehicle collisions, which allows nonlinear environment–deer relationships and assessment of spatial heterogeneity, was the basis for estimating the local risk of collisions for specific road types on the scale of Bavarian municipalities. Based on this risk model, we propose a new “deer–vehicle collision index” for deer management. We show that the risk of deer–vehicle collisions is positively correlated to browsing intensity and to harvest numbers. Overall, our results demonstrate that the number of deer–vehicle collisions can be predicted with high precision on the scale of municipalities. In the densely populated and intensively used landscapes of Central Europe and North America, a model-based risk assessment for deer–vehicle collisions provides a cost-efficient instrument for deer management on the landscape scale. The measures derived from our model provide valuable information for planning road protection and defining hunting quota. Open-source software implementing the model can be used to transfer our modelling approach to wildlife–vehicle collisions elsewhere.
Page 1 /1002
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.