Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99


Any time

2019 ( 759 )

2018 ( 1151 )

2017 ( 1098 )

2016 ( 1472 )

Custom range...

Search Results: 1 - 10 of 633256 matches for " Mark A van de Wiel "
All listed articles are free for downloading (OA Articles)
Page 1 /633256
Display every page Item
Effects of dependence in high-dimensional multiple testing problems
Kyung In Kim, Mark A van de Wiel
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-114
Abstract: We study the robustness against dependence of several FDR procedures that are popular in microarray studies, such as Benjamin-Hochberg FDR, Storey's q-value, SAM and resampling based FDR procedures. False Non-discovery Rates and estimates of the number of null hypotheses are computed from those methods and compared. Our simulation study shows that methods such as SAM and the q-value do not adequately control the FDR to the level claimed under dependence conditions. On the other hand, the adaptive Benjamini-Hochberg procedure seems to be most robust while remaining conservative. Finally, the estimates of the number of true null hypotheses under various dependence conditions are variable.We discuss a new method for efficient guided simulation of dependent data, which satisfy imposed network constraints as conditional independence structures. Our simulation set-up allows for a structural study of the effect of dependencies on multiple testing criterions and is useful for testing a potentially new method on π0 or FDR estimation in a dependency context.Scientists regularly face multiple testing of a large number of hypotheses nowadays. Typically in microarray data, one performs hypothesis testing for each gene and the number of genes is usually more than thousands. In this situation, direct application of single hypothesis testing thousands times produces a large number of false discoveries. Hence, alternative testing criterions for controlling errors of false discoveries have been introduced.It is widely recognized that dependencies are omnipresent in many high-throughput studies. Such dependencies may be regulatory or functional as in gene pathways, but also spatial such as in SNP or DNA copy number arrays because of the genomic order. Although attempts to infer such interactions from data have been made, it is a notoriously difficult problem. Usually solutions focus on some modules with relatively few elements and many samples, in particular for model organisms (see e
A nonparametric control chart based on the Mann-Whitney statistic
Subhabrata Chakraborti,Mark A. van de Wiel
Statistics , 2008, DOI: 10.1214/193940307000000112
Abstract: Nonparametric or distribution-free charts can be useful in statistical process control problems when there is limited or lack of knowledge about the underlying process distribution. In this paper, a phase II Shewhart-type chart is considered for location, based on reference data from phase I analysis and the well-known Mann-Whitney statistic. Control limits are computed using Lugannani-Rice-saddlepoint, Edgeworth, and other approximations along with Monte Carlo estimation. The derivations take account of estimation and the dependence from the use of a reference sample. An illustrative numerical example is presented. The in-control performance of the proposed chart is shown to be much superior to the classical Shewhart $\bar{X}$ chart. Further comparisons on the basis of some percentiles of the out-of-control conditional run length distribution and the unconditional out-of-control ARL show that the proposed chart is almost as good as the Shewhart $\bar{X}$ chart for the normal distribution, but is more powerful for a heavy-tailed distribution such as the Laplace, or for a skewed distribution such as the Gamma. Interactive software, enabling a complete implementation of the chart, is made available on a website.
CGHregions: Dimension Reduction for Array CGH Data with Minimal Information Loss
Mark A. van de Wiel,Wessel N. van Wieringen
Cancer Informatics , 2007,
Abstract: An algorithm to reduce multi-sample array CGH data from thousands of clones to tens or hundreds of clone regions is introduced. This reduction of the data is performed such that little information is lost, which is possible due to the high dependencies between neighboring clones. The algorithm is explained using a small example. The potential beneficial effects of the algorithm for downstream analysis are illustrated by re-analysis of previously published colorectal cancer data. Using multiple testing corrections suitable for these data, we provide statistical evidence for genomic differences on several clone regions between MSI+ and CIN+ tumors. The algorithm, named CGHregions, is available as an easy-to-use script in R.
Stepwise classification of cancer samples using clinical and molecular data
Askar Obulkasim, Gerrit A Meijer, Mark A van de Wiel
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-422
Abstract: We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples.Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package stepwiseCM and available at the Bioconductor website.Accurate prognosis of relevant cancer-related endpoints, such as relapse, recurrence or metastasis, may lead to more targeted treatment and avoid unnecessary chemotherapy or surgery. One example is breast cancer recurrence. A major clinical problem of breast cancer recurrence is that by the time primary tumor is diagnosed, microscopic metastases may have already occurred. For this, patients at high risk receive more intensive chemotherapy, endocrine or radiotherapy. Yet, the ability to predict metastasis still remains one of the greatest clinical challenges in oncology.Classifying cancer subtypes with high precision and predicting treatment outcome are intensive research topics. Traditi
Normalized, Segmented or Called aCGH Data?
Wessel N. van Wieringen,Mark A. van de Wiel,Bauke Ylstra
Cancer Informatics , 2007,
Abstract: Array comparative genomic hybridization (aCGH) is a high-throughput lab technique to measure genome-wide chromosomal copy numbers. Data from aCGH experiments require extensive pre-processing, which consists of three steps: normalization, segmentation and calling. Each of these pre-processing steps yields a different data set: normalized data, segmented data, and called data. Publications using aCGH base their fi ndings on data from all stages of the pre-processing. Hence, there is no consensus on which should be used for further down-stream analysis. This consensus is however important for correct reporting of findings, and comparison of results from different studies. We discuss several issues that should be taken into account when deciding on which data are to be used. We express the believe that called data are best used, but would welcome opposing views.
Implementing a Class of Permutation Tests: The coin Package
Torsten Hothorn,Kurt Hornik,Mark A. van de Wiel,Achim Zeileis
Journal of Statistical Software , 2008,
Abstract: The R package coin implements a unified approach to permutation tests providing a huge class of independence tests for nominal, ordered, numeric, and censored data as well as multivariate data at mixed scales. Based on a rich and flexible conceptual framework that embeds different permutation test procedures into a common theory, a computational framework is established in coin that likewise embeds the corresponding R functionality in a common S4 class structure with associated generic functions. As a consequence, the computational tools in coin inherit the flexibility of the underlying theory and conditional inference functions for important special cases can be set up easily. Conditional versions of classical tests---such as tests for location and scale problems in two or more samples, independence in two- or three-way contingency tables, or association problems for censored, ordered categorical or multivariate data---can easily be implemented as special cases using this computational toolbox by choosing appropriate transformations of the observations. The paper gives a detailed exposition of both the internal structure of the package and the provided user interfaces along with examples on how to extend the implemented functionality.
Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines
Gwena?l G. R. Leday,Aad W. van der Vaart,Wessel N. van Wieringen,Mark A. van de Wiel
Statistics , 2013, DOI: 10.1214/12-AOAS605
Abstract: DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.
Better prediction by use of co-data: Adaptive group-regularized ridge regression
Mark A. van de Wiel,Tonje G. Lien,Wina Verlaat,Wessel N. van Wieringen,Saskia M. Wilting
Statistics , 2014,
Abstract: For many high-dimensional studies, additional information on the variables, like (genomic) annotation or external p-values, is available. In the context of binary and continuous prediction, we develop a method for adaptive group-regularized (logistic) ridge regression, which makes structural use of such 'co-data'. Here, 'groups' refer to a partition of the variables according to the co-data. We derive empirical Bayes estimates of group-specific penalties, which possess several nice properties: i) they are analytical; ii) they adapt to the informativeness of the co-data for the data at hand; iii) only one global penalty parameter requires tuning by cross-validation. In addition, the method allows use of multiple types of co-data at little extra computational effort. We show that the group-specific penalties may lead to a larger distinction between `near-zero' and relatively large regression parameters, which facilitates post-hoc variable selection. The method, termed GRridge, is implemented in an easy-to-use R-package. It is demonstrated on two cancer genomics studies, which both concern the discrimination of precancerous cervical lesions from normal cervix tissues using methylation microarray data. For both examples, GRridge clearly improves the predictive performances of ordinary logistic ridge regression and the group lasso. In addition, we show that for the second study the relatively good predictive performance is maintained when selecting only 42 variables.
CGHpower: exploring sample size calculations for chromosomal copy number experiments
Ilari Scheinin, José A Ferreira, Sakari Knuutila, Gerrit A Meijer, Mark A van de Wiel, Bauke Ylstra
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-331
Abstract: Here we explore power calculations for aCGH experiments comparing two groups. In a pilot experiment CGHpower estimates the biological diversity between groups and provides a statistical framework for estimating average power as a function of sample size. As the method requires pilot data, it can be used either in the planning stage of larger studies or in estimating the power achieved in past experiments.The proposed method relies on certain assumptions. According to our evaluation with public and simulated data sets, they do not always hold true. Violation of the assumptions typically leads to unreliable sample size estimates. Despite its limitations, this method is, at least to our knowledge, the only one currently available for performing sample size calculations in the context of aCGH. Moreover, the implementation of the method provides diagnostic plots that allow critical assessment of the assumptions on which it is based and hence on the feasibility and reliability of the sample size calculations in each case.The CGHpower web application and the program outputs from evaluation data sets can be freely accessed at http://www.cangem.org/cghpower/ webciteArray comparative genomic hybridization (aCGH) is a technique that uses microarrays to perform high-resolution and genome-wide screening of DNA copy number changes. Its most important applications are in cancer research [1] and clinical genetics [2]. In this paper we focus on aCGH experiments comparing two groups of cancer samples. Previously, we introduced the Wilcoxon test with ties to identify chromosomal copy number differences when comparing two groups [3]. The goal of comparing two groups is generally to identify disease biomarkers, chromosomal regions (or genes therein) for survival, therapy, progression, et cetera. An important problem that arises in the planning of aCGH experiments is the choice of the sample size, which we explore here. Data analysis of microarray experiments comparing two groups general
Intensity-based analysis of dual-color gene expression data as an alternative to ratio-based analysis to enhance reproducibility
Koen Bossers, Bauke Ylstra, Ruud H Brakenhoff, Serge J Smeets, Joost Verhaagen, Mark A van de Wiel
BMC Genomics , 2010, DOI: 10.1186/1471-2164-11-112
Abstract: By analyzing three distinct and technically replicated datasets with either ratio- or intensity-based models, we determined that, when applied to the same dataset, intensity-based analysis of dual-color gene expression experiments yields 1) more reproducible results, and 2) is more sensitive in the detection of differentially expressed genes. These effects were most pronounced in experiments with large biological variation and complex hybridization designs. Furthermore, a power analysis revealed that for direct two-group comparisons above a certain sample size, ratio-based models have higher power, although the difference with intensity-based models is very small.Intensity-based analysis of dual-color datasets results in more reproducible results and increased sensitivity in the detection of differential gene expression than the analysis of the same dataset with ratio-based analysis. Complex dual-color setups such as interwoven loop designs benefit most from ignoring the array factor. The applicability of our approach to array platforms other than dual-color needs to be further investigated.During the last decade, microarray technology has evolved into an indispensable tool for high-throughput gene expression studies. For example, microarrays are now routinely applied to identify differentially expressed genes between paired sample series, classify tumors in prognostic groups, and identify transcriptional alterations during development [1-3]. Two main types of commercial high density microarray platforms have emerged: one-color oligonucleotide platforms such as Affymetrix and Illumina, and dual-color oligonucleotide platforms such as Agilent and Nimblegen. Dual-color gene expression platforms are very efficient in directly comparing two conditions, by hybridizing the two conditions together on the same array. This greatly reduces the possible confounding effects of inter-array variability and local array effects.The outcome of comparative microarray experiments is a
Page 1 /633256
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.