oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2019 ( 147 )

2018 ( 283 )

2017 ( 289 )

2016 ( 399 )

Custom range...

Search Results: 1 - 10 of 224700 matches for " Jeffrey C. Miecznikowski "
All listed articles are free for downloading (OA Articles)
Page 1 /224700
Display every page Item
Multidimensional Median Filters for Finding Bumps in Chemical Sensor Datasets  [PDF]
Jeffrey C. Miecznikowski, Kimberly F. Sellers, William F. Eddy
Journal of Sensor Technology (JST) , 2012, DOI: 10.4236/jst.2012.21005
Abstract: Feature detection in chemical sensors images falls under the general topic of mathematical morphology, where the goal is to detect “image objects” e.g. peaks or spots in an image. Here, we propose a novel method for object detection that can be generalized for a k-dimensional object obtained from an analogous higher-dimensional technology source. Our method is based on the smoothing decomposition, Data = Smooth + Rough, where the “rough” (i.e. residual) object from a k-dimensional cross-shaped smoother provides information for object detection. We demonstrate properties of this procedure with chemical sensor applications from various biological fields, including genetic and proteomic data analysis.
Estimating the Empirical Null Distribution of Maxmean Statistics in Gene Set Analysis  [PDF]
Xing Ren, Jianmin Wang, Song Liu, Jeffrey C. Miecznikowski
Open Journal of Statistics (OJS) , 2017, DOI: 10.4236/ojs.2017.75053
Abstract: Gene Set Analysis (GSA) is a framework for testing the association of a set of genes and the outcome, e.g. disease status or treatment group. The method replies on computing a maxmean statistic and estimating the null distribution of the maxmean statistics via a restandardization procedure. In practice, the pre-determined gene sets have stronger intra-correlation than genes across sets. This may result in biases in the estimated null distribution. We derive an asymptotic null distribution of the maxmean statistics based on sparsity assumption. We propose a flexible two group mixture model for the maxmean statistics. The mixture model allows us to estimate the null parameters empirically via maximum likelihood approach. Our empirical method is compared with the restandardization procedure of GSA in simulations. We show that our method is more accurate in null density estimation when the genes are strongly correlated within gene sets.
Feature Detection Techniques for Preprocessing Proteomic Data
Kimberly F. Sellers,Jeffrey C. Miecznikowski
International Journal of Biomedical Imaging , 2010, DOI: 10.1155/2010/896718
Abstract: Numerous gel-based and nongel-based technologies are used to detect protein changes potentially associated with disease. The raw data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. Low-level analysis issues (including normalization, background correction, gel and/or spectral alignment, feature detection, and image registration) are substantial problems that need to be addressed, because any large-level data analyses are contingent on appropriate and statistically sound low-level procedures. Feature detection approaches are particularly interesting due to the increased computational speed associated with subsequent calculations. Such summary data corresponding to image features provide a significant reduction in overall data size and structure while retaining key information. In this paper, we focus on recent advances in feature detection as a tool for preprocessing proteomic data. This work highlights existing and newly developed feature detection algorithms for proteomic datasets, particularly relating to time-of-flight mass spectrometry, and two-dimensional gel electrophoresis. Note, however, that the associated data structures (i.e., spectral data, and images containing spots) used as input for these methods are obtained via all gel-based and nongel-based methods discussed in this manuscript, and thus the discussed methods are likewise applicable. 1. Introduction One of the major goals for scientists is to identify biomarkers for patients, thus ultimately providing them with personalized medicine. Personalized medicine provides a patient-specific means by which to target one's disposition to a disease or condition. Recent developments in this area include molecular profiling technologies which may include metabolomic analysis, genomic expression analysis, and proteomic profiling. Specifically, within proteomic profiling, there are several different techniques used to isolate and quantify the proteins within a subject's proteome. The raw data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. “Preprocessing” (including normalization, background correction, gel and/or spectral alignment, feature detection, and image registration) is therefore often required to account for the systematic biases present in the technology and to reduce the noise in the data. Feature detection (i.e., the detection and quantification of data features, such as peaks in spectral data, or spots in two-dimensional images) is a particularly important
Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent
Daniel P Gaile, Jeffrey C Miecznikowski
BMC Genomics , 2007, DOI: 10.1186/1471-2164-8-105
Abstract: We replicate and extend the analyses of Dabney and Storey and present our results in the context of a two stage analysis. We provide evidence that the Stage I pre-processing algorithms considered in Dabney and Storey fail to provide expression values that are adequately centered or scaled. Furthermore, we demonstrate that the distributions of the p-values, test statistics, and probabilities associated with the relative locations and variabilities of the Stage II expression values vary with signal intensity. We provide diagnostic plots and a simple logistic regression based test statistic to detect these intensity related defects in the processed data.We agree with Dabney and Storey that the null p-values considered in Choe et al. are indeed non-uniform. We also agree with the conclusion that, given current pre-processing technologies, the Golden Spike dataset should not serve as a reference dataset to evaluate false discovery rate controlling methodologies. However, we disagree with the assessment that the non-uniform p-values are merely the byproduct of testing for differential expression under the incorrect assumption that chip data are approximate to biological replicates. Whereas Dabney and Storey attribute the non-uniform p-values to violations of the Stage II model assumptions, we provide evidence that the non-uniformity can be attributed to the failure of the Stage I analyses to correct for systematic biases in the raw data matrix. Although we do not speculate as to the root cause of these systematic biases, the observations made in Irizarry et al. appear to be consistent with our findings. Whereas Irizarry et al. describe the effect of the experimental design on the feature level data, we consider the effect on the underlying multivariate distribution of putative null p-values. We demonstrate that the putative null distributions corresponding to the pre-processing algorithms considered in Choe et al. are all intensity dependent. This dependence serves to inv
Preferred analysis methods for Affymetrix GeneChips. II. An expanded, balanced, wholly-defined spike-in dataset
Qianqian Zhu, Jeffrey C Miecznikowski, Marc S Halfon
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-285
Abstract: We generated a new wholly defined Affymetrix spike-in dataset consisting of 18 microarrays. Over 5700 RNAs are spiked in at relative concentrations ranging from 1- to 4-fold, and the arrays from each condition are balanced with respect to both total RNA amount and degree of positive versus negative fold change. We use this new "Platinum Spike" dataset to evaluate microarray analysis routes and contrast the results to those achieved using our earlier Golden Spike dataset.We present updated best-route methods for Affymetrix GeneChip analysis and demonstrate that the degree of "imbalance" in gene expression has a significant effect on the performance of these methods.As a result of their ability to detect the expression levels of tens of thousands of genes simultaneously, DNA microarrays have quickly become a leading tool in diverse areas of biological and biomedical research. Given this popularity and the associated accumulation of numerous microarray analysis methods, there is a critical need to know the accuracy of microarray technology and the best ways of analyzing microarray data. Important advances toward this goal were made by the MicroArray Quality Control (MAQC) project launched by US Food and Drug Administration [1]. For the MAQC study, two distinct reference RNA samples were mixed together at specified ratios and then hybridized to different microarray platforms at multiple test sites. This design enabled the MAQC consortium to evaluate the reproducibility of microarray technology and the consistency between platforms. The study demonstrated that high levels of both intraplatform and interplatform concordance can be achieved in detecting differentially expressed genes (DEGs) when the microarray experiment is performed appropriately. However, as the exact identities of the individual RNAs in the reference samples were unknown, the MAQC project was not able to address questions regarding the overall accuracy of microarray technology and analysis methods.Spike
A New Normalizing Algorithm for BAC CGH Arrays with Quality Control Metrics
Jeffrey C. Miecznikowski,Daniel P. Gaile,Song Liu,Lori Shepherd,Norma Nowak
Journal of Biomedicine and Biotechnology , 2011, DOI: 10.1155/2011/860732
Abstract: The main focus in pin-tip (or print-tip) microarray analysis is determining which probes, genes, or oligonucleotides are differentially expressed. Specifically in array comparative genomic hybridization (aCGH) experiments, researchers search for chromosomal imbalances in the genome. To model this data, scientists apply statistical methods to the structure of the experiment and assume that the data consist of the signal plus random noise. In this paper we propose “SmoothArray”, a new method to preprocess comparative genomic hybridization (CGH) bacterial artificial chromosome (BAC) arrays and we show the effects on a cancer dataset. As part of our R software package “aCGHplus,” this freely available algorithm removes the variation due to the intensity effects, pin/print-tip, the spatial location on the microarray chip, and the relative location from the well plate. removal of this variation improves the downstream analysis and subsequent inferences made on the data. Further, we present measures to evaluate the quality of the dataset according to the arrayer pins, 384-well plates, plate rows, and plate columns. We compare our method against competing methods using several metrics to measure the biological signal. With this novel normalization algorithm and quality control measures, the user can improve their inferences on datasets and pinpoint problems that may arise in their BAC aCGH technology.
Comparative survival analysis of breast cancer microarray studies identifies important prognostic genetic pathways
Jeffrey C Miecznikowski, Dan Wang, Song Liu, Lara Sucheston, David Gold
BMC Cancer , 2010, DOI: 10.1186/1471-2407-10-573
Abstract: Five microarray datasets related to breast cancer were examined using gene set analysis and the cancers were categorized into different subtypes using a scoring system based on genetic pathway activity.We have observed that significant genes in the individual studies show little reproducibility across the datasets. From our comparative analysis, using gene pathways with clinical variables is more reliable across studies and shows promise in assessing a patient's prognosis.This study concludes that, in light of clinical variables, there are significant gene pathways in common across the datasets. Specifically, several pathways can further significantly stratify patients for survival. These candidate pathways should help to develop a panel of significant biomarkers for the prognosis of breast cancer patients in a clinical setting.Developing genomic based biomarkers for breast cancer prognosis is an active research area with clinicians and researchers considering genomic expression data as a potential valuable source of information to be mined for such markers. In addition to considering the BRCA mutation status of a patient, three genetic markers, estrogen receptors (ER) [1], progesterone receptors (PR) [2], and the HER2/neu receptor (HER2) [3] are commonly used for assessing prognosis and/or assigning treatment. More recently TGF- has also been considered as a potential prognosis biomarker [4].One of the biggest challenges in developing valid prognostic genomic based biomarkers for breast cancer is obtaining large enough datasets with sufficient patient follow-up time [5,6]. To address this, we employ a comparative analysis approach. In a comparative analysis, several datasets gathered to test related hypotheses are combined to obtain more powerful estimates for a common hypothesis. We combine five genomic studies examining prognosis in breast cancer patients to assess the ability of the genetic biomarkers to stratify or distinguish patient survival. Datasets under c
A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
Jeffrey C Miecznikowski, Senthilkumar Damodaran, Kimberly F Sellers, Donald E Coling, Richard Salvi, Richard A Rabin
Proteome Science , 2011, DOI: 10.1186/1477-5956-9-14
Abstract: Co-authors Donald E. Coling and Richard Salvi were omitted in error from the original published author list.The Authors' contributions section should read:JCM designed the study, performed the statistical analysis, and wrote the manuscript. SD designed the study, performed the data analysis, and wrote the manuscript. KFS assisted in the statistical analysis and the writing of the manuscript. DEC and RS provided the datasets for analysis. RAR provided the materials and contributed to the conception of the study. All authors read and approved the final manuscript.
A comparison of imputation procedures and statistical tests for the analysis of two-dimensional electrophoresis data
Jeffrey C Miecznikowski, Senthilkumar Damodaran, Kimberly F Sellers, Richard A Rabin
Proteome Science , 2010, DOI: 10.1186/1477-5956-8-66
Abstract: This work highlights the existing algorithms for handling missing data in two-dimensional gel analysis and performs a thorough comparison of the various algorithms and statistical tests on simulated and real datasets. For imputation methods, the best results in terms of root mean squared error are obtained using the least squares method of imputation along with the expectation maximization (EM) algorithm approach to estimate missing values with an array covariance structure. The bootstrapped versions of the statistical tests offer the most liberal option for determining protein spot significance while the generalized family wise error rate (gFWER) should be considered for controlling the multiple testing error.In summary, we advocate for a three-step statistical analysis of two-dimensional gel electrophoresis (2-DE) data with a data imputation step, choice of statistical test, and lastly an error control method in light of multiple testing. When determining the choice of statistical test, it is worth considering whether the protein spots will be subjected to mass spectrometry. If this is the case a more liberal test such as the percentile-based bootstrap t can be employed. For error control in electrophoresis experiments, we advocate that gFWER be controlled for multiple testing rather than the false discovery rate.Analysis of quantitative changes in a specific proteome (i.e., complement of proteins expressed in a particular tissue or cell at a given time) is commonly carried out using two-dimensional gel electrophoresis (2-DE). With this procedure, proteins are separated in the first dimension based on iso-electric point, followed by separation based on molecular mass in the second dimension. Subsequently, protein spots are visualized, and the scanned gel images are analyzed using image analysis programs (e.g. ImageMaster, PDQuest). Once the relevant proteins spots have been determined, these specific proteins are identified using mass spectrometry. Because quantit
Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets
Sreevidya Sadananda Sadasiva Rao,Lori A. Shepherd,Andrew E. Bruno,Song Liu,Jeffrey C. Miecznikowski
Advances in Bioinformatics , 2013, DOI: 10.1155/2013/790567
Abstract: Introduction. The microarray datasets from the MicroArray Quality Control (MAQC) project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with is most accurate under the error measures considered. The k-nearest neighbor method with has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with has the best overall performance and k-nearest neighbor method with has the worst overall performance. These results hold true for both 5% and 10% missing values. 1. Introduction In microarray experiments, randomly missing values may occur due to scratches on the chip, spotting errors, dust, or hybridization errors. Other nonrandom missing values may be biological in nature, for example, probes with low intensity values or intensity values that may exceed a readable threshold. These missing values will create incomplete gene expression matrices where the rows refer to genes and the columns refer to samples. These incomplete expression matrices will make it difficult for researchers to perform downstream analyses such as differential expression inference, clustering or dimension reduction methods (e.g., principal components analysis), or multidimensional scaling. Hence, it is critical to understand the nature of the missing values and to choose an accurate method to impute the missing values. There have been several methods put forth to impute missing data in microarray experiments. In one of the first papers related to microarrays, Troyanskaya et al. [1] examine several methods of imputing missing data and ultimately suggest a -nearest neighbors approach. Researchers also explored applying previously developed schemes for microarrays such as the nonlinear iterative partial least
Page 1 /224700
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.