oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Any time

2019 ( 4 )

2018 ( 95 )

2017 ( 80 )

2016 ( 77 )

Custom range...

Search Results: 1 - 10 of 6487 matches for " Tiejun Tong "
All listed articles are free for downloading (OA Articles)
Page 1 /6487
Display every page Item
Optimal variance estimation without estimating the mean function
Tiejun Tong,Yanyuan Ma,Yuedong Wang
Statistics , 2013, DOI: 10.3150/12-BEJ432
Abstract: We study the least squares estimator in the residual variance estimation context. We show that the mean squared differences of paired observations are asymptotically normally distributed. We further establish that, by regressing the mean squared differences of these paired observations on the squared distances between paired covariates via a simple least squares procedure, the resulting variance estimator is not only asymptotically normal and root-$n$ consistent, but also reaches the optimal bound in terms of estimation variance. We also demonstrate the advantage of the least squares estimator in comparison with existing methods in terms of the second order asymptotic properties.
Some Rearrangement Inequalities on Space of Homogeneous Type  [PDF]
Tiejun Chen
Open Journal of Applied Sciences (OJAppS) , 2014, DOI: 10.4236/ojapps.2014.49042
Abstract: Let ω be a A Muckenhoupt weight. In this paper we get the estimate of rearrangement f*ω in homogeneous space that is \"\" . The similar estimate is obtained only on space of Rn .
Inferring Epidemic Network Topology from Surveillance Data
Xiang Wan, Jiming Liu, William K. Cheung, Tiejun Tong
PLOS ONE , 2014, DOI: 10.1371/journal.pone.0100661
Abstract: The transmission of infectious diseases can be affected by many or even hidden factors, making it difficult to accurately predict when and where outbreaks may emerge. One approach at the moment is to develop and deploy surveillance systems in an effort to detect outbreaks as timely as possible. This enables policy makers to modify and implement strategies for the control of the transmission. The accumulated surveillance data including temporal, spatial, clinical, and demographic information, can provide valuable information with which to infer the underlying epidemic networks. Such networks can be quite informative and insightful as they characterize how infectious diseases transmit from one location to another. The aim of this work is to develop a computational model that allows inferences to be made regarding epidemic network topology in heterogeneous populations. We apply our model on the surveillance data from the 2009 H1N1 pandemic in Hong Kong. The inferred epidemic network displays significant effect on the propagation of infectious diseases.
Optimally estimating the sample mean from the sample size, median, mid-range and/or mid-quartile range
Dehui Luo,Xiang Wan,Jiming Liu,Tiejun Tong
Statistics , 2015,
Abstract: The era of big data is coming, and meta-analysis is attracting increasing attention to analytically combine the results from several similar clinical trials to provide an overall estimation of a treatment effectiveness. The sample mean and standard deviation are two commonly used statistics in meta-analysis but some trials use the median, the minimum and maximum values, or sometimes the first and third quartiles to report the results. Thus, to pool results in a consistent format, researchers need to transform those information back to the sample mean and standard deviation. In this paper, we investigate the optimal estimation of the sample mean for meta-analysis from both theoretical and empirical perspectives. A major drawback in the literature is that the sample size, needless to say its importance, is either ignored or used in a stepwise but somewhat arbitrary manner, e.g., the famous method proposed by Hozo et al. (2005). We solve this issue by incorporating the sample size in a smoothly changing weight in the estimators to reach the optimal estimation. Our proposed estimators not only improve the existing ones significantly but also share the same virtue of the simplicity. The real data application indicates that our proposed estimators is capable to serve as "rules of thumb" and could be widely applied in meta-analysis.
NBLDA: Negative Binomial Linear Discriminant Analysis for RNA-Seq Data
Kai Dong,Hongyu Zhao,Xiang Wan,Tiejun Tong
Statistics , 2015,
Abstract: RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze four real RNA-Seq data sets to demonstrate the advantage of our method in real-world applications.
Non-parametric shrinkage mean estimation for quadratic loss functions with unknown covariance matrices
Cheng Wang,Tiejun Tong,Longbing Cao,Baiqi Miao
Statistics , 2012, DOI: 10.1016/j.jmva.2013.12.012
Abstract: In this paper, a shrinkage estimator for the population mean is proposed under known quadratic loss functions with unknown covariance matrices. The new estimator is non-parametric in the sense that it does not assume a specific parametric distribution for the data and it does not require the prior information on the population covariance matrix. Analytical results on the improvement of the proposed shrinkage estimator are provided and some corresponding asymptotic properties are also derived. Finally, we demonstrate the practical improvement of the proposed method over existing methods through extensive simulation studies and real data analysis. Keywords: High-dimensional data; Shrinkage estimator; Large $p$ small $n$; $U$-statistic.
Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range
Xiang Wan,Wenqian Wang,Jiming Liu,Tiejun Tong
Statistics , 2014,
Abstract: In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al. (2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under more general settings where the first and third quartiles are also available for the trials. Through simulation studies, we demonstrate that the proposed methods greatly improve the existing methods and enrich the literature. We conclude our work with a summary table that serves as a comprehensive guidance for performing meta-analysis in different situations.
A survey of statistical software for analysing RNA-seq data
Dexiang Gao, Jihye Kim, Hyunmin Kim, Tzu L Phang, Heather Selby, Aik Tan, Tiejun Tong
Human Genomics , 2010, DOI: 10.1186/1479-7364-5-1-56
Abstract: High-throughput genome-wide RNA profiling by deep sequencing (RNA-seq) is rapidly emerging as a favourite method for gene expression studies. RNA-seq provides more precise measurement of levels of transcripts at a wide dynamic range and the ability to quantitate and detect known and novel isoforms by comparison with hybridisation-based technology (oligonucleotide and cDNA microarrays). In every sequencing run, tens of millions of short reads are simultaneously sequenced in each lane by the next generation sequencer. After pre-processing and mapping against a reference genome, the total number of counts for each mappable transcript is reported. It has been reported that the sequencing results are highly reproducible [1]. One of the main applications of RNA-seq is to identify differential expression (DE) genes under two or more different phenotypes (eg cancer versus normal samples).Several statistical methods have been proposed to identify DE [1-5]. When choosing a statistical analysis approach, some aspects need to be considered:(a) Normalisation. It was noticed that the observed number of reads for a gene depends on the expression level and the length of the gene, and also on the RNA composition of the sample [6,7]. The purpose of the normalisation is to minimise the influences of gene length and total sample RNA composition so that the normalised read counts represent a direct reflection of the targeted gene expression level. It has been shown that the normalisation procedure has a great impact on DE detection [2,7]. Depending on the experimental design, different normalisation methods are required.(b) Statistical model. The Poisson distribution is commonly used to model count data. Due to biological and genetic variations, however, for sequencing data the variance of a read is often much greater than the mean value. That is, the data are over-dispersed. In such cases, one natural alternative to Poisson is the negative binomial (NB) model. In addition to these two
Analysing breast cancer microarrays from African Americans using shrinkage-based discriminant analysis
Herbert Pang, Keita Ebisu, Emi Watanabe, Laura Y Sue, Tiejun Tong
Human Genomics , 2010, DOI: 10.1186/1479-7364-5-1-5
Abstract: Breast cancer is the most commonly diagnosed cancer in women of all ethnic groups in the United States. It is also the second leading cause of cancer deaths in women. The Surveillance, Epidemiology, and End Results (SEER) database of the National Cancer Institute shows that African-American women, by comparison with Caucasian women, have a higher mortality rate for breast cancer, despite a lower incidence. Between 2000 and 2004, the age-adjusted breast cancer incidence rates were 118.3 cases per 100,000 African-American women and 132.5 cases per 100,000 Caucasian women [1]. By contrast, mortality rates were worse for African Americans, with 33.8 deaths per 100,000 women compared with 25.0 deaths per 100,000 Caucasian women [1]. In addition, a greater proportion of African-American women are diagnosed at a younger age compared with Caucasian women. The median age at breast cancer diagnosis is 57 years for African-American women and 62 years for Caucasian women [2]. Between 1996 and 2004, the five-year breast cancer survival rates were 77.1 per cent for African-American women and 89.9 per cent for Caucasian women [1].These statistics highlight the disproportionate burden of breast cancer among African-American women [3]. One reason for this ethnic cancer disparity may be due to lower socioeconomic status. Roetzheim et al. mentioned that the lower percentage of health insurance among African Americans has led to late-stage diagnosis, which results in higher mortality rates [4]. In their review article, Gerend and Pai suggested that in addition to socioeconomic status, cultural factors may also play a role [5]. Another potential reason may be the lack of access to mammography [6]. Smigal et al. also reported that the rate mammography uptake varies among ethnic groups [7]. These previous reports collectively suggest that disparities in breast cancer survival may be attributed to lower socioeconomic status. Multivariate modelling approaches show that ethnic differences re
A short survey of computational analysis methods in analysing ChIP-seq data
Hyunmin Kim, Jihye Kim, Heather Selby, Dexiang Gao, Tiejun Tong, Tzu Lip Phang, Aik Choon Tan
Human Genomics , 2011, DOI: 10.1186/1479-7364-5-2-117
Abstract: The regulation of gene expression is tightly controlled by transcription factors (TFs) that bind to specific DNA regulatory regions, histone modifications and positioned nucleosomes in the genome. High-throughput chromatin immunoprecipitation (ChIP) followed by massively parallel nextgeneration sequencing (ChIP-seq) represents a current approach in profiling genome-wide protein -DNA interactions, histone modifications and nucleosome positions. This new technology has marked advantages over microarray-based (ChIP-chip) approaches by offering higher specificity, sensitivity and coverage for locating TF occupancy or epigenetic markers across the genome. ChIP-seq experiments generate large amounts of data (in the order of tens of millions of reads), thus creating a bottleneck for data analysis and interpretation. Consequently, effective bioinformatics tools are needed to process, analyse and interpret these data in order to uncover underlying biological regulatory mechanisms.In essence, the ChIP-seq analysis workflow can be divided into the following steps:(i) Pre-processing. The goal of this step is to filter out erroneous and low-quality reads and to ensure that only the highest quality sequencing reads are retained for the sub-sequent mapping step;(ii) Mapping. This is the key step in which mapped reads are converted to an integer count of 'tags' at each position in the genome with fixed read length. The choice of flexibility options on mapping multiple reads to multiple sites affects sensitivity and specificity dependent on the volume and complexity of target genome sequences. The user can increase specificity using unique reads only or can increase sensitivity allowing multiple alignments of reads;(iii) Peak finding. This is the most challenging step in the analysis workflow, as the goal is to identify significant peak signals among background signals. This includes not only finding the strong signals, but also finding the statistically reproducible weak signals ob
Page 1 /6487
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.