Spectral Analysis on Time-Course Expression Data: Detecting Periodic Genes Using a Real-Valued Iterative Adaptive Approach  [PDF]
Kwadwo S. Agyepong,Fang-Han Hsu,Edward R. Dougherty,Erchin Serpedin
Advances in Bioinformatics , 2013, DOI: 10.1155/2013/171530
Abstract: Time-course expression profiles and methods for spectrum analysis have been applied for detecting transcriptional periodicities, which are valuable patterns to unravel genes associated with cell cycle and circadian rhythm regulation. However, most of the proposed methods suffer from restrictions and large false positives to a certain extent. Additionally, in some experiments, arbitrarily irregular sampling times as well as the presence of high noise and small sample sizes make accurate detection a challenging task. A novel scheme for detecting periodicities in time-course expression data is proposed, in which a real-valued iterative adaptive approach (RIAA), originally proposed for signal processing, is applied for periodogram estimation. The inferred spectrum is then analyzed using Fisher’s hypothesis test. With a proper -value threshold, periodic genes can be detected. A periodic signal, two nonperiodic signals, and four sampling strategies were considered in the simulations, including both bursts and drops. In addition, two yeast real datasets were applied for validation. The simulations and real data analysis reveal that RIAA can perform competitively with the existing algorithms. The advantage of RIAA is manifested when the expression data are highly irregularly sampled, and when the number of cycles covered by the sampling time points is very reduced. 1. Introduction Patterns of periodic gene expression have been found to be associated with essential biological processes such as cell cycle and circadian rhythm [1], and the detection of periodic genes is crucial to advance our understanding of gene function, disease pathways, and, ultimately, therapeutic solutions. Using high-throughput technologies such as microarrays, gene expression profiles at discrete time points can be derived and hundreds of cell cycle regulated genes have been reported in a variety of species. For example, Spellman et al. applied cell synchronization methods and conducted time-course gene expression experiments on Saccharomyces cerevisiae [2]. The authors identified 800 cell cycle regulated genes using DNA microarrays. Also, Rustici et al. and Menges et al. identified 407 and about 500 cell cycle regulated genes in Schizosaccharomyces pombe and Arabidopsis, respectively [3, 4]. Signal processing in the frequency domain simplifies the analysis and an emerging number of studies have demonstrated the power of spectrum analysis in the detection of periodic genes. Considering the common issues of missing values and noise in microarray experiments, Ahdesm?ki et al. proposed a
Simplex, associahedron, and cyclohedron  [PDF]
Martin Markl
Mathematics , 1997,
Abstract: The aim of the paper is to give an `elementary' introduction to the theory of modules over operads and discuss three prominent examples of these objects - simplex, associahedron (= the Stasheff polyhedron) and cyclohedron (= the compactification of the space of configurations of points on the circle). Keywords: (right) module over an operad, module associated to a cyclic operad, Koszul module over an operad.
Finding Communities of Related Genes  [PDF]
Dennis Wilkinson,Bernardo A. Huberman
Physics , 2002,
Abstract: We present an automated method of identifying communities of functionally related genes from the biomedical literature. These communities encapsulate human gene and protein interactions and identify groups of genes that are complementary in their function. We use graphs to represent the network of gene cooccurrences in articles mentioning particular keywords, and find that these graphs consist of one giant connected component and many small ones. In addition, the vertex degree distribution of the graphs follows a power law, whose exponent we determine. We then use an algorithm based on betweenness centrality to identify community structures within the giant component. The different structures are then aggregated into a final list of communities, whose members are weighted according to how strongly they belong to them. Our method is efficient enough to be applicable to the entire Medline database, and yet the information it extracts is significantly detailed, applicable to a particular problem, and interesting in and of itself. We illustrate the method in the case of colon cancer and demonstrate important features of the resulting communities.
Finding flavor genes
Philippe Reymond
Genome Biology , 2000, DOI: 10.1186/gb-2000-1-2-reports0057
Abstract: Aharoni et al. randomly isolated 1,701 cDNA clones from a strawberry fruit cDNA library and 480 clones from petunia corolla (as control) and printed the PCR-amplified clones on chemically modified glass slides using a robotic device. They used these microarrays to monitor changes in gene expression at three fruit developmental stages (from green to red). Using a rigorous statistical analysis, the authors found that 401 clones were differentially expressed between all three stages, with 177 clones being upregulated between the green and red stages. Sequences of the latter group of genes revealed that more than 50% were related to primary and secondary metabolism. From the other sequences potentially involved in flavor formation, Aharoni et al. identified a novel gene (SAAT) for an alcohol acetyltransferase, an enzyme that catalyzes the final step in the synthesis of volatile esters. This gene shows 16-fold greater expression during the red stage than the green stage of fruit development. The authors expressed recombinant SAAT in Escherichia coli and confirmed that the enzyme has alcohol acetyltransferase activity. Analysis of a series of potential substrates suggests that SAAT is responsible for formation of the predominant esters found in ripe strawberries.Access to Arabidopsis cDNA microarrays is provided by the Arabidopsis Functional Genomics Consortium (AFGC). Links to information on plant microarrays can also be found via the Virtual library: plant-arrays.Large-scale cDNA microarrays are now used with model systems to investigate global patterns of gene expression at the level of the whole organism. The utility of microarrays that cover a restricted portion of the genome, like that described in this paper, will become increasingly recognized, however. This paper is a first example of the use of customized plant cDNA microarrays from a non-model system. It provides a good example of how a small selected array can be used to study a particular developmental proces
Realizations of the associahedron and cyclohedron  [PDF]
Christophe Hohlweg,Carsten Lange
Mathematics , 2005,
Abstract: We describe many different realizations with integer coordinates for the associahedron (i.e. the Stasheff polytope) and for the cyclohedron (i.e. the Bott-Taubes polytope) and compare them to the permutahedron of type A_n and B_n respectively. The coordinates are obtained by an algorithm which uses an oriented Coxeter graph of type A_n or B_n respectively as only input and which specialises to a procedure presented by J.-L. Loday for a certain orientation of A_n. The described realizations have cambrian fans of type A and B as normal fans. This settles a conjecture of N. Reading for cambrian fans of these types.
Finding the Core-Genes of Chloroplasts  [PDF]
Bassam AlKindy,Jean-Fran?ois Couchot,Christophe Guyeux,Arnaud Mouly,Michel Salomon,Jacques M. Bahi
Computer Science , 2014,
Abstract: Due to the recent evolution of sequencing techniques, the number of available genomes is rising steadily, leading to the possibility to make large scale genomic comparison between sets of close species. An interesting question to answer is: what is the common functionality genes of a collection of species, or conversely, to determine what is specific to a given species when compared to other ones belonging in the same genus, family, etc. Investigating such problem means to find both core and pan genomes of a collection of species, \textit{i.e.}, genes in common to all the species vs. the set of all genes in all species under consideration. However, obtaining trustworthy core and pan genomes is not an easy task, leading to a large amount of computation, and requiring a rigorous methodology. Surprisingly, as far as we know, this methodology in finding core and pan genomes has not really been deeply investigated. This research work tries to fill this gap by focusing only on chloroplastic genomes, whose reasonable sizes allow a deep study. To achieve this goal, a collection of 99 chloroplasts are considered in this article. Two methodologies have been investigated, respectively based on sequence similarities and genes names taken from annotation tools. The obtained results will finally be evaluated in terms of biological relevance.
Associahedron, cyclohedron, and permutohedron as compactifications of configuration spaces  [PDF]
P. Lambrechts,V. Tourtchine,I. Volic
Mathematics , 2006,
Abstract: As in the case of the associahedron and cyclohedron, the permutohedron can also be defined as an appropriate compactification of a configuration space of points on an interval or on a circle. The construction of the compactification endows the permutohedron with a projection to the cyclohedron, and the cyclohedron with a projection to the associahedron. We show that the preimages of any point via these projections might not be homeomorphic to (a cell decomposition of) a disk, but are still contractible. We briefly explain an application of this result to the study of knot spaces from the point of view of the Goodwillie-Weiss manifold calculus.
Time-Course Analysis of Cyanobacterium Transcriptome: Detecting Oscillatory Genes  [PDF]
Carla Layana, Luis Diambra
PLOS ONE , 2011, DOI: 10.1371/journal.pone.0026291
Abstract: The microarray technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining these data one can identify the dynamics of the gene expression time series. The detection of genes that are periodically expressed is an important step that allows us to study the regulatory mechanisms associated with the circadian cycle. The problem of finding periodicity in biological time series poses many challenges. Such challenge occurs due to the fact that the observed time series usually exhibit non-idealities, such as noise, short length, outliers and unevenly sampled time points. Consequently, the method for finding periodicity should preferably be robust against such anomalies in the data. In this paper, we propose a general and robust procedure for identifying genes with a periodic signature at a given significance level. This identification method is based on autoregressive models and the information theory. By using simulated data we show that the suggested method is capable of identifying rhythmic profiles even in the presence of noise and when the number of data points is small. By recourse of our analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis.
Periodic Active Case Finding for TB: When to Look?  [PDF]
Peter J. Dodd, Richard G. White, Elizabeth L. Corbett
PLOS ONE , 2011, DOI: 10.1371/journal.pone.0029130
Abstract: Objective To investigate the factors influencing the performance and cost-efficacy of periodic rounds of active case finding (ACF) for TB. Methods A mathematical model of TB dynamics and periodic ACF (PACF) in the HIV era, simplified by assuming constant prevalence of latent TB infection, is analyzed for features that control intervention outcome, measured as cases averted and cases found. Explanatory variables include baseline TB incidence, interval between PACF rounds, and different routine and PACF case-detection rates among HIV-infected and uninfected TB cases. Findings PACF can be cost-saving over a 10 year time frame if the cost-per-round is lower than a threshold proportional to initial incidence and cost-per-case-treated. More cases are averted at higher baseline incidence rates, when more potent PACF strategies are used, intervals between PACF rounds are shorter, and when the ratio of HIV-negative to positive TB cases detected is higher. More costly approaches, e.g. radiographic screening, can be as cost-effective as less costly alternatives if PACF case-detection is higher and/or implementation less frequent. Conclusion Periodic ACF can both improve control and save medium-term health care costs in high TB burden settings. Greater costs of highly effective PACF at frequent (e.g. yearly) intervals may be offset by higher numbers of cases averted in populations with high baseline TB incidence, higher prevalence of HIV-uninfected cases, higher costs per-case-treated, and more effective routine case-detection. Less intensive approaches may still be cost-neutral or cost-saving in populations lacking one or more of these key determinants.
Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars  [PDF]
Umaa Rebbapragada,Pavlos Protopapas,Carla E. Brodley,Charles Alcock
Physics , 2009, DOI: 10.1007/s10994-008-5093-3
Abstract: Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD's reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena.
