Semi-parametric estimation of the hazard function in a model with covariate measurement error
Marie-Laure Martin-Magniette,Marie-Luce Taupin
Mathematics , 2006,
Abstract: We consider a model where the failure hazard function, conditional on a covariate $Z$ is given by $R(t,\theta^0|Z)=\eta\_{\gamma^0}(t)f\_{\beta^0}(Z)$, with $\theta^0=(\beta^0,\gamma^0)^\top\in \mathbb{R}^{m+p}$. The baseline hazard function $\eta\_{\gamma^0}$ and relative risk $f\_{\beta^0}$ belong both to parametric families. The covariate $Z$ is measured through the error model $U=Z+\epsilon$ where $\epsilon$ is independent from $Z$, with known density $f\_\epsilon$. We observe a $n$-sample $(X\_i, D\_i, U\_i)$, $i=1,...,n$, where $X\_i$ is the minimum between the failure time and the censoring time, and $D\_i$ is the censoring indicator. We aim at estimating $\theta^0$ in presence of the unknown density $g$. Our estimation procedure based on least squares criterion provide two estimators. The first one minimizes an estimation of the least squares criterion where $g$ is estimated by density deconvolution. Its rate depends on the smoothnesses of $f\_\epsilon$ and $f\_\beta(z)$ as a function of $z$,. We derive sufficient conditions that ensure the $\sqrt{n}$-consistency. The second estimator is constructed under conditions ensuring that the least squares criterion can be directly estimated with the parametric rate. These estimators, deeply studied through examples are in particular $\sqrt{n}$-consistent and asymptotically Gaussian in the Cox model and in the excess risk model, whatever is $f\_\epsilon$.
Hidden Markov Models with mixtures as emission distributions
Stevenn Volant,Caroline Bérard,Marie-Laure Martin-Magniette,Stéphane Robin
Computer Science , 2012,
Abstract: In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric modeling where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the classical EM algorithm can be adapted to infer the model parameters. For the initialisation step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the merging criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.
Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering
Gilles Celeux,Marie-Laure Martin-Magniette,Cathy Maugis-Rabusseau,Adrian E. Raftery
Statistics , 2013,
Abstract: We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than $K$-means without variable selection.
Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome
Caroline Bérard,Marie-Laure Martin-Magniette,Véronique Brunaud,Sébastien Aubourg,Stéphane Robin
Quantitative Biology , 2011,
Abstract: Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification.
Normalization for triple-target microarray experiments
Marie-Laure Martin-Magniette, Julie Aubert, Avner Bar-Hen, Samira Elftieh, Frederic Magniette, Jean-Pierre Renou, Jean-Jacques Daudin
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-216
Abstract: We propose a two-step normalization procedure for triple-target experiments. First the dye bleeding is evaluated and corrected if necessary. Then the signal in each channel is normalized using a generalized lowess procedure to correct a global dye bias. The normalization procedure is validated using triple-self experiments and by comparing the results of triple-target and two-color experiments. Although the focus is on triple-target microarrays, the proposed method can be used to normalize p differently labelled targets co-hybridized on a same array, for any value of p greater than 2.The proposed normalization procedure is effective: the technical biases are reduced, the number of false positives is under control in the analysis of differentially expressed genes, and the triple-target experiments are more powerful than the corresponding two-color experiments. There is room for improving the microarray experiments by simultaneously hybridizing more than two samples.DNA microarray technology is a high throughput technique by which the expression of the whole genome is studied in a single experiment. In dual label experiments the fluorescent dyes Cy3 and Cy5 are used to label the two RNA samples co-hybridized on a same array. Recently two more dyes have been proposed (Alexa 488 and Alexa 594) allowing the simultaneous hybridization of three or four samples. Forster et al. [2] have evaluated triple-target microarray by comparing results of single-target, dual-target and triple-target microarrays. They have concluded that the use of triple-target microarray is valid from an experimental point of view. One year later, Staal et al. [7] have investigated the four-target microarray experiments. Their approach differs from that of [2], but their conclusions are in fair agreement. Their study has shown that Alexa 594 is best suited as a third dye and that Alexa 488 can be applied as a fourth dye on some microarray types. These extensions of the microarray technology are promis
Search for the genes involved in oocyte maturation and early embryo development in the hen
Sebastien Elis, Florence Batellier, Isabelle Couty, Sandrine Balzergue, Marie-Laure Martin-Magniette, Philippe Monget, Elisabeth Blesbois, Marina S Govoroun
BMC Genomics , 2008, DOI: 10.1186/1471-2164-9-110
Abstract: The in silico approach allowed us to identify 18 chicken homologs of mouse potential oocyte genes found by digital differential display. Using the chicken Affymetrix microarray, we identified 461 genes overexpressed in granulosa cells (GCs) and 250 genes overexpressed in the germinal disc (GD) of the hen oocyte. Six genes were identified using both in silico and microarray approaches. Based on GO annotations, GC and GD genes were differentially involved in biological processes, reflecting different physiological destinations of these two cell layers. Finally we studied the spatial and temporal dynamics of the expression of 21 chicken genes. According to their expression patterns all these genes are involved in different stages of final follicular maturation and/or early embryogenesis in the chicken. Among them, 8 genes (btg4, chkmos, wee, zpA, dazL, cvh, zar1 and ktfn) were preferentially expressed in the maturing occyte and cvh, zar1 and ktfn were also highly expressed in the early embryo.We showed that in silico and Affymetrix microarray approaches were relevant and complementary in order to find new avian genes potentially involved in oocyte maturation and/or early embryo development, and allowed the discovery of new potential chicken mature oocyte and chicken granulosa cell markers for future studies. Moreover, detailed study of the expression of some of these genes revealed promising candidates for maternal effect genes in the chicken. Finally, the finding concerning the different state of rRNA compared to that of mRNA during the postovulatory period shed light on some mechanisms through which oocyte to embryo transition occurs in the hen.The activation of molecular pathways underlying oocyte to embryo transition (OET) depends exclusively on maternal RNAs and proteins accumulated during growth of the oocyte [1]. During OET and preimplantation development in mice, the embryo becomes almost autonomous, and may gradually eliminate maternal components. Indeed, by t
Comparative transcriptomics of drought responses in Populus: a meta-analysis of genome-wide expression profiling in mature leaves and root apices across two genotypes
David Cohen, Marie-Béatrice Bogeat-Triboulot, Emilie Tisserant, Sandrine Balzergue, Marie-Laure Martin-Magniette, Ga?lle Lelandais, Nathalie Ningre, Jean-Pierre Renou, Jean-Philippe Tamby, Didier Le Thiec, Irène Hummel
BMC Genomics , 2010, DOI: 10.1186/1471-2164-11-630
Abstract: Using a multi-species designed microarray, a genomic DNA-based selection of probesets provided an unambiguous between-genotype comparison. Analyses of functional group enrichment enabled the extraction of processes physiologically relevant to drought response. The drought-driven changes in gene expression occurring in root apices were consistent across treatments and genotypes. For mature leaves, the transcriptome response varied weakly but in accordance with the duration of water deficit. A differential clustering algorithm revealed similar and divergent gene co-expression patterns among the two genotypes. Since moderate stress levels induced similar physiological responses in both genotypes, the genotype-dependent transcriptional responses could be considered as intrinsic divergences in genome functioning. Our meta-analysis detected several candidate genes and processes that are differentially regulated in root and leaf, potentially under developmental control, and preferentially involved in early and long-term responses to drought.In poplar, the well-known drought-induced activation of sensing and signalling cascades was specific to the early response in leaves but was found to be general in root apices. Comparing our results to what is known in arabidopsis, we found that transcriptional remodelling included signalling and a response to energy deficit in roots in parallel with transcriptional indices of hampered assimilation in leaves, particularly in the drought-sensitive poplar genotype.Water deficit is recognised as one of the main environmental constraints restricting natural and agro-ecosystem productivity [1,2]. The influence of water availability on plant productivity suggests that water limitation has shaped the natural variation and evolution of many physiological traits [3]. Biotechnology has investigated the genetic basis of drought tolerance by targeting relevant genes [4,5]. However, manipulating a single gene at a time, even genes encoding transcrip
Arabidopsis TFL2/LHP1 Specifically Associates with Genes Marked by Trimethylation of Histone H3 Lysine 27
Franziska Turck equal contributor,Fran?ois Roudier equal contributor,Sara Farrona,Marie-Laure Martin-Magniette,Elodie Guillaume,Nicolas Buisine,Séverine Gagnot,Robert A Martienssen,George Coupland ,Vincent Colot
PLOS Genetics , 2007, DOI: 10.1371/journal.pgen.0030086
Abstract: TERMINAL FLOWER 2/LIKE HETEROCHROMATIN PROTEIN 1 (TFL2/LHP1) is the only Arabidopsis protein with overall sequence similarity to the HETEROCHROMATIN PROTEIN 1 (HP1) family of metazoans and S. pombe. TFL2/LHP1 represses transcription of numerous genes, including the flowering-time genes FLOWERING LOCUS T (FT) and FLOWERING LOCUS C (FLC), as well as the floral organ identity genes AGAMOUS (AG) and APETALA 3 (AP3). These genes are also regulated by proteins of the Polycomb repressive complex 2 (PRC2), and it has been proposed that TFL2/LHP1 represents a potential stabilizing factor of PRC2 activity. Here we show by chromatin immunoprecipitation and hybridization to an Arabidopsis Chromosome 4 tiling array (ChIP-chip) that TFL2/LHP1 associates with hundreds of small domains, almost all of which correspond to genes located within euchromatin. We investigated the chromatin marks to which TFL2/LHP1 binds and show that, in vitro, TFL2/LHP1 binds to histone H3 di- or tri-methylated at lysine 9 (H3K9me2 or H3K9me3), the marks recognized by HP1, and to histone H3 trimethylated at lysine 27 (H3K27me3), the mark deposited by PRC2. However, in vivo TFL2/LHP1 association with chromatin occurs almost exclusively and co-extensively with domains marked by H3K27me3, but not H3K9me2 or -3. Moreover, the distribution of H3K27me3 is unaffected in lhp1 mutant plants, indicating that unlike PRC2 components, TFL2/LHP1 is not involved in the deposition of this mark. Rather, our data suggest that TFL2/LHP1 recognizes specifically H3K27me3 in vivo as part of a mechanism that represses the expression of many genes targeted by PRC2.
Transcriptome Analysis Describing New Immunity and Defense Genes in Peripheral Blood Mononuclear Cells of Rheumatoid Arthritis Patients
Vitor Hugo Teixeira, Robert Olaso, Marie-Laure Martin-Magniette, Sandra Lasbleiz, Laurent Jacq, Catarina Resende Oliveira, Pascal Hilliquin, Ivo Gut, Fran?ois Cornelis, Elisabeth Petit-Teixeira
PLOS ONE , 2009, DOI: 10.1371/journal.pone.0006803
Abstract: Background Large-scale gene expression profiling of peripheral blood mononuclear cells from Rheumatoid Arthritis (RA) patients could provide a molecular description that reflects the contribution of diverse cellular responses associated with this disease. The aim of our study was to identify peripheral blood gene expression profiles for RA patients, using Illumina technology, to gain insights into RA molecular mechanisms. Methodology/Principal Findings The Illumina Human-6v2 Expression BeadChips were used for a complete genome-wide transcript profiling of peripheral blood mononuclear cells (PBMCs) from 18 RA patients and 15 controls. Differential analysis per gene was performed with one-way analysis of variance (ANOVA) and P values were adjusted to control the False Discovery Rate (FDR<5%). Genes differentially expressed at significant level between patients and controls were analyzed using Gene Ontology (GO) in the PANTHER database to identify biological processes. A differentially expression of 339 Reference Sequence genes (238 down-regulated and 101 up-regulated) between the two groups was observed. We identified a remarkably elevated expression of a spectrum of genes involved in Immunity and Defense in PBMCs of RA patients compared to controls. This result is confirmed by GO analysis, suggesting that these genes could be activated systemically in RA. No significant down-regulated ontology groups were found. Microarray data were validated by real time PCR in a set of nine genes showing a high degree of correlation. Conclusions/Significance Our study highlighted several new genes that could contribute in the identification of innovative clinical biomarkers for diagnostic procedures and therapeutic interventions.
A Gene-Phenotype Network Based on Genetic Variability for Drought Responses Reveals Key Physiological Processes in Controlled and Natural Environments
David Rengel, Sandrine Arribat, Pierre Maury, Marie-Laure Martin-Magniette, Thibaut Hourlier, Marion Laporte, Didier Varès, Sébastien Carrère, Philippe Grieu, Sandrine Balzergue, Jér?me Gouzy, Patrick Vincourt, Nicolas B. Langlade
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0045249
Abstract: Identifying the connections between molecular and physiological processes underlying the diversity of drought stress responses in plants is key for basic and applied science. Drought stress response involves a large number of molecular pathways and subsequent physiological processes. Therefore, it constitutes an archetypical systems biology model. We first inferred a gene-phenotype network exploiting differences in drought responses of eight sunflower (Helianthus annuus) genotypes to two drought stress scenarios. Large transcriptomic data were obtained with the sunflower Affymetrix microarray, comprising 32423 probesets, and were associated to nine morpho-physiological traits (integrated transpired water, leaf transpiration rate, osmotic potential, relative water content, leaf mass per area, carbon isotope discrimination, plant height, number of leaves and collar diameter) using sPLS regression. Overall, we could associate the expression patterns of 1263 probesets to six phenotypic traits and identify if correlations were due to treatment, genotype and/or their interaction. We also identified genes whose expression is affected at moderate and/or intense drought stress together with genes whose expression variation could explain phenotypic and drought tolerance variability among our genetic material. We then used the network model to study phenotypic changes in less tractable agronomical conditions, i.e. sunflower hybrids subjected to different watering regimes in field trials. Mapping this new dataset in the gene-phenotype network allowed us to identify genes whose expression was robustly affected by water deprivation in both controlled and field conditions. The enrichment in genes correlated to relative water content and osmotic potential provides evidence of the importance of these traits in agronomical conditions.
