Adaptive density estimation for clustering with Gaussian mixtures
Maugis Cathy,Michel Bertrand
Statistics , 2011,
Abstract: Gaussian mixture models are widely used to study clustering problems. These model-based clustering methods require an accurate estimation of the unknown data density by Gaussian mixtures. In Maugis and Michel (2009), a penalized maximum likelihood estimator is proposed for automatically selecting the number of mixture components. In the present paper, a collection of univariate densities whose logarithm is locally {\beta}-H\"older with moment and tail conditions are considered. We show that this penalized estimator is minimax adaptive to the {\beta} regularity of such densities in the Hellinger sense.
Non-asymptotic detection of two-component mixtures with unknown means
Béatrice Laurent,Clément Marteau,Cathy Maugis-Rabusseau
Statistics , 2013,
Abstract: This work is concerned with the detection of a mixture distribution from a $\mathbb{R}$-valued sample. Given a sample $X_1,\dots, X_n$ and an even density $\phi$, our aim is to detect whether the sample distribution is $\phi(.-\mu)$ for some unknown mean $\mu$, or is defined as a two-component mixture based on translations of $\phi$. In a first time, a non-asymptotic testing procedure is proposed and we determine conditions under which the power of the test can be controlled. In a second time, the performances of our testing procedure are investigated in 'benchmark' asymptotic settings. A simulation study provides comparisons with classical procedures.
Multidimensional two-component Gaussian mixtures detection
Béatrice Laurent,Clément Marteau,Cathy Maugis-Rabusseau
Statistics , 2015,
Abstract: Let $(X\_1,\ldots,X\_n)$ be a $d$-dimensional i.i.d sample from a distribution with density $f$. The problem of detection of a two-component mixture is considered. Our aim is to decide whether $f$ is the density of a standard Gaussian random $d$-vector ($f=\phi\_d$) against $f$ is a two-component mixture: $f=(1-\varepsilon)\phi\_d +\varepsilon \phi\_d (.-\mu)$ where $(\varepsilon,\mu)$ are unknown parameters. Optimal separation conditions on $\varepsilon, \mu, n$ and the dimension $d$ are established, allowing to separate both hypotheses with prescribed errors. Several testing procedures are proposed and two alternative subsets are considered.
Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering
Gilles Celeux,Marie-Laure Martin-Magniette,Cathy Maugis-Rabusseau,Adrian E. Raftery
Statistics , 2013,
Abstract: We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than $K$-means without variable selection.
Event Conditional Correlation: Or How Non-Linear Linear Dependence Can Be
P-A. G. Maugis
Statistics , 2014,
Abstract: Given two random variables we study their correlation conditional on a given event, and call this parameter "Event Conditional Correlation". This parameter can be used to describe conditional dependence and to produce local linear approximations of the overall dependence. To this end we introduce a new estimator of event conditional correlation, and a new estimator of the unconditional correlation based on a partial sample. In both cases we provide proof of consistency, asymptotic normality, and present simulations where the proposed estimators have mean square errors close to that induced by the Cramer-Rao bound.
An Econometric Study of Vine Copulas
Pierre-André Maugis,Dominique Guégan
International Journal of Economics and Finance , 2010, DOI: 10.5539/ijef.v2n5p2
Abstract: We present a new recursive algorithm to construct vine copulas based on an underlying tree structure. This new structure is interesting to compute multivariate distributions for dependent random variables. We prove the asymptotic normality of the vine copula parameter estimator and show that all vine copula parameter estimators have comparable variance. Both results are crucial to motivate any econometrical work based on vine copulas. We provide an application of vine copulas to estimate the VaR of a portfolio, and show they offer significant improvement as compared to a benchmark estimator based on a GARCH model.
Educating Students’ Privacy Decision Making through Information Ethics Curriculum  [PDF]
Cathy S. Lin
Creative Education (CE) , 2016, DOI: 10.4236/ce.2016.71017
Abstract: Increasingly sophisticated technologies nowadays have equipped powerful capabilities to obtain and exploit consumers’ information privacy on the Internet. The contemporary privacy protection techniques seem fail to guard information privacy. Besides of the technological protections, information ethics education is described as the ideal way to increase people’s consciousness. This study proposes a privacy decision making model which posits that attitudes toward privacy protection, privacy self-efficacy for protection, and privacy self-efficacy for non-acquisition are critical factors essential to behavioral intention. Further, a longitudinal model explores whether information ethics education plays a role in influencing students’ concepts of protecting information privacy. A survey of 111 senior-level undergraduate students in the department of Information Management was conducted to test the hypothesized model. The findings exhibit important insights: through information ethics education, students demonstrate significant model paths changes in the relationships of attitude, privacy self-efficacy for protection, and privacy self-efficacy for non-acquisition to intention. The implications to the ethics curriculum concerning information privacy are discussed.
Differentiation is death
Cathy Holding
Genome Biology , 2002, DOI: 10.1186/gb-2002-3-11-reports0058
Abstract: Several lines of evidence led Fernando et al. to investigate whether apoptosis and myoblast differentiation might initially share a common pathway. At the molecular level, caspase 3 has been linked with activation of the mitogen-activated protein (MAP) kinases (MAPKs), Jun N-terminal kinase (JNK) and p38, which are involved in the initiation and continuation of myogenesis. Certain cellular events, namely cell-membrane blebbing and fusion and actin fiber reorganization, appear to be common to both myogenesis and apoptosis. Finally, caspase-3-knockout mice appear to be quite underweight and have visibly less muscle mass than heterozygous individuals.In wild-type cell lines, raised levels of caspase 3 were detected after initiation of differentiation (which is achieved by placing actively growing cells in low-serum medium). Immunocytochemistry was used to show that the observed increase in caspase 3 was associated with differentiating myoblasts and not apoptotic cells. After initiation of differentiation in cell lines derived from caspase-3-knockout mice, there was a measurable lack of myotube formation compared with cell lines derived from heterozygous and wild-type mice, even though cell proliferation was comparable before the initiation of differentiation. Lower levels of differentiation-specific gene products, such as myogenin and hypophosphorylated MyoD, were found in these cells, while levels of cyclin D1, a marker for cellular proliferation, were higher. To rule out the possibility that apoptosis was somehow removing inhibitory cells, or that it could have some kind of cell-autonomous effect leading to the triggering of myogenesis, apoptosis was measured in both normal and knockout cell lines. Surprisingly, no difference in the degree of apoptosis was found between the two, and, therefore, the raised levels of caspase 3 in the normal cell lines did not appear to be involved entirely with apoptosis.To complement these experiments, caspase 3 was chemically inhibit
Gut response
Cathy Holding
Genome Biology , 2003, DOI: 10.1186/gb-spotlight-20030616-01
Abstract: Steidler et al. replaced the thymidylate synthase gene thyA in Lactococcus lactis with an expression cassette for the IL-10 gene, simultaneously enabling the microorganism to produce the cytokine and to render it dependent on thymidine or thymine for survival. They reasoned that a recombination event to restore the thyA gene, if it should occur at all, would simply replace the expression cassette and return the bacterial genome to its premodification state. They demonstrated survival dependence of the organism on thymidine and thymine, its viability, and the secretion of functional IL-10 both in vitro and in vivo in pig intestine, which closely resembles the human gut. One of their strains is currently undergoing clinical trials in Holland."The thyA-deficient bacteria cannot accumulate in the environment. Our approach thus provides a simple and robust system for biological containment," conclude the authors.
Cross-species transfer is last straw
Cathy Holding
Genome Biology , 2003, DOI: 10.1186/gb-spotlight-20031128-01
Abstract: Weigel et al. confirmed the identity as S. aureus by sequence analysis of specific genes gyrA and gyrB and rDNA and ruled out contamination with enterococci by the inability to amplify enterococcal ligases by polymerase chain reaction. Pulse field analysis confirmed the VRSA to be type USA100 - the most common type in US hospitals. Minimal inhibitory concentration was determined to be 1024 μg/ml for vancomycin, and resistance to aminoglycosides, β-lactams, fluoroquinolones, macrolides, rifampin, and tetracycline showed that it had retained its MRSA phenotype. Vancomycin resistance was observed to be conferred by vanA, one of several gene clusters found in Enterococcus faecium. Analysis of plasmids from the VRSA and E. faecalis co-isolates identified two plasmids, 45 and 95 kb long, and Southern blot analysis revealed a 7.1-kb fragment containing vanA. Filter mating studies identified the resistance plasmid as conjugative. The complete sequence has been placed in GenBank."Genetic analyses suggest that the long-anticipated transfer of vancomycin resistance to a methicillin-resistant S. aureus occurred in vivo by interspecies transfer on Tn1546 from a co-isolate of Enterococcus faecalis. The VRSA plasmid was transferable to other strains of S. aureus, reinforcing concerns of potential widespread resistance to one of the few classes of agents still active against multidrug-resistant S. aureus," conclude the authors.
