Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Scatter Component Analysis: A Unified Framework for Domain Adaptation and Domain Generalization  [PDF]
Muhammad Ghifary,David Balduzzi,W. Bastiaan Kleijn,Mengjie Zhang
Computer Science , 2015,
Abstract: This paper addresses classification tasks on a particular target domain in which labeled training data are only available from source domains different from (but related to) the target. Two closely related frameworks, domain adaptation and domain generalization, are concerned with such tasks, where the only difference between those frameworks is the availability of the unlabeled target data: domain adaptation can leverage unlabeled target information, while domain generalization cannot. We propose Scatter Component Analyis (SCA), a fast representation learning algorithm that can be applied to both domain adaptation and domain generalization. SCA is based on a simple geometrical measure, i.e., scatter, which operates on reproducing kernel Hilbert space. SCA finds a representation that trades between maximizing the separability of classes, minimizing the mismatch between domains, and maximizing the separability of data; each of which is quantified through scatter. The optimization problem of SCA can be reduced to a generalized eigenvalue problem, which results in a fast and exact solution. Comprehensive experiments on benchmark cross-domain object recognition datasets verify that SCA performs much faster than several state-of-the-art algorithms and also provides state-of-the-art classification accuracy in both domain adaptation and domain generalization. We also show that scatter can be used to establish a theoretical generalization bound in the case of domain adaptation.
A Probabilistic Framework for Structural Analysis in Directed Networks  [PDF]
Cheng-Shang Chang,Duan-Shin Lee,Li-Heng Liou,Sheng-Min Lu,Mu-Huan Wu
Computer Science , 2015,
Abstract: In our recent works, we developed a probabilistic framework for structural analysis in undirected networks. The key idea of that framework is to sample a network by a symmetric bivariate distribution and then use that bivariate distribution to formerly define various notions, including centrality, relative centrality, community, and modularity. The main objective of this paper is to extend the probabilistic framework to directed networks, where the sampling bivariate distributions could be asymmetric. Our main finding is that we can relax the assumption from symmetric bivariate distributions to bivariate distributions that have the same marginal distributions. By using such a weaker assumption, we show that various notions for structural analysis in directed networks can also be defined in the same manner as before. However, since the bivariate distribution could be asymmetric, the community detection algorithms proposed in our previous work cannot be directly applied. For this, we show that one can construct another sampled graph with a symmetric bivariate distribution so that for any partition of the network, the modularity index remains the same as that of the original sampled graph. Based on this, we propose a hierarchical agglomerative algorithm that returns a partition of communities when the algorithm converges.
Probabilistic principal component analysis for metabolomic data
Gift Nyamundanda, Lorraine Brennan, Isobel Gormley
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-571
Abstract: Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data.The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field.Metabolomics is the term used to describe the study of small molecules or metabolites present in biological samples. Examples of such metabolites include lipids, amino acids, bile acids, keto-acids. Studies of the concentration levels of these molecules in biological samples aim to enhance understanding of the effect of a particular stimulus or treatment [1-3]. The most commonly applied analytical technologies to metabolomic studies are nuclear magnetic resonance spectroscopy (NMR) [4] and mass spectrometry (MS) [5]. With respect to NMR-based metabolomics the data are usually in the form of spectra which are bin
A stochastic algorithm for probabilistic independent component analysis  [PDF]
Stéphanie Allassonniére,Laurent Younes
Statistics , 2012, DOI: 10.1214/11-AOAS499
Abstract: The decomposition of a sample of images on a relevant subspace is a recurrent problem in many different fields from Computer Vision to medical image analysis. We propose in this paper a new learning principle and implementation of the generative decomposition model generally known as noisy ICA (for independent component analysis) based on the SAEM algorithm, which is a versatile stochastic approximation of the standard EM algorithm. We demonstrate the applicability of the method on a large range of decomposition models and illustrate the developments with experimental results on various data sets.
A General Framework For Consistency of Principal Component Analysis  [PDF]
Dan Shen,Haipeng Shen,J. S. Marron
Statistics , 2012,
Abstract: A general asymptotic framework is developed for studying consis- tency properties of principal component analysis (PCA). Our frame- work includes several previously studied domains of asymptotics as special cases and allows one to investigate interesting connections and transitions among the various domains. More importantly, it enables us to investigate asymptotic scenarios that have not been considered before, and gain new insights into the consistency, subspace consistency and strong inconsistency regions of PCA and the boundaries among them. We also establish the corresponding convergence rate within each region. Under general spike covariance models, the dimension (or the number of variables) discourages the consistency of PCA, while the sample size and spike information (the relative size of the population eigenvalues) encourages PCA consistency. Our framework nicely illustrates the relationship among these three types of information in terms of dimension, sample size and spike size, and rigorously characterizes how their relationships a?ffect PCA consistency.
Class-Conditional Probabilistic Principal Component Analysis: Application to Gender Recognition
Bekios Calfa, Juan;Buenaposada, José M.;Baumela, Luis;
Computación y Sistemas , 2011,
Abstract: this paper presents a solution to the problem of recognizing the gender of a human face from an image. we adopt a holistic approach by using the cropped and normalized texture of the face as input to a naíve bayes classifier. first it is introduced the class-conditional probabilistic principal component analysis (cc-ppca) technique to reduce the dimensionality of the classification attribute vector and enforce the independence assumption of the classifier. this new approach has the desirable property of a simple parametric model for the marginals. moreover this model can be estimated with very few data. in the experiments conducted we show that using cc-ppca we get 90% classification accuracy, which is similar result to the best in the literature. the proposed method is very simple to train and implement.
PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data  [PDF]
Gabriel E. Hoffman ,Benjamin A. Logsdon,Jason G. Mezey
PLOS Computational Biology , 2013, DOI: 10.1371/journal.pcbi.1003101
Abstract: Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro- and anti-inflammatory pathways in Crohn's disease; and one novel association implicating a gene involved in apoptosis pathways in rheumatoid arthritis. We provide software for applying our PUMA analysis framework.
On estimation of the noise variance in high-dimensional probabilistic principal component analysis  [PDF]
Damien Passemier,Zhaoyuan Li,Jian-Feng Yao
Statistics , 2013,
Abstract: In this paper, we develop new statistical theory for probabilistic principal component analysis models in high dimensions. The focus is the estimation of the noise variance, which is an important and unresolved issue when the number of variables is large in comparison with the sample size. We first unveil the reasons of a widely observed downward bias of the maximum likelihood estimator of the variance when the data dimension is high. We then propose a bias-corrected estimator using random matrix theory and establish its asymptotic normality. The superiority of the new (bias-corrected) estimator over existing alternatives is first checked by Monte-Carlo experiments with various combinations of $(p, n)$ (dimension and sample size). In order to demonstrate further potential benefits from the results of the paper to general probability PCA analysis, we provide evidence of net improvements in two popular procedures (Ulfarsson and Solo, 2008; Bai and Ng, 2002) for determining the number of principal components when the respective variance estimator proposed by these authors is replaced by the bias-corrected estimator. The new estimator is also used to derive new asymptotics for the related goodness-of-fit statistic under the high-dimensional scheme.
Multilayer Network of Language: a Unified Framework for Structural Analysis of Linguistic Subsystems  [PDF]
Domagoj Margan,Ana Me?trovi?,Sanda Martin?i?-Ip?i?
Computer Science , 2015,
Abstract: Recently, the focus of complex networks research has shifted from the analysis of isolated properties of a system toward a more realistic modeling of multiple phenomena - multilayer networks. Motivated by the prosperity of multilayer approach in social, transport or trade systems, we propose the introduction of multilayer networks for language. The multilayer network of language is a unified framework for modeling linguistic subsystems and their structural properties enabling the exploration of their mutual interactions. Various aspects of natural language systems can be represented as complex networks, whose vertices depict linguistic units, while links model their relations. The multilayer network of language is defined by three aspects: the network construction principle, the linguistic subsystem and the language of interest. More precisely, we construct a word-level (syntax, co-occurrence and its shuffled counterpart) and a subword level (syllables and graphemes) network layers, from five variations of original text (in the modeled language). The obtained results suggest that there are substantial differences between the networks structures of different language subsystems, which are hidden during the exploration of an isolated layer. The word-level layers share structural properties regardless of the language (e.g. Croatian or English), while the syllabic subword level expresses more language dependent structural properties. The preserved weighted overlap quantifies the similarity of word-level layers in weighted and directed networks. Moreover, the analysis of motifs reveals a close topological structure of the syntactic and syllabic layers for both languages. The findings corroborate that the multilayer network framework is a powerful, consistent and systematic approach to model several linguistic subsystems simultaneously and hence to provide a more unified view on language.
A Tutorial on Principal Component Analysis with the Accord.NET Framework  [PDF]
César Roberto de Souza
Computer Science , 2012,
Abstract: This document aims to clarify frequent questions on using the Accord.NET Framework to perform statistical analyses. Here, we reproduce all steps of the famous Lindsay's Tutorial on Principal Component Analysis, in an attempt to give the reader a complete hands-on overview on the framework's basics while also discussing some of the results and sources of divergence between the results generated by Accord.NET and by other software packages.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.