Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
A Multivariate Student’s t-Distribution  [PDF]
Daniel T. Cassidy
Open Journal of Statistics (OJS) , 2016, DOI: 10.4236/ojs.2016.63040
Abstract: A multivariate Student’s t-distribution is derived by analogy to the derivation of a multivariate normal (Gaussian) probability density function. This multivariate Student’s t-distribution can have different shape parameters for the marginal probability density functions of the multivariate distribution. Expressions for the probability density function, for the variances, and for the covariances of the multivariate t-distribution with arbitrary shape parameters for the marginals are given.
Analyzing Latent Topics in Student Confessions Communities on Facebook  [PDF]
Soubhik Barari
Computer Science , 2015,
Abstract: In recent years, confessions pages have grown popular on social media sites such as Facebook and Twitter, particularly within college communities. Such pages allow users to anonymously submit confessions related to collegiate experience that are subsequently publicly broadcasted. Because of the anonymous nature of disclosure, we believe that confessions pages are novel data sources from which to discover trends and issues in a collegiate community. Aggregating a dataset of more than 20,000 posts from one such page, we analyze natural language characteristics of the originating community with LDA, pointwise mutual information and sentiment analysis. Using a Markov topic model, we examine the latent topics in our corpus and find that loneliness is a highly regular pattern. Our findings on student confession communities support contemporary sociological theories contextualizing student loneliness in the framework of social networks.
Student Motivation in STEM Careers at Three Northwest Universities of Mexico  [PDF]
María Amparo Oliveros, Lidia Esther Vargas, Benjamín Valdez, Eduardo Cabrera, Miguel Schoor, José Luis Arcos
Creative Education (CE) , 2016, DOI: 10.4236/ce.2016.718262
Abstract: Mexico hosts a large number of modern firms, notably in the sectors of aerospace, automobiles, foods and beverages, which employ high-skilled and well-educated workers. Therefore, Graduates from Science, Technology, Engineering, and Mathematics (STEM) fields are both in high demand in the labor market and among the most highly paid. Even though, 30.9% of Mexican employers report having faced difficulties finding people with the necessary skills to fill vacancies in STEM areas. Three universities in the northwest region of Mexico conformed a STEM network aiming to promote enrollment, retention and gender equality on STEM careers. An instrument based on Questionnaire ROSE-Q or “Relevance of Science Education” allowed gathering information that allows measuring relevant indicators to support the design of actions and strategies. The project was carried out with funds granted in 2016 from the National Council on Science and Technology (CONACYT). The main indicators impacting the STEM career choice of students are about cultural training, youth identity, and gender equity.
Spectral Methods for Learning Multivariate Latent Tree Structure  [PDF]
Animashree Anandkumar,Kamalika Chaudhuri,Daniel Hsu,Sham M. Kakade,Le Song,Tong Zhang
Computer Science , 2011,
Abstract: This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics.
Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data  [PDF]
Yarin Gal,Yutian Chen,Zoubin Ghahramani
Statistics , 2015,
Abstract: Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data. We model vectors of categorical variables as generated from a non-linear transformation of a continuous latent space. Non-linearity captures multi-modality in the distribution. The continuous representation addresses sparsity. Our model ties together many existing models, linking the linear categorical latent Gaussian model, the Gaussian process latent variable model, and Gaussian process classification. We derive inference for our model based on recent developments in sampling based variational inference. We show empirically that the model outperforms its linear and discrete counterparts in imputation tasks of sparse data.
Assessing phenotypic correlation through the multivariate phylogenetic latent liability model  [PDF]
Gabriela B. Cybis,Janet S. Sinsheimer,Trevor Bedford,Alison E. Mather,Philippe Lemey,Marc A. Suchard
Quantitative Biology , 2014, DOI: 10.1214/15-AOAS821
Abstract: Understanding which phenotypic traits are consistently correlated throughout evolution is a highly pertinent problem in modern evolutionary biology. Here, we propose a multivariate phylogenetic latent liability model for assessing the correlation between multiple types of data, while simultaneously controlling for their unknown shared evolutionary history informed through molecular sequences. The latent formulation enables us to consider in a single model combinations of continuous traits, discrete binary traits and discrete traits with multiple ordered and unordered states. Previous approaches have entertained a single data type generally along a fixed history, precluding estimation of correlation between traits and ignoring uncertainty in the history. We implement our model in a Bayesian phylogenetic framework, and discuss inference techniques for hypothesis testing. Finally, we showcase the method through applications to columbine flower morphology, antibiotic resistance in Salmonella and epitope evolution in influenza.
Joint modelling of repeated multivariate cognitive measures and competing risks of dementia and death: a latent process and latent class approach  [PDF]
Cécile Proust-Lima,Jean-Fran?ois Dartigues,Hélène Jacqmin-Gadda
Statistics , 2014,
Abstract: Joint models initially dedicated to a single longitudinal marker and a single time-to-event need to be extended to account for the rich longitudinal data of cohort studies. Multiple causes of clinical progression are indeed usually observed, and multiple longitudinal markers are collected when the true latent trait of interest is hard to capture (e.g. quality of life, functional dependency, cognitive level). These multivariate and longitudinal data also usually have nonstandard distributions (discrete, asymmetric, bounded,...). We propose a joint model based on a latent process and latent classes to analyze simultaneously such multiple longitudinal markers of different natures, and multiple causes of progression. A latent process model describes the latent trait of interest and links it to the observed longitudinal outcomes using flexible measurement models adapted to different types of data, and a latent class structure links the longitudinal and the cause-specific survival models. The joint model is estimated in the maximum likelihood framework. A score test is developed to evaluate the assumption of conditional independence of the longitudinal markers and each cause of progression given the latent classes. In addition, individual dynamic cumulative incidences of each cause of progression based on the repeated marker data are derived. The methodology is validated in a simulation study and applied on real data about cognitive aging coming from a large population-based study. The aim is to predict the risk of dementia by accounting for the competing death according to the profiles of semantic memory measured by two asymmetric psychometric tests.
Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement  [PDF]
Leonardo Grilli,Fulvia Pennoni,Carla Rampichini,Isabella Romeo
Statistics , 2014,
Abstract: We exploit a multivariate multilevel model for the analysis of the Italian sample of the TIMSS\&PIRLS 2011 Combined International Database on fourth grade students. The multivariate approach jointly considers educational achievement on Reading, Mathematics and Science, thus allowing us to test for differential associations of the covariates with the three outcomes, and to estimate the residual correlations between pairs of outcomes at student and class levels. Multilevel modelling allows us to disentangle student and contextual factors affecting achievement. We also account for territorial differences in wealth by means of an index from an external source. The model residuals point out classes with high or low performance. As educational achievement is measured by plausible values, the estimates are obtained through multiple imputation formulas. The results, while confirming the role of traditional student and contextual factors, reveal interesting patterns of achievement in Italian primary schools.
Multilevel latent Gaussian process model for mixed discrete and continuous multivariate response data  [PDF]
Erin M. Schliep,Jennifer A. Hoeting
Statistics , 2012, DOI: 10.1007/s13253-013-0136
Abstract: We propose a Bayesian model for mixed ordinal and continuous multivariate data to evaluate a latent spatial Gaussian process. Our proposed model can be used in many contexts where mixed continuous and discrete multivariate responses are observed in an effort to quantify an unobservable continuous measurement. In our example, the latent, or unobservable measurement is wetland condition. While predicted values of the latent wetland condition variable produced by the model at each location do not hold any intrinsic value, the relative magnitudes of the wetland condition values are of interest. In addition, by including point-referenced covariates in the model, we are able to make predictions at new locations for both the latent random variable and the multivariate response. Lastly, the model produces ranks of the multivariate responses in relation to the unobserved latent random field. This is an important result as it allows us to determine which response variables are most closely correlated with the latent variable. Our approach offers an alternative to traditional indices based on best professional judgment that are frequently used in ecology. We apply our model to assess wetland condition in the North Platte and Rio Grande River Basins in Colorado. The model facilitates a comparison of wetland condition at multiple locations and ranks the importance of in-field measurements.
Efficient inference about the tail weight in multivariate Student $t$ distributions  [PDF]
Christophe Ley,Anouk Neven
Statistics , 2013,
Abstract: We propose a new testing procedure about the tail weight parameter of multivariate Student $t$ distributions by having recourse to the Le Cam methodology. Our test is asymptotically as efficient as the classical likelihood ratio test, but outperforms the latter by its flexibility and simplicity: indeed, our approach allows to estimate the location and scatter nuisance parameters by any root-$n$ consistent estimators, hereby avoiding numerically complex maximum likelihood estimation. The finite-sample properties of our test are analyzed in a Monte Carlo simulation study, and we apply our method on a financial data set. We conclude the paper by indicating how to use this framework for efficient point estimation.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.