Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99


Any time

2019 ( 29 )

2018 ( 65 )

2017 ( 65 )

2016 ( 74 )

Custom range...

Search Results: 1 - 10 of 21905 matches for " Dae-Won Kim "
All listed articles are free for downloading (OA Articles)
Page 1 /21905
Display every page Item
Applied and Translational Genomics for Human Genetics and Clinical Science
Dae-Won Kim
Frontiers in Bioengineering and Biotechnology , 2014, DOI: 10.3389/fbioe.2014.00011
Abstract: A book review based on Applied Computational Genomics (Translational Bioinformatics Volume 1) Edited by Yin Yao Shugart Springer; 2012; ISBN: 978-94-007-5557-4; Hardcover; 184 pp.; $189.00. Translational bioinformatics is an emerging field of study that addresses the computational challenges encountered in biological and clinical research as well as in the analysis and interpretation of the data generated from it. Applied computation genomics, as part of the Springer Series on Translational Bioinformatics, thoroughly discusses the most relevant issues in the development of novel techniques for the integration of human genetic, biological, and clinical data. This book also provides an example of theories and research being practically applied to inform translational medical research in clinical diagnosis. This book covers numerous experimental and computational methods related to statistical development and their applications in the field of human genomics, including candidate gene mapping, linkage analysis, population-based, genome-wide association, exon sequencing, and whole genome sequencing analysis. This book consists of ten chapters. The first chapter focuses on an overview of the current human genome science. It reviews the history of using machine-learning algorithms for studies on disease prediction and provides highlights for the other nine chapters, which have been collected in this book. Chapter 2 provides a broad overview of the most important concepts in genetic epidemiology. In this chapter, the authors provide a precise definition for complex traits and a thorough introduction to genetic epidemiology as a tool for understanding the role of genetic factors. In addition, the essential study designs used to accomplish this goal, including family, twin, adoption, and migration studies, are summarized. Chapter 3 focuses on integrated linkage analysis and its results in the design, execution, and interpretation of whole genome or whole exome sequencing studies. It includes experiments, knowledge, and specific example data. This chapter also presents new statistical algorithms to identify rare variants in pedigree settings for both qualitative and quantitative traits. Chapter 4 briefly reviews the methods used to combine functional genomic data to detect complex diseases. Chapter 4 examines the progress of research on a specific rare disorder, nasopharyngeal carcinoma (NPC). Dr. Jorgensen et al. conducted a thorough review of all candidate genes related to NPC and explained the findings of two genome-wide association studies (GWAS), one by a
Review of “Bioinformatics: A Computing Perspective” edited by Shuba Gopal, Anne Haake, Rhys Price Jones and Paul Tymann
Dae-Won Kim, Hong-Seog Park
Algorithms for Molecular Biology , 2009, DOI: 10.1186/1748-7188-4-9
Abstract: This book consists of nine chapters. The first chapter provides a history of scientific works and a definition of bioinformatics. Topics include the organization and roles of a bioinformatics team and the challenges of computational algorithms in molecular biology. Chapter 2 illustrates some overall basics of biology, including cellular organization and complexity, evolution, and how the language of cells is encoded at various steps through DNA, RNA and protein. This chapter also provides a description of the management of genetic data, including replication and verification of DNA. Chapter 3 deals with fundamental wet lab techniques, describing the principles of hybridization, cDNA synthesis, DNA sequencing methods and proteomics techniques. This chapter emphasizes the importance of development of appropriate computational approaches to working with biological data to provide an understanding and characterization of those data.Chapters 4 and 5 provide a broad overview of fragment assembly, aimed especially at answering two main questions: What is the problem of biological sequence assembly? and What are the various computational approaches used for sequence alignment? Chapter 4 addresses the issue of the nature of genome sequences. It provides the reader with a basic example and simple pattern matching and graph algorithms for solving the sample problem. Chapter 5 provides descriptions of more extensive and advanced alignment algorithms for more exact similarity detection, such as Deterministic finite-state automata (DFAs), the Needleman-Wunsch algorithm, and the Smith-Waterman algorithm. This chapter also deals with evolutionary and heuristic approaches, including point accepted mutation (PAM).Chapter 6 introduces the concept of evolution and computational methods for the discovery of evolutionary relationships based on the tree of life. This chapter offers a detailed introduction regarding how best to build phylogenies, including discussion of neighbor joining, m
Review of ”Bioinformatics for vaccinology“ edited by Darren R. Flower
Dae-Won Kim, Hong-Seog Park
Virology Journal , 2009, DOI: 10.1186/1743-422x-6-85
Abstract: The study of vaccines has captured the attention of the biomedical and public communities for the past hundred years, largely because vaccines play a key role in affecting lifespan and well being, as well as economics. A positive trend in the field of vaccinology has been inspired by recent major breakthroughs in the fields of molecular biology, physiology, genomics, proteomics, and computers that promise a bright future for the prevention of infectious diseases and allergies. This book focuses on a fundamental understanding of the primary role of vaccines from a historical perspective, the immune system in molecular cell biology and computational approaches that show the link between experiment and theory.The major portion of the book is divided into seven chapters. Chapter one provides a historical background of vaccination that acts as a framework for the remainder of the book. Overall, this chapter provides a comprehensive overview of several interesting subjects based on effective and efficient efforts at developing vaccines. Chapter two focuses on the introductory need and opportunity for vaccines, followed by the role of unprecedented phenomenon, such as rapid changes in economic status, climate and infectious disease. Chapter three describes how a rigorous biological perspective of the molecular immune system is a fundamental requirement for the development of computational algorithms to explain how vaccines work. It further provides a discussion of how informatics approaches can give a glimpse of vaccine discovery. Chapter four contains an introduction into '-omics' and reviews the most important biological sequence databases, such as the host databases, immunological databases, pathogen databases, and T cell and B cell databases. Chapter five describes the computational challenges in predicting T cell and B cell epitopes and in allergen discovery. Moreover, it provides a description of computational algorithms, including artificial neural networks (ANN), h
A Package for the Automated Classification of Periodic Variable Stars
Dae-Won Kim,Coryn A. L. Bailer-Jones
Physics , 2015,
Abstract: We present a machine learning package for the classification of periodic variable stars. Our package is intended to be general: it can classify any single band optical light curve comprising at least a few tens of observations covering durations from weeks to years, with arbitrary time sampling. We use light curves of periodic variable stars taken from OGLE and EROS-2 to train the model. To make our classifier relatively survey-independent, it is trained on 16 features extracted from the light curves (e.g. period, skewness, Fourier amplitude ratio). The model classifies light curves into one of seven superclasses - Delta Scuti, RR Lyrae, Cepheid, Type II Cepheid, eclipsing binary, long-period variable, non-variable - as well as subclasses of these, such as ab, c, d, and e types for RR Lyraes. When trained to give only superclasses, our model achieves 0.98 for both recall and precision as measured on an independent validation dataset (on a scale of 0 to 1). When trained to give subclasses, it achieves 0.81 for both recall and precision. In order to assess classification performance of the subclass model, we applied it to the MACHO, LINEAR, and ASAS periodic variables, which gave recall/precision of 0.92/0.98, 0.89/0.96, and 0.84/0.88, respectively. We also applied the subclass model to Hipparcos periodic variable stars of many other variability types that do not exist in our training set, in order to examine how much those types degrade the classification performance of our target classes. In addition, we investigate how the performance varies with the number of data points and duration of observations. We find that recall and precision do not vary significantly if the number of data points is larger than 80 and the duration is more than a few weeks. The classifier software of the subclass model is available from the GitHub repository (https://goo.gl/xmFO6Q).
Supervised detection of anomalous light-curves in massive astronomical catalogs
Isadora Nun,Karim Pichara,Pavlos Protopapas,Dae-Won Kim
Physics , 2014, DOI: 10.1088/0004-637X/793/1/23
Abstract: The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. To process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new method to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all the information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. Our method is suitable for exploring massive datasets given that the training process is performed offline. We tested our algorithm on 20 millions light-curves from the MACHO catalog and generated a list of anomalous candidates. We divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post analysis stage by perfoming a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables and X-ray sources. For some outliers there were no additional information. Among them we identified three unknown variability types and few individual outliers that will be followed up for a deeper analysis.
Statistical Properties of Galactic δ Scuti Stars: Revisited
Seo-Won Chang,Pavlos Protopapas,Dae-Won Kim,Yong-Ik Byun
Physics , 2013, DOI: 10.1088/0004-6256/145/5/132
Abstract: We present statistical characteristics of 1,578 {\delta} Scuti stars including nearby field stars and cluster member stars within the Milky Way. We obtained 46% of these stars (718 stars) from the works done by Rodr\'{i}guez and collected the remaining 54% stars (860 stars) from other literatures. We updated the entries with the latest information of sky coordinate, color, rotational velocity, spectral type, period, amplitude and binarity. The majority of our sample are well characterized in terms of typical period range (0.02-0.25 days), pulsation amplitudes (<0.5 mag) and spectral types (A-F type). Given this list of {\delta} Scuti stars, we examined relations between their physical properties (i.e., periods, amplitudes, spectral types and rotational velocities) for field stars and cluster members, and confirmed that the correlations of properties are not significantly different from those reported in the Rodr\'{i}guez's works. All the {\delta} Scuti stars are cross-matched with several X-ray and UV catalogs, resulting in 27 X-ray and 41 UV-only counterparts. These counterparts are interesting targets for further study because of their rarity and uniqueness in showing {\delta} Scuti-type variability and X-ray/UV emission at the same time. The compiled catalog can be accessed through the web interface http://stardb.yonsei.ac.kr/DeltaScuti
QSO Selection Algorithm Using Time Variability and Machine Learning: Selection of 1,620 QSO Candidates from MACHO LMC Database
Dae-Won Kim,Pavlos Protopapas,Yong-Ik Byun,Charles Alcock,Roni Khardon,Markos Trichas
Physics , 2011, DOI: 10.1088/0004-637X/735/2/68
Abstract: We present a new QSO selection algorithm using a Support Vector Machine (SVM), a supervised classification method, on a set of extracted times series features including period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using 58 known QSOs, 1,629 variable stars and 4,288 non-variables using the MAssive Compact Halo Object (MACHO) database as a training set. To estimate the efficiency and the accuracy of the model, we perform a cross-validation test using the training set. The test shows that the model correctly identifies ~80% of known QSOs with a 25% false positive rate. The majority of the false positives are Be stars. We applied the trained model to the MACHO Large Magellanic Cloud (LMC) dataset, which consists of 40 million lightcurves, and found 1,620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution (SAGE) LMC catalog and a few X-ray catalogs. The results further suggest that the majority of the candidates, more than 70%, are QSOs.
Assessment of stochastic and deterministic models of 6304 quasar lightcurves from SDSS Stripe 82
Rene Andrae,Dae-Won Kim,Coryn A. L. Bailer-Jones
Physics , 2013, DOI: 10.1051/0004-6361/201321335
Abstract: The optical light curves of many quasars show variations of tenths of a magnitude or more on time scales of months to years. This variation often cannot be described well by a simple deterministic model. We perform a Bayesian comparison of over 20 deterministic and stochastic models on 6304 QSO light curves in SDSS Stripe 82. We include the damped random walk (or Ornstein-Uhlenbeck [OU] process), a particular type of stochastic model which recent studies have focused on. Further models we consider are single and double sinusoids, multiple OU processes, higher order continuous autoregressive processes, and composite models. We find that only 29 out of 6304 QSO lightcurves are described significantly better by a deterministic model than a stochastic one. The OU process is an adequate description of the vast majority of cases (6023). Indeed, the OU process is the best single model for 3462 light curves, with the composite OU process/sinusoid model being the best in 1706 cases. The latter model is the dominant one for brighter/bluer QSOs. Furthermore, a non-negligible fraction of QSO lightcurves show evidence that not only the mean is stochastic but the variance is stochastic, too. Our results confirm earlier work that QSO light curves can be described with a stochastic model, but place this on a firmer footing, and further show that the OU process is preferred over several other stochastic and deterministic models. Of course, there may well exist yet better (deterministic or stochastic) models which have not been considered here.
An improved quasar detection method in EROS-2 and MACHO LMC datasets
Karim Pichara,Pavlos Protopapas,Dae-Won Kim,Jean-Baptiste Marquette,Patrick Tisserand
Physics , 2013, DOI: 10.1111/j.1365-2966.2012.22061.x
Abstract: We present a new classification method for quasar identification in the EROS-2 and MACHO datasets based on a boosted version of Random Forest classifier. We use a set of variability features including parameters of a continuous auto regressive model. We prove that continuous auto regressive parameters are very important discriminators in the classification process. We create two training sets (one for EROS-2 and one for MACHO datasets) using known quasars found in the LMC. Our model's accuracy in both EROS-2 and MACHO training sets is about 90% precision and 86% recall, improving the state of the art models accuracy in quasar detection. We apply the model on the complete, including 28 million objects, EROS-2 and MACHO LMC datasets, finding 1160 and 2551 candidates respectively. To further validate our list of candidates, we crossmatched our list with a previous 663 known strong candidates, getting 74% of matches for MACHO and 40% in EROS-2. The main difference on matching level is because EROS-2 is a slightly shallower survey which translates to significantly lower signal-to-noise ratio lightcurves.
Detecting Variability in Massive Astronomical Time-Series Data II: Variable Candidates in the Northern Sky Variability Survey
Min-Su Shin,Hahn Yi,Dae-Won Kim,Seo-Won Chang,Yong-Ik Byun
Physics , 2011, DOI: 10.1088/0004-6256/143/3/65
Abstract: We present variability analysis of data from the Northern Sky Variability Survey (NSVS). Using the clustering method which defines variable candidates as outliers from large clusters, we cluster 16,189,040 light curves, having data points at more than 15 epochs, as variable and non-variable candidates in 638 NSVS fields. Variable candidates are selected depending on how strongly they are separated from the largest cluster and how rarely they are grouped together in eight dimensional space spanned by variability indices. All NSVS light curves are also cross-correlated to the Infrared Astronomical Satellite, AKARI, Two Micron All Sky Survey, Sloan Digital Sky Survey (SDSS), and Galaxy Evolution Explorer objects as well as known objects in the SIMBAD database. The variability analysis and cross-correlation results are provided in a public online database which can be used to select interesting objects for further investigation. Adopting conservative selection criteria for variable candidates, we find about 1.8 million light curves as possible variable candidates in the NSVS data, corresponding to about 10% of our entire NSVS samples. Multi-wavelength colors help us find specific types of variability among the variable candidates. Moreover, we also use morphological classification from other surveys such as SDSS to suppress spurious cases caused by blending objects or extended sources due to the low angular resolution of the NSVS.
Page 1 /21905
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.