oalib

Publish in OALib Journal

ISSN: 2333-9721

APC: Only $99

Submit

Search Results: 1 - 6 of 6 matches for " Kyungsook An "
All listed articles are free for downloading (OA Articles)
Page 1 /6
Display every page Item
Prediction of RNA-binding amino acids from protein and RNA sequences
Choi Sungwook,Han Kyungsook
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-s13-s7
Abstract: Background Many learning approaches to predicting RNA-binding residues in a protein sequence construct a non-redundant training dataset based on the sequence similarity. The sequence similarity-based method either takes a whole sequence or discards it for a training dataset. However, similar sequences or even identical sequences can have different interaction sites depending on their interaction partners, and this information is lost when the sequences are removed. Furthermore, a training dataset constructed by the sequence similarity-based method may contain redundant data when the remaining sequence contains similar subsequences within the sequence. In addition to the problem with the training dataset, most approaches do not consider the interacting partner (i.e., RNA) of a protein when they predict RNA-binding amino acids. Thus, they always predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNA molecules. Results We developed a feature vector-based method that removes data redundancy for a non-redundant training dataset. The feature vector-based method constructed a larger training dataset than the standard sequence similarity-based method, yet the dataset contained no redundant data. We identified effective features of protein and RNA (the interaction propensity of amino acid triplets, global features of the protein sequence, and RNA feature) for predicting RNA-binding residues. Using the method and features, we built a support vector machine (SVM) model that predicted RNA-binding residues in a protein sequence. Our SVM model showed an accuracy of 84.2%, an F-measure of 76.1%, and a correlation coefficient of 0.41 with 5-fold cross validation on a non-redundant dataset from 3,149 protein-RNA interacting pairs. In an independent test dataset that does not include the 3,149 pairs and were not used in training the SVM model, it achieved an accuracy of 90.3%, an F-measure of 72.8%, and a correlation coefficient of 0.24. Comparison with other methods on the same datasets demonstrated that our model was better than the others. Conclusions The feature vector-based redundancy reduction method is powerful for constructing a non-redundant training dataset for a learning model since it generates a larger dataset with non-redundant data than the standard sequence similarity-based method. Including the features of both RNA and protein sequences in a feature vector results in better performance than using the protein features only when predicting the RNA-binding residues in a protein sequence.
An Algorithm for Finding Functional Modules and Protein Complexes in Protein-Protein Interaction Networks
Guangyu Cui,Yu Chen,De-Shuang Huang,Kyungsook Han
Journal of Biomedicine and Biotechnology , 2008, DOI: 10.1155/2008/860270
Abstract: Biological processes are often performed by a group of proteins rather than by individual proteins, and proteins in a same biological group form a densely connected subgraph in a protein-protein interaction network. Therefore, finding a densely connected subgraph provides useful information to predict the function or protein complex of uncharacterized proteins in the highly connected subgraph. We have developed an efficient algorithm and program for finding cliques and near-cliques in a protein-protein interaction network. Analysis of the interaction network of yeast proteins using the algorithm demonstrates that 59% of the near-cliques identified by our algorithm have at least one function shared by all the proteins within a near-clique, and that 56% of the near-cliques show a good agreement with the experimentally determined protein complexes catalogued in MIPS.
A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network
Zhu-Hong You, Zheng Yin, Kyungsook Han, De-Shuang Huang, Xiaobo Zhou
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-343
Abstract: In this work, we introduce a computational systems biology approach for the accurate prediction of pairwise synthetic genetic interactions (SGI). First, a high-coverage and high-precision functional gene network (FGN) is constructed by integrating protein-protein interaction (PPI), protein complex and gene expression data; then, a graph-based semi-supervised learning (SSL) classifier is utilized to identify SGI, where the topological properties of protein pairs in weighted FGN is used as input features of the classifier. We compare the proposed SSL method with the state-of-the-art supervised classifier, the support vector machines (SVM), on a benchmark dataset in S. cerevisiae to validate our method's ability to distinguish synthetic genetic interactions from non-interaction gene pairs. Experimental results show that the proposed method can accurately predict genetic interactions in S. cerevisiae (with a sensitivity of 92% and specificity of 91%). Noticeably, the SSL method is more efficient than SVM, especially for very small training sets and large test sets.We developed a graph-based SSL classifier for predicting the SGI. The classifier employs topological properties of weighted FGN as input features and simultaneously employs information induced from labelled and unlabelled data. Our analysis indicates that the topological properties of weighted FGN can be employed to accurately predict SGI. Also, the graph-based SSL method outperforms the traditional standard supervised approach, especially when used with small training sets. The proposed method can alleviate experimental burden of exhaustive test and provide a useful guide for the biologist in narrowing down the candidate gene pairs with SGI. The data and source code implementing the method are available from the website: http://home.ustc.edu.cn/~yzh33108/GeneticInterPred.htm webciteGenetic interaction analysis, in which two mutations have a combined effect not exhibited by either mutation alone, can reveal fu
A Comprehensive GALEX Ultraviolet Catalog of Star Clusters in M31 and a Study of the Young Clusters
Yongbeom Kang,Soo-Chang Rey,Luciana Bianchi,Kyungsook Lee,YoungKwang Kim,Sangmo Tony Sohn
Physics , 2011, DOI: 10.1088/0067-0049/199/2/37
Abstract: We present a comprehensive catalog of 700 confirmed star clusters in the field of M31 compiled from three major existing catalogs. We detect 418 and 257 star clusters in Galaxy Evolution Explorer (GALEX) near-ultraviolet (NUV) and far-ultraviolet (FUV) imaging, respectively. Our final catalog includes photometry of star clusters in up to 16 passbands ranging from FUV to NIR as well as ancillary information such as reddening, metallicity, and radial velocities. In particular, this is the most extensive and updated catalog of UV integrated photometry for M31 star clusters. Ages and masses of star clusters are derived by fitting the multi-band photometry with model spectral energy distribution (SED); UV photometry enables more accurate age estimation of young clusters. Our catalog includes 182 young clusters with ages less than 1 Gyr. Our estimated ages and masses of young clusters are in good agreement with previously determined values in the literature. The mean age and mass of young clusters are about 300 Myr and 10^4 M_sun, respectively. We found that the compiled [Fe/H] values of young clusters included in our catalog are systematically lower (by more than 1 dex) than those from recent high-quality spectroscopic data and our SED fitting result. We confirm that most of the young clusters kinematics show systematic rotation around the minor axis and association with the thin disk of M31. The young clusters distribution exhibits a distinct peak in the M31 disk around 10 - 12 kpc from the center and follow a spatial distributions similar to other tracers of disk structure such as OB stars, UV star-forming regions, and dust. Some young clusters also show concentration around the ring splitting regions found in the southern part of the M31 disk and most of them have systematically younger (< 100 Myr) ages.
Identification and Functional Analysis of Light-Responsive Unique Genes and Gene Family Members in Rice
Ki-Hong Jung,Jinwon Lee,Chris Dardick,Young-Su Seo,Peijian Cao,Patrick Canlas,Jirapa Phetsom,Xia Xu,Shu Ouyang,Kyungsook An,Yun-Ja Cho,Geun-Cheol Lee,Yoosook Lee,Gynheung An,Pamela C. Ronald
PLOS Genetics , 2008, DOI: 10.1371/journal.pgen.1000164
Abstract: Functional redundancy limits detailed analysis of genes in many organisms. Here, we report a method to efficiently overcome this obstacle by combining gene expression data with analysis of gene-indexed mutants. Using a rice NSF45K oligo-microarray to compare 2-week-old light- and dark-grown rice leaf tissue, we identified 365 genes that showed significant 8-fold or greater induction in the light relative to dark conditions. We then screened collections of rice T-DNA insertional mutants to identify rice lines with mutations in the strongly light-induced genes. From this analysis, we identified 74 different lines comprising two independent mutant lines for each of 37 light-induced genes. This list was further refined by mining gene expression data to exclude genes that had potential functional redundancy due to co-expressed family members (12 genes) and genes that had inconsistent light responses across other publicly available microarray datasets (five genes). We next characterized the phenotypes of rice lines carrying mutations in ten of the remaining candidate genes and then carried out co-expression analysis associated with these genes. This analysis effectively provided candidate functions for two genes of previously unknown function and for one gene not directly linked to the tested biochemical pathways. These data demonstrate the efficiency of combining gene family-based expression profiles with analyses of insertional mutants to identify novel genes and their functions, even among members of multi-gene families.
Refinement of Light-Responsive Transcript Lists Using Rice Oligonucleotide Arrays: Evaluation of Gene-Redundancy
Ki-Hong Jung, Christopher Dardick, Laura E. Bartley, Peijian Cao, Jirapa Phetsom, Patrick Canlas, Young-Su Seo, Michael Shultz, Shu Ouyang, Qiaoping Yuan, Bryan C. Frank, Eugene Ly, Li Zheng, Yi Jia, An-Ping Hsia, Kyungsook An, Hui-Hsien Chou, David Rocke, Geun Cheol Lee, Patrick S. Schnable, Gynheung An, C. Robin Buell, Pamela C. Ronald
PLOS ONE , 2008, DOI: 10.1371/journal.pone.0003337
Abstract: Studies of gene function are often hampered by gene-redundancy, especially in organisms with large genomes such as rice (Oryza sativa). We present an approach for using transcriptomics data to focus functional studies and address redundancy. To this end, we have constructed and validated an inexpensive and publicly available rice oligonucleotide near-whole genome array, called the rice NSF45K array. We generated expression profiles for light- vs. dark-grown rice leaf tissue and validated the biological significance of the data by analyzing sources of variation and confirming expression trends with reverse transcription polymerase chain reaction. We examined trends in the data by evaluating enrichment of gene ontology terms at multiple false discovery rate thresholds. To compare data generated with the NSF45K array with published results, we developed publicly available, web-based tools (www.ricearray.org). The Oligo and EST Anatomy Viewer enables visualization of EST-based expression profiling data for all genes on the array. The Rice Multi-platform Microarray Search Tool facilitates comparison of gene expression profiles across multiple rice microarray platforms. Finally, we incorporated gene expression and biochemical pathway data to reduce the number of candidate gene products putatively participating in the eight steps of the photorespiration pathway from 52 to 10, based on expression levels of putatively functionally redundant genes. We confirmed the efficacy of this method to cope with redundancy by correctly predicting participation in photorespiration of a gene with five paralogs. Applying these methods will accelerate rice functional genomics.
Page 1 /6
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.