Computational identification of ubiquitylation sites from protein sequences
Chun-Wei Tung, Shinn-Ying Ho
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-310
Abstract: We established an ubiquitylation dataset consisting of 157 ubiquitylation sites and 3676 putative non-ubiquitylation sites extracted from 105 proteins in the UbiProt database. This study first evaluates promising sequence-based features and classifiers for the prediction of ubiquitylation sites by assessing three kinds of features (amino acid identity, evolutionary information, and physicochemical property) and three classifiers (support vector machine, k-nearest neighbor, and Na?veBayes). Results show that the set of used 531 physicochemical properties and support vector machine (SVM) are the best kind of features and classifier respectively that their combination has a prediction accuracy of 72.19% using leave-one-out cross-validation.Consequently, an informative physicochemical property mining algorithm (IPMA) is proposed to select an informative subset of 531 physicochemical properties. A prediction system UbiPred was implemented by using an SVM with the feature set of 31 informative physicochemical properties selected by IPMA, which can improve the accuracy from 72.19% to 84.44%. To further analyze the informative physicochemical properties, a decision tree method C5.0 was used to acquire if-then rule-based knowledge of predicting ubiquitylation sites. UbiPred can screen promising ubiquitylation sites from putative non-ubiquitylation sites using prediction scores. By applying UbiPred, 23 promising ubiquitylation sites were identified from an independent dataset of 3424 putative non-ubiquitylation sites, which were also validated by using the obtained prediction rules.We have proposed an algorithm IPMA for mining informative physicochemical properties from protein sequences to build an SVM-based prediction system UbiPred. UbiPred can predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification. UbiPred has been implemented as a web server and is available at http://iclab.lif
Prediction and Analysis of Antibody Amyloidogenesis from Sequences
Chyn Liaw, Chun-Wei Tung, Shinn-Ying Ho
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0053235
Abstract: Antibody amyloidogenesis is the aggregation of soluble proteins into amyloid fibrils that is one of major causes of the failures of humanized antibodies. The prediction and prevention of antibody amyloidogenesis are helpful for restoring and enhancing therapeutic effects. Due to a large number of possible germlines, the existing method is not practical to predict sequences of novel germlines, which establishes individual models for each known germline. This study proposes a first automatic and across-germline prediction method (named AbAmyloid) capable of predicting antibody amyloidogenesis from sequences. Since the amyloidogenesis is determined by a whole sequence of an antibody rather than germline-dependent properties such as mutated residues, this study assess three types of germline-independent sequence features (amino acid composition, dipeptide composition and physicochemical properties). AbAmyloid using a Random Forests classifier with dipeptide composition performs well on a data set of 12 germlines. The within- and across-germline prediction accuracies are 83.10% and 83.33% using Jackknife tests, respectively, and the novel-germline prediction accuracy using a leave-one-germline-out test is 72.22%. A thorough analysis of sequence features is conducted to identify informative properties for further providing insights to antibody amyloidogenesis. Some identified informative physicochemical properties are amphiphilicity, hydrophobicity, reverse turn, helical structure, isoelectric point, net charge, mutability, coil, turn, linker, nuclear protein, etc. Additionally, the numbers of ubiquitylation sites in amyloidogenic and non-amyloidogenic antibodies are found to be significantly different. It reveals that antibodies less likely to be ubiquitylated tend to be amyloidogenic. The method AbAmyloid capable of automatically predicting antibody amyloidogenesis of novel germlines is implemented as a publicly available web server at http://iclab.life.nctu.edu.tw/abamyloid.
POPISK: T-cell reactivity prediction using support vector machines and string kernels
Chun-Wei Tung, Matthias Ziehm, Andreas K?mper, Oliver Kohlbacher, Shinn-Ying Ho
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-446
Abstract: This work establishes a large dataset by collecting immunogenicity data from three major immunology databases. In order to consider the effect of MHC restriction, peptides are classified by their associated MHC alleles. Subsequently, a computational method (named POPISK) using support vector machine with a weighted degree string kernel is proposed to predict T-cell reactivity and identify important recognition positions. POPISK yields a mean 10-fold cross-validation accuracy of 68% in predicting T-cell reactivity of HLA-A2-binding peptides. POPISK is capable of predicting immunogenicity with scores that can also correctly predict the change in T-cell reactivity related to point mutations in epitopes reported in previous studies using crystal structures. Thorough analyses of the prediction results identify the important positions 4, 6, 8 and 9, and yield insights into the molecular basis for TCR recognition. Finally, we relate this finding to physicochemical properties and structural features of the MHC-peptide-TCR interaction.A computational method POPISK is proposed to predict immunogenicity with scores which are useful for predicting immunogenicity changes made by single-residue modifications. The web server of POPISK is freely available at http://iclab.life.nctu.edu.tw/POPISK webcite.Immunogenicity is the ability to induce an immune response. For the major histocompatibility complex (MHC) class I-mediated immune response, this immune activation entails a successful processing of the antigen, its presentation by an MHC class I molecule and finally its recognition by a T-cell receptor (Figure 1). The predictions of antigen processing and MHC-peptide binding are well-studied problems in immunoinformatics. The prediction of T-cell reactivity, in contrast, is less well studied and much more difficult.For computer-aided vaccine designs [1-3], the prediction of the immunogenicity is an important step. Computational methods for immunogenicity prediction accelerate the de
ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization
Wen-Lin Huang, Chun-Wei Tung, Shih-Wen Ho, Shiow-Fen Hwang, Shinn-Ying Ho
BMC Bioinformatics , 2008, DOI: 10.1186/1471-2105-9-80
Abstract: This study proposes an efficient sequence-based method (named ProLoc-GO) by mining informative GO terms for predicting protein subcellular localization. For each protein, BLAST is used to obtain a homology with a known accession number to the protein for retrieving the GO annotation. A large number n of all annotated GO terms that have ever appeared are then obtained from a large set of training proteins. A novel genetic algorithm based method (named GOmining) combined with a classifier of support vector machine (SVM) is proposed to simultaneously identify a small number m out of the n GO terms as input features to SVM, where m <
NeurphologyJ: An automatic neuronal morphology quantification method and its application in pharmacological discovery
Shinn-Ying Ho, Chih-Yuan Chao, Hui-Ling Huang, Tzai-Wen Chiu, Phasit Charoenkwan, Eric Hwang
BMC Bioinformatics , 2011, DOI: 10.1186/1471-2105-12-230
Abstract: This study proposes an effective quantification method, called NeurphologyJ, capable of automatically quantifying neuronal morphologies such as soma number and size, neurite length, and neurite branching complexity (which is highly related to the numbers of attachment points and ending points). NeurphologyJ is implemented as a plugin to ImageJ, an open-source Java-based image processing and analysis platform. The high performance of NeurphologyJ arises mainly from an elegant image enhancement method. Consequently, some morphology operations of image processing can be efficiently applied. We evaluated NeurphologyJ by comparing it with both the computer-aided manual tracing method NeuronJ and an existing ImageJ-based plugin method NeuriteTracer. Our results reveal that NeurphologyJ is comparable to NeuronJ, that the coefficient correlation between the estimated neurite lengths is as high as 0.992. NeurphologyJ can accurately measure neurite length, soma number, neurite attachment points, and neurite ending points from a single image. Furthermore, the quantification result of nocodazole perturbation is consistent with its known inhibitory effect on neurite outgrowth. We were also able to calculate the IC50 of nocodazole using NeurphologyJ. This reveals that NeurphologyJ is effective enough to be utilized in applications of pharmacological discoveries.This study proposes an automatic and fast neuronal quantification method NeurphologyJ. The ImageJ plugin with supports of batch processing is easily customized for dealing with high-content screening applications. The source codes of NeurphologyJ (interactive and high-throughput versions) and the images used for testing are freely available (see Availability).Recent advancements in automated fluorescence microscopy have made high-content screening an essential technique for discovering novel molecular pathways in diseases [1] or potential new therapeutic treatments [2,3]. However, high-content screenings on biological or p
SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs
Phasit Charoenkwan, Watshara Shoombuatong, Hua-Chin Lee, Jeerayut Chaijaruwanich, Hui-Ling Huang, Shinn-Ying Ho
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0072368
Abstract: Existing methods for predicting protein crystallization obtain high accuracy using various types of complemented features and complex ensemble classifiers, such as support vector machine (SVM) and Random Forest classifiers. It is desirable to develop a simple and easily interpretable prediction method with informative sequence features to provide insights into protein crystallization. This study proposes an ensemble method, SCMCRYS, to predict protein crystallization, for which each classifier is built by using a scoring card method (SCM) with estimating propensity scores of p-collocated amino acid (AA) pairs (p = 0 for a dipeptide). The SCM classifier determines the crystallization of a sequence according to a weighted-sum score. The weights are the composition of the p-collocated AA pairs, and the propensity scores of these AA pairs are estimated using a statistic with optimization approach. SCMCRYS predicts the crystallization using a simple voting method from a number of SCM classifiers. The experimental results show that the single SCM classifier utilizing dipeptide composition with accuracy of 73.90% is comparable to the best previously-developed SVM-based classifier, SVM_POLY (74.6%), and our proposed SVM-based classifier utilizing the same dipeptide composition (77.55%). The SCMCRYS method with accuracy of 76.1% is comparable to the state-of-the-art ensemble methods PPCpred (76.8%) and RFCRYS (80.0%), which used the SVM and Random Forest classifiers, respectively. This study also investigates mutagenesis analysis based on SCM and the result reveals the hypothesis that the mutagenesis of surface residues Ala and Cys has large and small probabilities of enhancing protein crystallizability considering the estimated scores of crystallizability and solubility, melting point, molecular weight and conformational entropy of amino acids in a generalized condition. The propensity scores of amino acids and dipeptides for estimating the protein crystallizability can aid biologists in designing mutation of surface residues to enhance protein crystallizability. The source code of SCMCRYS is available at http://iclab.life.nctu.edu.tw/SCMCRYS/.
Protein-Protein Interaction Site Predictions with Three-Dimensional Probability Distributions of Interacting Atoms on Protein Surfaces
Ching-Tai Chen, Hung-Pin Peng, Jhih-Wei Jian, Keng-Chang Tsai, Jeng-Yih Chang, Ei-Wen Yang, Jun-Bo Chen, Shinn-Ying Ho, Wen-Lian Hsu, An-Suei Yang
PLOS ONE , 2012, DOI: 10.1371/journal.pone.0037706
Abstract: Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.
Increased Risk of Major Depression in the Three Years following a Femoral Neck Fracture–A National Population-Based Follow-Up Study
Chih-Yu Chang, Wen-Liang Chen, Yi-Fan Liou, Chih-Chi Ke, Hua-Chin Lee, Hui-Ling Huang, Li-Ping Ciou, Chu-Chung Chou, Mei-Chueh Yang, Shinn-Ying Ho, Yan-Ren Lin
PLOS ONE , 2014, DOI: 10.1371/journal.pone.0089867
Abstract: Femoral neck fracture is common in the elderly, and its impact has increased in aging societies. Comorbidities, poor levels of activity and pain may contribute to the development of depression, but these factors have not been well addressed. This study aims to investigate the frequency and risk of major depression after a femoral neck fracture using a nationwide population-based study. The Taiwan Longitudinal Health Insurance Database was used in this study. A total of 4,547 patients who were hospitalized for femoral neck fracture within 2003 to 2007 were recruited as a study group; 13,641 matched non-fracture participants were enrolled as a comparison group. Each patient was prospectively followed for 3 years to monitor the occurrence of major depression. Cox proportional-hazards models were used to compute the risk of major depression between members of the study and comparison group after adjusting for residence and socio-demographic characteristics. The most common physical comorbidities that were present after the fracture were also analyzed. The incidences of major depression were 1.2% (n = 55) and 0.7% (n = 95) in the study and comparison groups, respectively. The stratified Cox proportional analysis showed a covariate-adjusted hazard ratio of major depression among patients with femoral neck fracture that was 1.82 times greater (95% CI, 1.30–2.53) than that of the comparison group. Most major depressive episodes (34.5%) presented within the first 200 days following the fracture. In conclusion, patients with a femoral neck fracture are at an increased risk of subsequent major depression. Most importantly, major depressive episodes mainly occurred within the first 200 days following the fracture.
Role of the Peripheral Sympathetic Innervations in Controlling Cerebral Blood Flow after the Transection of Bilateral Superior Cervical Sympathetic Ganglia Two Weeks Later  [PDF]
Cheng-Ta Hsieh, Shinn-Zong Lin, Ming-Ying Liu
Surgical Science (SS) , 2011, DOI: 10.4236/ss.2011.24041
Abstract: Background: Cerebral blood vessels are mainly supplied by sympathetic nerves arising from the superior cervical ganglia and cerebral blood volume may be influenced by bilateral superior cervical ganglionectomy (SCG). Various stages of cerebral blood volume changes depended on the time following bilateral excision of SCG. In this study, we emphasize the subacute effect (two weeks) on the local cerebral blood flow (LCBF). Material and Methods: Sprague-Dawley rats weighing 250 ~ 400 gm (n = 20) were selected into two groups. Under the ambient temperature 20oC, the first group animals (n = 10) received sham operation and the other group animals (n = 10) underwent bilateral SCG. The LCBF and O2 delivery of 14 brain struc-tures were measured for each animal by the use of 14C-iodoantipyrine technique two weeks after the opera-tion. Results: The average of LCBF was decreased from 150 ml/100 gm/min to 129 ml/100 gm/min after bi-lateral SCG. Only the LCBF at basal ganglia was increased from 108 ml/min/100 g in the sham-operated group to 118 ml/min/100g in the SCG group. A mean of 14% reduction of LCBF was estimated. In 14 brain structures, the delivery amount of O2 was all decreased, except in basal ganglia. However, these changes of LCBF and the delivery amount of O2 at these 14 brain structures did not reach the significant differences. Conclusions: The present results show that chronic effect (two weeks) of bilateral SCG on LCBF was not only in a decrease of the LCBF, but also a decrease of local cerebral O2 delivery. However, the changes didn’t show the significant differences.
Ultraviolet Radiative Transfer Modeling of Nearby Galaxies with Extraplanar Dusts
Jong-Ho Shinn,Kwang-Il Seon
Physics , 2015,
Abstract: In order to examine their relation to the host galaxy, the extraplanar dust of six nearby galaxies are modeled, employing a three dimensional Monte Carlo radiative transfer code. The targets are from the highly-inclined galaxies that show dust-scattered ultraviolet halos, and the archival Galaxy Evolution Explorer FUV band images were fitted with the model. The observed images are in general well reproduced by two dust layers and one light-source layer, whose vertical and radial distributions have exponential profiles. We obtained several important physical parameters, such as star formation rate (SFR_UV), face-on optical depth, and scale-heights. Three galaxies (NGC 891, NGC 3628, and UGC 11794) show clear evidence for the existence of extraplanar dust layer. However, it is found that the rest three targets (IC 5249, NGC 24, and NGC 4173) do not necessarily need a thick dust disk to model the ultraviolet (UV) halo, because its contribution is too small and the UV halo may be caused by the wing part of the GALEX point spread function. This indicates that the galaxy samples reported to have UV halos may be contaminated by galaxies with negligible extraplanar (halo) dust. The galaxies showing evidence of the extraplanar dust layer fall within a narrow range on the scatter plots between physical parameters such as SFR_UV and extraplanar dust mass. Several mechanisms possible to produce the extraplanar dust are discussed. We also found a hint that the extraplanar dust scale-height might not be much different from the polycyclic aromatic hydrocarbon emission characteristic height.
