This work is devoted to the prediction of a series of 208 structurally diverse PKCθ inhibitors using the Random Forest (RF) based on the Mold 2 molecular descriptors. The RF model was established and identified as a robust predictor of the experimental pIC 50 values, producing good external R 2 pred of 0.72, a standard error of prediction ( SEP) of 0.45, for an external prediction set of 51 inhibitors which were not used in the development of QSAR models. By using the RF built-in measure of the relative importance of the descriptors, an important predictor—the number of group donor atoms for H-bonds (with N and O)―has been identified to play a crucial role in PKCθ inhibitory activity. We hope that the developed RF model will be helpful in the screening and prediction of novel unknown PKCθ inhibitory activity.
References
[1]
Boschelli, D. Small molecule inhibitors of PKCθ as potential antiinflammatory therapeutics. Curr. Top. Med. Chem?2009, 9, 640–654.
[2]
Salek-Ardakani, S; So, T; Halteman, BS; Altman, A; Croft, M. Protein kinase Cθ controls Th1 cells in experimental autoimmune encephalomyelitis. J. Immunol?2005, 175, 7635–7641.
[3]
Tan, SL; Zhao, J; Bi, C; Chen, XC; Hepburn, DL; Wang, J; Sedgwick, JD; Chintalacharuvu, SR; Na, S. Resistance to experimental autoimmune encephalomyelitis and impaired IL-17 production in protein kinase Cθ-deficient mice. J. Immunol?2006, 176, 2872–2879.
[4]
Healy, AM; Izmailova, E; Fitzgerald, M; Walker, R; Hattersley, M; Silva, M; Siebert, E; Terkelsen, J; Picarella, D; Pickard, MD; LeClair, B; Chandra, S; Jaffee, B. PKC-θ-deficient mice are protected from Th1-dependent antigen-induced arthritis. J. Immunol?2006, 177, 1886–1893.
Berg-Brown, NN; Gronski, MA; Jones, RG; Elford, AR; Deenick, EK; Odermatt, B; Littman, DR; Ohashi, PS. PKCθ signals activation versus tolerance in vivo. J. Exp. Med?2004, 199, 743–752.
[7]
Chaudhary, D; Kasaian, M. PKCθ: A potential therapeutic target for T-cell-mediated diseases. Curr. Opin. Investig. Drugs?2006, 7, 432–437.
[8]
Cole, D; Asselin, M; Brennan, A; Czerwinski, R; Ellingboe, J; Fitz, L; Greco, R; Huang, X; Joseph-McCarthy, D; Kelly, M; Kirisits, M; Lee, J; Li, Y; Morgan, P; Stock, J; Tsao, D; Wissner, A; Yang, X; Chaudhary, D. Identification, characterization and initial hit-to-lead optimization of a series of 4-arylamino-3-pyridinecarbonitrile as protein kinase C theta (PKCθ) inhibitors. J. Med. Chem?2008, 51, 5958–5963.
[9]
Tumey, L; Boschelli, D; Lee, J; Chaudhary, D. 2-Alkenylthieno [2, 3-b] pyridine-5-carbonitriles: Potent and selective inhibitors of PKCθ. Bioorg. Med. Chem. Lett?2008, 18, 4420–4423.
[10]
Tumey, L; Bhagirath, N; Brennan, A; Brooijmans, N; Lee, J; Yang, X; Boschelli, D. 5-Vinyl-3-pyridinecarbonitrile inhibitors of PKCθ: Optimization of enzymatic and functional activity. Bioorg. Med. Chem?2009, 17, 7933–7948.
Shim, J; Eid, C; Lee, J; Liu, E; Chaudhary, D; Boschelli, D. Synthesis and PKCθ inhibitory activity of a series of 5-vinyl phenyl sulfonamide-3-pyridinecarbonitriles. Bioorg. Med. Chem. Lett?2009, 19, 6575–6577.
[18]
Li, Y; Wang, Y; Ding, J; Chang, Y; Zhang, S. In silico prediction of androgenic and nonandrogenic compounds using random forest. QSAR Comb. Sci?2009, 28, 396–405.
[19]
Hong, H; Xie, Q; Ge, W; Qian, F; Fang, H; Shi, L; Su, Z; Perkins, R; Tong, W. Mold2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J. Chem. Inf. Model?2008, 48, 1337–1344.
[20]
Svetnik, V; Liaw, A; Tong, C; Culberson, J; Sheridan, R; Feuston, B. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci?2003, 43, 1947–1958.
[21]
Bakken, G; Jurs, P. Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. J. Med. Chem?2000, 43, 4534–4541.
[22]
Pontes, M; Galvao, R; Araujo, M; Moreira, P; Neto, O; Jose, G; Saldanha, T. The successive projections algorithm for spectral variable selection in classification problems. Chemom. Intell. Lab. Syst?2005, 78, 11–18.
[23]
Pourbasheer, E; Riahi, S; Ganjali, M; Norouzi, P. QSAR study on melanocortin-4 receptors by support vector machine. Eur. J. Med. Chem?2010, 45, 1087–1093.
[24]
Wang, Y; Li, Y; Wang, B. An in silico method for screening nicotine derivatives as cytochrome P450 2A6 selective inhibitors based on kernel partial least squares. Int. J. Mol. Sci?2007, 8, 166–179.
[25]
Breiman, L. Random forests. Mach. Learn?2001, 45, 5–32.
[26]
Polishchuk, P; Muratov, E; Artemenko, A; Kolumbin, O; Muratov, N; Kuz’min, V. Application of random forest approach to QSAR prediction of aquatic toxicity. J. Chem. Inf. Model?2009, 49, 2481–2488.
[27]
Caret: Classification and Regression Training, http://cran.r-project.org/web/packages/caret/index.html (accessed on 06 September 2010).
[28]
RandomForest: Breiman and Cutler’s random forests for classification and regression, http://cran.rproject.org/web/packages/randomForest/index.html (accessed on 06 September 2010).
[29]
Kernlab: Kernel-based Machine Learning Lab, http://cran.r-project.org/web/packages/kernlab/index.html (accessed on 06 September 2010).
[30]
PLS: Partial Least Squares Regression (PLSR) and Principal Component Regression (PCR), http://cran.r-project.org/web/packages/pls/index.html (accessed on 06 September 2010).
[31]
Palmer, D; O’Boyle, N; Glen, R; Mitchell, J. Random forest models to predict aqueous solubility. J. Chem. Inf. Model?2007, 47, 150–158.
[32]
Si, H; Wang, T; Zhang, K; Duan, Y; Yuan, S; Fu, A; Hu, Z. Quantitative structure activity relationship model for predicting the depletion percentage of skin allergic chemical substances of glutathione. Anal. Chim. Acta?2007, 591, 255–264.
[33]
Si, H; Yuan, S; Zhang, K; Fu, A; Duan, Y; Hu, Z. Quantitative structure activity relationship study on EC50 of anti-HIV drugs. Chemom. Intell. Lab. Syst?2008, 90, 15–24.
[34]
Tropsha, A; Gramatica, P; Gombar, V. The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci?2003, 22, 69–77.
[35]
Takaoka, Y; Endo, Y; Yamanobe, S; Kakinuma, H; Okubo, T; Shimazaki, Y; Ota, T; Sumiya, S; Yoshikawa, K. Development of a method for evaluating drug-likeness and ease of synthesis using a data set in which compounds are assigned scores based on chemists’ intuition. J. Chem. Inf. Comput. Sci?2003, 43, 1269–1275.
[36]
Crivori, P; Cruciani, G; Carrupt, P; Testa, B. Predicting blood-brain barrier permeation from three-dimensional molecular structure. J. Med. Chem?2000, 43, 2204–2216.
[37]
Zamora, I; Oprea, T; Cruciani, G; Pastor, M; Ungell, A. Surface descriptors for protein-ligand affinity prediction. J. Med. Chem?2003, 46, 25–33.
[38]
Liu, H; Hu, R; Zhang, R; Yao, X; Liu, M; Hu, Z; Fan, B. The prediction of human oral absorption for diffusion rate-limited drugs based on heuristic method and support vector machine. J. Comput. Aided Mol. Des?2005, 19, 33–46.
[39]
Yao, X; Panaye, A; Doucet, J; Zhang, R; Chen, H; Liu, M; Hu, Z; Fan, B. Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks and multiple linear regression. J. Chem. Inf. Comput. Sci?2004, 44, 1257–1266.
[40]
Pourbasheer, E; Riahi, S; Ganjali, M; Norouzi, P. Application of genetic algorithm-support vector machine (GA-SVM) for prediction of BK-channels activity. Eur. J. Med. Chem?2009, 44, 5023–5028.
[41]
Wang, Y; Li, Y; Yang, S; Yang, L. An in silico approach for screening flavonoids as p-glycoprotein inhibitors based on a bayesian-regularized neural network. J. Comput. Aided Mol. Des?2005, 19, 137–147.
[42]
Golbraikh, A; Shen, M; Xiao, Z; Xiao, Y; Lee, K; Tropsha, A. Rational selection of training and test sets for the development of validated QSAR models. J. Comput. Aided Mol. Des?2003, 17, 241–253.
[43]
Golbraikh, A; Tropsha, A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput. Aided Mol. Des?2002, 16, 357–369.
[44]
Uddin, R; Yuan, H; Petukhov, P; Choudhary, M; Madura, J. Receptor-based modeling and 3D-QSAR for a quantitative production of the butyrylcholinesterase inhibitors based on genetic algorithm. J. Chem. Inf. Model?2008, 48, 1092–1103.
[45]
Roy, K; Leonard, J. QSAR analyses of 3-(4-benzylpiperidin-1-yl)-N-phenylpropylamine derivatives as potent CCR5 antagonists. J. Chem. Inf. Model?2005, 45, 1352–1368.
[46]
Egan, WJ; Morgan, SL. Outlier detection in multivariate analytical chemical data. Anal. Chem?1998, 70, 2372–2379.
[47]
Burden, F. Molecular identification number for substructure searches. J. Chem. Inf. Comput. Sci?1989, 29, 225–227.
[48]
Burden, F; Polley, M; Winkler, D. Toward novel universal descriptors: Charge fingerprints. J. Chem. Inf. Model?2009, 49, 710–715.
[49]
Zhu, Y; Lei, M; Lu, A; Zhao, X; Yin, X; Gao, Q. 3D-QSAR studies of boron-containing dipeptides as proteasome inhibitors with CoMFA and CoMSIA methods. Eur. J. Med. Chem?2009, 44, 1486–1499.
[50]
Song, M; Breneman, CM; Sukumar, N. Three-dimensional quantitative structure-activity relationship analyses of piperidine-based CCR5 receptor antagonists. Bioorg. Med. Chem?2004, 12, 489–499.
[51]
Leonard, J; Roy, K. On selection of training and test sets for the development of predictive QSAR models. QSAR Comb. Sci?2006, 25, 235–251.
[52]
. version 2.3; MDL Information Systems, Inc: San Diego, CA, USA, 2010.
[53]
Geladi, P; Kowalski, B. Partial least-squares regression: A tutorial. Anal. Chim. Acta?1986, 185, 1–17.
[54]
Cortes, C; Vapnik, V. Support-vector networks. Mach. Learn?1995, 20, 273–297.
[55]
Wold, S. Cross-validatory estimation of the number of components in factor and principal components models. Technometrics?1978, 20, 397–405.
[56]
Karchin, R; Karplus, K; Haussler, D. Classifying G-protein coupled receptors with support vector machines. Bioinformatics?2002, 18, 147.
[57]
Cai, Y; Liu, X; Xu, X; Chou, K. Support vector machines for predicting HIV protease cleavage sites in protein. J. Comput. Chem?2002, 23, 267–274.
[58]
Tay, F; Cao, L. Modified support vector machines in financial time series forecasting. Neurocomputing?2003, 48, 847–862.
[59]
Brown, RD; Martin, YC. Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci?1996, 36, 572–584.
[60]
Afantitis, A; Melagraki, G; Sarimveis, H; Koutentis, P; Igglessi-Markopoulou, O; Kollias, G. A combined LS-SVM & MLR QSAR workflow for predicting the inhibition of CXCR3 receptor by quinazolinone analogs. Mol. Divers?2010, 14, 1–11.