[1] | Grünwald P (2000) Model selection based on Minimum Description Length. J Math Psychol 44 (1): 133–152. doi: 10.1006/jmps.1999.1280
|
[2] | Grünwald P (2007) The Minimum Description Length principle. MIT Press. 703 p.
|
[3] | Grunwald P, Myung IJ, Pitt MA, eds. (2005) Advances in Minimum Description Length: theory and applications. MIT Press. 452 p.
|
[4] | Kearns M, Mansour Y, Ng AY, Ron D (1997) An experimental and theoretical comparison of model selection methods. Mach Learn 27 (1): 7–50. doi: 10.1145/225298.225301
|
[5] | Myung IJ (2000) The importance of complexity in model selection. J Math Psychol 44 (1): 190–204. doi: 10.1006/jmps.1999.1283
|
[6] | Van Allen T, Greiner R (2000) Model selection criteria for learning belief nets: an empirical comparison. Proc Int Conf Mach Learn 17: 1047–1054.
|
[7] | Zucchini W (2000) An introduction to model selection. J Math Psychol 44 (1): 41–61. doi: 10.1006/jmps.1999.1276
|
[8] | Bozdogan H (2000) Akaike’s information criterion and recent developments in information complexity. J Math Psychol 44 (1): 62–91. doi: 10.1006/jmps.1999.1277
|
[9] | Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. San Mateo, California: Morgan Kaufmann. 552 p.
|
[10] | Wasserman L (2000) Bayesian model selection and model averaging. J Math Psychol 44 (1): 92–107. doi: 10.1006/jmps.1999.1278
|
[11] | Cooper GF (1999) An overview of the representation and discovery of causal relationships using Bayesian networks. In: Glymour C and Cooper GF, editors. Computation, causation & discovery. AAAI Press/MIT Press. 3–62.
|
[12] | Geiger D, Heckerman D, Meek C (1998) Asymptotic model selection for directed networks with hidden variables. Learning in graphical models, NATO ASI series 89: 461–477. doi: 10.1007/978-94-011-5014-9_16
|
[13] | Heckerman D (1998) A tutorial on learning with Bayesian networks. Learning in graphical models, NATO ASI series 89: 301–354. doi: 10.1007/978-94-011-5014-9_11
|
[14] | Friedman JH (1997) On bias, variance, 0/1-loss, and the curse of dimensionality. Data Min Knowl Discov 1 (1): 55–77.
|
[15] | Geman S, Bienenstock E, Doursat R (1992) Neural Networks and the bias/variance dilemma. Neural Comput 4 (1): 1–58. doi: 10.1162/neco.1992.4.1.1
|
[16] | Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. New York: Springer. 533 p.
|
[17] | Bouckaert RR (1993) Probabilistic network construction using the Minimum Description Length principle. In: Clarke M, Kruse R and Moral S, editors. Symbolic and quantitative approaches to reasoning and uncertainty. Springer-Verlag. 41–48.
|
[18] | Lam W, Bacchus F (1994) Learning Bayesian belief networks: an approach based on the MDL principle. Comput Intell 10 (3): 269–293. doi: 10.1111/j.1467-8640.1994.tb00166.x
|
[19] | Suzuki J (1996) Learning Bayesian belief networks based on the Minimum Description Length principle: an efficient algorithm using the B & B technique. Proc Int Conf Mach Learn 13: 462–470.
|
[20] | Suzuki J (1999) Learning Bayesian belief networks based on the Minimum Description Length Principle: basic properties. IEICE transactions on fundamentals of electronics, communications and computer science E82-A (10): 2237–2245.
|
[21] | Robinson RW (1977) Counting unlabeled acyclic digraphs. Combinatorial mathematics V, Lecture notes in mathematics 622: 28–43. doi: 10.1007/bfb0069178
|
[22] | Chickering DM, Heckerman D, Meek C (1997) A Bayesian approach to learning Bayesian networks with local structure. Uncertain Artif Intell 13: 80–89.
|
[23] | Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–347. doi: 10.1007/bf00994110
|
[24] | Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29 (2–3): 131–163. doi: 10.1023/a:1007465528199
|
[25] | Glymour C, Cooper GF, eds. (1999) Computation, causation & discovery. AAAI Press/MIT Press. 552 p.
|
[26] | Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20 (3): 197–243. doi: 10.1007/bf00994016
|
[27] | Jordan MI, ed. (1998) Learning in graphical models. Dordecht, The Netherlands: Kluwer Academic Publishers. 634 p.
|
[28] | Neapolitan RE (1990) Probabilistic reasoning in expert systems: theory and algorithms. New York: John Wiley & Sons, Inc. 433 p.
|
[29] | Pearl J (2000) Causality: models, reasoning and inference. New York: Cambridge University Press. 384 p.
|
[30] | Spirtes P, Glymour C, Scheines R (1993) Causation, prediction and search. New York: Springer-Verlag. 526 p.
|
[31] | Spirtes P, Meek C (1995) Learning Bayesian networks with discrete variables from data. KDD 1: 294–299.
|
[32] | Whittaker J (1990) Graphical Models in Applied Mathematical Multivariate Statistics. New York: John Wiley & Sons. 448 p.
|
[33] | Friedman N, Goldszmidt M (1998) Learning Bayesian networks from data. Proc Conf AAAI Artif Intell 15.
|
[34] | Buntine W (1996) A guide to the literature on learning probabilistic networks from data. IEEE Trans Knowl Data Eng 8 (2): 195–210. doi: 10.1109/69.494161
|
[35] | Diez FJ, Mira J, Iturralde E, Zubillaga S (1997) DIAVAL, a Bayesian expert system for echocardiography. Artif Intell Med 10 (1): 59–73. doi: 10.1016/s0933-3657(97)00384-9
|
[36] | Chickering DM (1996) Learning Bayesian Networks is NP-Complete. Learning from data, Lecture notes in statistics 112: 121–130. doi: 10.1007/978-1-4612-2404-4_12
|
[37] | Russell S, Norvig P (2002) Artificial intelligence: a modern approach. Prentice Hall. 1179 p.
|
[38] | Grossman D, Domingos P (2004) Learning Bayesian network classifiers by maximizing conditional likelihood. Proc Int Conf Mach Learn 21: 46. doi: 10.1145/1015330.1015339
|
[39] | Kelner R, Lerner B (2012) Learning Bayesian network classifiers by risk minimization. Int J Approx Reason 53: 248–272. doi: 10.1016/j.ijar.2011.10.006
|
[40] | Acid S, de Campos LM, Fernández M (2013) Score-based methods for learning Markov boundaries by searching in constrained spaces. Data Min Knowl Discov 26 (1): 174–212. doi: 10.1007/s10618-011-0247-5
|
[41] | Chow CK, Liu CN (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14 (3): 462–467. doi: 10.1109/tit.1968.1054142
|
[42] | Friedman N, Goldszmidt M (1996) Discretizing continuous attributes while learning Bayesian networks. Proc Int Conf Mach Learn 13: 157–165.
|
[43] | Cheng J, Greiner R (1999) Comparing Bayesian network classifiers. Uncertain Artif Intell 15: 101–108.
|
[44] | Kontkanen P, Silander T, Myllymaki P, Tirri H (1999) On supervised selection of Bayesian networks. Uncertain Artif Intell 15: 334–342.
|
[45] | Kontkanen P, Myllymaki P, Silander T, Tirri H, Grünwald P (2000) On predictive distributions and Bayesian networks. Stat Comput 10 (1): 39–54. doi: 10.1023/a:1008984400380
|
[46] | Kleiner A, Sharp B (2000) A new algorithm for learning Bayesian classifiers from data. Artificial Intelligence and Soft Computing: 191–197.
|
[47] | Clarke EJ, Barton BA (2000) Entropy and MDL discretization of continuous variables for Bayesian belief networks. International Journal of Intelligent Systems 15 (1): 61–92. doi: 10.1002/(sici)1098-111x(200001)15:1<61::aid-int4>3.0.co;2-o
|
[48] | Madden MG (2002) A new Bayesian network structure for classification tasks. Artificial intelligence and cognitive science, Lecture notes in computer science 2464: 203–208. doi: 10.1007/3-540-45750-x_27
|
[49] | Wong ML, Lee SY, Leung KS (2002) A hybrid data mining approach to discover Bayesian networks using evolutionary programming. GECCO 2: 214–222. doi: 10.1109/icdm.2002.1183994
|
[50] | Wong ML, Lee SY, Leung KS (2002) A hybrid approach to discover Bayesian networks from databases using evolutionary programming. Proc IEEE Int Conf Data Min 2002: 498–505. doi: 10.1109/icdm.2002.1183994
|
[51] | Madden M (2003) The performance of Bayesian network classifiers constructed using different techniques. Proceedings of European conference on machine learning, workshop on probabilistic graphical models for classification 14: 59–70.
|
[52] | Santos E, Hussein A (2004) Case-based Bayesian network classifiers. FLAIRS conference 2004: 538–543.
|
[53] | Roos R, Wettig H, Grünwald P, Myllyn?ki P, Tirri H (2005) On discriminative Bayesian network classifiers and logistic regression. Mach Learn 59 (3): 267–296. doi: 10.1007/s10994-005-0471-6
|
[54] | Castillo G, Gama J (2005) Bias management of Bayesian network classifiers. Discovery science, Lecture notes in computer science 3735: 70–83. doi: 10.1007/11563983_8
|
[55] | Jing Y, Pavlovic V, Rehg JM (2005) Efficient discriminative learning of Bayesian network classifier via boosted augmented naive Bayes. Proc Int Conf Mach Learn 22: 369–376. doi: 10.1145/1102351.1102398
|
[56] | Langseth H, Nielsen TD (2006) Classification using hierarchical na?ve Bayes models. Mach Learn 63 (2): 135–159. doi: 10.1007/s10994-006-6136-2
|
[57] | Su J, Zhang H (2006) Full Bayesian network classifiers. Proc Int Conf Mach Learn 23: 897–904.
|
[58] | O’Donnell RT, Allison L, Korb KB (2006) Learning hybrid Bayesian networks by MML. Advances in artificial intelligence, Lecture notes in computer science 4304: 192–203. doi: 10.1007/11941439_23
|
[59] | Yehezkel R, Lerner B (2006) Bayesian network structure learning by recursive autonomy identification. Structural, syntactic, and statistical pattern recognition, Lecture notes in computer science 4109: 154–162. doi: 10.1007/11815921_16
|
[60] | Carvalho AM, Oliveira AL, Sagot MF (2007) Efficient learning of Bayesian network classifiers: an extension to the TAN classifier. Advances in artificial intelligence, Lecture notes in computer science 4830: 16–25. doi: 10.1007/978-3-540-76928-6_4
|
[61] | Boullé M (2007) Compression-based averaging of selective naive Bayes classifiers. J Mach Learn Res 8: 1659–1685.
|
[62] | Francois OCH (2008) Efficient Bayesian network learning using EM or pairwise deletion. European Workshop on Probabilistic Graphical Models 4: 121–128.
|
[63] | Jing Y, Pavlovic V, Rehg JM (2008) Boosted Bayesian network classifiers. Mach Learn 73 (2): 155–184. doi: 10.1007/s10994-008-5065-7
|
[64] | Madden MG (2009) On the classification performance of TAN and general Bayesian networks. Knowledge-Based Systems 22 (7): 489–495. doi: 10.1016/j.knosys.2008.10.006
|
[65] | Silander T, Roos T, Myllymaki P (2010) Learning locally minimax optimal Bayesian networks. Int J Approx Reason 51 (5): 544–557. doi: 10.1016/j.ijar.2010.01.012
|
[66] | Drugan MM, Wiering MA (2010) Feature selection for Bayesian network classifiers using the MDL-FS score. Int J Approx Reason 51 (6): 695–717. doi: 10.1016/j.ijar.2010.02.001
|
[67] | Lee N, Kim JM (2010) Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications. Comput Stat Data Anal 54 (5): 1247–1265. doi: 10.1016/j.csda.2009.11.003
|
[68] | Flores MJ, Nicholson AE, Brunskill A, Korb KB, Mascaro S (2011) Incorporating expert knowledge when learning Bayesian network structure: a medical case study. Artif Intell Med 53 (3): 181–204. doi: 10.1016/j.artmed.2011.08.004
|
[69] | Flores MJ, Gámez JA, Martínez AM (2011) Handling numeric attributes when comparing Bayesian network classifiers: does the discretization method matter? Applied Intelligence 34 (3): 372–385. doi: 10.1007/s10489-011-0286-z
|
[70] | Larra?aga P, Karshenas H, Bielza C, Santana R (2013) A review on evolutionary algorithms in Bayesian network learning and inference tasks. Information Sciences 233: 109–125. doi: 10.1016/j.ins.2012.12.051
|
[71] | Friedman N, Goldszmidt M (1998) Learning Bayesian networks with local structure. Learning in graphical models, NATO ASI series 89: 421–459. doi: 10.1007/978-94-011-5014-9_15
|
[72] | Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106. doi: 10.1007/bf00116251
|
[73] | Neapolitan RE (2004) Learning Bayesian networks. New Jersey: Pearson-Prentice Hall. 674 p.
|
[74] | Isozaki T (2012) Learning causal Bayesian networks using minimum free energy principle. New Generation Computing 30 (1): 17–52. doi: 10.1007/s00354-012-0103-1
|
[75] | Drugan MM, Wiering MA (2010) Feature selection for Bayesian network classifiers using the MDL-FS score. Int J Approx Reason 51 (6): 695–717. doi: 10.1016/j.ijar.2010.02.001
|
[76] | Lee N, Kim JM (2010) Conversion of categorical variables into numerical variables via Bayesian network classifiers for binary classifications. Comput Stat Data Anal 54 (5): 1247–1265. doi: 10.1016/j.csda.2009.11.003
|
[77] | Alonso-Barba JI, delaOssa L, Puerta JM (2011) Structural learning of Bayesian networks using local algorithms based on the space of orderings. Soft Computing 15 (10): 1881–1895. doi: 10.1007/s00500-010-0623-x
|
[78] | Freno A, Trentin E (2011) Hybrid random fields: a scalable approach to structure and parameter learning in probabilistic graphical models. Berlin, Germany: Springer Berlin-Heidelberg. 207 p.
|
[79] | Lima CF, Lobo FG, Pelikan M, Goldberg DE (2011) Model accuracy in the Bayesian optimization algorithm. Soft Computing 15 (7): 1351–1371. doi: 10.1007/s00500-010-0675-y
|
[80] | Liu Y, Han JDJ (2010) Application of Bayesian networks on large-scale biological data. Front Biol (Beijing) 5 (2): 98–104. doi: 10.1007/s11515-010-0023-8
|
[81] | de Campos CP, Zeng Z, Ji Q (2009) Structure learning of Bayesian networks using constraints. Proc Int Conf Mach Learn 26: 113–120.
|
[82] | Yang Y, Webb G, Korb K, Ting KM (2007) Classifying under computational resource constraints: anytime classification using probabilistic estimators. Mach Learn 69 (1): 35–53. doi: 10.1007/s10994-007-5020-z
|
[83] | Sahin F, Tillett J, Raghuveer R, Rao T (2004) An evolutionary algorithmic approach to learning a Bayesian network from complete data. Proc. SPIE 5433, Data Mining and Knowledge Discovery: Theory, Tools, and Technology VI: 88–99. doi: 10.1117/12.542371
|
[84] | de Campos LM (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests.. J Mach Learn Res 7: 2149–2187.
|
[85] | Cheng J, Greiner R, Kelly J, Bell D, Liu W (2002) Learning Bayesian networks from data: an information theory based approach. Artif Intell 137: 43–90. doi: 10.1016/s0004-3702(02)00191-1
|
[86] | Ide JS, Cozman FG (2002) Random generation of Bayesian networks. Advances in artificial intelligence, Lecture notes in computer science 2507: 366–376. doi: 10.1007/3-540-36127-8_35
|
[87] | Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1999) Numerical recipes in C: the art of scientific computing. New York: Cambridge University Press. 994 p.
|
[88] | Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. New York: Springer. 533 p.
|
[89] | Friedman N, Yakhini Z (1996) On the sample complexity of learning Bayesian networks. Uncertain Artif Intell 12: 274–282.
|