全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

基于电子健康档案中异构时态数据的学习
Learning from Heterogeneous Temporal Da-ta Based on Electronic Health Records

DOI: 10.12677/CSA.2020.101001, PP. 1-10

Keywords: 电子健康档案,随机子序列,聚类序列,机器学习
Electronic Health Record
, Random Subsequences, Clustering Sequences, Machine Learning

Full-Text   Cite this paper   Add to My Lib

Abstract:

电子健康档案包含大量的纵向数据,对于生物医学信息学研究很有价值。然而,由于数据的复杂结构,包括随时间不均匀分布的临床事件,对标准学习算法提出了挑战。时态数据建模的一些方法依赖于从时间序列中提取单一值,导致有潜在价值时序信息的丢失。因此,如何更好地解释临床数据的时效性,仍然是一个重要的研究问题。本文研究了电子健康档案中时态数据新的表示方法,这些表示保留了时序信息,并且可以由标准机器学习算法直接处理。基于时间序列数据符号化表示的研究方法有多种不同的方式。使用电子健康档案真实数据库中临床测量的数据集的实证研究结果表明,相比使用原始序列或聚类序列,对随机子序列使用距离度量显著提高了预测性能。本文提出的表示方法更好地解释了临床事件的时效性,对于生物医学领域的预测任务十分关键。
Electronic health records contain a large number of longitudinal data, which is valuable for biomedical informatics research. However, standard learning algorithms present challenges due to the complex structure of the data and clinical events that are unevenly distributed over time. Some methods of temporal data modeling depend on extracting single values from time series, which leads to the loss of potentially valuable sequential information. Therefore, how to better explain the temporality of clinical data is still an important research question. In this paper, a new representation of temporal data in electronic health records are studied, which preserves the sequential information that can be processed directly by the standard machine learning algorithms. The research method based on time-series data symbol representation has many different ways. Empirical studies using clinically measured datasets in the real-life database of electronic health records have shown that using distance metrics for random subsequences significantly improves predictive performance compared to the use of original sequences or clustering sequences. The representation method proposed in this paper better explains the temporality of clinical events and is key to the prediction task in the biomedical domain.

References

[1]  Safran, C., Bloomrosen, M., Hammond, W.E., et al. (2007) Toward a National Framework for the Secondary Use of Health Data: An American Medical Informatics Association White Paper. Journal of the American Medical Informatics Association, 14, 1-9.
https://doi.org/10.1197/jamia.M2273
[2]  Hersh, W.R. (2007) Adding Value to the Electronic Health Record through Secondary Use of Data for Quality Assurance, Research, and Surveillance. Clinical Pharmacolo-gy & Therapeutics, 81, 126-128.
https://doi.org/10.1038/sj.clpt.6100029
[3]  Jensen, P.B., Jensen, L.J. and Brunak, S. (2012) Mining Electronic Health Records: Towards Better Research Applications and Clinical Car. Nature Reviews Genetics, 13, 395-405.
https://doi.org/10.1038/nrg3208
[4]  Patel, D., Hsu, W. and Lee, M.L. (2008) Mining Relationships among Inter-val-Based Events for Classification. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, 9-12 June 2008, 393-404.
https://doi.org/10.1145/1376616.1376658
[5]  Batal, I., Fradkin, D., Harrison, J., Moerchen, F. and Hauskrecht, M. (2012) Mining Recent Temporal Patterns for Event Detection in Mul-tivariate Time Series Data. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 12-16 August 2012, 280-288.
https://doi.org/10.1145/2339530.2339578
[6]  Zhao, J. and Henriksson, A. (2016) Learning Temporal Weights of Clinical Events Using Variable Importance. BMC Medical Informatics and Decision Making, 16, 71.
https://doi.org/10.1186/s12911-016-0311-6
[7]  Harpaz, R., Haerian, K., Chase, H.S. and Friedman, C. (2010) Mining Electronic Health Records for Adverse Drug Effects Using Regression Based Methods. The 1st ACM Interna-tional Health Informatics Symposium, Arlington, VA, 11-12 November 2010, 100-107.
https://doi.org/10.1145/1882992.1883008
[8]  Zhao, J., Henriksson, A., Asker, L. and Bostr?m, H. (2015) Predic-tive Modeling of Structured Electronic Health Records for Adverse Drug Event Detection. BMC Medical Informatics and Decision Making, 15, S1.
https://doi.org/10.1186/1472-6947-15-S4-S1
[9]  Scheff, J.D., Almon, R.R., Du Bois, D.C., Jusko, W.J. and An-droulakis, I.P. (2010) A New Symbolic Representation for the Identification of Informative Genes in Replicated Micro-array Experiments. OMICS: A Journal of Integrative Biology, 14, 239-248.
https://doi.org/10.1089/omi.2010.0005
[10]  Siirtola, P., Koskim?ki, H., Huikari, V., Laurinen, P. and R?ning, J. (2011) Improving the Classification Accuracy of Streaming Data Using Sax Similarity Features. Pattern Recognition Letters, 32, 1659-1668.
https://doi.org/10.1016/j.patrec.2011.06.025
[11]  Hills, J., Lines, J., Baranauskas, E., Mapp, J. and Bagnall, A. (2014) Classification of Time Series by Shapelet Transformation. Data Mining and Knowledge Discovery, 28, 851-881.
https://doi.org/10.1007/s10618-013-0322-1
[12]  Gordon, D., Hendler, D. and Rokac, L. (2012) Fast Randomized Model Generation for Shapelet-Based Time Series Classification. Computer Science, 1-10.
[13]  Karlsson, I., Papapetrou, P. and Bostr?m, H. (2016) Generalized Random Shapelet Forests. Data Mining and Knowledge Discovery, 30, 1053-1085.
https://doi.org/10.1007/s10618-016-0473-y
[14]  Chakrabarti, K., Keogh, E., Mehrotra, S. and Pazzani, M. (2002) Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Transactions on Database Systems, 27, 188-228.
https://doi.org/10.1145/568518.568520
[15]  Lin, J., Keogh, E., Lonardi, S. and Chiu, B. (2003) A Symbolic Rep-resentation of Time Series, with Implications for Streaming Algorithms. Proceedings of the 8th ACM SIGMOD Work-shop on Research Issues in Data Mining and Knowledge Discovery, San Diego, CA, 13 June 2003, 2-11.
https://doi.org/10.1145/882082.882086
[16]  Lin, J., Keogh, E., Wei, L. and Lonardi, S. (2007) Experiencing Sax: A Novel Symbolic Representation of Time Series. Data Mining and Knowledge Discovery, 15, 107-144.
https://doi.org/10.1007/s10618-007-0064-z
[17]  Levenshtein, V. (1965) Binary Codes Capable of Correcting Spu-rious Insertions and Deletions of Ones. Problems of Information Transmission, 1, 8-17.
[18]  Ye, L. and Keog, E. (2009) Time Series Shapelets: A New Primitive for Data Mining. Proceedings of the 15th ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining, Paris, France, 28 June-1 July, 2009, 947-956.
https://doi.org/10.1145/1557019.1557122
[19]  Kaufman, L. and Rousseeuw, P.J. (1990) Partitioning around Me-doids (Program PAM). In: Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York, 68-125.
https://doi.org/10.1002/9780470316801.ch2
[20]  Reynolds, A.P., Richards, G., de la Iglesia, B. and Ray-ward-Smith, V.J. (2006) Clustering Rules: A Comparison of Partitioning and Hierarchical Clustering Algorithms. Jour-nal of Mathematical Modelling and Algorithms, 5, 475-504.
https://doi.org/10.1007/s10852-005-9022-1
[21]  Shannon, C.E. (2001) A Mathematical Theory of Communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5, 3-55.
https://doi.org/10.1145/584091.584093
[22]  Zeiler, M.D. (2012) ADADELTA: An Adaptive Learning Rate Method. Computer Science, 1-6.
[23]  Zhao, J., Henriksson, A., Asker, L. and Bostr?m, H. (2014) Detecting Adverse Drug Events with Multiple Representations of Clinical Measurements. 2014 IEEE International Conference on Bioin-formatics and Biomedicine, Belfast, 2-5 November 2014, 536-543.
https://doi.org/10.1109/BIBM.2014.6999216
[24]  Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32.
https://doi.org/10.1023/A:1010933404324
[25]  Hanley, J.A. and McNeil, B.J. (1982) The Meaning and Use of the Area under a Receiver Operating Characteristic (ROC) Curve. Radiology, 143, 29-36.
https://doi.org/10.1148/radiology.143.1.7063747
[26]  Bradley, A.P. (1997) The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30, 1145-1159.
https://doi.org/10.1016/S0031-3203(96)00142-2

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133