全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Effects of Censoring Levels on Survival Data Analysis: Big Data

DOI: 10.4236/ojs.2025.153016, PP. 312-322

Keywords: Survival, Censoring, Big Data

Full-Text   Cite this paper   Add to My Lib

Abstract:

In addition to non-normality, censoring is one of the characteristics of survival data. All traditional procedures and models take into consideration this censoring characteristic in relation to survival data analysis. However, no studies have been done on the effect of censoring levels in survival data analysis. The main objective of this paper is to look at the effect of censoring levels in survival data analysis in relation to big data. Data of sizes n = 10,000, n = 50,000 and n = 100,000 were simulated each at censoring levels of p = 0.1, p = 0.5 and p = 0.9. For comparison sake, also small/moderate sized survival datasets were also simulated. Censoring levels had a low effect on small/moderate sized datasets and had a significant effect on big datasets. This was depicted by the plots of survivor function. Visually, it was evident that as the level of censoring increases, there is a tendency to overestimate survival prospects. Model fit was much better for small/moderate datasets as compared to model fit for big datasets. This supports the idea of many researchers that traditional survival statistical models are inferior when handling big data. Surprising, the model fit for high censoring level (p = 0.9) had a much better fit both on small/moderate and big datasets.

References

[1]  Brilleman, S.L., Wolfe, R., Moreno-Betancur, M. and Crowther, M.J. (2021) Simulating Survival Data Using the Simsurv R Package. Journal of Statistical Software, 97, 1-27.
https://doi.org/10.18637/jss.v097.i03
https://www.jstatsoft.org/index.php/jss/article/view/v097i03
[2]  Collett, D. (2003) Modelling Survival Data in Medical Research. 2nd Edition, Chapman & Hall/CRC Texts in Statistical Science, Taylor & Francis.
https://books.google.co.zw/books?id=4t3-GWDKDRQC
[3]  Heckman, J.J. and Robb, R. (1985) Alternative Methods for Evaluating the Impact of Interventions. In: Heckman, J.J. and Singer, B.S., Eds., Longitudinal Analysis of Labor Market Data, Cambridge University Press, 156-246.
https://doi.org/10.1017/ccol0521304539.004
[4]  Lee, E.T. and Wang, J.W. (2003) Statistical Methods for Survival Data Analysis. Wiley.
https://doi.org/10.1002/0471458546
[5]  Wang, P., Li, Y. and Reddy, C. (2017) Machine Learning for Survival Analysis: A Survey. arxiv abs/1708.04649.
[6]  Lin, D.Y. (2007) On the Breslow Estimator. Lifetime Data Analysis, 13, 471-480.
https://doi.org/10.1007/s10985-007-9048-y
[7]  Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39, 1-13.
https://doi.org/10.18637/jss.v039.i05
[8]  Riahi, Y. and Riahi, S. (2018) Big Data and Big Data Analytics: Concepts, Types and Technologies. International Journal of Research and Engineering, 5, 524-528.
https://doi.org/10.21276/ijre.2018.5.9.5
[9]  Hiba, J., Hadi, H., Hameed Shnain, A., Hadishaheed, S. and Haji, A. (2015) Big Data and Five V’s Characteristics. 2393-2835.
https://www.iraj.in/journal/journal_file/journal_pdf/12-105-142063747116-23.pdf
[10]  Collet, D. (2015) Modelling Survival Data in Medical Research. Chapman & Hall/CRC Texts in Statistical Science, CRC Press.
https://books.google.co.zw/books?id=Okf7CAAAQBAJ
[11]  Dunn, O.J. and Clark, V.A. (2009) Basic Statistics. Wiley.
https://doi.org/10.1002/9780470496862
[12]  Harden, J.J. and Kropko, J. (2018) Simulating Duration Data for the Cox Model. Political Science Research and Methods, 7, 921-928.
https://doi.org/10.1017/psrm.2018.19
[13]  Berkowitz, M., Altman, R.M. and Loughin, T.M. (2024) Random Forests for Survival Data: Which Methods Work Best and under What Conditions? The International Journal of Biostatistics, 20, 315-345.
https://doi.org/10.1515/ijb-2023-0056

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133