%0 Journal Article
%T Effects of Censoring Levels on Survival Data Analysis: Big Data
%A Evans Manjoro
%A Isheanesu Munyira
%A Charles Chimedza
%J Open Journal of Statistics
%P 312-322
%@ 2161-7198
%D 2025
%I Scientific Research Publishing
%R 10.4236/ojs.2025.153016
%X In addition to non-normality, censoring is one of the characteristics of survival data. All traditional procedures and models take into consideration this censoring characteristic in relation to survival data analysis. However, no studies have been done on the effect of censoring levels in survival data analysis. The main objective of this paper is to look at the effect of censoring levels in survival data analysis in relation to big data. Data of sizes n = 10,000, n = 50,000 and n = 100,000 were simulated each at censoring levels of p = 0.1, p = 0.5 and p = 0.9. For comparison sake, also small/moderate sized survival datasets were also simulated. Censoring levels had a low effect on small/moderate sized datasets and had a significant effect on big datasets. This was depicted by the plots of survivor function. Visually, it was evident that as the level of censoring increases, there is a tendency to overestimate survival prospects. Model fit was much better for small/moderate datasets as compared to model fit for big datasets. This supports the idea of many researchers that traditional survival statistical models are inferior when handling big data. Surprising, the model fit for high censoring level (p = 0.9) had a much better fit both on small/moderate and big datasets.
%K Survival
%K Censoring
%K Big Data
%U http://www.scirp.org/journal/PaperInformation.aspx?PaperID=143707