全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Area Skewness for Random Samples Drawn from an Unknown or Specified Distribution

DOI: 10.4236/ojs.2025.151007, PP. 93-128

Keywords: Skewness, Quantitative Variable, Bootstrap Confidence Interval, Asymptotic Confidence Interval, Bootstrap-vs-Asymptotic Error Comparison, R Program

Full-Text   Cite this paper   Add to My Lib

Abstract:

Singh, Gewali, and Khatiwada proposed a skewness measure for probability distributions called Area Skewness (AS), which has desirable properties but has not been widely applied in practice. One reason for this may be the lack of dissemination and further exploration of this new measure. Additionally, the authors focused mainly on specific probability distributions. One of the advantages of AS is that it considers the entire shape of the distribution, rather than focusing only on moments or the linear distance between central tendency statistics or quantiles. This holistic approach makes AS particularly robust in cases where the distribution deviates from normality or contains outliers. This paper aims to generalize its use to random samples with either known or unknown distributions. The study has three objectives: 1) to develop an R script for point and interval estimation of AS; 2) to provide interpretive norms of normality by examining normality in bootstrap sampling distributions; and 3) to compare asymptotic and bootstrap standard errors. Interval estimation is approached asymptotically and through bootstrap. The script was illustrated using two examples: one with generated data and another with real-world data. Interpretive norms of normality are derived from 40 samples of various sizes, created by inverse transform sampling to follow a standard normal distribution. Bootstrap intervals at three confidence levels (0.9, 0.95, and 0.99) were obtained using the normal method, with two exceptions: the bias-corrected and accelerated percentile method for the 60-data sample and the percentile method for the 600-data sample, as these deviated from normality. Asymptotic 95% confidence intervals are also provided. The asymptotic standard error was larger than the bootstrap one, with the difference decreasing as the sample size increased. The script is concluded to have practical and educational utility for estimating AS, whose asymptotic sampling distribution is normal.

References

[1]  Singh, A.K., Gewali, L.P. and Khatiwada, J. (2019) New Measures of Skewness of a Probability Distribution. Open Journal of Statistics, 9, 601-621.
https://doi.org/10.4236/ojs.2019.95039
[2]  Pearson, K. (1895) Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material. Philosophical Transactions of the Royal Society of London, Series A (Mathematical, Physical and Engineering Sciences), 186, 343-414.
[3]  Fisher, R.A. (1930) The Moments of the Distribution for Normal Samples of Measures of Departure from Normality. Proceedings of the Royal Society of London, Series A Mathematical and Physical Sciences, 130, 16-28.
[4]  Wald, A. (1939) Contributions to the Theory of Statistical Estimation and Testing Hypotheses. The Annals of Mathematical Statistics, 10, 299-326.
https://doi.org/10.1214/aoms/1177732144
[5]  Zepeda-Tello, R., Schomaker, M., Maringe, C., Smith, M.J., Belot, A., Rachet, B., Schnitzer, M.E. and Luque-Fernandez, M.A. (2022) The Delta-Method and Influence Function in Medical Statistics: A Reproducible Tutorial.
[6]  Efron, B. and Narasimhan, B. (2020) The Automatic Construction of Bootstrap Confidence Intervals. Journal of Computational and Graphical Statistics, 29, 608-619.
https://doi.org/10.1080/10618600.2020.1714633
[7]  Wang, N. (2023) Conducting Meta-Analyses of Proportions in R. Journal of Behavioral Data Science, 3, 64-126.
https://doi.org/10.35566/jbds/v3n2/wang
[8]  Waudby-Smith, I., Arbour, D., Sinha, R., Kennedy, E.H. and Ramdas, A. (2021) Time-Uniform Central Limit Theory and Asymptotic Confidence Sequences.
[9]  Hahn, J. and Liao, Z. (2021) Bootstrap Standard Error Estimates and Inference. Econometrica, 89, 1963-1977.
https://doi.org/10.3982/ecta17912
[10]  Chu, B.M., Jacho-Chávez, D.T. and Linton, O.B. (2020) Standard Errors for Nonparametric Regression. Econometric Reviews, 39, 674-690.
https://doi.org/10.1080/07474938.2020.1772563
[11]  Stapor, K. (2020) Descriptive and Inferential Statistics. In: Intelligent Systems Reference Library, Springer, 63-131.
https://doi.org/10.1007/978-3-030-45799-0_2
[12]  R Core Team and Contributors Worldwide (2024) R Documentation. Sample Quantiles.
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
[13]  McGrath, S., Zhao, X., Steele, R., Thombs, B.D., Benedetti, A., Levis, B., et al. (2020) Estimating the Sample Mean and Standard Deviation from Commonly Reported Quantiles in Meta-Analysis. Statistical Methods in Medical Research, 29, 2520-2537.
https://doi.org/10.1177/0962280219889080
[14]  Hyndman, R.J. and Fan, Y. (1996) Sample Quantiles in Statistical Packages. The American Statistician, 50, 361-365.
https://doi.org/10.1080/00031305.1996.10473566
[15]  Tukey, J.W. (1977) Exploratory Data Analysis. Addison-Wesley.
[16]  Bruce, P., Bruce, A. and Gedeck, P. (2020) Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python. 2nd Edition, O’Reilly Media.
[17]  Linden, A. (2023) CENTILE2: Stata Module to Enhance Centile Command and Provide Additional Definitions for Computing Sample Quantiles. Statistical Software Components. S459262. Boston College Department of Economics.
[18]  Sukhoplyuev, D.I. and Nazarov, A.N. (2024) Methods of Descriptive Statistics in Telemetry Tasks. 2024 Systems of Signals Generating and Processing in the Field of on Board Communications, Moscow, 12-14 March 2024, 1-5.
https://doi.org/10.1109/ieeeconf60226.2024.10496798
[19]  Ramachandran, K.M. and Tsokos, C.P. (2020) Mathematical Statistics with Applications in R. Academic Press.
[20]  Chihara, L.M. and Hesterberg, T.C. (2022) Mathematical Statistics with Resampling and R. 3rd Edition, Wiley.
[21]  Schwarzer, G. and Rücker, G. (2021) Meta-Analysis of Proportions. In: Methods in Molecular Biology, Springer, 159-172.
https://doi.org/10.1007/978-1-0716-1566-9_10
[22]  Eberl, A. and Klar, B. (2020) Asymptotic Distributions and Performance of Empirical Skewness Measures. Computational Statistics & Data Analysis, 146, Article 106939.
https://doi.org/10.1016/j.csda.2020.106939
[23]  Wald, A. and Wolfowitz, J. (1943) An Exact Test for Randomness in the Non-Parametric Case Based on Serial Correlation. The Annals of Mathematical Statistics, 14, 378-388.
https://doi.org/10.1214/aoms/1177731358
[24]  Coker, B., Rudin, C. and King, G. (2021) A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results. Management Science, 67, 6174-6197.
https://doi.org/10.1287/mnsc.2020.3818
[25]  Grubbs, F.E. (1969) Procedures for Detecting Outlying Observations in Samples. Technometrics, 11, 1-21.
https://doi.org/10.1080/00401706.1969.10490657
[26]  D’agostino, R.B. (1970) Transformation to Normality of the Null Distribution of G1. Biometrika, 57, 679-681.
https://doi.org/10.1093/biomet/57.3.679
[27]  Anscombe, F.J. and Glynn, W.J. (1983) Distribution of the Kurtosis Statistic B2 for Normal Samples. Biometrika, 70, 227-234.
https://doi.org/10.1093/biomet/70.1.227
[28]  Shapiro, S.S. and Francia, R.S. (1972) An Approximate Analysis of Variance Test for Normality. Journal of the American Statistical Association, 67, 215-216.
https://doi.org/10.1080/01621459.1972.10481232
[29]  Royston, P. (1993) A Toolkit for Testing for Non-Normality in Complete and Censored Samples. The Statistician, 42, 37-43.
https://doi.org/10.2307/2348109
[30]  D’Agostino, R.B., Belanger, A. and D’Agostino, R.B. (1990) A Suggestion for Using Powerful and Informative Tests of Normality. The American Statistician, 44, 316-321.
https://doi.org/10.2307/2684359
[31]  Hu, K. (2020) Become Competent within One Day in Generating Boxplots and Violin Plots for a Novice without Prior R Experience. Methods and Protocols, 3, Article 64.
https://doi.org/10.3390/mps3040064
[32]  José Moral, D.L.R. (2024) Determination of the Number and Width of Class Intervals Using R. Annals of Environmental Science and Toxicology, 8, 22-42.
https://doi.org/10.17352/aest.000077
[33]  Scott, D.W. (2015) Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley.
https://doi.org/10.1002/9781118575574
[34]  Freedman, D. and Diaconis, P. (1981) On the Histogram as a Density Estimator: L2 Theory. Probability Theory and Related Fields, 57, 453-476.
https://doi.org/10.1007/BF01025868
[35]  Epanechnikov, V.A. (1969) Non-Parametric Estimation of a Multivariate Probability Density. Theory of Probability & Its Applications, 14, 153-158.
https://doi.org/10.1137/1114019
[36]  Sheather, S.J. and Jones, M.C. (1991) A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation. Journal of the Royal Statistical Society Series B: Statistical Methodology, 53, 683-690.
https://doi.org/10.1111/j.2517-6161.1991.tb01857.x
[37]  Kvam, P., Vidakovic, B. and Kim, S.J. (2022) Density Estimation. In: Nonparametric Statistics with Applications to Science and Engineering with R, Wiley, 223-234.
https://doi.org/10.1002/9781119268178.ch11
[38]  Hernández, H. (2021) Testing for Normality: What is the Best Method. ForsChem Research Reports, 6, 1-38.
https://www.forschem.org/
[39]  Efron, B. and Tibshirani, R.J. (1993) An Introduction to the Bootstrap. Chapman & Hall/CRC Press.
https://doi.org/10.1007/978-1-4899-4541-9
[40]  Efron, B. (2022) Exponential Families in Theory and Practice. Cambridge University Press.
https://doi.org/10.1017/9781108773157
[41]  Efron, B. and Narasimhan, B. (2022) Package ‘Bcaboot’. Bias Corrected Bootstrap Confidence Intervals.
https://cran.r-project.org/web/packages/bcaboot/bcaboot.pdf
[42]  Rousselet, G., Pernet, C.R. and Wilcox, R.R. (2023) An Introduction to the Bootstrap: A Versatile Method to Make Inferences by Using Data-Driven Simulations. Meta-Psychology, 7, 1-24.
https://doi.org/10.15626/mp.2019.2058
[43]  Di Leo, G. and Sardanelli, F. (2020) Statistical Significance: P Value, 0.05 Threshold, and Applications to Radiomics—Reasons for a Conservative Approach. European Radiology Experimental, 4, Article 18.
https://doi.org/10.1186/s41747-020-0145-y
[44]  Ding, P. (2014) Three Occurrences of the Hyperbolic-Secant Distribution. The American Statistician, 68, 32-35.
https://doi.org/10.1080/00031305.2013.867902
[45]  Snedecor, G.W. and Cochran, W.G. (1989) Statistical Methods. 8th Edition, Iowa State University Press.
[46]  Caeiro, F. and Mateus, A. (2024) “Randtests”: Testing Randomness in R.
https://doi.org/10.32614/CRAN.package.randtests
[47]  Wickham, H., Chang, W., Henry, L., Pedersen, T.L., Takahashi, K., Wilke, C., Woo, K., Yutani, H., Dunnington, D. and van den Brand, T. (2024) “Ggplot2”: Create Elegant Data Visualisations Using the Grammar of Graphics.
https://doi.org/10.32614/CRAN.package.ggplot2
[48]  Komsta, L. (2022) “Outliers”: Tests for Outliers.
https://doi.org/10.32614/CRAN.package.outliers
[49]  Komsta, L. and Novomestky, F. (2022) “Moments”: Moments, Cumulants, Skewness, Kurtosis and Related Tests.
https://doi.org/10.32614/CRAN.package.moments
[50]  Gross, J. and Ligges, U. (2022) Package ‘Nortest’.
https://cran.r-project.org/web/packages/nortest/nortest.pdf
[51]  Canty, A., Ripley, B. and Brazzale, A.R. (2024) Package ‘Boot’.
https://cran.r-project.org/web/packages/boot/boot.pdf
[52]  Angrist, J.D., Imbens, G.W. and Rubin, D.B. (1996) Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association, 91, 444-455.
https://doi.org/10.2307/2291629
[53]  Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences. 2nd Edition, Erlbaum.
[54]  Dul, J., van der Laan, E. and Kuik, R. (2018) A Statistical Significance Test for Necessary Condition Analysis. Organizational Research Methods, 23, 385-395.
https://doi.org/10.1177/1094428118795272
[55]  Khan, I.A., Bickel, J.E. and Hammond, R.K. (2023) Determining the Accuracy of the Triangular and PERT Distributions. Decision Analysis, 20, 151-165.
https://doi.org/10.1287/deca.2022.0464
[56]  Khakifirooz, M., Tercero-Gómez, V.G. and Woodall, W.H. (2021) The Role of the Normal Distribution in Statistical Process Monitoring. Quality Engineering, 33, 497-510.
https://doi.org/10.1080/08982112.2021.1909731
[57]  Habibzadeh, F. (2024) Data Distribution: Normal or Abnormal? Journal of Korean Medical Science, 39, e35.
https://doi.org/10.3346/jkms.2024.39.e35
[58]  Demir, S. (2022) Comparison of Normality Tests in Terms of Sample Sizes under Different Skewness and Kurtosis Coefficients. International Journal of Assessment Tools in Education, 9, 397-409.
https://doi.org/10.21449/ijate.1101295
[59]  Moral de la Rubia, J. (2020) Propiedades métricas de la Escala de Autoritarismo de Ala Derecha en estudiantes de medicina mexicanos. Revista de Psicología y Ciencias del Comportamiento de la Unidad Académica de Ciencias Jurídicas y Sociales, 11, 52-76.
https://doi.org/10.29059/rpcc.20200617-103
[60]  Charpentier, A. and Flachaire, E. (2022) Pareto Models for Top Incomes and Wealth. The Journal of Economic Inequality, 20, 1-25.
https://doi.org/10.1007/s10888-021-09514-6
[61]  Vaclavik, M., Sikorova, Z. and Barot, T. (2020) Skewness in Applied Analysis of Normality. In: Advances in Intelligent Systems and Computing, Springer, 927-937.
https://doi.org/10.1007/978-3-030-63319-6_86
[62]  Orcan, F. (2020) Parametric or Non-Parametric: Skewness to Test Normality for Mean Comparison. International Journal of Assessment Tools in Education, 7, 255-265.
https://doi.org/10.21449/ijate.656077

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133