Sample size estimation of skewed distributions in medical research
Conteúdo do artigo principal
Resumo
The primary goal of calculating sample size is to ascertain the minimum number of samples required to identify meaningful changes in treatment outcomes, clinical parameters, or associations following data collection. Determining the sample size is the initial and crucial step in organizing a clinical trial. An improper assessment of this number could result in the approval of an ineffective medication or the rejection of an effective one. Sample size estimations should align with the intended analysis methodology. We will use generalized linear models (GLMs) to analyze the data, frequently employing normal approximations for non-normal distributions. The Binomial, Negative Binomial, Poisson, and Gamma families are specific cases where we utilize GLM theory to derive sample size formulas when comparing two means. We evaluated the performance of normal approximations by simulating various distributions using the log-link and identity-link functions. First, we examined the extent of errors in normal approximations for discrete probability distributions. Next, we applied GLM theory to derive sample size equations, which were evaluated through case studies and simulations. The Negative Binomial and Gamma distributions under study are well-suited for calculations on the link function (log) scale, often providing greater accuracy than normal approximations. However, the Binomial and Poisson distributions offer minimal advantage. The proposed method effectively calculates sample sizes when comparing the means of highly skewed outcome variables.
Detalhes do artigo

Este trabalho está licenciado sob uma licença Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Referências
1. Al-Dousari, A., Ellahi, A. & Hussain, I. Use of non-homogeneous Poisson process for the analysis of new cases, deaths, and recoveries of COVID-19 patients: A case study of Kuwait. Journal of King Saud University-Science 33, 101614 (2021). https://doi.org/10.1016/j.jksus.2021.101614
2. Bhattacharya, B. Tests of parameters of several gamma distributions with inequality restrictions. Annals of the Institute of Statistical Mathematics 54, 565–576 (2002). https://doi.org/10.1023/A:1022411127154
3. Bhaumik, D. K., Kapur, K. & Gibbons, R. D. Testing parameters of a gamma distribution for small samples. Technometrics 51, 326–334 (2009). https://doi.org/10.1198/tech.2009.07038
4. Bolarinwa, O. A. Sample size estimation for health and social science researchers: The principles and considerations for different study designs. Nigerian Postgraduate Medical Journal 27, 67–75 (2020). https://doi.org/10.4103/npmj.npmj_19_20
5. Boneau, C. A. The effects of violations of assumptions underlying the t test. Psychological bulletin 57, 49 (1960). https://doi.org/10.1037/h0041412
6. Chen, W. W. & Kotz, S. The Riemannian structure of the three-parameter Gamma distribution. Scientific Research (2013). http://dx.doi.org/10.4236/am.2013.43077
7. Cundill, B. & Alexander, N. D. Sample size calculations for skewed distributions. BMC medical research methodology 15, 1–9 (2015). https://doi.org/10.1186/s12874-015-0023-0
8. Feller. An introduction to Probability Theory and its Applications (Wiley and Sons, 1971).
9. Feller, W. An introduction to probability theory and its applications. 2 (Wiley, 1957).
10. Forbes, C., Evans, M., Hastings, N. & Peacock, B. Statistical distributions (John Wiley & Sons, 2011).
11. Hajian-Tilaki, K. Sample size estimation in diagnostic test studies of biomedical informatics. Journal of biomedical informatics 48, 193–204 (2014). https://doi.org/10.1016/j.jbi.2014.02.013
12. Heeren, T. & D’Agostino, R. Robustness of the two independent samples t-test when applied to ordinal scaled data. Statistics in medicine 6, 79–90 (1987). https://doi.org/10.1002/sim.4780060110
13. Holodinsky, J. K., Yu, A. Y., Kapral, M. K. & Austin, P. C. Comparing regression modeling strategies for predicting hometime. BMC Medical Research Methodology 21, 1–18 (2021). https://doi.org/10.1186/s12874-021-01331-9
14. Korolev, V. & Shevtsova, I. An improvement of the Berry–Esseen inequality with applications to Poisson and mixed Poisson random sums. Scandinavian Actuarial Journal 2012, 81–105 (2012). http://dx.doi.org/10.1080/03461238.2010.485370
15. Kurniasari, D., Warsono, W., Widiarti, W. & Usman, M. Estimation of Generalized Gamma Distribution Parameter with Probability Weighted Moment Method. Science International Lahore 30, 1–6 (2018).
16. Lachin, J. M. Introduction to sample size determination and power analysis for clinical trials. Controlled clinical trials 2, 93–113 (1981). https://doi.org/10.1016/0197-2456(81)90001-5
17. Lee, C. S. & Conway, C. The role of generalized linear models in handling cost and count data. (2022). https://doi.org/10.1093/eurjcn/zvac002
18. McCullagh, P. Generalized linear models (Routledge, 2019).
19. Nagar, D. K., Roldán-Correa, A. & Gupta, A. K. Extended matrix variate gamma and beta functions. Journal of Multivariate Analysis 122, 53–69 (2013). https://doi.org/10.1016/j.jmva.2013.07.001
20. Stonehouse, J. M. & Forrester, G. J. Robustness of the t and U tests under combined assumption violations. Journal of Applied Statistics 25, 63–74 (1998). https://doi.org/10.1080/02664769823304
21. Tang, Y., Zhu, L. & Gu, J. An improved sample size calculation method for score tests in generalized linear models. Statistics in Biopharmaceutical Research 13, 415–424 (2021). https://doi.org/10.1080/19466315.2020.1756398
22. Tripathi, R. C., Gupta, R. C. & Pair, R. K. Statistical tests involving several independent gamma distributions. Annals of the Institute of Statistical Mathematics 45, 773–786 (1993). https://doi.org/10.1007/BF00774787
23. Yan, F., Robert, M. & Li, Y. Statistical methods and common problems in medical or biomedical science research. International journal of physiology, pathophysiology and pharmacology 9, 157 (2017). https://pmc.ncbi.nlm.nih.gov/articles/PMC5698693/
24. Yue, S., Ouarda, T. B. & Bobée, B. A review of bivariate gamma distributions for hydrological application. Journal of Hydrology 246, 1–18 (2001). https://doi.org/10.1016/S0022-1694(01)00374-2
25. Zelterman, D. Discrete distributions: applications in the health sciences (John Wiley & Sons, 2005).
26. Zhang, Y., Ye, Z. & Lord, D. Estimating dispersion parameter of negative binomial distribution for analysis of crash data: bootstrapped maximum likelihood method. Transportation Research Record 2019, 15–21 (2007). http://dx.doi.org/10.3141/2019-03
27. Zhu, H. & Lakkis, H. Sample size calculation for comparing two negative binomial rates. Statistics in medicine 33, 376–387 (2014). https://doi.org/10.1002/sim.5947