Sample size estimation of skewed distributions in medical research

Main Article Content

Sadanandam Vemula
https://orcid.org/0000-0002-5325-5125
Kalpanapriya Dhakshanamoorthy
https://orcid.org/0000-0001-5984-3131

Abstract

The primary goal of calculating sample size is to ascertain the minimum number of samples required to identify meaningful changes in treatment outcomes, clinical parameters, or associations following data collection. Determining the sample size is the initial and crucial step in organizing a clinical trial. An improper assessment of this number could result in the approval of an ineffective medication or the rejection of an effective one. Sample size estimations should align with the intended analysis methodology. We will use generalized linear models (GLMs) to analyze the data, frequently employing normal approximations for non-normal distributions. The Binomial, Negative Binomial, Poisson, and Gamma families are specific cases where we utilize GLM theory to derive sample size formulas when comparing two means. We evaluated the performance of normal approximations by simulating various distributions using the log-link and identity-link functions. First, we examined the extent of errors in normal approximations for discrete probability distributions. Next, we applied GLM theory to derive sample size equations, which were evaluated through case studies and simulations. The Negative Binomial and Gamma distributions under study are well-suited for calculations on the link function (log) scale, often providing greater accuracy than normal approximations. However, the Binomial and Poisson distributions offer minimal advantage. The proposed method effectively calculates sample sizes when comparing the means of highly skewed outcome variables.

Article Details

How to Cite
Vemula, . S., & Dhakshanamoorthy, K. (2025). Sample size estimation of skewed distributions in medical research. Brazilian Journal of Biometrics, 43(3), e-43754. https://doi.org/10.28951/bjb.v43i3.754
Section
Articles
Author Biography

Sadanandam Vemula, Vellore Institute of Tecnology

Department of Mathematics

Ph.d Research Scholar

References

1. Al-Dousari, A., Ellahi, A. & Hussain, I. Use of non-homogeneous Poisson process for the analysis of new cases, deaths, and recoveries of COVID-19 patients: A case study of Kuwait. Journal of King Saud University-Science 33, 101614 (2021). https://doi.org/10.1016/j.jksus.2021.101614

2. Bhattacharya, B. Tests of parameters of several gamma distributions with inequality restrictions. Annals of the Institute of Statistical Mathematics 54, 565–576 (2002). https://doi.org/10.1023/A:1022411127154

3. Bhaumik, D. K., Kapur, K. & Gibbons, R. D. Testing parameters of a gamma distribution for small samples. Technometrics 51, 326–334 (2009). https://doi.org/10.1198/tech.2009.07038

4. Bolarinwa, O. A. Sample size estimation for health and social science researchers: The principles and considerations for different study designs. Nigerian Postgraduate Medical Journal 27, 67–75 (2020). https://doi.org/10.4103/npmj.npmj_19_20

5. Boneau, C. A. The effects of violations of assumptions underlying the t test. Psychological bulletin 57, 49 (1960). https://doi.org/10.1037/h0041412

6. Chen, W. W. & Kotz, S. The Riemannian structure of the three-parameter Gamma distribution. Scientific Research (2013). http://dx.doi.org/10.4236/am.2013.43077

7. Cundill, B. & Alexander, N. D. Sample size calculations for skewed distributions. BMC medical research methodology 15, 1–9 (2015). https://doi.org/10.1186/s12874-015-0023-0

8. Feller. An introduction to Probability Theory and its Applications (Wiley and Sons, 1971).

9. Feller, W. An introduction to probability theory and its applications. 2 (Wiley, 1957).

10. Forbes, C., Evans, M., Hastings, N. & Peacock, B. Statistical distributions (John Wiley & Sons, 2011).

11. Hajian-Tilaki, K. Sample size estimation in diagnostic test studies of biomedical informatics. Journal of biomedical informatics 48, 193–204 (2014). https://doi.org/10.1016/j.jbi.2014.02.013

12. Heeren, T. & D’Agostino, R. Robustness of the two independent samples t-test when applied to ordinal scaled data. Statistics in medicine 6, 79–90 (1987). https://doi.org/10.1002/sim.4780060110

13. Holodinsky, J. K., Yu, A. Y., Kapral, M. K. & Austin, P. C. Comparing regression modeling strategies for predicting hometime. BMC Medical Research Methodology 21, 1–18 (2021). https://doi.org/10.1186/s12874-021-01331-9

14. Korolev, V. & Shevtsova, I. An improvement of the Berry–Esseen inequality with applications to Poisson and mixed Poisson random sums. Scandinavian Actuarial Journal 2012, 81–105 (2012). http://dx.doi.org/10.1080/03461238.2010.485370

15. Kurniasari, D., Warsono, W., Widiarti, W. & Usman, M. Estimation of Generalized Gamma Distribution Parameter with Probability Weighted Moment Method. Science International Lahore 30, 1–6 (2018).

16. Lachin, J. M. Introduction to sample size determination and power analysis for clinical trials. Controlled clinical trials 2, 93–113 (1981). https://doi.org/10.1016/0197-2456(81)90001-5

17. Lee, C. S. & Conway, C. The role of generalized linear models in handling cost and count data. (2022). https://doi.org/10.1093/eurjcn/zvac002

18. McCullagh, P. Generalized linear models (Routledge, 2019).

19. Nagar, D. K., Roldán-Correa, A. & Gupta, A. K. Extended matrix variate gamma and beta functions. Journal of Multivariate Analysis 122, 53–69 (2013). https://doi.org/10.1016/j.jmva.2013.07.001

20. Stonehouse, J. M. & Forrester, G. J. Robustness of the t and U tests under combined assumption violations. Journal of Applied Statistics 25, 63–74 (1998). https://doi.org/10.1080/02664769823304

21. Tang, Y., Zhu, L. & Gu, J. An improved sample size calculation method for score tests in generalized linear models. Statistics in Biopharmaceutical Research 13, 415–424 (2021). https://doi.org/10.1080/19466315.2020.1756398

22. Tripathi, R. C., Gupta, R. C. & Pair, R. K. Statistical tests involving several independent gamma distributions. Annals of the Institute of Statistical Mathematics 45, 773–786 (1993). https://doi.org/10.1007/BF00774787

23. Yan, F., Robert, M. & Li, Y. Statistical methods and common problems in medical or biomedical science research. International journal of physiology, pathophysiology and pharmacology 9, 157 (2017). https://pmc.ncbi.nlm.nih.gov/articles/PMC5698693/

24. Yue, S., Ouarda, T. B. & Bobée, B. A review of bivariate gamma distributions for hydrological application. Journal of Hydrology 246, 1–18 (2001). https://doi.org/10.1016/S0022-1694(01)00374-2

25. Zelterman, D. Discrete distributions: applications in the health sciences (John Wiley & Sons, 2005).

26. Zhang, Y., Ye, Z. & Lord, D. Estimating dispersion parameter of negative binomial distribution for analysis of crash data: bootstrapped maximum likelihood method. Transportation Research Record 2019, 15–21 (2007). http://dx.doi.org/10.3141/2019-03

27. Zhu, H. & Lakkis, H. Sample size calculation for comparing two negative binomial rates. Statistics in medicine 33, 376–387 (2014). https://doi.org/10.1002/sim.5947

Similar Articles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > >> 

You may also start an advanced similarity search for this article.