Advanced Bayesian Survival Modeling for Lung Adenocarcinoma Prognosis: The afthd R Package and Shiny Application

Main Article Content

Gajendra Vishwakarma
https://orcid.org/0000-0002-2804-4334
Atanu Bhattacharjee
https://orcid.org/0000-0002-5757-5513
Pragya Kumari
https://orcid.org/0000-0001-8170-404X

Abstract

High-dimensional variable selection in time-to-event analysis is a critical area in biostatistics, especially in the context of complex diseases like lung adenocarcinoma (LUAD). LUAD, the most common subtype of lung cancer, presents unique diagnostic and prognostic challenges due to its molecular and genetic diversity. This study introduces an integrated framework for high-dimensional survival analysis, combining feature selection, advanced survival modeling, and robust missing data handling techniques. We developed the afthd R package, designed specifically for Bayesian survival analysis using the Accelerated Failure Time (AFT) model. This package facilitates efficient variable selection in high-dimensional settings, employing regularized methods such as LASSO and Elastic Net, as well as Bayesian approaches for model stability. An accompanying Shiny web application provides an accessible platform for non-programmers, allowing researchers to perform high-dimensional analysis and view results interactively. Using a LUAD dataset from The Cancer Genome Atlas (TCGA), our results identify key biomarkers associated with patient survival, highlighting the practical utility of this framework in LUAD prognosis. This integrated approach lays the groundwork for more precise prognostic modeling, with potential extensions to other cancers and high-dimensional biomedical datasets.

Article Details

How to Cite
Vishwakarma, G., Bhattacharjee, A., & Kumari, P. (2025). Advanced Bayesian Survival Modeling for Lung Adenocarcinoma Prognosis: The afthd R Package and Shiny Application. Brazilian Journal of Biometrics, 43(3), e-43794. https://doi.org/10.28951/bjb.v43i3.794
Section
Articles

References

1. Abdelwahab, O., Awad, N., Elserafy, M. & Badr, E. A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma. Plos One 17, e0269126 (2022). https://doi.org/10.1371/journal.pone.0269126

2. Benner, A., Zucknick, M., Hielscher, T., Ittrich, C. & Mansmann, U. High-dimensional Cox models: the choice of penalty as part of the model building process. Biometrical Journal 52, 50–69 (2010). https://doi.org/10.1002/bimj.200900064

3. Bhattacharjee, A., Vishwakarma, G. K. & Thomas, A. Bayesian state-space modeling in gene expression data analysis: An application with biomarker prediction. Mathematical biosciences 305, 96–101 (2018). https://doi.org/10.1016/j.mbs.2018.08.011

4. Cho, H.-J., Lee, S., Ji, Y. G. & Lee, D. H. Association of specific gene mutations derived from machine learning with survival in lung adenocarcinoma. PLoS One 13, e0207204 (2018). https://doi.org/10.1371/journal.pone.0207204

5. Csardi, G. & Nepusz, T. The igraph software package for complex network research. Inter-Journal, Complex Systems 1695. https://igraph.org (2006).

6. Fan, J., Feng, Y. & Wu, Y. A Bayesian approach to variable selection in high-dimensional survival data. Biometrika 97, 691–703 (2010).

7. Fan, J. & Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360 (2002). https://doi.org/10.1198/016214501753382273

8. Fanizzi, C., De Marco, M. & De Santis, A. Machine learning survival models trained on clinical data to identify high-risk patients with hormone responsive HER2 negative breast cancer. Scientific Reports 13, e8575 (2023). https://doi.org/10.1038/s41598-023-35344-9

9. Fox, J. & Carvalho, M. S. The RcmdrPlugin. survival package: Extending the R Commander interface to survival analysis. Journal of Statistical Software 49, 1–32 (2012). https://doi.org/10.18637/jss.v049.i07

10. Gabrio, A., Mason, A. J. & Baio, G. A full Bayesian model to handle structural ones and missingness in economic evaluations from individual-level data. Statistics in medicine 38, 1399–1420 (2019). https://doi.org/10.1002/sim.8045

11. Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016). https://doi.org/10.1093/bioinformatics/btw313

12. Jiao, Y., Li, Y., Jiang, P., Han, W.& Liu, Y. PGM5: a novel diagnostic and prognostic biomarker for liver cancer. PeerJ 7, e7070 (2019). https://doi.org/10.7717/peerj.7070

13. Kaneko, S., Hirakawa, A. & Hamada, C. Enhancing the lasso approach for developing a survival prediction model based on gene expression data. Computational and Mathematical Methods in Medicine 2015, 259474 (2015). https://doi.org/10.1155/2015/259474

14. Kelter, R. Statistical Rethinking: A Bayesian course with examples in R and STAN, Taylor & Francis, 2020.

15. Li, R., Chang, C., Justesen, J. M., Tanigawa, Y.,Qian, J., Hastie, T., Rivas, M. A. & Tibshirani, R. Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank. Biostatistics 23, 522–540 (2022). https://doi.org/10.1093/biostatistics/kxaa038

16. Lin, Y., Chen, Y. & Zhang, H. Deep learning-based survival analysis for high-dimensional survival data. Mathematics 9, 1244 (2021).

17. Pedersen, T. L. ggraph: An implementation of grammar of graphics for graphs and networks R package version 2.0.5. 2020. https://CRAN.R-project.org/package=ggraph.

18. Qiu, W.-R., Qi, B.-B., Lin, W.-Z., Zhang, S.-H., Yu, W.-K. & Huang, S.-F. Predicting the lung adenocarcinoma and its biomarkers by integrating gene expression and DNA methylation data. Frontiers in Genetics 13, 926927 (2022). https://doi.org/10.3389/fgene.2022.926927

19. Shin, B., Park, S., Hong, J. H., An, H. J., Chun, S. H., Kang, K., Ahn, Y.-H., Ko, Y. H. & Kang, K. Cascaded Wx: A novel prognosis-related feature selection framework in human lung adenocarcinoma transcriptomes. Frontiers in genetics 10, 662 (2019). https://doi.org/10.3389/fgene.2019.00662

20. Sievert, C. Interactive data visualization for the web (O’Reilly Media, 2020).

21. Suantari, N. G. A. P. P., Fitrianto, A. & Sartono, B. Comparative study of survival support vector machine and random survival forest in survival data. BAREKENG: Jurnal Ilmu Matematika dan Terapan 17, 1495–1502 (2023). http://dx.doi.org/10.30598/barekengvol17iss3pp1495-1502

22. Syed, H., Jorgensen, A. L. & Morris, A. P. SurvivalGWAS_SV: software for the analysis of genome-wide association studies of imputed genotypes with “time-to-event” outcomes. BMC Bioinformatics 18, 1–6 (2017). https://doi.org/10.1186/s12859-017-1683-z

23. Vishwakarma, G. K., Bhattacharjee, A. & Banerjee, S. Handling missingness value on jointly measured time-course and time-to-event data. Communications in Statistics-Simulation and Computation 52, 126–141 (2023). https://doi.org/10.1080/03610918.2020.1851711

24. Vishwakarma, G. K., Bhattacharjee, A. & Kumar, N. Missing data handling technique in joint modeling context. Biomedical Engineering Advances 2, 100012 (2021). https://doi.org/10.1016/j.bea.2021.100012

25. Vishwakarma, G. K., Kumari, P. & Bhattacharjee, A. Thresholding of prominent biomarkers of breast cancer on overall survival using classification and regression tree. Cancer Biomarkers 34, 319–328 (2022). https://doi.org/10.3233/cbm-210470

26. Vishwakarma, G. K., Thomas, A. & Bhattacharjee, A. A weight function method for selection of proteins to predict an outcome using protein expression data. Journal of Computational and Applied Mathematics 391, 113465 (2021). https://doi.org/10.1016/j.cam.2021.113465

27. Wang, H. & Li, R. A selective review on random survival forests for high himensional data. Quantitative Bio-Science 36, 85–95 (2017). https://doi.org/10.22283/qbs.2017.36.2.85

28. Wang, Y., Li, J. & Zhang, Y. Machine learning for survival analysis. ACM Computing Surveys 52 (2019). https://doi.org/10.1145/3214306

29. Wang, Y., Gao, X., Ru, X., Sun, P. & Wang, J. Identification of gene signatures for COAD using feature selection and Bayesian network approaches. Scientific Reports 12, 8761 (2022). https://doi.org/10.1038/s41598-022-12780-7

30. Wickham, H. ggplot2: Elegant graphics for data analysis (Springer-Verlag, 2016).

Similar Articles

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > >> 

You may also start an advanced similarity search for this article.