Big Data and Epidemiology: Predictive Models for Future Infectious Disease Outbreaks

Munkhzul Ganbat (1), Baatar Tserendorj (2), Selenge Batbold (3), Rustiyana Rustiyana (4)
(1) Mongolian State University of Education, Mongolia,
(2) National University of Mongolia, Mongolia,
(3) Mongolian University of Science and Technology, Mongolia,
(4) Universitas Bale Bandung, Indonesia

Abstract

Intensifying global mobility, climate variability, and urban density have increased the frequency and complexity of infectious disease outbreaks, prompting the need for more accurate and timely epidemiological surveillance. Big Data analytics has emerged as a transformative approach capable of integrating heterogeneous datasets to detect patterns that traditional surveillance systems often miss. This study aims to examine the effectiveness of predictive modeling techniques leveraging Big Data sources—such as social media activity, electronic health records, mobility data, and environmental indicators—in forecasting potential infectious disease outbreaks. A mixed-methods analytical design was employed, combining machine learning–based predictive modeling with retrospective epidemiological validation using multi-country datasets covering the past ten years. The results show that ensemble learning models, especially random forest and gradient boosting algorithms, significantly outperform conventional statistical models in predicting outbreak onset and trajectory, achieving higher accuracy, sensitivity, and early-warning lead time. The findings demonstrate that Big Data–driven predictive models can enhance public health preparedness by providing earlier and more reliable outbreak alerts. The study concludes that integrating Big Data analytics into national and global epidemiological systems is essential for strengthening proactive disease prevention, although ethical governance and data privacy protections must be prioritized.


 


 

Full text article

Generated from XML file

References

Akter, T., & Deardon, R. (2025). Conditional logistic individual-level models of spatial infectious disease dynamics. Infectious Disease Modelling, 10(1), 268–286. Scopus. https://doi.org/10.1016/j.idm.2024.10.008

Alzahrani, S. I., Yafooz, W. M. S., Aljamaan, I. A., Alwaleedi, A., al-Hariri, M., & Saleh, G. (2025). AI-driven health analysis for emerging respiratory diseases: A case study of Yemen patients using COVID-19 data. Mathematical Biosciences and Engineering, 22(3), 554–584. Scopus. https://doi.org/10.3934/mbe.2025021

Basheer, A., Tran, M., Khan, B., Jentner, W., Wendelboe, A., Vogel, J., Kuhn, K., Wimberly, M. C., & Ebert, D. (2025). Comprehensive review of One Health systems for emerging infectious disease detection and management. One Health, 21. Scopus. https://doi.org/10.1016/j.onehlt.2025.101253

Bose, S., & Beed, R. S. (2026). Clustering-Based Multivariate Prediction Model for Infectious Disease Forecasting in India. In S. Goswami, S. Saha, R. S. Beed, & K. Basu (Eds.), Lect. Notes Networks Syst.: Vol. 1370 LNNS (pp. 1–12). Springer Science and Business Media Deutschland GmbH; Scopus. https://doi.org/10.1007/978-981-96-6537-2_1

Brunwasser, S. M., Gebretsadik, T., Satish, A., Cole, J. C., Dupont, W. D., Joseph, C., Bendixsen, C. G., Calatroni, A., Arbes, S. J., Fulkerson, P. C., Sanders, J., Bacharier, L. B., Camargo, C. A., Johnson, C. C., Furuta, G. T., Gruchalla, R. S., Gupta, R. S., Khurana Hershey, G. K., Jackson, D. J., … Hartert, T. V. (2025). Caregiver worry about COVID-19 as a predictor of social mitigation behaviours and SARS-CoV-2 infection in a 12-city U.S. surveillance study of households with children. Preventive Medicine Reports, 49. Scopus. https://doi.org/10.1016/j.pmedr.2024.102936

Chen, Q., Guo, Y., Zhai, H., Kang, J., & TANG, X. (2025). Advances in methodological research on dengue fever epidemiological surveillance and early warning models. China Tropical Medicine, 25(9), 1155–1161. Scopus. https://doi.org/10.13604/j.cnki.46-1064/r.2025.09.12

del Re, D., Palla, L., Meridiani, P., Soffi, L., Loiudice, M. T., Antinozzi, M., & Cattaruzza, M. S. (2025). Data from Emergency Medical Service Activities: A Novel Approach to Monitoring COVID-19 and Other Infectious Diseases. Diagnostics, 15(2). Scopus. https://doi.org/10.3390/diagnostics15020181

Elfatimi, E., Lekbach, Y., Prakash, S., & BenMohamed, L. (2025). Artificial intelligence and machine learning in the development of vaccines and immunotherapeutics—Yesterday, today, and tomorrow. Frontiers in Artificial Intelligence, 8. Scopus. https://doi.org/10.3389/frai.2025.1620572

Hanny, D., Arifi, D., Knoblauch, S., Resch, B., Lautenbach, S., Zipf, A., & de Aragão Rocha, A. A. (2025). An explainable GeoAI approach for the multimodal analysis of urban human dynamics: A case study for the COVID-19 pandemic in Rio de Janeiro. Computational Urban Science, 5(1). Scopus. https://doi.org/10.1007/s43762-025-00172-2

Li, T.-N., Liu, Y.-H., Yiu, K.-L., Liu, L., Han, M., Ma, W.-J., Zhou, C.-L., & Mu, H. (2025). Clinical Characteristics of Patients With Respiratory Infections After Nonpharmacological Interventions for COVID-19 in China Have Ended: Using Machine Learning Approaches to Support Pathogen Prediction at Admission. Immunity, Inflammation and Disease, 13(8). Scopus. https://doi.org/10.1002/iid3.70237

Lu, Y., Qian, C., Huang, Y., Ren, T., Xie, W., Xia, N., & Li, S. (2025). Advancing mRNA vaccines: A comprehensive review of design, delivery, and efficacy in infectious diseases. International Journal of Biological Macromolecules, 319. Scopus. https://doi.org/10.1016/j.ijbiomac.2025.145501

Malla, A. M., & Banka, A. A. (2025). AI-Powered Revolution in Infectious Disease Management: From Early Diagnostics to Drug Discovery. In Artificial Intelligence in Hum. Health and Diseases (pp. 221–236). Springer Science+Business Media; Scopus. https://doi.org/10.1007/978-981-96-8176-1_12

Meetei, M. Z., Shafqat, R., Msmali, A. H., & Hamali, W. (2025). Deep neural network applications in mathematical epidemiology: Case of rabies virus. AIMS Mathematics, 10(10), 23261–23291. Scopus. https://doi.org/10.3934/math.20251032

Michael, E., & Masys, A. J. (2025). Anticipatory Innovation for Strengthening Pandemic Preparedness and Response: Tech Enabled Predictive Pandemic Intelligence for Capability Planning. In Adv. Sci. Tech. Sec. Appl.: Vol. Part F773 (pp. 329–345). Springer; Scopus. https://doi.org/10.1007/978-3-031-86997-6_9

Nayyar, A., Shrivastava, R., & Jain, S. (2025). AI-Driven Modeling of Mycobacterium tuberculosis Dynamics to Predict Disease Progression: Experimental and Deterministic Approaches. Int. Conf. Biomed. Eng. Sustain. Healthc., ICBMESH - Proc. Scopus. https://doi.org/10.1109/ICBMESH66209.2025.11182219

Nikitina, E. A., Dushkin, A. D., Streltsov, Y. V., Andreev, S. S., Kruglova, T. S., Markina, U. A., Lebedkina, M. S., Lysenko, M. A., & Fomina, D. S. (2025). Clinical and anamnestic analysis of patients with Stevens–Johnson syndrome/toxic epidermal necrolysis hospitalised in Moscow. Development of a prognostic model of unfavourable outcomes. Russian Journal of Allergy, 22(3), 233–247. Scopus. https://doi.org/10.36691/RJA16995

Nuha, N., Pitchay, S., Azni, A. H., Sahbudin, M. A. B., & Sahbudin, I. (2025). Beyond the outbreak: A review of big data analytics in proactive infectious disease prevention for risk mitigation for COVID-19. Journal of Big Data, 12(1). Scopus. https://doi.org/10.1186/s40537-025-01245-z

Pagsuyoin, S., Ng, C., Molejon, N., & Luo, Y. (2025). Coupling wastewater-based epidemiology with data-driven machine learning for managing public health risks. Risk Analysis, 45(10), 2974–2982. Scopus. https://doi.org/10.1111/risa.70075

Pujari, S., Saroliya, H., Gawde, V., Manral, E., Mehta, J., Patil, D., & Malvankar, R. (2026). Child Mortality Prediction in India: A Time Series Approach Using ARIMA and SARIMA Models. In S. Fong, N. Dey, & A. Joshi (Eds.), Lect. Notes Networks Syst.: Vol. 1652 LNNS (pp. 241–254). Springer Science and Business Media Deutschland GmbH; Scopus. https://doi.org/10.1007/978-3-032-06691-6_24

Sun, J., Xu, L., Huang, C., & Ng, E. Y. K. (2025). Climate change and health: The role of artificial intelligence in predictive surgical treatment. Innovation and Emerging Technologies, 12. Scopus. https://doi.org/10.1142/S2737599425500045

Swaminatha Rao, L. P., Suresh, A., & Muthukumar, A. (2025). BaSTRoN: a Bayesian model for predicting infectious disease spread using socio-economic and environmental factors. International Journal of Information Technology (Singapore), 17(8), 4805–4821. Scopus. https://doi.org/10.1007/s41870-025-02695-7

Webster, J. L., Eppes, S., Lee, B. K., Harrington, N. S., & Goldstein, N. D. (2025). Contrasting methods to operationalize antibiotic exposure in clinical research: A real-world application on health care-associated Clostridioides difficile infection. American Journal of Epidemiology, 194(5), 1448–1459. Scopus. https://doi.org/10.1093/aje/kwae302

Wu, A.-Q., Wen, Z.-X., Wu, Q.-S., Wang, C.-X., & Shi, J.-H. (2025). Construction and evaluation of a prediction model for the trend of acute respiratory infectious diseases based on multi—Source data including Symptom surveillance. Modern Preventive Medicine, 52(2), 220–226. Scopus. https://doi.org/10.20043/j.cnki.MPM.202407206

Xue, Y., Long, S., Lei, X., Zhang, J., Li, W., Zhao, L., Liu, Y., Li, H., Liu, Z., Zhang, R., Chen, Y., Wang, G., Guo, S., & Wen, L. (2025). Analysis of prognostic factors and construction of a prediction model for patients with initially treated severe pulmonary tuberculosis. Journal of Thoracic Disease, 17(10), 8584–8596. Scopus. https://doi.org/10.21037/jtd-2025-1059

Yang, Y., Wan, X., Zhang, N., Wu, Z., Qiu, R., Yuan, J., & Xie, Y. (2025). Analysis and modelling of global online public interest in multiple other infectious diseases due to the COVID-19 pandemic. Journal of Evaluation in Clinical Practice, 31(5). Scopus. https://doi.org/10.1111/jep.14206

Zhu, X., Shi, Y., & Zhong, Y. (2025). An EKF prediction of COVID-19 propagation under vaccinations and viral variants. Mathematics and Computers in Simulation, 231, 221–238. Scopus. https://doi.org/10.1016/j.matcom.2024.12.012

Authors

Munkhzul Ganbat
munkzul@gmail.com (Primary Contact)
Baatar Tserendorj
Selenge Batbold
Rustiyana Rustiyana
Ganbat, M., Tserendorj, B., Batbold, S., & Rustiyana, R. (2025). Big Data and Epidemiology: Predictive Models for Future Infectious Disease Outbreaks. Journal of World Future Medicine, Health and Nursing, 3(3), 415–428. https://doi.org/10.70177/health.v3i3.2805

Article Details