Big Data and Epidemiology: Predictive Models for Future Infectious Disease Outbreaks
Abstract
Intensifying global mobility, climate variability, and urban density have increased the frequency and complexity of infectious disease outbreaks, prompting the need for more accurate and timely epidemiological surveillance. Big Data analytics has emerged as a transformative approach capable of integrating heterogeneous datasets to detect patterns that traditional surveillance systems often miss. This study aims to examine the effectiveness of predictive modeling techniques leveraging Big Data sources—such as social media activity, electronic health records, mobility data, and environmental indicators—in forecasting potential infectious disease outbreaks. A mixed-methods analytical design was employed, combining machine learning–based predictive modeling with retrospective epidemiological validation using multi-country datasets covering the past ten years. The results show that ensemble learning models, especially random forest and gradient boosting algorithms, significantly outperform conventional statistical models in predicting outbreak onset and trajectory, achieving higher accuracy, sensitivity, and early-warning lead time. The findings demonstrate that Big Data–driven predictive models can enhance public health preparedness by providing earlier and more reliable outbreak alerts. The study concludes that integrating Big Data analytics into national and global epidemiological systems is essential for strengthening proactive disease prevention, although ethical governance and data privacy protections must be prioritized.
Full text article
References
Akter, T., & Deardon, R. (2025). Conditional logistic individual-level models of spatial infectious disease dynamics. Infectious Disease Modelling, 10(1), 268–286. Scopus. https://doi.org/10.1016/j.idm.2024.10.008
Alzahrani, S. I., Yafooz, W. M. S., Aljamaan, I. A., Alwaleedi, A., al-Hariri, M., & Saleh, G. (2025). AI-driven health analysis for emerging respiratory diseases: A case study of Yemen patients using COVID-19 data. Mathematical Biosciences and Engineering, 22(3), 554–584. Scopus. https://doi.org/10.3934/mbe.2025021
Basheer, A., Tran, M., Khan, B., Jentner, W., Wendelboe, A., Vogel, J., Kuhn, K., Wimberly, M. C., & Ebert, D. (2025). Comprehensive review of One Health systems for emerging infectious disease detection and management. One Health, 21. Scopus. https://doi.org/10.1016/j.onehlt.2025.101253
Bose, S., & Beed, R. S. (2026). Clustering-Based Multivariate Prediction Model for Infectious Disease Forecasting in India. In S. Goswami, S. Saha, R. S. Beed, & K. Basu (Eds.), Lect. Notes Networks Syst.: Vol. 1370 LNNS (pp. 1–12). Springer Science and Business Media Deutschland GmbH; Scopus. https://doi.org/10.1007/978-981-96-6537-2_1
Brunwasser, S. M., Gebretsadik, T., Satish, A., Cole, J. C., Dupont, W. D., Joseph, C., Bendixsen, C. G., Calatroni, A., Arbes, S. J., Fulkerson, P. C., Sanders, J., Bacharier, L. B., Camargo, C. A., Johnson, C. C., Furuta, G. T., Gruchalla, R. S., Gupta, R. S., Khurana Hershey, G. K., Jackson, D. J., … Hartert, T. V. (2025). Caregiver worry about COVID-19 as a predictor of social mitigation behaviours and SARS-CoV-2 infection in a 12-city U.S. surveillance study of households with children. Preventive Medicine Reports, 49. Scopus. https://doi.org/10.1016/j.pmedr.2024.102936
Chen, Q., Guo, Y., Zhai, H., Kang, J., & TANG, X. (2025). Advances in methodological research on dengue fever epidemiological surveillance and early warning models. China Tropical Medicine, 25(9), 1155–1161. Scopus. https://doi.org/10.13604/j.cnki.46-1064/r.2025.09.12
del Re, D., Palla, L., Meridiani, P., Soffi, L., Loiudice, M. T., Antinozzi, M., & Cattaruzza, M. S. (2025). Data from Emergency Medical Service Activities: A Novel Approach to Monitoring COVID-19 and Other Infectious Diseases. Diagnostics, 15(2). Scopus. https://doi.org/10.3390/diagnostics15020181
Elfatimi, E., Lekbach, Y., Prakash, S., & BenMohamed, L. (2025). Artificial intelligence and machine learning in the development of vaccines and immunotherapeutics—Yesterday, today, and tomorrow. Frontiers in Artificial Intelligence, 8. Scopus. https://doi.org/10.3389/frai.2025.1620572
Hanny, D., Arifi, D., Knoblauch, S., Resch, B., Lautenbach, S., Zipf, A., & de Aragão Rocha, A. A. (2025). An explainable GeoAI approach for the multimodal analysis of urban human dynamics: A case study for the COVID-19 pandemic in Rio de Janeiro. Computational Urban Science, 5(1). Scopus. https://doi.org/10.1007/s43762-025-00172-2
Li, T.-N., Liu, Y.-H., Yiu, K.-L., Liu, L., Han, M., Ma, W.-J., Zhou, C.-L., & Mu, H. (2025). Clinical Characteristics of Patients With Respiratory Infections After Nonpharmacological Interventions for COVID-19 in China Have Ended: Using Machine Learning Approaches to Support Pathogen Prediction at Admission. Immunity, Inflammation and Disease, 13(8). Scopus. https://doi.org/10.1002/iid3.70237
Lu, Y., Qian, C., Huang, Y., Ren, T., Xie, W., Xia, N., & Li, S. (2025). Advancing mRNA vaccines: A comprehensive review of design, delivery, and efficacy in infectious diseases. International Journal of Biological Macromolecules, 319. Scopus. https://doi.org/10.1016/j.ijbiomac.2025.145501
Malla, A. M., & Banka, A. A. (2025). AI-Powered Revolution in Infectious Disease Management: From Early Diagnostics to Drug Discovery. In Artificial Intelligence in Hum. Health and Diseases (pp. 221–236). Springer Science+Business Media; Scopus. https://doi.org/10.1007/978-981-96-8176-1_12
Meetei, M. Z., Shafqat, R., Msmali, A. H., & Hamali, W. (2025). Deep neural network applications in mathematical epidemiology: Case of rabies virus. AIMS Mathematics, 10(10), 23261–23291. Scopus. https://doi.org/10.3934/math.20251032
Michael, E., & Masys, A. J. (2025). Anticipatory Innovation for Strengthening Pandemic Preparedness and Response: Tech Enabled Predictive Pandemic Intelligence for Capability Planning. In Adv. Sci. Tech. Sec. Appl.: Vol. Part F773 (pp. 329–345). Springer; Scopus. https://doi.org/10.1007/978-3-031-86997-6_9
Nayyar, A., Shrivastava, R., & Jain, S. (2025). AI-Driven Modeling of Mycobacterium tuberculosis Dynamics to Predict Disease Progression: Experimental and Deterministic Approaches. Int. Conf. Biomed. Eng. Sustain. Healthc., ICBMESH - Proc. Scopus. https://doi.org/10.1109/ICBMESH66209.2025.11182219
Nikitina, E. A., Dushkin, A. D., Streltsov, Y. V., Andreev, S. S., Kruglova, T. S., Markina, U. A., Lebedkina, M. S., Lysenko, M. A., & Fomina, D. S. (2025). Clinical and anamnestic analysis of patients with Stevens–Johnson syndrome/toxic epidermal necrolysis hospitalised in Moscow. Development of a prognostic model of unfavourable outcomes. Russian Journal of Allergy, 22(3), 233–247. Scopus. https://doi.org/10.36691/RJA16995
Nuha, N., Pitchay, S., Azni, A. H., Sahbudin, M. A. B., & Sahbudin, I. (2025). Beyond the outbreak: A review of big data analytics in proactive infectious disease prevention for risk mitigation for COVID-19. Journal of Big Data, 12(1). Scopus. https://doi.org/10.1186/s40537-025-01245-z
Pagsuyoin, S., Ng, C., Molejon, N., & Luo, Y. (2025). Coupling wastewater-based epidemiology with data-driven machine learning for managing public health risks. Risk Analysis, 45(10), 2974–2982. Scopus. https://doi.org/10.1111/risa.70075
Pujari, S., Saroliya, H., Gawde, V., Manral, E., Mehta, J., Patil, D., & Malvankar, R. (2026). Child Mortality Prediction in India: A Time Series Approach Using ARIMA and SARIMA Models. In S. Fong, N. Dey, & A. Joshi (Eds.), Lect. Notes Networks Syst.: Vol. 1652 LNNS (pp. 241–254). Springer Science and Business Media Deutschland GmbH; Scopus. https://doi.org/10.1007/978-3-032-06691-6_24
Sun, J., Xu, L., Huang, C., & Ng, E. Y. K. (2025). Climate change and health: The role of artificial intelligence in predictive surgical treatment. Innovation and Emerging Technologies, 12. Scopus. https://doi.org/10.1142/S2737599425500045
Swaminatha Rao, L. P., Suresh, A., & Muthukumar, A. (2025). BaSTRoN: a Bayesian model for predicting infectious disease spread using socio-economic and environmental factors. International Journal of Information Technology (Singapore), 17(8), 4805–4821. Scopus. https://doi.org/10.1007/s41870-025-02695-7
Webster, J. L., Eppes, S., Lee, B. K., Harrington, N. S., & Goldstein, N. D. (2025). Contrasting methods to operationalize antibiotic exposure in clinical research: A real-world application on health care-associated Clostridioides difficile infection. American Journal of Epidemiology, 194(5), 1448–1459. Scopus. https://doi.org/10.1093/aje/kwae302
Wu, A.-Q., Wen, Z.-X., Wu, Q.-S., Wang, C.-X., & Shi, J.-H. (2025). Construction and evaluation of a prediction model for the trend of acute respiratory infectious diseases based on multi—Source data including Symptom surveillance. Modern Preventive Medicine, 52(2), 220–226. Scopus. https://doi.org/10.20043/j.cnki.MPM.202407206
Xue, Y., Long, S., Lei, X., Zhang, J., Li, W., Zhao, L., Liu, Y., Li, H., Liu, Z., Zhang, R., Chen, Y., Wang, G., Guo, S., & Wen, L. (2025). Analysis of prognostic factors and construction of a prediction model for patients with initially treated severe pulmonary tuberculosis. Journal of Thoracic Disease, 17(10), 8584–8596. Scopus. https://doi.org/10.21037/jtd-2025-1059
Yang, Y., Wan, X., Zhang, N., Wu, Z., Qiu, R., Yuan, J., & Xie, Y. (2025). Analysis and modelling of global online public interest in multiple other infectious diseases due to the COVID-19 pandemic. Journal of Evaluation in Clinical Practice, 31(5). Scopus. https://doi.org/10.1111/jep.14206
Zhu, X., Shi, Y., & Zhong, Y. (2025). An EKF prediction of COVID-19 propagation under vaccinations and viral variants. Mathematics and Computers in Simulation, 231, 221–238. Scopus. https://doi.org/10.1016/j.matcom.2024.12.012
Authors
Copyright (c) 2025 Munkhzul Ganbat, Baatar Tserendorj, Selenge Batbold, Rustiyana Rustiyana

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.