Digital Epidemiology: Using Social Media Data and Machine Learning to Forecast Influenza Outbreaks and Inform Public Health Responses

Digital Epidemiology Influenza Forecasting Machine Learning Social Media LSTM

Authors

  • Benny Novico Zani
    bennynovico.phd@gmail.com
    Sekolah Tinggi Ilmu Kesehatan Raflesia, ID Indonesia
  • Ravi Raj Pandey Pokhara University, NP Nepal
  • Le Thi Lan Anh Hanoi Medical University, VN Viet Nam
December 5, 2024
June 20, 2025

Downloads

Background. Traditional influenza surveillance systems inherently suffer from a critical one-to-two-week reporting lag, severely hindering timely public health interventions and resource allocation.

Purpose. This research aims to develop and validate a hybrid digital epidemiology model using unstructured social media data and advanced Machine Learning (ML) to provide accurate, long-range influenza outbreak forecasts.

Method. The methodology involved quantitative time-series forecasting, training Long Short-Term Memory (LSTM) and XGBoost models on five years of social media data, and benchmarking against official clinical reports.

Results. The optimized LSTM model achieved significantly superior accuracy, recording a Root Mean Square Error (RMSE) of 0.145 for the four-week forecasting horizon, less than half the error of the traditional ARIMA baseline. This high predictive power confirms that social media is a statistically reliable, non-clinical leading indicator.

Conclusion. The study establishes a transparent policy translation framework, linking predicted incidence rates (e.g., exceeding 0.20) directly to required operational responses (e.g., hospital surge activation). This model offers a robust, actionable template for transforming public health surveillance from a reactive system into a proactive intelligence platform for epidemic preparedness.