Digital Epidemiology: Using Social Media Data and Machine Learning to Forecast Influenza Outbreaks and Inform Public Health Responses

Digital Epidemiology Influenza Forecasting Machine Learning Social Media LSTM

Authors

December 26, 2025
June 20, 2025

Downloads

Background. Traditional influenza surveillance systems inherently suffer from a critical one-to-two-week reporting lag, severely hindering timely public health interventions and resource allocation.

Purpose. This research aims to develop and validate a hybrid digital epidemiology model using unstructured social media data and advanced Machine Learning (ML) to provide accurate, long-range influenza outbreak forecasts.

Method. The methodology involved quantitative time-series forecasting, training Long Short-Term Memory (LSTM) and XGBoost models on five years of social media data, and benchmarking against official clinical reports.

Results. The optimized LSTM model achieved significantly superior accuracy, recording a Root Mean Square Error (RMSE) of 0.145 for the four-week forecasting horizon, less than half the error of the traditional ARIMA baseline. This high predictive power confirms that social media is a statistically reliable, non-clinical leading indicator.

Conclusion. The study establishes a transparent policy translation framework, linking predicted incidence rates (e.g., exceeding 0.20) directly to required operational responses (e.g., hospital surge activation). This model offers a robust, actionable template for transforming public health surveillance from a reactive system into a proactive intelligence platform for epidemic preparedness.