Narrating Minimal Data: Rethinking Cohort-Based GPA Prediction in Low-Resource Higher Education Contexts
Abstract
Background. Student performance prediction has become a major topic in educational data mining and learning analytics. However, most previous studies rely on high-dimensional datasets such as attendance records, course-level grades, and learning management system logs, which are often unavailable in institutions with limited digital infrastructure.
Purpose. This study aims to evaluate the feasibility of predicting student academic performance using minimal institutional data and to establish a practical baseline for machine learning implementation in low-resource higher education contexts. Rather than maximizing predictive accuracy, this research examines the lower boundary of predictive capability when only simple academic variables are available.
Method. A quantitative descriptive–predictive design was applied to 355 student records from the Christian Religious Education Study Program at IAKN Manado, Indonesia. GPA values were categorized into four classes (Poor, Fair, Good, and Very Good). The dataset was split into 75% training and 25% testing subsets, and class imbalance was addressed using SMOTE. Four models were evaluated: Dummy Classifier, Decision Tree, Random Forest, and Neural Network (MLP). Performance was assessed using accuracy and 5-fold cross-validation.
Results. The Dummy Classifier achieved an accuracy of 15.73%, establishing a realistic baseline under balanced class conditions. Decision Tree and Random Forest produced the highest accuracy at 46.06%, while the Neural Network achieved 40.44%. However, cross-validation results remained lower, indicating limited generalization and possible overfitting under minimal-feature conditions.
Conclusion. This study shows that simple institutional data can still provide non-trivial predictive signals, but predictive performance remains moderate. The main contribution of this study lies in positioning minimal-data prediction as a baseline methodological framework for institutions with constrained academic datasets, rather than as a high-accuracy predictive solution.
Full text article
References
Abdullah, A., & Chemmangat, K. (2020). A computationally efficient sEMG-based silent speech interface using channel reduction and decision tree-based classification. Procedia Computer Science, 171, 119–127. https://doi.org/10.1016/j.procs.2020.04.013
Abukader, A., Alzubi, A., & Adegboye, O. R. (2025). Intelligent system for student performance prediction: An educational data mining approach using metaheuristic-optimized LightGBM with SHAP-based learning analytics. Applied Sciences, 15(20), 10875. https://doi.org/10.3390/app152010875
Agyemang, E. F., & Mensah, J. A. (2025). Predicting students’ academic performance via machine learning algorithms: An empirical review and practical application. Computer Engineering and Intelligent Systems, 15(1). https://doi.org/10.7176/CEIS/15-1-09
Albreiki, B., Zaki, N., & Alashwal, H. (2021). Student performance prediction using machine learning: A systematic literature review. Education Sciences, 11(9), 552. https://doi.org/10.3390/educsci11090552
Aldowah, H., Al-Samarraie, H., & Fauzy, W. M. (2020). Educational data mining and learning analytics for 21st century higher education: A review and synthesis. Telematics and Informatics, 37, 13–49. https://doi.org/10.1016/j.tele.2019.01.007
Atika, P. D. (2026). A comparative study of machine learning-based student dropout risk prediction. PIKSEL: Penelitian Ilmu Komputer Sistem Embedded and Logic, 14(1), 167–174. https://doi.org/10.33558/piksel.v14i1.12299
Blikstein, P., & Worsley, M. (2016). Multimodal learning analytics and education data mining: Using computational technologies to measure complex learning tasks. Journal of Learning Analytics, 3(2), 220–238. https://doi.org/10.18608/jla.2016.32.11
Boujmiraz, S., Darhmaoui, H., & Drissi El Maliani, A. (2026). Predicting student performance: A comprehensive review of machine learning, deep learning, and explainable AI approaches. Computers and Education: Artificial Intelligence, 10, 100548. https://doi.org/10.1016/j.caeai.2026.100548
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Dabhade, P., Agarwal, R., Alameen, K. P., Fathima, A. T., Sridharan, R., & Gopakumar, G. (2021). Educational data mining for predicting students’ academic performance using machine learning algorithms. Materials Today: Proceedings, 47(15), 5260–5267. https://doi.org/10.1016/j.matpr.2021.05.646
Diponegoro, M. H., Kusumawardani, S. S., & Hidayah, I. (2021). Implementation of deep learning methods in predicting student performance: A systematic literature review. Jurnal Nasional Teknik Elektro dan Teknologi Informasi, 10(2), 131–138. https://doi.org/10.22146/jnteti.v10i2.1417
Dutt, A., Ismail, M. A., & Herawan, T. (2021). A systematic review on educational data mining. IEEE Access, 9, 1344–1360. https://doi.org/10.1109/ACCESS.2017.2654247
Gaftandzhieva, S., & Talukder, A. (2022). Exploring online activities to predict the final grade of students. Mathematics, 10(20), 3758. https://doi.org/10.3390/math10203758
Gontzis, A. F. (2018). A predictive analytics framework as a countermeasure for attrition of students. Interactive Learning Environments. https://doi.org/10.1080/10494820.2019.1674884
Guanin-Fajardo, J. H., Guaña-Moya, J., & Casillas, J. (2024). Predicting academic success using machine learning. Data, 9(4), 60. https://doi.org/10.3390/data9040060
Hasan, M. M., Pal, B., & Arifin, S. (2018). Student academic performance prediction. IJACSA, 9(3), 389–395. https://doi.org/10.1109/ICCOINS.2018.8510600
Hashim, A., Akeel, W., & Hamoud, A. K. (2020). Student performance prediction model based on supervised machine learning algorithms. IOP Conference Series: Materials Science and Engineering, 928(3), 032019. https://doi.org/10.1088/1757-899X/928/3/032019
Huang, S., Fang, N., & Xu, Y. (2019). Predicting student academic performance using logs. Computers & Education, 150, 103842. https://doi.org/10.1080/10494820.2019.1636086
Kala, A., & Torkul, O. (2025). Early prediction of students’ performance through deep learning: A systematic and bibliometric literature review. SAUCIS Journal. https://doi.org/10.35377/saucis.1635558
Katarya, R. (2024). A systematic review on predicting the performance of students in higher education in offline mode using machine learning techniques. Wireless Personal Communications. https://doi.org/10.1007/s11277-023-10838-x
Kuzilek, J., Hlosta, M., & Zdrahal, Z. (2017). Open university learning analytics dataset. Scientific Data, 7, 1–9. https://doi.org/10.1038/s41597-020-00639-9
Namoun, A., & Alshanqiti, A. (2021). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences, 11(1), 237. https://doi.org/10.3390/app11010237
Nguyen, T., Gardner, L., & Sheridan, D. (2020). Data analytics in higher education. British Journal of Educational Technology, 51, 1537–1551. https://doi.org/10.1111/bjet.12910
Pallathadka, H., Wenda, A., Ramirez-Asís, E., Asís-López, M., Flores-Albornoz, J., & Phasinam, K. (2023). Classification and prediction of student performance data using various machine learning algorithms. Materials Today: Proceedings, 80(3), 3782–3785. https://doi.org/10.1016/j.matpr.2021.07.382
Sandra, L., Lumbangaol, F., & Matsuo, T. (2021). Machine learning algorithm to predict student’s performance: A systematic literature review. TEM Journal, 10(4). https://doi.org/10.18421/TEM104-56
Sarker, I. H. (2021). Machine learning: Algorithms, real world applications and research direction. SN Computer Science, 2, 160. https://doi.org/10.1007/s42979-021-00592-x
Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). Review on predicting student performance. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157
Shoukath, T. K., & Midhunchakkravarthy. (2024). A study on predictive modelling of student academic performance using machine learning method. Journal of Information Systems Engineering and Management, 10(1s). https://doi.org/10.52783/jisem.v10i1s.103
Tang, B., Li, S., & Zhao, C. (2024). Deep ensemble learning for prediction. Journal of Intelligence, 12(12), 124. https://doi.org/10.3390/jintelligence12120124
Tiwari, M., & Jain, N. (2024). Student performance prediction using machine learning algorithms. ShodhKosh: Journal of Visual and Performing Arts, 5(6). https://doi.org/10.29121/shodhkosh.v5.i6.2024.4552
Vora, D. R., & Rajamani, K. (2022). A hybrid classification model for prediction of academic performance of students: A big data application. Evolutionary Intelligence, 15, 1083–1096. https://doi.org/10.1007/s12065-019-00303-9
Wang, J., & Yu, Y. (2025). Machine learning approach to student performance prediction of online learning. PLOS ONE, 20(1), e0299018. https://doi.org/10.1371/journal.pone.0299018
Ya?c?, M. (2022). Educational data mining prediction. Smart Learning Environments, 9, 11. https://doi.org/10.1186/s40561-022-00192-z
Zhang, Y., Yun, Y., An, R., Cui, J., Dai, H., & Shang, X. (2021). Educational data mining techniques for student performance prediction: Method review and comparison analysis. Frontiers in Psychology, 12, 698490. https://doi.org/10.3389/fpsyg.2021.698490
Zhou, Y., Xu, B., & Li, Q. (2021). Student performance prediction with ML. IEEE Access, 9, 67849–67859. https://doi.org/10.1109/ACCESS.2021.3076875
Authors
Copyright (c) 2026 Berdinata Massang, Rolty Glendy Wowiling, Allin Junikhah, Firmanians Romula Tuerah, Andrew Nathanael Ratag, Febri Kurnia Manoppo

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.