The Effectiveness of Audio-Visual Learning Media in Improving Arabic Speaking, Listening, and Vocabulary Skills: Evidence from Indonesian Students
Abstract
Background. The increasing demands of global communication and the integration of technology in education have highlighted the importance of mastering foreign languages, including Arabic. However, many university students continue to demonstrate low proficiency in Arabic, particularly in speaking and listening skills. This issue is often associated with limited variation in instructional
Purpose. This study aims to evaluate the effectiveness of audio-visual media as a tool for strengthening Arabic language competency, particularly in speaking, listening, and vocabulary mastery.
Method. This research employed a quantitative quasi-experimental design involving two groups: an experimental group that utilized audio-visual media and a control group that applied conventional learning methods.
Results. The findings revealed that students in the experimental group showed significant improvement in speaking, listening, and vocabulary skills compared to those in the control group. Statistical analysis confirmed a significant difference in the mean scores, indicating that the use of audio-visual media had a positive and measurable impact on enhancing Arabic language competency.
Conclusion. The study concludes that audio-visual media is an effective and engaging learning strategy for improving Arabic language skills. Its implementation not only enhances students’ comprehension and participation but also supports their readiness to meet the challenges of modern, technology-driven learning environments.
Full text article
References
Ahirwal, M.K. (2020). Audio-visual stimulation based emotion classification by correlated EEG channels. Health and Technology, 10(1), 7–23. https://doi.org/10.1007/s12553-019-00394-5
Akman, A. (2024). Audio Explainable Artificial Intelligence: A Review. Intelligent Computing, 3(Query date: 2026-01-24 23:34:59). https://doi.org/10.34133/icomputing.0074
Arayata, PAE (2022). Chyilax: A 3D Video Game as a Marketing Tool for Mental Breakdown Awareness Campaign. 2022 IEEE 14th International Conference on Humanoid Nanotechnology Information Technology Communication and Control Environment and Management Hnicem 2022, Query date: 2026-01-24 23:34:59. https://doi.org/10.1109/HNICEM57413.2022.10109535
Cao, Y. (2023). Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment. IEEE Transactions on Image Processing, 32(Query date: 2026-01-24 23:34:59), 1882–1896. https://doi.org/10.1109/TIP.2023.3251695
Cilio, S. (2023). Analysis of quality information provided by “Dr. YouTubeTM” on Phimosis. International Journal of Impotence Research, 35(4), 398–403. https://doi.org/10.1038/s41443-022-00557-5
Cortivo, D.D. (2021). CNN-based multi-modal camera model identification on video sequences. Journal of Imaging, 7(8). https://doi.org/10.3390/jimaging7080135
Cover, R. (2022). Deepfake culture: The emergence of audio-video deception as an object of social anxiety and regulation. Continuum, 36(4), 609–621. https://doi.org/10.1080/10304312.2022.2084039
Gupta, R. (2023). Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023(Query date: 2026-01-24 23:34:59), 19923–19933. https://doi.org/10.1109/CVPR52729.2023.01908
Hannuksela, MM (2021). An Overview of Omnidirectional Media Format (OMAF). Proceedings of the IEEE, 109(9), 1590–1606. https://doi.org/10.1109/JPROC.2021.3063544
Hendriks, S. (2021). Azalea: Co-experience in remote dialogue through diminished reality and somaesthetic interaction design. Conference on Human Factors in Computing Systems Proceedings, Query date: 2026-01-24 23:34:59. https://doi.org/10.1145/3411764.3445052
Hu, R. (2021). AVMSN: An Audio-Visual Two Stream Crowd Counting Framework under Low-Quality Conditions. IEEE Access, 9(Query date: 2026-01-24 23:34:59), 80500–80510. https://doi.org/10.1109/ACCESS.2021.3074797
Ilyas, H. (2023). AVFakeNet: A unified end-to-end Dense Swin Transformer deep learning model for audio–visual deepfakes detection. Applied Soft Computing, 136(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.asoc.2023.110124
Madugalla, A. (2020). Creating Accessible Online Floor Plans for Visually Impaired Readers. ACM Transactions on Accessible Computing, 13(4). https://doi.org/10.1145/3410446
Martínez-Sánchez, M.E. (2021). Analysis of the social media strategy of audio-visual OTTs in Spain: The case study of Netflix, HBO and Amazon Prime during the implementation of Disney+. Technological Forecasting and Social Change, 173(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.techfore.2021.121178
Matta, G. (2022). Communicating Water, Sanitation, and Hygiene under Sustainable Development Goals 3, 4, and 6 as the Panacea for Epidemics and Pandemics Referencing the Succession of COVID-19 Surges. ACS Es and T Water, 2(5), 667–689. https://doi.org/10.1021/ACSESTWATER.1C00366
Maurya, R. (2020). An Extended Visual Cryptography Technique for Medical Image Security. 2nd International Conference on Innovative Mechanisms for Industry Applications Icimia 2020 Conference Proceedings, Query date: 2026-01-24 23:34:59, 415–421. https://doi.org/10.1109/ICIMIA48430.2020.9074910
Middya, AI (2022). Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities. Knowledge Based Systems, 244(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.knosys.2022.108580
Palani, B. (2022). CB-Fake: A multimodal deep learning framework for automatic fake news detection using capsule neural network and BERT. Multimedia Tools and Applications, 81(4), 5587–5620. https://doi.org/10.1007/s11042-021-11782-3
Patel, Y. (2023). Deepfake Generation and Detection: Case Study and Challenges. IEEE Access, 11(Query date: 2026-01-24 23:34:59), 143296–143323. https://doi.org/10.1109/ACCESS.2023.3342107
Prindani, A., Syauqillah, M., & Anam, K. (2025). Humanist Intelligence in the Context of Modern Security: Between Operational Effectiveness and Respect for Human Values. Journal of Cultural Analysis and Social Change, 10(3), 1396–1412. Scopus. https://doi.org/10.64753/jcasc.v10i3.2612
Ros, G., Fraile Rey, A. F., Calonge, A., & López-Carrillo, M. D. (2022). The Design of a Teaching-Learning Sequence on Simple Machines in Elementary Education and its Benefit on Creativity and Self-Regulation. Eurasia Journal of Mathematics, Science and Technology Education, 18(1), 1–22. Scopus. https://doi.org/10.29333/EJMSTE/11487
Revelli, VP (2022). Automate extraction of braille text to speech from an image. Advances in Software Engineering, 172(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.advengsoft.2022.103180
Röchert, D. (2022). Caught in a networked collusion? Homogeneity in conspiracy-related discussion networks on YouTube. Information Systems, 103(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.is.2021.101866
Shih-En Leu, G. (2020). My Life the Way I See It: Reconstructing Minoritized Youth With Disabilities as Critical Thinkers. Journal of Adolescent and Adult Literacy, 64(2), 181–190. Scopus. https://doi.org/10.1002/jaal.1083
Shittu, E., Dashiell-Shoffner, J., & Kim, H. N. (2021). Examining Black Diaspora Participation in Engineering using Narrative Inquiry. CoNECD. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124516789&partnerID=40&md5=e65ef9fb4d969ebb09bed2dca615ac02
Suwa, M. (2019). Methodological and Philosophical Issues in Studies on Embodied Knowledge. New Generation Computing, 37(2), 167–184. Scopus. https://doi.org/10.1007/s00354-019-00055-1
Teeple, S. K., & Benolken, A. (2023). Exploring the relationship between social-emotional competencies and student performance in online learning environments. E-Learning and Digital Media, 20(5), 460–472. Scopus. https://doi.org/10.1177/20427530221117328
Tran, T., Ternov, N. K., Weber, J., Barata, C., Berry, E. G., Doan, H. Q., Marghoob, A. A., Seiverling, E. V., Sinclair, S., Stein, J. A., Stoos, E. R., Tolsgaard, M. G., Wolfensperger, M., Braun, R. P., & Nelson, K. C. (2022). Instructional Strategies to Enhance Dermoscopic Image Interpretation Education: A Review of the Literature. Dermatology Practical and Conceptual, 12(4). Scopus. https://doi.org/10.5826/dpc.1204a189
Uppada, S. K. (2023). An image and text-based multimodal model for detecting fake news in OSN's. Journal of Intelligent Information Systems, 61(2), 367–393. https://doi.org/10.1007/s10844-022-00764-y
Xia, L. (2020). Audiovisual speech recognition: A review and forecast. International Journal of Advanced Robotic Systems, 17(6). https://doi.org/10.1177/1729881420976082
Xu, H. (2022). Audio-Visual Autoencoding for Privacy-Preserving Video Streaming. IEEE Internet of Things Journal, 9(3), 1749–1761. https://doi.org/10.1109/JIOT.2021.3089080
Xuan, H. (2020). Cross-modal attention network for temporal inconsistent audio-visual event localization. Aaai 2020 34th Aaai Conference on Artificial Intelligence, Query date: 2026-01-24 23:34:59, 279–286.
Yang, W. (2023). AVoiD-DF: Audio-Visual Joint Learning for Detecting Deepfakes. IEEE Transactions on Information Forensics and Security, 18(Query date: 2026-01-24 23:34:59), 2015–2029. https://doi.org/10.1109/TIFS.2023.3262148
Zhang, X. (2020). DAVD-net: Deep audio-aided video decompression of talking heads. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Query date: 2026-01-24 23:34:59, 12332–12341. https://doi.org/10.1109/CVPR42600.2020.01235
Zhao, B. (2023). AudioVisual Video Summarization. IEEE Transactions on Neural Networks and Learning Systems, 34(8), 5181–5188. https://doi.org/10.1109/TNNLS.2021.3119969