The Effectiveness of Audio-Visual Learning Media in Improving Arabic Speaking, Listening, and Vocabulary Skills: Evidence from Indonesian Students

Muh. Fihris Khalik

doi:10.70177/ijen.v4i1.3307

Muh. Fihris Khalik ⁽¹⁾

(1) Universitas Islam Makassar, Indonesia

https://doi.org/10.70177/ijen.v4i1.3307

Issue
Vol. 4 No. 1 (2026)

Submitted
27 January 2026

Published
24 February 2026

Keywords:

Arabic Language, Learning Media, Vocabulary Acquisition

PDF

Abstract

Background. The increasing demands of global communication and the integration of technology in education have highlighted the importance of mastering foreign languages, including Arabic. However, many university students continue to demonstrate low proficiency in Arabic, particularly in speaking and listening skills. This issue is often associated with limited variation in instructional

Purpose. This study aims to evaluate the effectiveness of audio-visual media as a tool for strengthening Arabic language competency, particularly in speaking, listening, and vocabulary mastery.

Method. This research employed a quantitative quasi-experimental design involving two groups: an experimental group that utilized audio-visual media and a control group that applied conventional learning methods.

Results. The findings revealed that students in the experimental group showed significant improvement in speaking, listening, and vocabulary skills compared to those in the control group. Statistical analysis confirmed a significant difference in the mean scores, indicating that the use of audio-visual media had a positive and measurable impact on enhancing Arabic language competency.

Conclusion. The study concludes that audio-visual media is an effective and engaging learning strategy for improving Arabic language skills. Its implementation not only enhances students’ comprehension and participation but also supports their readiness to meet the challenges of modern, technology-driven learning environments.

Full text article

Generated from XML file

References

Ahirwal, M.K. (2020). Audio-visual stimulation based emotion classification by correlated EEG channels. Health and Technology, 10(1), 7–23. https://doi.org/10.1007/s12553-019-00394-5

Akman, A. (2024). Audio Explainable Artificial Intelligence: A Review. Intelligent Computing, 3(Query date: 2026-01-24 23:34:59). https://doi.org/10.34133/icomputing.0074

Arayata, PAE (2022). Chyilax: A 3D Video Game as a Marketing Tool for Mental Breakdown Awareness Campaign. 2022 IEEE 14th International Conference on Humanoid Nanotechnology Information Technology Communication and Control Environment and Management Hnicem 2022, Query date: 2026-01-24 23:34:59. https://doi.org/10.1109/HNICEM57413.2022.10109535

Cao, Y. (2023). Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment. IEEE Transactions on Image Processing, 32(Query date: 2026-01-24 23:34:59), 1882–1896. https://doi.org/10.1109/TIP.2023.3251695

Cilio, S. (2023). Analysis of quality information provided by “Dr. YouTubeTM” on Phimosis. International Journal of Impotence Research, 35(4), 398–403. https://doi.org/10.1038/s41443-022-00557-5

Cortivo, D.D. (2021). CNN-based multi-modal camera model identification on video sequences. Journal of Imaging, 7(8). https://doi.org/10.3390/jimaging7080135

Cover, R. (2022). Deepfake culture: The emergence of audio-video deception as an object of social anxiety and regulation. Continuum, 36(4), 609–621. https://doi.org/10.1080/10304312.2022.2084039

Gupta, R. (2023). Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2023(Query date: 2026-01-24 23:34:59), 19923–19933. https://doi.org/10.1109/CVPR52729.2023.01908

Hannuksela, MM (2021). An Overview of Omnidirectional Media Format (OMAF). Proceedings of the IEEE, 109(9), 1590–1606. https://doi.org/10.1109/JPROC.2021.3063544

Hendriks, S. (2021). Azalea: Co-experience in remote dialogue through diminished reality and somaesthetic interaction design. Conference on Human Factors in Computing Systems Proceedings, Query date: 2026-01-24 23:34:59. https://doi.org/10.1145/3411764.3445052

Hu, R. (2021). AVMSN: An Audio-Visual Two Stream Crowd Counting Framework under Low-Quality Conditions. IEEE Access, 9(Query date: 2026-01-24 23:34:59), 80500–80510. https://doi.org/10.1109/ACCESS.2021.3074797

Ilyas, H. (2023). AVFakeNet: A unified end-to-end Dense Swin Transformer deep learning model for audio–visual deepfakes detection. Applied Soft Computing, 136(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.asoc.2023.110124

Madugalla, A. (2020). Creating Accessible Online Floor Plans for Visually Impaired Readers. ACM Transactions on Accessible Computing, 13(4). https://doi.org/10.1145/3410446

Martínez-Sánchez, M.E. (2021). Analysis of the social media strategy of audio-visual OTTs in Spain: The case study of Netflix, HBO and Amazon Prime during the implementation of Disney+. Technological Forecasting and Social Change, 173(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.techfore.2021.121178

Matta, G. (2022). Communicating Water, Sanitation, and Hygiene under Sustainable Development Goals 3, 4, and 6 as the Panacea for Epidemics and Pandemics Referencing the Succession of COVID-19 Surges. ACS Es and T Water, 2(5), 667–689. https://doi.org/10.1021/ACSESTWATER.1C00366

Maurya, R. (2020). An Extended Visual Cryptography Technique for Medical Image Security. 2nd International Conference on Innovative Mechanisms for Industry Applications Icimia 2020 Conference Proceedings, Query date: 2026-01-24 23:34:59, 415–421. https://doi.org/10.1109/ICIMIA48430.2020.9074910

Middya, AI (2022). Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities. Knowledge Based Systems, 244(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.knosys.2022.108580

Palani, B. (2022). CB-Fake: A multimodal deep learning framework for automatic fake news detection using capsule neural network and BERT. Multimedia Tools and Applications, 81(4), 5587–5620. https://doi.org/10.1007/s11042-021-11782-3

Patel, Y. (2023). Deepfake Generation and Detection: Case Study and Challenges. IEEE Access, 11(Query date: 2026-01-24 23:34:59), 143296–143323. https://doi.org/10.1109/ACCESS.2023.3342107

Prindani, A., Syauqillah, M., & Anam, K. (2025). Humanist Intelligence in the Context of Modern Security: Between Operational Effectiveness and Respect for Human Values. Journal of Cultural Analysis and Social Change, 10(3), 1396–1412. Scopus. https://doi.org/10.64753/jcasc.v10i3.2612

Ros, G., Fraile Rey, A. F., Calonge, A., & López-Carrillo, M. D. (2022). The Design of a Teaching-Learning Sequence on Simple Machines in Elementary Education and its Benefit on Creativity and Self-Regulation. Eurasia Journal of Mathematics, Science and Technology Education, 18(1), 1–22. Scopus. https://doi.org/10.29333/EJMSTE/11487

Revelli, VP (2022). Automate extraction of braille text to speech from an image. Advances in Software Engineering, 172(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.advengsoft.2022.103180

Röchert, D. (2022). Caught in a networked collusion? Homogeneity in conspiracy-related discussion networks on YouTube. Information Systems, 103(Query date: 2026-01-24 23:34:59). https://doi.org/10.1016/j.is.2021.101866

Shih-En Leu, G. (2020). My Life the Way I See It: Reconstructing Minoritized Youth With Disabilities as Critical Thinkers. Journal of Adolescent and Adult Literacy, 64(2), 181–190. Scopus. https://doi.org/10.1002/jaal.1083

Shittu, E., Dashiell-Shoffner, J., & Kim, H. N. (2021). Examining Black Diaspora Participation in Engineering using Narrative Inquiry. CoNECD. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124516789&partnerID=40&md5=e65ef9fb4d969ebb09bed2dca615ac02

Suwa, M. (2019). Methodological and Philosophical Issues in Studies on Embodied Knowledge. New Generation Computing, 37(2), 167–184. Scopus. https://doi.org/10.1007/s00354-019-00055-1

Teeple, S. K., & Benolken, A. (2023). Exploring the relationship between social-emotional competencies and student performance in online learning environments. E-Learning and Digital Media, 20(5), 460–472. Scopus. https://doi.org/10.1177/20427530221117328

Tran, T., Ternov, N. K., Weber, J., Barata, C., Berry, E. G., Doan, H. Q., Marghoob, A. A., Seiverling, E. V., Sinclair, S., Stein, J. A., Stoos, E. R., Tolsgaard, M. G., Wolfensperger, M., Braun, R. P., & Nelson, K. C. (2022). Instructional Strategies to Enhance Dermoscopic Image Interpretation Education: A Review of the Literature. Dermatology Practical and Conceptual, 12(4). Scopus. https://doi.org/10.5826/dpc.1204a189

Uppada, S. K. (2023). An image and text-based multimodal model for detecting fake news in OSN's. Journal of Intelligent Information Systems, 61(2), 367–393. https://doi.org/10.1007/s10844-022-00764-y

Xia, L. (2020). Audiovisual speech recognition: A review and forecast. International Journal of Advanced Robotic Systems, 17(6). https://doi.org/10.1177/1729881420976082

Xu, H. (2022). Audio-Visual Autoencoding for Privacy-Preserving Video Streaming. IEEE Internet of Things Journal, 9(3), 1749–1761. https://doi.org/10.1109/JIOT.2021.3089080

Xuan, H. (2020). Cross-modal attention network for temporal inconsistent audio-visual event localization. Aaai 2020 34th Aaai Conference on Artificial Intelligence, Query date: 2026-01-24 23:34:59, 279–286.

Yang, W. (2023). AVoiD-DF: Audio-Visual Joint Learning for Detecting Deepfakes. IEEE Transactions on Information Forensics and Security, 18(Query date: 2026-01-24 23:34:59), 2015–2029. https://doi.org/10.1109/TIFS.2023.3262148

Zhang, X. (2020). DAVD-net: Deep audio-aided video decompression of talking heads. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Query date: 2026-01-24 23:34:59, 12332–12341. https://doi.org/10.1109/CVPR42600.2020.01235

Zhao, B. (2023). AudioVisual Video Summarization. IEEE Transactions on Neural Networks and Learning Systems, 34(8), 5181–5188. https://doi.org/10.1109/TNNLS.2021.3119969

Authors

Muh. Fihris Khalik

Universitas Islam Makassar

fihriskhalik@gmail.com (Primary Contact)

Khalik, M. F. (2026). The Effectiveness of Audio-Visual Learning Media in Improving Arabic Speaking, Listening, and Vocabulary Skills: Evidence from Indonesian Students. International Journal of Educational Narratives, 4(1), 1–10. https://doi.org/10.70177/ijen.v4i1.3307