Abstract
This study explores the potential of Generative AI (GenAI) chatbots as assessment tools in Computer-Assisted Language Learning (CALL) environments for assessing first-language (L1) reading comprehension, focusing on their effectiveness in providing feedback across three reading genres: classical literature, technical writing, and modern fiction. Using a mixed-methods approach, 360 students' responses to constructed-response items in reading assessments from junior secondary students in China were analyzed, comparing GenAI-generated scores and feedback to those provided by human evaluators. Six expert teachers further assessed the quality of the chatbot’s evaluative and revision feedback. Results indicated that GenAI exhibited a significantly stronger alignment with human raters in scoring low-level responses but struggled with high-level samples. Among the genres, interview data suggested that revision feedback for technical writing received the highest ratings for its clarity, rationality, and actionable recommendations. In contrast, feedback for classical literature was often overly complex for junior-level learners and lacked alignment with examination rubrics. For fiction, GenAI struggled with interpretive nuance, thematic complexity, and variability in question types, highlighting its limitations in fostering deep critical literary analysis. This study highlights the genre-specific strengths and limitations of GenAI in supporting reading comprehension.
References
Agarwal, C., & Chakraborty, P. (2019). A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Education and Information Technologies, 24(6), 3731–3743. https://doi.org/10.1007/s10639-019-09955-7
Altay, S., Schwartz, M., Hacquin, A., Allard, A., Blancke, S., & Mercier, H. (2022). Scaling up interactive argumentation by providing counterarguments with a chatbot. Nature Human Behaviour, 6(4), 579–592. https://doi.org/10.1038/s41562-021-01271-w
Banihashem, S. K., Kerman, N. T., Noroozi, O., Moon, J., & Drachsler, H. (2024). Feedback sources in essay writing: peer-generated or AI-generated feedback? International Journal of Educational Technology in Higher Education, 21(1). https://doi.org/10.1186/s41239-024-00455-4
Barrett, A., & Pack, A. (2023). Not quite eye to A.I.: student and teacher perspectives on the use of generative artificial intelligence in the writing process. International Journal of Educational Technology in Higher Education, 20(1). https://doi.org/10.1186/s41239-023-00427-0
Basaraba, D., Yovanoff, P., Alonzo, J., & Tindal, G. (2012). Examining the structure of reading comprehension: do literal, inferential, and evaluative comprehension truly exist? Reading and Writing, 26(3), 349–379. https://doi.org/10.1007/s11145-012-9372-9
Bearman, M., & Ajjawi, R. (2023). Learning to work with the black box: Pedagogy for a world with artificial intelligence. British Journal of Educational Technology, 1160-1173. DOI: 10.1111/bjet.13337.
Chen, Z., Zhu, X., Lu, Q., & Wei, W. (2024). L2 students’ barriers in engaging with form and content-focused AI-generated feedback in revising their compositions. Computer Assisted Language Learning, 1–21. https://doi.org/10.1080/09588221.2024.2422478
Cheng, X., Yin, L., Lin, C., Shi, Z., Zheng, H., Zhu, L., Liu, X., Chen, K., & Dong, R. (2024). Chatbot dialogic reading boosts comprehension for Chinese kindergarteners with higher language skills. Journal of Experimental Child Psychology, 240, 105842. https://doi.org/10.1016/j.jecp.2023.105842
Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: The state of the field. International Journal of Educational Technology in Higher Education, 20(22), 1-22. https://doi.org/10.1186/s41239-023-00392-8.
De Winter, J. (2023). Can ChatGPT pass high school exams on English language comprehension? International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-023-00372-z
Deeva, G., Bogdanova, D., Serral, E., Snoeck, M., & De Weerdt, J. (2020). A review of automated feedback systems for learners: Classification framework, challenges and opportunities. Computers & Education, 162, 104094. https://doi.org/10.1016/j.compedu.2020.104094
Dong, S., & Xu, J. (2017). Language education in China: the Chinese curriculum. In Encyclopedia of Chinese language and linguistics. http://dx.doi.org/10.1163/2210-7363_ecll_COM_00000212
Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20(1), 57. https://doi.org/10.1186/s41239-023-00425-2
Ghafouri, M., Hassaskhah, J., & Mahdavi-Zafarghandi, A. (2024). From virtual assistant to writing mentor: Exploring the impact of a ChatGPT-based writing instruction protocol on EFL teachers’ self-efficacy and learners’ writing skill. Language Teaching Research, 0(0). https://doi.org/10.1177/13621688241239764
Guidance for generative AI in education and research. (2023). In UNESCO eBooks. https://doi.org/10.54675/ewzm9535.
Guo, K., & Wang, D. (2023). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29, 8435–8463. https://doi.org/10.1007/s10639-023-12146-0
Halliday, M. A. K., & Webster, J. (2009). Continuum companion to systemic functional linguistics. In Continuum eBooks. http://www.gbv.de/dms/goettingen/557990661.pdf
Huang, K., Liu, Y., Dong, M., & Lu, C. (2024). Integrating AIGC into product design ideation teaching: An empirical study on self-efficacy and learning outcomes. Learning and Instruction, 92, 101929. https://doi.org/10.1016/j.learninstruc.2024.101929
Huang, X., Zou, D., Cheng, G., Chen, X., & Xie, H. (2023). Trends, research issues and applications of artificial intelligence in language education. Educational Technology & Society, 26(1), 112-131. https://doi.org/10.30191/ETS.202301_26(1).0009
Janssen, M., Hartog, M., Matheus, R., Ding, A. Y., & Kuk, G. (2020). Will algorithms blind people? The effect of Explainable AI and Decision-Makers’ experience on AI-supported Decision-Making in Government. Social Science Computer Review, 40(2), 478–493. https://doi.org/10.1177/0894439320980118
Jiang, Z., Xu, Z., Pan, Z., He, J., & Xie, K. (2023). Exploring the Role of Artificial Intelligence in Facilitating Assessment of Writing Performance in Second Language Learning. Languages, 8(4), 247. https://doi.org/10.3390/languages8040247
Jose, J. (2024). The impact of integrating Microsoft Teams – Reading Progress as an Artificial Intelligence (AI) platform for promoting learners’ reading aloud skills. Education and Information Technologies. https://doi.org/10.1007/s10639-024-13074-3
Keyes, O., Hitzig, Z., & Blell, M. (2021). Truth from the machine: artificial intelligence and the materialization of identity. Interdisciplinary Science Reviews, 46(1–2), 158–175. https://doi.org/10.1080/03080188.2020.1840224
Larson, B. Z., Moser, C., Caza, A., Muehlfeld, K., & Colombo, L. A. (2024). Critical thinking in the age of generative AI. Academy of Management Learning and Education, 23(3), 373–378. https://doi.org/10.5465/amle.2024.0338
Lee, U., Jung, H., Jeon, Y., Sohn, Y., Hwang, W., Moon, J., & Kim, H. (2023). Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education. Education and Information Technologies. https://doi.org/10.1007/s10639-023-12249-8
Liang, J., Hwang, G, Chen, M., Darmawansah, D. (2023). Roles and research foci of artificial intelligence in language education: An integrated bibliographic analysis and systematic review approach. Interactive Learning Environments, 31(7), 4270-4296. https://doi.org/10.1080/10494820.2021.1958348
Lin, Z., & Chen, H. (2024). Investigating the capability of ChatGPT for generating multiple-choice reading comprehension items. System, 103344. https://doi.org/10.1016/j.system.2024.103344
Lu, Q., Yao, Y., Xiao, L., Yuan, M., Wang, J., & Zhu, X. (2024). Can ChatGPT effectively complement teacher assessment of undergraduate students’ academic writing? Assessment & Evaluation in Higher Education, 1–18. https://doi.org/10.1080/02602938.2024.2301722
Markowitz, D. M., & Hancock, J. T. (2023). Generative AI Are More Truth-Biased Than Humans: A Replication and Extension of Core Truth-Default Theory Principles. Journal of Language and Social Psychology, 43(2), 261-267. https://doi.org/10.1177/0261927x231220404
Nah, F. F., Zheng, R., Cai, J., Siau, K., & Chen, L. (2023). Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration. Journal of Information Technology Case and Application Research, 25(3), 277–304. https://doi.org/10.1080/15228053.2023.2233814
Nguyen-Trung, K., Saeri, A. K., & Kaufman, S. (2024). Applying ChatGPT and AI‐Powered tools to accelerate evidence reviews. Human Behavior and Emerging Technologies, 2024(1). https://doi.org/10.1155/2024/8815424
OpenAI. (2023). GPT-4 Technical Report. arXiv Preprints. https://arxiv.org/abs/2023.08774
Sasahara, K., Chen, W., Peng, H., Ciampaglia, G. L., Flammini, A., & Menczer, F. (2020). Social influence and unfollowing accelerate the emergence of echo chambers. Journal of Computational Social Science, 4(1), 381–402. https://doi.org/10.1007/s42001-020-00084-7
Shin, D., & Lee, J. H. (2023). Can ChatGPT make reading comprehension testing items on par with human experts? Language Learning & Technology, 27(3), 27–40. https://hdl.handle.net/10125/73530.
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
Su, Y., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing, 57, 100752. https://doi.org/10.1016/j.asw.2023.100752
Thelwall, M., & Yaghi, A. (2024, September 25). In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results. arXiv.org. https://arxiv.org/abs/2409.16695
Tseng, Y., & Lin, Y. (2024). Enhancing English as a Foreign Language (EFL) Learners’ Writing with ChatGPT: A University-Level Course Design. Electronic Journal of e-Learning, 00. https://doi.org/10.34190/ejel.21.5.3329
Vázquez-Cano, E., Ramírez-Hurtado, J. M., Sáez-López, J. M., & López-Meneses, E. (2023). ChatGPT: The brightest student in the class. Thinking Skills and Creativity, 49, 101380. https://doi.org/10.1016/j.tsc.2023.101380
Wang, L., Chen, X., Wang, C., Xu, L., Shadiev, R., & Li, Y. (2024). ChatGPT’s capabilities in providing feedback on undergraduate students’ argumentation: A case study. Thinking Skills and Creativity, 51, 101440. https://doi.org/10.1016/j.tsc.2023.101440
Xu, Z., Wijekumar, K., Ramirez, G., Hu, X., & Irey, R. (2019). The effectiveness of intelligent tutoring systems on K-12 students’ reading comprehension: A meta-analysis. British Journal of Educational Technology, 50(6), 3119–3137. https://doi.org/10.1111/bjet.12758.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2025 Author and CALL-EJ
