Exploring GenAI as Evaluative and Formative Assessment Tools in Reading Assessment: A Mixed-methods Analysis of Genre-based Feedback

Chen Ziqi; Wei Wei; Chang Sheng; Cao Xueyan

doi:10.54855/

Vol. 26 No. 4 (2025), Research Articles

Vol. 26 No. 4 (2025)

Exploring GenAI as Evaluative and Formative Assessment Tools in Reading Assessment: A Mixed-methods Analysis of Genre-based Feedback

Research Articles

Published 2025-11-03

Chen Ziqi⁺⁻
Wei Wei⁺⁻
Chang Sheng⁺⁻
Cao Xueyan⁺⁻

Chen Ziqi

Macao Polytechnic University, Macao

https://orcid.org/0000-0003-3139-9723

Wei Wei

Macao Polytechnic University, Macao

https://orcid.org/0000-0002-1919-3178

Chang Sheng

Guangzhou Shiqiao Qiaoxing Secondary School, China

https://orcid.org/0009-0004-7203-5680

Cao Xueyan

Macao Polytechnic University, Macao

https://orcid.org/0009-0001-6538-2463

PDF

Keywords

Generative AI
reading assessment
reading comprehension
genre-based feedback

How to Cite

Exploring GenAI as Evaluative and Formative Assessment Tools in Reading Assessment: A Mixed-methods Analysis of Genre-based Feedback. (2025). Computer-Assisted Language Learning Electronic Journal, 26(4), 378-395. https://doi.org/10.54855/

Abstract

This study explores the potential of Generative AI (GenAI) chatbots as assessment tools in Computer-Assisted Language Learning (CALL) environments for assessing first-language (L1) reading comprehension, focusing on their effectiveness in providing feedback across three reading genres: classical literature, technical writing, and modern fiction. Using a mixed-methods approach, 360 students' responses to constructed-response items in reading assessments from junior secondary students in China were analyzed, comparing GenAI-generated scores and feedback to those provided by human evaluators. Six expert teachers further assessed the quality of the chatbot’s evaluative and revision feedback. Results indicated that GenAI exhibited a significantly stronger alignment with human raters in scoring low-level responses but struggled with high-level samples. Among the genres, interview data suggested that revision feedback for technical writing received the highest ratings for its clarity, rationality, and actionable recommendations. In contrast, feedback for classical literature was often overly complex for junior-level learners and lacked alignment with examination rubrics. For fiction, GenAI struggled with interpretive nuance, thematic complexity, and variability in question types, highlighting its limitations in fostering deep critical literary analysis. This study highlights the genre-specific strengths and limitations of GenAI in supporting reading comprehension.

PDF

References

Agarwal, C., & Chakraborty, P. (2019). A review of tools and techniques for computer aided pronunciation training (CAPT) in English. Education and Information Technologies, 24(6), 3731–3743. https://doi.org/10.1007/s10639-019-09955-7

Altay, S., Schwartz, M., Hacquin, A., Allard, A., Blancke, S., & Mercier, H. (2022). Scaling up interactive argumentation by providing counterarguments with a chatbot. Nature Human Behaviour, 6(4), 579–592. https://doi.org/10.1038/s41562-021-01271-w

Banihashem, S. K., Kerman, N. T., Noroozi, O., Moon, J., & Drachsler, H. (2024). Feedback sources in essay writing: peer-generated or AI-generated feedback? International Journal of Educational Technology in Higher Education, 21(1). https://doi.org/10.1186/s41239-024-00455-4

Barrett, A., & Pack, A. (2023). Not quite eye to A.I.: student and teacher perspectives on the use of generative artificial intelligence in the writing process. International Journal of Educational Technology in Higher Education, 20(1). https://doi.org/10.1186/s41239-023-00427-0

Basaraba, D., Yovanoff, P., Alonzo, J., & Tindal, G. (2012). Examining the structure of reading comprehension: do literal, inferential, and evaluative comprehension truly exist? Reading and Writing, 26(3), 349–379. https://doi.org/10.1007/s11145-012-9372-9

Bearman, M., & Ajjawi, R. (2023). Learning to work with the black box: Pedagogy for a world with artificial intelligence. British Journal of Educational Technology, 1160-1173. DOI: 10.1111/bjet.13337.

Chen, Z., Zhu, X., Lu, Q., & Wei, W. (2024). L2 students’ barriers in engaging with form and content-focused AI-generated feedback in revising their compositions. Computer Assisted Language Learning, 1–21. https://doi.org/10.1080/09588221.2024.2422478

Cheng, X., Yin, L., Lin, C., Shi, Z., Zheng, H., Zhu, L., Liu, X., Chen, K., & Dong, R. (2024). Chatbot dialogic reading boosts comprehension for Chinese kindergarteners with higher language skills. Journal of Experimental Child Psychology, 240, 105842. https://doi.org/10.1016/j.jecp.2023.105842

Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: The state of the field. International Journal of Educational Technology in Higher Education, 20(22), 1-22. https://doi.org/10.1186/s41239-023-00392-8.

De Winter, J. (2023). Can ChatGPT pass high school exams on English language comprehension? International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-023-00372-z

Deeva, G., Bogdanova, D., Serral, E., Snoeck, M., & De Weerdt, J. (2020). A review of automated feedback systems for learners: Classification framework, challenges and opportunities. Computers & Education, 162, 104094. https://doi.org/10.1016/j.compedu.2020.104094

Dong, S., & Xu, J. (2017). Language education in China: the Chinese curriculum. In Encyclopedia of Chinese language and linguistics. http://dx.doi.org/10.1163/2210-7363_ecll_COM_00000212

Escalante, J., Pack, A., & Barrett, A. (2023). AI-generated feedback on writing: insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education, 20(1), 57. https://doi.org/10.1186/s41239-023-00425-2

Ghafouri, M., Hassaskhah, J., & Mahdavi-Zafarghandi, A. (2024). From virtual assistant to writing mentor: Exploring the impact of a ChatGPT-based writing instruction protocol on EFL teachers’ self-efficacy and learners’ writing skill. Language Teaching Research, 0(0). https://doi.org/10.1177/13621688241239764

Guidance for generative AI in education and research. (2023). In UNESCO eBooks. https://doi.org/10.54675/ewzm9535.

Guo, K., & Wang, D. (2023). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29, 8435–8463. https://doi.org/10.1007/s10639-023-12146-0

Halliday, M. A. K., & Webster, J. (2009). Continuum companion to systemic functional linguistics. In Continuum eBooks. http://www.gbv.de/dms/goettingen/557990661.pdf

Huang, K., Liu, Y., Dong, M., & Lu, C. (2024). Integrating AIGC into product design ideation teaching: An empirical study on self-efficacy and learning outcomes. Learning and Instruction, 92, 101929. https://doi.org/10.1016/j.learninstruc.2024.101929

Huang, X., Zou, D., Cheng, G., Chen, X., & Xie, H. (2023). Trends, research issues and applications of artificial intelligence in language education. Educational Technology & Society, 26(1), 112-131. https://doi.org/10.30191/ETS.202301_26(1).0009

Janssen, M., Hartog, M., Matheus, R., Ding, A. Y., & Kuk, G. (2020). Will algorithms blind people? The effect of Explainable AI and Decision-Makers’ experience on AI-supported Decision-Making in Government. Social Science Computer Review, 40(2), 478–493. https://doi.org/10.1177/0894439320980118

Jiang, Z., Xu, Z., Pan, Z., He, J., & Xie, K. (2023). Exploring the Role of Artificial Intelligence in Facilitating Assessment of Writing Performance in Second Language Learning. Languages, 8(4), 247. https://doi.org/10.3390/languages8040247

Jose, J. (2024). The impact of integrating Microsoft Teams – Reading Progress as an Artificial Intelligence (AI) platform for promoting learners’ reading aloud skills. Education and Information Technologies. https://doi.org/10.1007/s10639-024-13074-3

Keyes, O., Hitzig, Z., & Blell, M. (2021). Truth from the machine: artificial intelligence and the materialization of identity. Interdisciplinary Science Reviews, 46(1–2), 158–175. https://doi.org/10.1080/03080188.2020.1840224

Larson, B. Z., Moser, C., Caza, A., Muehlfeld, K., & Colombo, L. A. (2024). Critical thinking in the age of generative AI. Academy of Management Learning and Education, 23(3), 373–378. https://doi.org/10.5465/amle.2024.0338

Lee, U., Jung, H., Jeon, Y., Sohn, Y., Hwang, W., Moon, J., & Kim, H. (2023). Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education. Education and Information Technologies. https://doi.org/10.1007/s10639-023-12249-8

Liang, J., Hwang, G, Chen, M., Darmawansah, D. (2023). Roles and research foci of artificial intelligence in language education: An integrated bibliographic analysis and systematic review approach. Interactive Learning Environments, 31(7), 4270-4296. https://doi.org/10.1080/10494820.2021.1958348

Lin, Z., & Chen, H. (2024). Investigating the capability of ChatGPT for generating multiple-choice reading comprehension items. System, 103344. https://doi.org/10.1016/j.system.2024.103344

Lu, Q., Yao, Y., Xiao, L., Yuan, M., Wang, J., & Zhu, X. (2024). Can ChatGPT effectively complement teacher assessment of undergraduate students’ academic writing? Assessment & Evaluation in Higher Education, 1–18. https://doi.org/10.1080/02602938.2024.2301722

Markowitz, D. M., & Hancock, J. T. (2023). Generative AI Are More Truth-Biased Than Humans: A Replication and Extension of Core Truth-Default Theory Principles. Journal of Language and Social Psychology, 43(2), 261-267. https://doi.org/10.1177/0261927x231220404

Nah, F. F., Zheng, R., Cai, J., Siau, K., & Chen, L. (2023). Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration. Journal of Information Technology Case and Application Research, 25(3), 277–304. https://doi.org/10.1080/15228053.2023.2233814

Nguyen-Trung, K., Saeri, A. K., & Kaufman, S. (2024). Applying ChatGPT and AI‐Powered tools to accelerate evidence reviews. Human Behavior and Emerging Technologies, 2024(1). https://doi.org/10.1155/2024/8815424

OpenAI. (2023). GPT-4 Technical Report. arXiv Preprints. https://arxiv.org/abs/2023.08774

Sasahara, K., Chen, W., Peng, H., Ciampaglia, G. L., Flammini, A., & Menczer, F. (2020). Social influence and unfollowing accelerate the emergence of echo chambers. Journal of Computational Social Science, 4(1), 381–402. https://doi.org/10.1007/s42001-020-00084-7

Shin, D., & Lee, J. H. (2023). Can ChatGPT make reading comprehension testing items on par with human experts? Language Learning & Technology, 27(3), 27–40. https://hdl.handle.net/10125/73530.

Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C. B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, 101894. https://doi.org/10.1016/j.learninstruc.2024.101894

Su, Y., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing, 57, 100752. https://doi.org/10.1016/j.asw.2023.100752

Thelwall, M., & Yaghi, A. (2024, September 25). In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results. arXiv.org. https://arxiv.org/abs/2409.16695

Tseng, Y., & Lin, Y. (2024). Enhancing English as a Foreign Language (EFL) Learners’ Writing with ChatGPT: A University-Level Course Design. Electronic Journal of e-Learning, 00. https://doi.org/10.34190/ejel.21.5.3329

Vázquez-Cano, E., Ramírez-Hurtado, J. M., Sáez-López, J. M., & López-Meneses, E. (2023). ChatGPT: The brightest student in the class. Thinking Skills and Creativity, 49, 101380. https://doi.org/10.1016/j.tsc.2023.101380

Wang, L., Chen, X., Wang, C., Xu, L., Shadiev, R., & Li, Y. (2024). ChatGPT’s capabilities in providing feedback on undergraduate students’ argumentation: A case study. Thinking Skills and Creativity, 51, 101440. https://doi.org/10.1016/j.tsc.2023.101440

Xu, Z., Wijekumar, K., Ramirez, G., Hu, X., & Irey, R. (2019). The effectiveness of intelligent tutoring systems on K-12 students’ reading comprehension: A meta-analysis. British Journal of Educational Technology, 50(6), 3119–3137. https://doi.org/10.1111/bjet.12758.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Exploring GenAI as Evaluative and Formative Assessment Tools in Reading Assessment: A Mixed-methods Analysis of Genre-based Feedback

Keywords

How to Cite

Download Citation

Abstract

References