Text-to-Speech in High-Variability Phonetic Training: Focus on L2 Phonological Awareness

Forcan Al-Shami; Walcir Cardoso

doi:10.54855/callej.123123

Authors

Forcan Al-Shami Concordia University , Concordia University , Concordia University , Concordia University Author https://orcid.org/0009-0003-6893-0220 (unauthenticated)
Walcir Cardoso Concordia University , Concordia University , Concordia University , Concordia University Author https://orcid.org/0000-0001-6376-185X (unauthenticated)

DOI:

https://doi.org/10.54855/callej.123123

Keywords:

High-Variability Phonetic Training (HVPT), Text-To-Speech (TTS), Second Language Acquisition, L2 Pronunciation, Phonological Awareness

Abstract

Time and space constraints in foreign/second language (L2) instruction often restrict learners’ exposure to phonetic variability, a key factor in pronunciation development. High-Variability Phonetic Training (HVPT) offers a promising solution by exposing learners to phonetic variation; however, its implementation into instructional settings remains underexplored. This study investigates the integration of Text-To-Speech (TTS) technology with HVPT to provide varied L2 input in a semi-autonomous (beyond-the-classroom) environment. A mixed-methods pretest-posttest design examined discrete aspects of English pronunciation development, focusing on learners’ phonological awareness of past -ed allomorphy. Thirty Arabic-speaking adult ESL learners in Kuwait were divided into a Treatment Group (exposed to varied TTS voices) and a Control Group (exposed to a single TTS voice), engaging in self-paced listening, categorization, and form-focused activities over four weeks. Results revealed significant improvements in phonological awareness for both groups, with no statistically significant difference between them. These findings contribute to ongoing debates about HVPT’s added value in semi-autonomous settings and suggest that TTS technology alone—whether implemented with HVPT or not—can effectively support phonological awareness, offering a flexible and accessible tool for L2 pronunciation practice.

Author Biographies

Forcan Al-Shami, Concordia University, Concordia University, Concordia University, Concordia University

Forcan Al-Shami is a PhD student in the Department of Education at Concordia University, Canada. Her research focuses on exploring the integration of text-to-speech (TTS) technology with high-variability phonetic training (HVPT) to enhance L2 pronunciation. Since 2011, she has specialized in teaching English as a second language to Arabic speakers in Kuwait.
Walcir Cardoso, Concordia University, Concordia University, Concordia University, Concordia University

Walcir Cardoso is a Professor of Applied Linguistics at Concordia University, Canada. He conducts research on the L2 acquisition of phonology, morphosyntax and vocabulary, and the effects of computer technology (e.g., clickers, text-to-speech synthesizers, automatic speech recognition, intelligent personal assistants) on L2 learning.

References

Anthony, J. L., & Francis, D. J. (2005). Development of phonological awareness. Current Directions in Psychological Science, 14(5), 255–259. https://doi.org/10.1111/j.0963-7214.2005.00376.x

Barcomb, M., & Cardoso, W. (2020). Rock or lock? Gamifying an online course management system for pronunciation instruction: Focus on English /r/ and /l/. CALICO Journal, 37(2), 127–147. https://doi.org/10.1558/cj.36996

Barcroft, J., & Sommers, M. S. (2005). Effects of acoustic variability on second language vocabulary learning. Studies in Second Language Acquisition, 27(3), 387–414. https://doi.org/10.1017/S0272263105050175

Barriuso, T. A., & Hayes-Harb, R. (2018). High variability phonetic training as a bridge from research to practice. CATESOL Journal, 30(1), 177-194.

Barros, A. M. V. (2003). Pronunciation difficulties in the consonant system experienced by Arabic speakers when learning English after the age of puberty [Unpublished Master’s thesis]. West Virginia University, Morgantown. https://doi.org/10.33915/etd.766

Bione, T., & Cardoso, W. (2020). Synthetic voices in the foreign language context. Language Learning & Technology, 24(1), 169–186. https://doi.org/10125/44715

Bione, T., Grimshaw, J., & Cardoso, W. (2016). An evaluation of text-to-speech synthesizers in the foreign language classroom: learners’ perceptions. In S. Papadima-Sophocleous, L. Bradley & S. Thouësny (Eds), CALL communities and culture – short papers from EUROCALL 2016 (pp. 50-54). Research-publishing.net. https://doi.org/10.14705/rpnet.2016.eurocall2016.537

Bione, T., Grimshaw, J., & Cardoso, W. (2017). An evaluation of TTS as a pedagogical tool for pronunciation instruction: the ‘foreign’ language context. In K. Borthwick, L. Bradley & S. Thouësny (Eds), CALL in a climate of change: adapting to turbulent global conditions – short papers from EUROCALL 2017 (pp. 56-61). Research-publishing.net. https://doi.org/10.14705/rpnet.2017.eurocall2017.689

Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to non-native speech. Cognition, 106(2), 707–729. https://doi.org/10.1016/j.cognition.2007.04.005

Bradlow, A. R., Akahane-Yamada, R., Pisoni, D. B., & Tohkura, Y. (1999). Training Japanese listeners to identify English /r/and /l/: Long-term retention of learning in perception and production. Perception & Psychophysics, 61(5), 977–985. https://doi.org/10.3758/bf03206911

Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. (1997). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. The Journal of the Acoustical Society of America, 101(4), 2299–2310. https://doi.org/10.1121/1.418276

Cardoso, W. (2011). The development of coda perception in second language phonology: A variationist perspective. Second Language Research, 27(4), 433-465. https://doi.org/10.1177/0267658311413540

Cardoso, W. (2018). Learning L2 pronunciation with a text-to-speech synthesizer. In Taalas, P., Jalkanen, J., Bradley, L., & Thouësny, S., (Eds.), Proceedings of the European Association for Computer-Assisted Language Learning – EUROCALL 2018 (pp.16-21). https://doi.org/10.14705/rpnet.2018.26.806

Cardoso, W. (2022). Technology for Speaking Development. In T. Derwing, M. Munro, & R. Thomson (Eds), Routledge Handbook on Second Language Acquisition and Speaking (p. 299-313). Routledge, Taylor & Francis Group.

Cardoso, W., Smith, G., & Garcia Fuentes, C. (2015). Evaluating text-to-speech synthesizers. In F. Helm, L. Bradley, M. Guarda, & S. Thouësny (Eds), Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy (pp. 108-113). Research-publishing.net. https://doi.org/10.14705/rpnet.2015.000318

Celce-Murcia, M., Brinton, D., & Goodwin, J. (2010). Teaching pronunciation: A reference for teachers of English to speakers of other languages. Cambridge University Press.

Crosby, C. (2020). Adding Production to High Variability Phonetic Training. Honors Theses, (1471). Retrieved from https://egrove.olemiss.edu/hon_thesis/1471/

Collins, L., & Muñoz, C. (2016). The foreign language classroom: Current perspectives and future considerations. The Modern Language Journal, 100(S1), 133–147. https://doi.org/10.1111/modl.12305

Collins, L., Trofimovich, P., White, J., Cardoso, W., & Horst, M. (2009). Some input on the easy/difficult grammar question: An empirical study. The Modern Language Journal, 93(3), 336–353. https://doi.org/10.1111/j.1540-4781.2009.00894.x

De Araújo Gomes, A. A., Cardoso, W., & De Lucena, R. M. (2018). Can TTS help L2 learners develop their phonological awareness? In P. Taalas, J. Jalkanen, L. Bradley & S. Thouësny (Eds), Future-proof CALL: language learning as exploration and encounters – short papers from EUROCALL 2018 (pp. 29-34). https://doi.org/10.14705/rpnet.2018.26.808

Delatorre, F. (2010). The role of orthography on the production of regular verbs ending in ed by Brazilian EFL learners. In Proceedings of the 9th Seminário do Círculo de Estudos Linguísticos do Sul (pp. 1-13). Florianópolis: Federal University of Santa Catarina.

Derwing, T. M., & Munro, M. J. (2005). Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly, 39(3), 379-397. https://doi.org/10.2307/3588486

Dwight, V. (2012). Regular Past Tense Acquisition in L2 English: The Roles of Perceptual Salience and Readiness [Unpublished Master’s thesis]. Concordia University.

Eksi, G. Y., & Yesilcinar, S. (2016). An investigation of the effectiveness of online text-to-speech tools in improving EFL teacher trainees’ pronunciation. English Language Teaching, 9(2), 205-214. https://doi.org/10.5539/elt.v9n2p205

Farhat, P., & Dzakiria, H. (2017). Pronunciation barriers and computer assisted language learning (CALL) coping the demands of 21st century in second language learning classroom in Pakistan. International Journal of Research in English Education, 2(2), 53–62. https://doi.org/10.18869/acadpub.ijree.2.2.53

Flege, J. E. (1988). The production and perception of speech sounds in a foreign language. In H. Winitz (Ed.), Human communication and its disorders: A review (pp. 224–401). Ablex.

Flege, J. E. (1991). Perception and production: The relevance of phonetic input to L2 phonological learning. In T. Huebner & C. A. Ferguson (Eds.), Cross Currents in Second Language Acquisition and Linguistic Theory (pp. 249–289). John Benjamins. https://doi.org/10.1075/lald.2.15fle

Flege, J. E. (1995). Second-language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233–277). York Press.

Flege, J. E. (1999). Age of learning and second language speech. In D. Birdsong (Ed.), Second language learning and the critical period hypothesis. (pp. 101–131). Erlbaum.

Flege, J. E., & Liu, S. (2001). The effect of experience on adults’ acquisition of a second language. Studies in Second Language Acquisition, 23(4), 527–552. https://doi.org/10.1017/s0272263101004041

Fraser, H. (2000). Coordinating improvements in pronunciation teaching for adult learners of English as a second language. DETYA (ANTA Innovative project).

González, D. (2007). Text-to-speech applications used in EFL contexts to enhance pronunciation. TESL-EJ, 11(2), 1-11.

Honeybone, P. (2011). Variation and linguistic theory. In W. Maguire & A. McMahon (Eds), Analysing Variation in English (pp. 151–177). Cambridge University Press. https://doi.org/10.1017/cbo9780511976360.008

Ingvalson, E. M., Ettlinger, M., & Wong, P. C. M. (2014). Bilingual speech perception and learning: A review of recent trends. International Journal of Bilingualism, 18(1), 35–47. https://doi.org/10.1177/1367006912456586

Iverson, P., & Evans, B. G. (2009). Learning English vowels with different first-language vowel systems II: Auditory training for native Spanish and German speakers. The Journal of the Acoustical Society of America, 126(2), 866–877. https://doi.org/10.1121/1.3148196

Jackson, S., & Cardoso, W. (2022). Orthographic interference in the acquisition of English /h/ by francophones. Second Language Pronunciation, 229–248. https://doi.org/10.1515/9783110736120-009

Jenkins, J. (2000). The phonology of English as an international language. Oxford University Press.

Jia, G., & Aaronson, D. (2003). A longitudinal study of Chinese children and adolescents learning English in the United States. Applied Psycholinguistics, 24(1), 131–161. https://doi.org/10.1017/s0142716403000079

Jing, Z. (2010). A new approach to college English pronunciation teaching. Shandong Foreign Language Teaching Journal, 31(03), 60–63.

Johns, T. (1991). Should you be persuaded: Two examples of data-driven learning. In T. Johns & P. King (Eds.), Classroom concordancing. English Language Research, 4, 1-16.

John, P., & Cardoso, W. (2017). A comparative study of text-to-speech and native speaker output. In J. Demperio, E. Rosales & S. Springer (Eds.), Proceedings of the meeting on English language teaching (pp. 78-96). Université du Québec à Montréal Press.

Kharma, N., & Hajjaj, A. (1997). Errors in English among Arabic speakers: Analysis and remedy. York Press.

Kiliçkaya, F. (2008). Improving pronunciation via accent reduction and text-to-speech software. In T. Koyama (Ed.), Proceedings of the WorldCALL 2008 conference (pp. 135–137). Nagoya, Japan: The Japan Association for Language Education and Teaching.

Kim, S. (2018). Exploring media literacy: Enhancing English oral proficiency and autonomy using media technology. Studies in English Education, 23(2), 473–500. https://doi.org/10.22275/see.23.2.03

Krashen, S. (1985). The Input Hypothesis: Issues and Implications. Longman.

Levis, J. M. (2016). Research into practice: How research appears in pronunciation teaching materials. Language Teaching, 49(3), 423–437. https://doi.org/10.1017/s0261444816000045

Liakin, D., Cardoso, W., & Liakina, N. (2017a). Mobilizing instruction in a second-language context: Learners’ perceptions of two speech technologies. Languages, 2(3), 11. https://doi.org/10.3390/languages2030011

Liakin, D., Cardoso, W., & Liakina, N. (2017b). The pedagogical use of mobile speech synthesis (TTS): Focus on French liaison. Computer Assisted Language Learning, 30(3–4), 325–342. https://doi.org/10.1080/09588221.2017.1312463

Linebaugh, G., & Roche, T. (2015). Evidence that L2 production training can enhance perception. Journal of Academic Language & Learning, 9(1), 1-17.

Lively, S. E., Logan, J. S., & Pisoni, D. B. (1993). Training Japanese listeners to identify English /r/ and /l/. II: The role of phonetic environment and talker variability in learning new perceptual categories. The Journal of the Acoustical Society of America, 94(3), 1242–1255. https://doi.org/10.1121/1.408177

Lively, S. E., Pisoni, D. B., Yamada, R. A., Tohkura, Y., & Yamada, T. (1994). Training Japanese listeners to identify English /r/ and /l/. III. long-term retention of new phonetic categories. The Journal of the Acoustical Society of America, 96(4), 2076–2087. https://doi.org/10.1121/1.410149

Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese listeners to identify English /r/ and /l/: A first report. The Journal of the Acoustical Society of America, 89(2), 874–886. https://doi.org/10.1121/1.1894649

McCandliss, B. D., Fiez, J. A., Protopapas, A., Conway, M., & McClelland, J. L. (2002). Success and failure in teaching the [r]-[l] contrast to Japanese adults: Tests of a Hebbian model of plasticity and stabilization in spoken language perception. Cognitive, Affective, & Behavioral Neuroscience, 2(2), 89–108. https://doi.org/10.3758/cabn.2.2.89

Moon, D. (2012). Web-based text-to-speech technologies in foreign language learning: Opportunities and challenges. In T. Kim, J. Ma, W. Fang, Y. Zhang, A. Cuzzocrea (Eds.), Computer Applications for Database, Education, and Ubiquitous Computing (pp. 120-125). Springer. https://doi.org/10.1007/978-3-642-35603-2_19

Moyer, A. (2009). Input as a critical means to an end: Quantity and quality of experience in L2 phonological attainment. In T. Piske & M. Young-Scholten (Eds.), Input Matters in SLA (pp. 159–174). Multilingual Matters.

Pérez-Paredes, P., & Boulton, a. (2025). Data-driven learning in and out of the language classroom. Cambridge University Press.

Perrachione, T. K., Lee, J., Ha, L. Y., & Wong, P. C. (2011). Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. The Journal of the Acoustical Society of America, 130(1), 461–472. https://doi.org/10.1121/1.3593366

Prashant, P. D. (2018). Importance of pronunciation in English language communication. Pronunciation and Communication, 7(2), 16-17.

Sadakata, M., & McQueen, J. M. (2014). Individual aptitude in Mandarin lexical tone perception predicts effectiveness of high-variability training. Frontiers in Psychology, 5. https://doi.org/10.3389/fpsyg.2014.01318

Sakai, M., & Moorman, C. (2018). Can perception training improve the production of second language phonemes? A meta-analytic review of 25 years of perception training research. Applied Psycholinguistics, 39(1), 187–224. https://doi.org/10.1017/s0142716417000418

Salim, E. A. E., & Mohammed, F. A. H. (2023). Mother tongue interference in teaching English. Multicultural Education, 9(6). https://doi.org/10.5281/zenodo.8025208

Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11(2), 129–158. https://doi.org/10.1093/applin/11.2.129

Schmidt, R., & Frota, S. (1986). Developing basic conversational ability in a second language: A case study of an adult learner. In R. Day (Ed.), Talking to learn: Conversation in second language acquisition (pp. 237-326). Newbury House.

Shin, D. J., & Iverson, P. (2013). Training Korean second language speakers on English vowels and prosody. Proceedings of Meetings on Acoustics, 19(1). https://doi.org/10.1121/1.4801046

Soler-Urzúa, F. (2011). The acquisition of English /ɪ/ by Spanish speakers via text-to-speech synthesizers: A quasi-experimental study [Unpublished Master’s thesis]. Concordia University.

Thomson, R. I. (2012). Improving L2 listeners’ perception of English vowels: A computer‐mediated approach. Language Learning, 62(4), 1231–1258. https://doi.org/10.1111/j.1467-9922.2012.00724.x

Thomson, R. I. (2018). High variability [pronunciation] training (HVPT): A proven technique about which every language teacher and learner ought to know. Journal of Second Language Pronunciation, 4(2), 208–231.

Thomson, R. I., & Derwing, T. M. (2016). Is phonemic training using nonsense or real words more effective? In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.), Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference. Oct. 2015. (pp. 88–97). Iowa State University.

Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. The Journal of the Acoustical Society of America, 106(6), 3649–3658. https://doi.org/10.1121/1.428217

Wong, J. W. (2014). The effects of high and low variability phonetic training on the perception and production of English vowels /e/-/æ/ by Cantonese ESL learners with high and low L2 proficiency levels. Interspeech 2014. https://doi.org/10.21437/interspeech.2014-129

Zhang, X., Cheng, B., & Zhang, Y. (2021). The role of talker variability in nonnative phonetic learning: A systematic review and meta-analysis. Journal of Speech, Language, and Hearing Research, 64(12), 4802–4825. https://doi.org/10.1044/2021_JSLHR-21-00181

Zimmer, M., Alves, U. K., & Silveira, R. (2009). Pronunciation instruction for Brazilians: Bringing theory and practice together. Cambridge Scholars.

Text-to-Speech in High-Variability Phonetic Training: Focus on L2 Phonological Awareness

Authors

DOI:

Keywords:

Abstract

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite