Using Bigrams to Detect Written Errors Made by Learners of Spanish as a Foreign Language

Abstract

Based on previous satisfactory experiences generated by Grammar Checker (UNED, 2017), a spell and grammar check package that figured among the finalists of the ELTons awards 2016, the Universidad Nacional de Educación a Distancia (UNED) is developing a Prototype of Grammar Checker (PGC) specifically designed to correct grammatical errors committed by learners of Spanish as a Foreign Language (SFL). The PGC relies on a corpus of reference of 100 million words and uses Sinclair’s well-known logarithm to detect errors. Such a logarithm, analyses words-pairs (bigrams). Since errors made by native speakers often differ from those made by Second Language (L2) learners, this study´s core objective was to find out where to establish the thresholds that this software will use to locate incorrect bigrams and to highlight them differently depending on their probability of being an error. To do so, each bigram found in a sample of 21 compositions written by L2 learners of SFL was first analysed. Three thresholds were provisionally recommended for the PGC depending on the bigram’s frequency and its probability to occur randomly. Then, the capacity of these thresholds to detect grammatical errors was later tested using another sample of 21 compositions. Results show that bigrams are a powerful tool to detect L2 learnersgrammatical errors. In word-pairs that do not usually occur together (R ≤ 0.1), the threshold has an accuracy of 90%. These results draw attention to the importance of using real data to better adapt learning tools to L2 learners needs.

pdf

Copyright of articles is retained by authors and CALL-EJ. As CALL-EJ is an open-access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings. Sources must be acknowledged appropriately.