Evaluating the Effectiveness of Large Language Models in Identifying Communicatively Significant Errors in Papers of Students Learning Russian as a Foreign Languageстатья
Информация о цитировании статьи получена из
Scopus
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 15 апреля 2026 г.
Аннотация:This paper examines the capability of contemporary large language models (LLMs), such as GPT-5 and DeepSeek-R1, to identify and classify communicative errors in papers of students learning Russian as a foreign language (RFL). While existing tools primarily focus on formal errors, this study emphasizes the communicative aspect, evaluating the extent to which an error disrupts comprehension (i.e., communicatively significant errors) or merely affects linguistic norms (i.e., communicatively insignificant errors). To this end, a corpus of papers by B2-level students (TORFL-2) was created and annotated by experts, and a multi-stage pipeline for testing LLMs was developed, incorporating structured prompting and heuristic voting methods to enhance result reliability. The experiment revealed that while models can localize errors with certain accuracy, they experience considerable difficulties in their proper communicative classification. The models tend to systematically underestimate the degree of error impact on comprehension, confuse error types, and encounter challenges in identifying multiple errors within a single fragment. The study demonstrates both the potential and current limitations of LLMs as tools for automated, communicatively oriented feedback in educational technologies.Link: https://rdcu.be/e9mZC