ClusterVote: Automatic Summarization Dataset Construction with Document Clusters

Авторы: Chernyshev Daniil, Dobrov Boris
Сборник: International Conference on Speech and Computer
Серия: Lecture Notes in Computer Science
Том: 13721
Год издания: 2022
Издательство: Springer International Publishing AG
Местоположение издательства: Cham, Switzerland
Первая страница: 99
Последняя страница: 113
DOI: 10.1007/978-3-031-20980-2_10
Аннотация: Creating a summarization dataset is a costly task due to the amount of expertise and human work required to compose quality summaries. To alleviate the issue, several pseudo-summary approaches were developed, but due to a lack of domain adaptation mechanism, they were not applied beyond language model pretraining. We find that this shortcoming can be overcome by leveraging document clusters. We propose ClusterVote, a pseudo-summarization approach that accounts for domain summarization patterns by studying links between related documents. The method can be configured for different levels of granularity and produce both extractive and abstractive summaries. We evaluate the approach by collecting Telegram news summarization dataset and testing state-of-the-art models. The experimental results show that the most refined variant of ClusterVote has similar extractive properties to CNN/Daily Mail dataset and proves to be challenging for summarization systems.
Добавил в систему: Чернышев Даниил Иванович

	ИСТИНА	Войти в систему Регистрация
	ИСТИНА ИНХС РАН
	Главная Поиск Статистика О проекте Помощь

ИСТИНА