MorphoBabushka: Simple and Fast Baselines your Granny would use for Part-Of-Speech Tagging of Russian - доклад на конференции | ИСТИНА – Интеллектуальная Система Тематического Исследования НАукометрических данных

Авторы: Ермолаев П.A., Арефьев Н.В.
Международная Конференция : Диалог 2017
Даты проведения конференции: 31 мая - 3 июня 2017
Дата доклада: 1 июня 2017
Тип доклада: Устный
Докладчик: Арефьев Н.В.
Место проведения: Москва, Russia
Аннотация доклада:
The first shared task of MorphoRuEval-17 is to determine parts of speech and several grammatical categories such as case, number, gender, etc. for each word of text in Russian. We propose using NB-SVM over bag of character n-grams input representation to solve the task. Several methods are compared including CRF (Conditional Random Fields), SVM (Support Vector Machines) and NB-SVM (Naive Bayes SVM) and superiority of NB-SVM over other classifiers is shown. The proposed model is the 5th best among 12 other models in the first shared task (per-token accuracy ranking). We also experimented with category grouping when a single classifier is used to determine several grammatical categories and showed that it improves the model performance even further.

	ИСТИНА	Войти в систему Регистрация
	ИСТИНА ИНХС РАН
	Главная Поиск Статистика О проекте Помощь

ИСТИНА