ИСТИНА |
Войти в систему Регистрация |
|
ИСТИНА ИНХС РАН |
||
The first shared task of MorphoRuEval-17 is to determine parts of speech and several grammatical categories such as case, number, gender, etc. for each word of text in Russian. We propose using NB-SVM over bag of character n-grams input representation to solve the task. Several methods are compared including CRF (Conditional Random Fields), SVM (Support Vector Machines) and NB-SVM (Naive Bayes SVM) and superiority of NB-SVM over other classifiers is shown. The proposed model is the 5th best among 12 other models in the first shared task (per-token accuracy ranking). We also experimented with category grouping when a single classifier is used to determine several grammatical categories and showed that it improves the model performance even further.