Информация о цитировании статьи получена из
Web of Science,
Scopus
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 18 июля 2013 г.
Аннотация:We have performed analysis of protein sequences treating them as texts written in a "protein" language. We have shown that repeating patterns (words) of various lengths can be identified in these sequences. It was found that the maximum word lengths are different for proteins belonging to different classes; therefore, the corresponding values can be used to characterize the protein type. The suggested technique was first applied to analyze (decompose into words) normal (literature) texts written as a gapless symbolic sequence without spaces and punctuation marks. The tests using fiction, scientific, and popular scientific English texts proved the relative efficiency of the technique. [GRAPHICS]