![]() |
ИСТИНА |
Войти в систему Регистрация |
ИСТИНА ИНХС РАН |
||
Synergy of high-throughput experimental methods and machine learning has substantially driven the recent advances in deciphering the structure and predicting the activity of eukaryotic gene regulatory regions. The volume of available data obtained from massively parallel reporter assays is growing fast, and the already available dataset include activity estimates for hundreds of thousands or even tens of millions of diverse regulatory elements. Machine learning, in particular, deep neural networks became an essential tool to properly utilize such large-scale data for modeling gene regulatory regions. In previous studies, we had shown how to adapt the EfficientNetV2 architecture [1], originally used for image classification, to be effectively applicable for the analysis of nucleotide sequences. Our LegNet model [2] outperformed multiple competing solutions, including those based on recurrent networks and attention-based networks, in solving the problem of predicting reporter protein expression in yeast cells solely from the promoter sequence. Here we show that LegNet is applicable to a wide range of genomic problems from predicting the activity of regulatory sequences in various human cell types to assessing the effect of single nucleotide variants and rational design of regulatory sequences with desired properties.