Methodology of Data Popularity Forecasting in High-Energy Physics Experiments on Unbalanced and Irregular Time-series DataстатьяИсследовательская статья
Статья опубликована в журнале из списка RSCI Web of Science
Статья опубликована в журнале из перечня ВАК
Статья опубликована в журнале из списка Web of Science и/или Scopus
Аннотация:This study introduces a method to forecast data popularity in high energy physics (HEP) experiments, focusing on unbalanced and irregular time-series data. The goal is to predict the popularity of specific datasets accurately over time, which is crucial for optimizing data replication and placement strategies and enhancing distributed computing efficiency in HEP experiments. The methodology utilizes advanced machine learning techniques and time-series analysis to tackle the challenges posed by the unbalanced nature of the data. The paper outlines the key components of the methodology, including data preprocessing and balancing techniques, filtration, and model selection. To evaluate the effectiveness of the presented approach, the authors conduct experiments on real-world HEP datasets, comparing their predictions against actual data. The findings of this study have important implications for resource management and decision-making in distributed computing of various large-scale scientific projects. By providing forecasts of data popularity, researchers and administrators can efficiently allocate resources, optimize data storage and retrieval mechanisms, and improve overall data processing efficiency.