![]() |
ИСТИНА |
Войти в систему Регистрация |
ИСТИНА ИНХС РАН |
||
Nowadays, the development of most leading web services is controlled by online experiments that qualify and quantify the steady stream of their updates. The challenging problem is to define an appropriate online metric of user behavior, so-called Overall Evaluation Criterion (OEC), which is both interpretable and sensitive. The state-of-the-art approach is to choose a type of entities to observe in the behavior data, to define a key metric for these observations, and to estimate the average value of this metric over the observations in each of the system versions. A significant disadvantage of the OEC obtained in this way is that the {\it average value} of the key metric does not necessarily change, even if its {\it distribution} changes significantly. The reason is that the difference between the mean values of the key metric over the two variants of the system does not necessarily reflect the character of the change in the distribution. We develop a novel method of quantifying the change in the distribution of the key metric, which is (1) interpretable, (2) is based on the analysis of the two distributions as a whole, and, for this reason, is sensitive to more ways the two distributions may actually differ. We provide a thorough theoretical analysis of our approach and show experimentally that, other things being equal, it produces more sensitive OEC than the average.