ИСТИНА |
Войти в систему Регистрация |
|
ИСТИНА ИНХС РАН |
||
Experimental networks are routinely characterized by a variety of structural parameters, from local ones, like degree distributions, to collective, such as, e.g., spectra of the networks. The resulting network characteristics are then used in order to explain various phenomena related to the material object represented by the netwrok in question. However, in most cases the experimental data used to construct a network is noisy or some relevant data is missing. Also, in many cases the construction of the network implies throwing away some of the information obtained in the experiment by, for example, putting arbitrary threshholds separating bond from no-bond, replacing weighted networks with unweighted ones or removing some of vertices, which for one reason or another are deemed irrelevant. All this factors, generally speaking, influence the characteristics of networks mentioned above. Therefore, one faces the question: how stable and robust are various local and global characteristics of the network with respect to experimental noise and human interference on the stage of representation of the experimental dataset with a network. In this work we consider two datasets on free associations between the words of English language. The underlying experimental setup is as follows [1]. Volunteers are presented with a pre-determined list of N words (seeds) and asked to present their first association to each of the seeds (response). Every such experiment results in a weighted directed graph over the set of all English words, where (a -> b) means that "word b has been reported as a response to seed word a", and the corresponding weight represents the frequency with which this association was reported. Importantly, due to practical purposes the list of experimentaly tested seeds is always much smaller than the total set of English words and is typically significantly smaller than the set of obtained responses. Also, the average degree of the network reflects the size of the test group rather than any underlying properties of the language: indeed, the more test subjects there are, the more associations are revealed and measured. The two datasets we used are taken from [1] and [2]. The first of them (volunteers where the students of Florida University in <year>) is a directed weighted network, which includes only nodes with out-degree more then zero, i.e., all the answer words which were not in the original list drawn up by experimentalists have been removed. The second dataset comes from <source description>, and includes nodes with out-degree zero. However, it is unweighted, and all of the bi-directional edges are replaced by directed ones apparently at random. Moreover, the lists of test words used in the first and second experiment are different, as well as the cultural background of the volunteers. We compare various structural characteristics of these two networks, as well as several other obtained from these datasets by partial deletion of the data. We show that while some of the structural properties are very robust (e.g., the in-degree distribution), some other differ quite radically (e.g., the structure of the eigenvalue spectrum). We discuss the reasons and possible implications of these instabilities.