Reviewer has chosen to be AnonymousOverall Impression:
UndecidedTechnical Quality of the paper:
Limited noveltyData availability:
All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript:
The length of this manuscript is about right
Summary of paper in a few sentences:
Reasons to accept:
Reasons to reject:
The authors present a network model selection technique (from a set of network models) using supervised learning. They do this by extracting a set of features from 1000 synthetic networks, generated using each of the 6 models under consideration. Next, they construct “triplets” for each synthetic network (the triplets: the feature vector of the instance, an instance of the same class and an instance of a different class). The labelled triplets are used to train a “triplet network” which embeds the feature vectors into a Euclidean space and thus “learns” the distance metric. The learned metric of the generated dataset, as described above, is used with a supervised learning algorithm, to train a network classifier.
The classification of a real network is found by:
(i) extracting the feature vector for the real network
(ii) Use the trained triplet neural network to embed the feature vector of the real network and
(iii) Input the embedded feature vector into a trained classifier, that returns the generative model that most closely resembles the real network.
The authors use a grab-bag of network features, hoping to capture the topology of networks. My concern is with one of their features; the assortativity coefficient, has been shown to depend upon network size (Nelly Litvak and Remco van der Hofstad, Physical Review E 87, 022801 (2013)). A size-dependent feature is at odds with the authors’ aim to have a “distance metric that is agnostic” to network size. This is especially relevant for at least two of the models used in their set of generative models: the Barabasi-Albert model and the Watts-Strogatz model.
In the results section, figure 3 is a convincing, low dimension, demonstration of why the model selection method as presented is effective. The figure clearly shows that the distance measure learned is an effective discrimiator between network models . However their evaluation of the model selection approach (Table 1) seems to indicate that their generated model instances (and I am speculating here in the absence of any indication of the range of model parameters used) do not have enough variability.
While the authors take some care to present their method, the main result, called Case Study, for real networks is simply given as a table with the closest generative model selected for each real network. There is no way for the reader to asses how well the generative model fits the data. The method developed by the authors learns a metric. It would be useful to see the closeness of the the fit for each generative models to each real network. This could be achieved by giving a measure of the distance of each real network from each generative model.
In the current form, I would not accept the paper for publication. Perhaps with significant changes, the paper may be acceptable.
The reasons for the decision: The result, in Table 2, for one of the real networks citHepTh, the selected generative model is the Erdos-Renyi random graph model. This result only indicates that none of the other 5 models fit the data well - fitting to a random graph model is like a “base-line” fit. There is no discussion of the results. I am not sure what the 1 2 “Discussion” section is trying to convey
The work presented in the paper is of marginal interest. The authors compare their work other works one between 7 and 15 years ago. The most useful contribution of the paper is the learned Euclidean distance between the feature vectors of networks.
The paper has many typos and grammatical errors. Examples (not exhaustive) are:
page 2, line 13: ‘ ...to perform an effective model selection.....’, remove ‘an’.
page 2, lines15-17: A run-on sentence that is out of place.
page 3, line 4: ‘estimte’
page 11, line 20: ‘1000 of network instances......’
page 11, line 22: Missing ‘is’ in the first sentence.
page 11, line 30: ‘ecah iteartion’
page 12, line 8: Missing space before ‘More’.
page 12, line 37: ‘....the randomly chosen pair of nodes.’ page 12, line 37: ‘....the the....’
page 13, line 34: ‘....we computes.....’
page 13, line 34: ‘The question is, Is the euclidean....’ should be: The question: is the Euclidean......
page 13, line 39: heatmap should not be capitalized.
page 13, line 40: ‘....feture....’
page 13, line 40: ‘.....diffrent.....’
page 15, line 34: ‘ Despite most of the existing methods [19, 25, 27], the proposed distance based method.....’ I am not sure of what the authors mean.
page 15, line 42: ‘perhaps smaller from the size of the target network’ the ‘from’ should be ‘than’
The paper needs a thorough proof-reading.
Review Document: review_ds-paper-624.pdf