Deep Learning based Network Similarity for Model Selection

Tracking #: 624-1604

Authors:

	Name	ORCID
	Kushal Veer	https://orcid.org/0000-0003-1406-215X
	Ajay kumar verma	https://orcid.org/0000-0002-4163-0100
	Lovekesh vig	https://orcid.org/0000-0001-9834-3308

Responsible editor:

Michael Maes

Submission Type:

Research Paper

Abstract:

Capturing data in the form of network’s is becoming an increasingly popular approach for modeling, analyzing and visualizing complex phenomena, to understand the important properties of the underlying complex processes. Access to many large-scale network datasets is restricted due to the privacy and security concerns. Also for several applications (such as functional connectivity networks), generating large scale real data is expensive. For these reasons, there is a growing need for advanced mathematical and statistical models (also called generative models) that can account for the structure of these large scale networks, without having to materialize them in the real world. The objective is to provide a comprehensible description of the network properties and to be able to infer previously unobserved properties. Various models have been developed by researchers, which generate synthetic networks that adhere to the structural properties of real networks. However, the selection of the appropriate generative model for a given real-world network remains an important challenge. In this paper, we investigate this problem and provide a novel technique (named as TripletFit) for model selection (or network classification) and estimation of structural similarities of the complex networks. The goal of network model selection is to select a generative model that is able to generate a structurally similar synthetic network for a given real-world (target) network. We consider six outstanding generative models as the candidate models. The existing model selection methods mostly suffer from sensitivity to network perturbations, dependency on the size of the networks, and low accuracy. To overcome these limitations, we considered a broad array of network features, with the aim of representing different structural aspects of the network and employed deep learning techniques such as deep triplet network architecture and simple feed-forward network for model selection and estimation of structural similarities of the complex networks. Our proposed method, outperforms existing methods with respect to accuracy, noise-tolerance, and size independence on a number of gold standard data set used in previous studies.

Manuscript:

ds-paper-624.pdf

Supplementary Files (optional):

ds-supplementary-624-964.zip

Revised Version:

Deep Learning based Network Similarity for Model Selection

Data repository URLs:

http://ce.sharif.edu/ aliakbary/datasets.htmlhttp://snap.stanford.edu/http://deim.urv.cat/ alexandre.arenas/data/welcome.htm

Date of Submission:

Saturday, March 28, 2020

Date of Decision:

Wednesday, July 8, 2020

Nanopublication URLs:

Decision:

Undecided

Solicited Reviews:

Review #1 submitted on 20/Apr/2020

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Average
Reviewer`s confidence: Low
Significance: High significance
Background: Unable to judge
Novelty: Unable to judge
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The authors suggest a novel method for generative model selection in order to simulate realistic network data.
They suggest to use a supervised Deep Learning approach to learn the similarity metric of various generative models and use the outcome within a classification approach to select the best generative model.

Reasons to accept:

The authors engage with existing research in the field and establish the limitations of existing methods of generative model selection, which they seek to address.
They suggest a new sophisticated methodology, which appears to be sound and results in convincing outcomes.

Reasons to reject:

The descriptions are not always clear, in particular how the two processes, (1) learning of network similarity and (2) model selection based on classification exactly interact/inform each other is not entirely clear to me.
The authors are encouraged to proof-read their paper again and correct typos etc.

Nanopublication comments:

Further comments:

I am not necessarily an expert in this field, so I cannot fully assess the novelty and significance of the contribution.

Review #2 submitted on 08/Jul/2020

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak
Suggested Decision: Undecided
Technical Quality of the paper: Weak
Presentation: Weak
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

See below

Reasons to accept:

See below

Reasons to reject:

See below

Nanopublication comments:

Further comments:

The authors present a network model selection technique (from a set of network models) using supervised learning. They do this by extracting a set of features from 1000 synthetic networks, generated using each of the 6 models under consideration. Next, they construct “triplets” for each synthetic network (the triplets: the feature vector of the instance, an instance of the same class and an instance of a different class). The labelled triplets are used to train a “triplet network” which embeds the feature vectors into a Euclidean space and thus “learns” the distance metric. The learned metric of the generated dataset, as described above, is used with a supervised learning algorithm, to train a network classifier.

The classification of a real network is found by:
(i) extracting the feature vector for the real network
(ii) Use the trained triplet neural network to embed the feature vector of the real network and
(iii) Input the embedded feature vector into a trained classifier, that returns the generative model that most closely resembles the real network.

The authors use a grab-bag of network features, hoping to capture the topology of networks. My concern is with one of their features; the assortativity coefficient, has been shown to depend upon network size (Nelly Litvak and Remco van der Hofstad, Physical Review E 87, 022801 (2013)). A size-dependent feature is at odds with the authors’ aim to have a “distance metric that is agnostic” to network size. This is especially relevant for at least two of the models used in their set of generative models: the Barabasi-Albert model and the Watts-Strogatz model.

In the results section, figure 3 is a convincing, low dimension, demonstration of why the model selection method as presented is effective. The figure clearly shows that the distance measure learned is an effective discrimiator between network models . However their evaluation of the model selection approach (Table 1) seems to indicate that their generated model instances (and I am speculating here in the absence of any indication of the range of model parameters used) do not have enough variability.

While the authors take some care to present their method, the main result, called Case Study, for real networks is simply given as a table with the closest generative model selected for each real network. There is no way for the reader to asses how well the generative model fits the data. The method developed by the authors learns a metric. It would be useful to see the closeness of the the fit for each generative models to each real network. This could be achieved by giving a measure of the distance of each real network from each generative model.

In the current form, I would not accept the paper for publication. Perhaps with significant changes, the paper may be acceptable.
The reasons for the decision: The result, in Table 2, for one of the real networks citHepTh, the selected generative model is the Erdos-Renyi random graph model. This result only indicates that none of the other 5 models fit the data well - fitting to a random graph model is like a “base-line” fit. There is no discussion of the results. I am not sure what the 1 2 “Discussion” section is trying to convey

The work presented in the paper is of marginal interest. The authors compare their work other works one between 7 and 15 years ago. The most useful contribution of the paper is the learned Euclidean distance between the feature vectors of networks.

The paper has many typos and grammatical errors. Examples (not exhaustive) are:

page 2, line 13: ‘ ...to perform an effective model selection.....’, remove ‘an’.
page 2, lines15-17: A run-on sentence that is out of place.
page 3, line 4: ‘estimte’
page 11, line 20: ‘1000 of network instances......’
page 11, line 22: Missing ‘is’ in the first sentence.
page 11, line 30: ‘ecah iteartion’
page 12, line 8: Missing space before ‘More’.
page 12, line 37: ‘....the randomly chosen pair of nodes.’ page 12, line 37: ‘....the the....’
page 13, line 34: ‘....we computes.....’
page 13, line 34: ‘The question is, Is the euclidean....’ should be: The question: is the Euclidean......
page 13, line 39: heatmap should not be capitalized.
page 13, line 40: ‘....feture....’
page 13, line 40: ‘.....diffrent.....’
page 15, line 34: ‘ Despite most of the existing methods [19, 25, 27], the proposed distance based method.....’ I am not sure of what the authors mean.
page 15, line 42: ‘perhaps smaller from the size of the target network’ the ‘from’ should be ‘than’

The paper needs a thorough proof-reading.

Review Document: review_ds-paper-624.pdf

1 Comment

Meta-Review by Editor

Submitted by Tobias Kuhn on Wed, 07/08/2020 - 07:21

Reviewer 1 is positive about your paper but points to concepts that need to be clarified. Please note that the reviewer, like parts of the journal’s target audience, is from the social sciences. While these readers have a very solid background in formal methods, they are often unaware of disciplinary details. Please try to make your paper more accessible also to these readers, following the reviewer’s suggestions. Reviewer 2 is less positive but provides a detailed list of aspects that need to be clarified or defended. In addition, the reviewer points out that the analyses leading to Table 1 and Table 2 are not demonstrating the main point of the paper. Please address this with great care, in order to better drive home the message of your paper.

Michael Maes (https://orcid.org/0000-0001-9416-3211)

Data Science

Deep Learning based Network Similarity for Model Selection

Tracking #: 624-1604

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Supplementary Files (optional):

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

1 Comment

Meta-Review by Editor