Heterogeneous Multi-layered Network for Modeling Complex Graph-Data

Tracking #: 874-1854

Authors:

	Name	ORCID
	Shraban Chatterjee	https://orcid.org/0000-0001-8935-9201

Responsible editor:

Michael Maes

Submission Type:

Research Paper

Abstract:

The present paper provides a generalized model of network, namely, Heterogeneous Multi-layered Network (HMN), which can simultaneously be multi-layered and heterogeneous. We proved that the sets of all homogeneous, heterogeneous and multi-layered networks are subsets of the set of all HMNs depicting the model's generalizability. The proposed HMN is more efficient in encoding different types of nodes and edges when compared to representing the same information through heterogeneous or multilayered networks. It is found experimentally that the HMN model when used with GNNs improve tasks such as link prediction. In addition, we present a novel parameterized algorithm (with complexity analysis) for generating synthetic HMNs. The networks generated from our proposed algorithm are more consistent in modelling the layer-wise degree distribution of a real-world Twitter network (represented as HMN) than those generated by existing models. Moreover, we also show that our algorithm is more effective in modelling an air-transportation multiplex network when compared to an algorithm designed specifically for the task. Further, we define different structural measures for HMN namely multilayer neighborhood, degree centrality, closeness centrality and betweeness centrality. Accordingly, we established the equivalency of the proposed structural measures of HMNs with that of homogeneous, heterogeneous, and multi-layered networks.

Manuscript:

ds-paper-874.pdf

Supplementary Files (optional):

ds-supplementary-874-1417.pdf

Previous Version:

Heterogeneous Multi-layered Network for Modeling Complex Graph-Data

Data repository URLs:

The datasets used in the paper: https://github.com/Shraban123/HMNData/tree/main

Network Dataset: https://networkrepository.com/

European Air Transportation Network: https://github.com/CompNet/MultiplexCentrality/blob/master/data/EUAir_Mu...

Date of Submission:

Friday, August 30, 2024

Date of Decision:

Wednesday, October 30, 2024

Nanopublication URLs:

Decision:

Reject

Solicited Reviews:

Review #1 submitted on 22/Sep/2024

By Fabio Sartori ORCID logo

https://orcid.org/0009-0006-5376-5631

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak
Suggested Decision: Undecided
Technical Quality of the paper: Weak
Presentation: Weak
Reviewer`s confidence: Medium
Significance: Low significance
Background: Incomplete or inappropriate
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences (summary of changes and improvements for second round reviews):

The authors expanded placing their work on the existing literature, and explained which parameters they used to generate the networks used in the manuscript.

Reasons to accept:

I thank the authors for mentioning some of the existing literature. Furthermore, I appreciate the work done in homogenizing the fonts used for figure 5, and clarifying for each example what set of parameters was used, to allow reproducibility of your results.

Reasons to reject:

I still have several points, see attachment for clarity

Nanopublication comments:

Further comments:

Page 16, I am happy to read the values used for the layers. I think that expanding on the process that led you to use 37 layers and not 35 or 40 would be helpful for a reader wanting to use your model to generate their own HMN with your algorithm.

Review Document: Review_2.pdf

Review #2 submitted on 01/Oct/2024

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Good
Presentation: Average
Reviewer`s confidence: Low
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences (summary of changes and improvements for second round reviews):

The authors have made significant progress in addressing some of the suggested improvements. Specifically, they have provided more detailed descriptions of their experiments, clarified the rationale behind their methodological choices, and enhanced the presentation of data through more refined tables and figures. Moreover, the results have been better contextualized, allowing for a clearer interpretation of their findings within the broader literature. However, the limitations discussion section could be further improved. Although the authors briefly mention Unknown Node Correspondence (UNC), they fail to sufficiently explain its relevance and do not elaborate on why it constitutes a limitation in their work. Greater coherence and depth in this section would strengthen the overall discussion.

Reasons to accept:

The authors present a novel algorithm for generating Heterogeneous Multilayer Networks (HMN) and demonstrate its use in capturing the characteristics of real-life networks when compared to traditional network generators. They have situated their work more robustly within the existing literature and offer an argument for the contribution of their algorithm to the study of heterogeneous multilayer networks.

Reasons to reject:

Despite the advances made, the added value of the proposed approach for the research community and its potential application areas remain somewhat vague.

Nanopublication comments:

Further comments:

RESPONSE TO REVIEWERS

### We would like to thank both the reviewers for the detailed comments on our work. We have incorporated changes required for all the observations of the reviewers and highlighted the changes with blue color in the updated manuscript. These reviews have helped us in improving the manuscript.

## Reviewer 1

### Reasons to accept:

The authors provide an innovative modelling approach that is of broader scientific significance and interest and the data sources are made publicly available for reproducibility.
Notably, various experiments with data sources spanning multiple domains are utilized to test for generalizability of their network approach that is situated with the scientific literature. The contribution is rigorously carved out and the results are presented in an adequate quality and elaboration degree.

**Answer:** Thank you for your detailed feedback on our work.

### Abstract:

**You introduce the HMN as an abbreviation, please refer to the generalized network model in the same way for better clarity. Please provide an example of the structural measures that you deliver.**

**Answer:** We have added the structural measures in the abstract, also modified the text for better clarity.

**You state that heterogeneous multi-layered networks (HMN) are more efficient, but the comparison lacks context. Specify the benchmark or state-of-the-art (SOTA) methods against which you are comparing your approach.**

**Answer:** Our proposed HMN is more efficient when compared to representing the same information through existing heterogeneous or multilayered networks like \[90, 11\]. Also, we discuss this in more detail with relevant comparison in Section 4.1, Page 8, where we compare our representation with multiple existing representations \[10, 41, 97, 101, 1\]. The citations can be referred to in the updated manuscript.

**Explain the rationae behind choosing the Twitter network and air-transportation network as your use cases. What specific characteristics of these networks make them suitable for your research goals?l**

**Answer:** We consider a Twitter dataset as an example of a real-life network that can be better modeled as a HMN with heterogeneity in its layers. The air transportation network, on the other hand, is a popular multilayer network; hence, we decided to experiment with the same. In reality, our model will be inevitable to model networks like Fediverse (https://en.wikipedia.org/wiki/Fediverse), where each server can be considered as a layer, and each layer will be depicted as a heterogeneous network of its own. As Fediverse allows cross-server (in other words, cross layer) following, this depicts a true HMN in real world.

**You conducted multiple experiments with various datasets (e.g., molecule dataset, brain networks) to prove generalizability. Clearly state the research question/hypothesis and corresponding methodological choices for these experiments**

**Answer:** We have tried to generate networks belonging to different categories like brain network, crime network, chemical and biological networks to show that our generation algorithm can be used to model different types of networks. We have also updated the manuscript to include a clear methodology for generating each network in Tables 2-5 (Section 7 and 7.3, Page 15-19).

### Section 4.1.1:

**You use the MovieLens dataset for link prediction, which was not introduced earlier. Provide a clear introduction to the methods used and lay out all of the experiments beforehand to ensure coherence.**

**Answer:** We have updated the manuscript providing a clear experiment design and more details about the movie lens dataset in Section 4.1.1, Page 9 in the modified manuscript.

**You mention "state-of-the-art GNN architectures available" (line 31, p.9) but do not specify which ones. Clarify this and explain why you chose to use t-SNE. Additionally, detail how you fine-tuned the hyperparameters of t-SNE.**

**Answer:** We have updated the manuscript, highlighting the GNN architectures and the hyperparameters used for t-SNE in Section 4.1.1, Page 10.

### Table Results:

**Report the results presented in the tables (e.g., Table 4 and Table 5\) within the text to ensure they are clearly contextualized.**

**Answer:** We have incorporated the suggestions made by the reviewers in our manuscript and added clear explanations for Tables 2-5 in the updated manuscript in Section 7.3, Pages 18 \- 19 in the modified manuscript.

### Limitations Section:

**Include a section discussing the limitations of your study to provide a balanced view and acknowledge areas for future research.**
**Answer** **:** We have added a limitations section addressing the limitations of our work (Section 8, Page 20\).

### Figures:

**Figure 2: Improve the interpretation. The text mentions, "The first layer contains tweets and the second layer contains users," but it is unclear which layer is which in the figure.**
**Answer:** We thank the reviewer for pointing this out. We have addressed this ambiguity in the updated manuscript in Section 4.1. The caption of Figure 2 is also modified in Page 8 of the new manuscript.

**Figure 3: The description says "with each rectangle representing a layer. The first layer contains tweets and the second layer contains users," but as a reader, I only see one layer (rectangle). You seem to express layers through edge types, which is challenging to understand. Clarify this representation.**
**Answer:** We have addressed this ambiguity in the updated manuscript. The caption of Figure 3 was incorrect and rectified in the modified manuscript. Thank you for pointing this out.

**Figure 5: This figure is never mentioned or discussed in the text. Ensure that every figure is referenced and discussed. Additionally, the captions of the subplots are too small to read, and the font differs from the rest of the manuscript. Ensure consistency in formatting.**
**Answer:** We thank the reviewer for pointing this out. We have updated the manuscript to include references and discussion on Figure 5 in Section 7.2 (Page 17\). Also, we have updated the figures to make the caption more visible in the text.

**Figure 6: The figure lacks an interpretation in the text. The caption does not adequately explain what subplots (a-d) specifically represent. Provide a detailed explanation in both the text and caption.**
**Answer:** We have updated the manuscript to include more discussion on Figure 6 in Section 7.3, Page 18.

## Reviewer 2

### The manuscript contains several extremely strong statements that are either false or conflicting with each other:

1. "A multi-layered network cannot support heterogeneity in a layer due to the absence of node or edge types." (Adding heterogeneity to layers is the main contribution of their manuscript)

We have added appropriate citations in support of this statement. Also, we request the reviewer to kindly note that the main contribution of this paper is not only adding heterogeneity to a layer but also developing a generalized definition of network data structure (in addition to providing a generalized network generation algorithm). The same is proved through Lemma 4.1 to 4.3. In other words, the majority of other network data structures can be considered as a special case of our proposed model.

2. "it is difficult to obtain heterogeneous multi-layered networks despite a lot of real-world networks being HMN" (contradicting what was said before, and easy to argue against it)

We thank the reviewer for pointing this out. It was a typo; we missed the keyword “dataset” in between. We meant to say that it is difficult to obtain heterogeneous multi-layered network **datasets** despite a lot of networks being HMN. We have now corrected the typos in the modified manuscript. Furthermore, due to the availability of Fediverse \[1\] or ActivityPub \[2\] protocols that allow different social networking apps to communicate among them, we will see more heterogeneous multilayer datasets in the future. In fact, once the Fediverse-like data sets are available for research, we think it will be inevitable to utilize generalized network data structures such as the proposed HMN.

1. https://www.fediverse.to
2. ActivityPub by Evan Prodromou, Released August 2024, Publisher(s): O'Reilly Media, Inc, ISBN: 9781098169466\.

3. "except this work, there is no mention of heterogeneous multilayered networks in the literature" (from a quick search from Google Scholar, more than 200 papers contain the wording "multilayer heterogeneous network", \[1\] itself propose a framework very similar to the one described in the Manuscript, but they don't mention it, nor they explain why their model is better, or where it differs from it)

We agree with the reviewer that the word “heterogeneous multi-layered” has been used in several literatures, however, the statement we used in the paper is not for the literal name but for the definition of the data structure. We already highlighted (in Introduction and Related work) that in several works, the literal name of multilayer heterogeneous (or heterogeneous multilayer) is synonymously used for the data structure of a multilayer network or multiplex network. These definitions of multilayer networks and multiplex networks are quite well known in the field \[3\], and one can verify that the definitions used for “multilayer-heterogeneous” networks in the literature are nothing but multilayer or multiplex data structures used for specific applications in these papers \[1,2,4,5,6\]. The same is evident from the statement of the reviewer in the “Reason for Accept: (One could reduce each network generated by the authors manuscript to a network as described in \[1\], but that would multiply the number of layers)”. This is exactly the limitation of multilayer networks \[3\] in the literature, and widely used for many applications as mentioned before \[1,2,4,5,6\].

Furthermore, we agree that we overlooked some of these papers in our literature review. We have now included several of them in the literature review Section 3, Page 4 in the modified manuscript based on the reviewer’s suggestion.

Regarding the comparison, as the definition of the data structure itself is different, the comparison would not be fairly applicable. The paper provides a generalized data structure and is not targeted at any particular application (but we do mention some advantages of using our representation over existing heterogeneous or multilayer definitions in Section 4.1, Page 8).

1. Wan, M. Zhang, X. Li, L. Sun, X. Wang and K. Liu, Identification of Important Nodes in Multilayer Heterogeneous Networks Incorporating Multirelational Information, IEEE Transactions on Computational Social Systems 9(6) (2022), 1715–1724. doi:10.1109/TCSS.2022.3161305.
2. L. Gyanendro Singh, A. Mitra and S. Ranbir Singh, Sentiment Analysis of Tweets using Heterogeneous Multi-layer Network Representation and Embedding, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), ACL, Online, 2020, pp. 8932–8946. doi:10.18653/v1/2020.emnlp-main.718.
3. G. Bianconi, Multilayer Networks: Structure and Function, Oxford University Press, Oxford, 2018, p. 416\. ISBN 9780198753919\. doi:10.1093/oso/9780198753919.001.0001.
4. Y. Tian and O. Ya˘gan, Spreading Processes With Layer-Dependent Population Heterogeneity Over Multilayer Networks, IEEE Transactions on Network Science and Engineering 11(5) (2024), 4106–4119. doi:10.1109/TNSE.2024.3396730.
5. Liu, A. Li, A. Zeng, J. Zhou, Y. Fan and Z. Di, Motif-based community detection in heterogeneous multilayer networks, Scientific Reports 14(1) (2024), 8769\. doi:10.1038/s41598-024-59120-5.
6. M. Bazzi, L.G.S. Jeub, A. Arenas, S.D. Howison and M.A. Porter, A framework for the construction of generative models for mesoscale structure in multilayer networks, Phys. Rev. Research 2(2) (2020), 023100\. doi:10.1103/PhysRevResearch.2.023100.

###

### The most problematic section is Section 7\.

**The authors introduced in Sec 6 an algorithm to generate HMN, and here they compare the network generated with their algorithm to real-life networks and show that their network is better suited to capture their characteristics compared to classical network generators. (Which is a lovely idea per se)**
**However, they fail to specify the parameters they used to generate the other network or why they chose such parameters. The result is an Erdos–Rènyi graph (they call it erdos-reyni) with a probability of connection p \> 0.5, which leads to an average number of links per node above 10^4. I have more concerns about Figure 6: the degree distribution of an ER graph in a log-log scale should be extremely narrow, the one they generate is not shown in its entirety, and it's very flat, spanning an entire order of magnitude. The Barabasi-Albert degree distribution is not a power-law.**

**Answer:** We have reported the parameters of our generated networks in the revised manuscript. Also, regarding the ER plot we use a regression plot with a log scale for both the axes. We chose regression plots with log scale to clearly distinguish between the degree distribution of each model. We would like to bring the difference between the log-log plot of the ER model (nodes \= 20000, p \= 0.5) and the BA (20000 nodes, m=3) model with and without the regression in the figures (links below and also added in the supplementary material).

![ER with regression](https://drive.google.com/file/d/1t0Ds6MFcOJDUm0jKMohTD-qQnCxdMOIA/view?u...)
![ER without regression](https://drive.google.com/file/d/1FLQHulXvmLtyu-hcFklQSX24118Fvt0c/view?u...)
![BA with regression](https://drive.google.com/file/d/1eLMkxO5FkQhNWa0pM19EBlP4cBdzzRjP/view?u...)
![BA without regression](https://drive.google.com/file/d/1BunSBgfu5gftb7PeLDA2fnuY4lRihsOF/view?u...)

###

If we represent ER using the log log plot without regression then it is extremely narrow as mentioned by the reviewer. This is one of the reasons we went with the regression plot for clarity in demonstration.

Kindly note that in the regression plot the degree distribution is smoothed that may exclude some of the edge case data points but it retains the overall characteristics of the degree distribution.

**There is also a problem with inconsistencies in the labels. They are comparing their model to generate HMN with the TWITT dataset; in the plot, they refer to the TWITT dataset as "user-user", the network they generate is named "synthetic 20000" in the legend, and "synthetic" in the main body of the manuscript, the Erdos-Reny network is called "erdos-reyni", and "Internet as graph" in the main text became "random internet" in the figure legend.**

We thank the reviewer for pointing out the inconsistencies in the labels. We have updated the manuscript to correct these inconsistencies.

###

### Tables

**In Tables 2-5, the authors compared the network generated with their algorithm with several real-life networks. (I like the idea) But they fail even at generating networks with the same number of nodes. In table 2, for example, they generated both a HMN network and a BINBALL network to mimic the EATN network. The original network had 55 nodes and 97 edges, while the one they generated with their algorithm had 67 nodes and 208 edges, and the one generated with the classical model BINBALL had 106 nodes and 22 edges. Not only the BINBALL air network is not connected, but more than half of the airports (nodes) they generated have 0 connections with other airports.**

**Answer:** The objective was to generate networks that are realistic and observe several properties of the network. In fact, when we generate a synthetic network from the statistics of another real world network, matching the number of nodes is possibly the lowest statistic of concern. The reason behind this is that our target was to generate a network with comparable global properties and not with the same number of nodes as shown in \[10\]. The same can be understood by the study of Barabasi on ER networks \[1\]. That was the same reason ER graphs were not able to explain many properties of real world networks \[2,11\]. The reviewer can refer to the following literature on synthetic network data set generators for similar approaches \[3-9\]. Now, in our experiments, we consider properties like centrality, clustering coefficient, and triangles, among others, so we did not match the exact number of nodes but tried to keep them comparable. This is in line with Unknown node-correspondence (UNC) methods where we compare two networks based on global structural properties \[10\]. The reviewer has correctly mentioned that we have not tried to exactly replicate the same network but modeled it using our algorithm which is a Known node-correspondence (KNC) method. Exact replication, possibly does not make any sense, as then rewiring network edges \[11\] will provide much better results. Our final objective is to generate networks that can be utilized to develop algorithms applicable to HMN. In the experiments, we tried to show that we can generate heterogeneous, homogeneous, and multilayer networks that follow real world network properties. Analogously, then the algorithm can be used to generate HMN as well.

Following the reviewer's suggestion, we have updated all the tables in our manuscript with the parameters to make the results reproducible.

Furthermore, for BINABLL, we have implemented the algorithm ourselves since the authors did not make the code available. We rechecked our implementation of the concern raised by the reviewer and found the results to be the same. So, we believe this is a problem with the BINBALL algorithm itself.

1. L. Barabási and R. Albert, Emergence of Scaling in Random Networks, Science 286(5439) (1999), 509–512. doi:10.1126/science.286.5439.509.
2. P. Erdös and A. Rényi. On random graphs, i. Publicationes Mathematicae Debrecen, 6:290, 1959\.
3. Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109–137, June 1983\. doi:10.1016/0378-8733(83)90021-7.
4. Stephen J. Young and Edward R. Scheinerman. Random Dot Product Graph Models for Social Networks. In *Algorithms and Models for the Web-Graph*, pages 138–149. Springer, Berlin, Germany, 2007\. [doi:10.1007/978-3-540-77004-6\_11](https://doi.org/10.1007/978-3-540-77004-6\_11).
5. Benchmark graphs for testing community detection algorithms”, Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi, Phys. Rev. E 78, 046110 2008
6. Watts, D. J. ‘Networks, Dynamics, and the Small-World Phenomenon.’ Amer. J. Soc. 105, 493-527, 1999\.
7. Hakimi S., On Realizability of a Set of Integers as Degrees of the Vertices of a Linear Graph. I, Journal of SIAM, 10(3), pp. 496-506 (1962)
8. Albert, R., & Barabási, A. L. (2000) Topology of evolving networks: local events and universality Physical review letters, 85(24), 5234\.
9. M. Bazzi, L.G.S. Jeub, A. Arenas, S.D. Howison and M.A. Porter, A framework for the construction of generative models for mesoscale structure in multilayer networks, Phys. Rev. Research 2(2) (2020), 023100\. doi:10.1103/PhysRevResearch.2.023100.
10. Tantardini, M., Ieva, F., Tajoli, L. *et al.* Comparing methods for comparing networks. *Sci Rep* **9**, 17557 (2019). https://doi.org/10.1038/s41598-019-53708-y
11. The Structure and Function of Complex Networks, M. E. J. Newman, SIAM Review 2003 45:2, 167-256

2 Comments

meta-review by editor

Submitted by Tobias Kuhn on Wed, 10/30/2024 - 11:10

Both reviewers acknowledge improvements but also express that they consider the contribution of your work to be either unclear or limited. Considering the very skeptical evaluation of the reviewers, I expect the process of turning the manuscript into a paper publishable in Data Science to be very long. Accordingly, I decided to reject your submission.

Michael Maes (https://orcid.org/0000-0001-9416-3211)

Attachment not accesible

Submitted by Shraban Chatterjee on Tue, 11/05/2024 - 04:42

Kindly allow access to the attchment, we are not able to aceess it.