REVIEWER 1
Comment: How did you identify the number of principal components to keep during PCA? How did you set t-SNE hyperparams (perplexity, learning rate, number of iterations)? Can you show that your results are not just a consequence of the particular PCA / t-SNE settings you used?
Response: We retained 95\% of the variance through PCA to minimize information loss before applying t-SNE.
We employed the default hyperparameters for TSNE(n_components=2, perplexity = 30, learning_rate = "auto", n_iter=300)
Comment: Fig.1 mentions MultiHead Self-Attention and the use of a Standard Scaler. Neither are mentioned in the manuscript.
Response: We acknowledge the inconsistency, and such information has been added at the Methodology
Comment: There is no information on how embeddings are constructed for each dataset, nor a description of the contents of the datasets or the expected clustering results. You mention using a sentence embedding model from Sentence-BERT. Is this applied to all five datasets? If so, please specify how you format the text for embedding in each case. If you use tabular features as embeddings, please specify the sources of each feature.
Response: Yes the Sentence BERT was used in all text datasets. This part has been explained in detail in the Methodology
Comment: Table 4 does not specify the dataset used in the experiment
Response: MIMIC-III was the dataset used in this Tabel 4. We created this dataset, and it was the most complex among those used due to its structure, nature, and the presence of longer sentences. It was derived from patients' clinical notes and was designed to detect whether a patient was experiencing an adverse drug reaction.
Comment: Table 5 repeats the same exact DBI values for all four clustering algorithms. If this is a copying error, the actual DBI values are missing. Also, the constant data size used in the experiment is missing.
Response: We completely agree that the DBI results were overlooked due to a copying error, and the datasets' data sizes were missing. These issues have been corrected, and the revisions are reflected in Table 5.
Comment: Figure 7 is missing the dataset used to run the experiment
Response: MIMIC-III was the dataset used
Comment: Figure 8 is missing the constant data size used in the experiment
Response: The constant data size used in Figure 8 is 4000
Comment: The discussion section refers to SS-DBSCAN having better noise sensitivity than the baselines but does not specify how SS-DBSCAN handles noise differently from the default DBSCAN behavior. Are noise points excluded from the computation of Silhouette when doing the fast grid search, or are you referring to something else?
Response: We now clarified this in the Discussion
Comment: No discussion of limitations as typically found in a limitations section
Response: We have not yet identified specific limitations; however, we recommend further research to explore the applicability of our approach to other types of data, such as images, audio, and other modalities.
Comment: The Silhouette metric is used for both MinPts selection and experimental evaluation when comparing to baselines. However, it has been shown that Silhouette is not an appropriate metric for density-based clustering algorithms such as DBSCAN, specifically when dealing with irregularly (non-spherical) shaped clusters. Instead, the DBCV [1] metric should be used.
Response: We have used both Silhouette and Davies-Bouldin Index metrics. Other metrics, such as Normalized Mutual Information (NMI) and Homogeneity Score, are available but are designed for labeled datasets. Since our study involves unlabeled datasets, these metrics were not applicable.
Comment: No justification is given that dimensionality reduction is needed for the SS-DBSCAN hyperparameter selection process. It is intuitive since dimensionality reduction is commonly used with clustering in general, but this needs to be shown either empirically or theoretically if you are claiming that PCA + t-SNE is a critical piece of your contribution.
Response: We now justified the necessity of dimmentionality reduction in the Methodology
Comment: Why PCA + t-SNE and not UMAP [2]? UMAP is a dimension reduction algorithm commonly used with clustering - would SS-DBSCAN work with UMAP as well as it does with t-SNE? If not, this would be a critical limitation that potential users would want to know about.
Response: Our decision to use PCA followed by t-SNE was not meant to suggest that UMAP or other dimensionality reduction methods are inferior; instead, it was simply the approach that yielded the best results for our specific experiment. Therefore, we performed another experiment to prove this point. Along with this response , we have attached a document that shows the results. Yes, SS-DBSCAN can work with UMAP as well; it is not a limitation in this case.
REVIEWER 2
Comment: It is Unclear how the contributions under Section 3 relate to each other. Isn't (1) the same as (4)? Are (2) and (3) part of (1)?
Response: The interconnection between the contributions has been demonstrated. Contribution (1) primarily focuses on complexity and high-dimensional data, while Contribution (4) emphasizes scalability and adaptability.
Comment: Typo: "Contibution" > "Contribution"
Response: This typo has been corrected.
Comment: Methodology and Evaluation are not sufficiently separated. The Methodology section starts with "4.1 Data Preprocessing" detailing what datasets were used, which should be part of the evaluation (or maybe I am misunderstanding the purpose of this, in which case it should be clarified). First the conceptual approach should be fully described (e.g. in a "Methodology" section), before the details about the scientific evaluation are introduced (e.g. in a "Evaluation" section).
Response: This has been addressed in both sections. It is essential to mention the dataset used and its preprocessing in both sections to ensure clarity from a methodological perspective and within the experimental context.
Comment: Therefore it's unclear whether the pre-processing in Section 4.1 is part of the Approach or only done for the evaluation.
Response: Data preprocessing is now clearly stated as a crucial step in SS-DBSCAN and all other baseline algorithms, not just for evaluation. This process is not uniform in all research; it depends on the data used, and more details are given to respond to this.
Comment: These datasets comprise sequences with lengths ranging from a minimum of 50 to a maximum of 500, with an average length of 250.": What datasets are these? Why were they selected? What's their role? What are "sequences" here? How are these minimums/maximums relevant and how were they chosen?
Response: We now explicitly state the rationale behind dataset selection. Sequences refer to text segments in the dataset. The limits were chosen based on an initial exploratory analysis, ensuring that a majority of sequences retained essential information without unnecessary complexity. The average length of 250 reflects the natural distribution within the selected datasets.
Comment: Missing motivation why PCA, t-SNE, etc. are part of the approach. What purpose do they serve in the bigger picture?
Response: This is already addressed in the Methodology
Comment: Overall, the approach isn't well introduced on a general level. Section 4 dives right into the details. It should first give a general and intuitive overview and motivation.
Response: An overview and motivation subsection has been added at the beginning of the Methodology section to provide context before delving into the technical details.
Comment: A novel stratified sampling technique": This should be better motivated too. What's the intuition behind this? Why could we expect this to work better than the alternatives? A conceptual diagram or something like that could help too
Response: This has been addressed in the Methodology
Comment: Section 5, "across multiple datasets, including...": It's unclear how these datasets were selected.
Response: This has been explained in the Experiment Setup
Comment: Section 5: How was the dataset used in "varying sizes"? What kind of sampling was applied? This should be explained better.
Response: There was no predefined sample size for this type of research. Instead, experiments were conducted to demonstrate the scalability of our SS-DBSCAN algorithm. The comparative algorithms struggled with increasing data size, highlighting the robustness of our approach.
Comment: In Section 5, the paper relies too much on the visualizations. They are nice, but don't give scientific answers. I would only show a couple of the visualizations to give an impression for the reader. But the real results are in the tables, and they should get more prominence.
Response: We apologize if the extensive visualizations were distracting. The intention was to illustrate how cluster formations evolve as data size increases for each algorithm. Additionally, these visualizations were supported by results presented in the tables.
Comment: The tables should better explain what they show. E.g. "more is better" and "less is better". The best results could be shown in bold. The caption should give us more indication what we are looking at.
Response: The best results are now bolded to show the difference
Comment: For the subsections like "5.1.2. Clustering Results with DBSCAN", it should be made clearer whether this is used as a competing approach or baseline we are comparing your contribution against, OR whether this is a variant of showing the performance of your approach. Specifically, does 5.1.2 applying your approach under the hood too or not?
Response: Our experimental setup focused on comparing four algorithms: SS-DBSCAN, DBSCAN, HDBSCAN, and OPTICS. The objective was to evaluate the performance of SS-DBSCAN against other DBSCAN variants.
Comment: Overall, Section 5 is lacking structure. The short subsections and many images disturb the text flow.
Response: We acknowledge that the images disrupt the text flow. This is a common challenge when using LaTeX, as it adjusts the text to fit around empty spaces where images cannot be placed. However, we have ensured that all images and tables are properly cited in the text for easy reference.
Comment: The datasets included Emotion-Sentiment, Coronavirus-Tweets, Cancer-Doc, and Sonar.": Again, why these? This shold be justified
Response: This has been justified in the Experiment Setup
Comment: Given that there are still many kinds of datasets out there that this approach was not tested on (which is natural and fine), I think the Discussion section should include a part where the generalization to other kinds of datasets/domains is discussed. Can we expect this to work there too? A statement like "we don't know; needs further work" is perfectly fine here, but I think this should be addressed.
Response: We have explicitly mentioned this in the Discussion section, noting that future work is needed to assess its performance across a broader range of datasets.
Comment:Conclusion could be a bit more elaborate. Maybe picking up some points from the introduction again, and quickly summarizing the discussion points from Section 7.
Response: We have expanded the conclusion as suggested