Efficient Similarity Measures for Clustering a Huge Dataset: A Critical Review

Tracking #: 527-1507


Responsible editor: 

Olivia Woolley-Meza

Submission Type: 

Survey Paper

Abstract: 

The need for appropriate applications of the various similarity measures for clustering has arisen over the years as data massively keeps on increasing. The issue of deciding which similarity measure is the best and on what kind of dataset have been a very cumbersome task in the field of data mining, data science, and organizations that are highly depending on the knowledge outcome from a huge set of data to make some vital / crucial decisions. Because various datasets portray some common features associated with them; therefore the need for clearer understanding of various similarity measures for clustering different datasets is needed. This paper presents a critical review of various similarity measures applied in text and data clustering. A theoretical comparison has been made to check the suitability of the measures on different kind of data sets.

Manuscript: 

Supplementary Files (optional): 

Tags: 

  • Reviewed

Data repository URLs: 

http://kdd.ics.uci.edu/databases/reuters21578/reuters21578

Date of Submission: 

Friday, November 3, 2017

Date of Decision: 

Sunday, December 24, 2017

Decision: 

Reject

Solicited Reviews:


1 Comment

Meta-Review by Editor

We have reviewed your submission carefully. The task that you set out to address with the manuscript is an important one. Nonetheless, in its current state the paper is not suitable for publication, requiring very significant changes. The reviewer's commentaries offer useful input to guide improvement of your work. I would emphasize the most important areas: (1) a more in depth evaluation of the efficiency of algorithms on large scale datasets, (2) a more comprehensive review of existing methods (or justification of your reduced scope), and (3) differentiating your work from existing reviews. The text also needs umerous corrections to grammar and spelling toimprove readability. Finally, a clearer explanation of what data you are using is needed, and should you decided to submit again to this journal , you must ensure the data is  FAIR and openly available in established data repositories.

Olivia Woolley-Meza (0000-0003-4517-2765)