Class Imbalance Learning of Defective Prone Modules Using Adaptive Neuro Fuzzy Inference System

Tracking #: 514-1494

Authors:

	Name	ORCID
	Satya Srinivas	https://orcid.org/0000-0001-9608-9481
	Yesubabu Adimulam	https://orcid.org/0000-0001-8072-5252
	Pradeepini G	https://orcid.org/0000-0001-7757-6559

Responsible editor:

Evangelos Pournaras

Submission Type:

Research Paper

Abstract:

Defect Identification is a major challenge in Software Development Process. Identifying a Defect in early stages reduces the cost of Software Development rather than the later stages. This motivates Demand for applying Data mining techniques for Predicting Software Defects. But the datasets available for predicting software defects are imbalance in nature. Due to imbalance nature of data available, the classifier performance will be degraded even though the classifier has low error rate. To improve the performance of classifier, In this paper, we applied Cost Sensitive Adaptive Neuro Fuzzy Inference System(CSANFIS). The performance of the classifier is measured using AuC(Area under ROC curves) values. We observed AuC value for CSANFIS was high compared to existing different over sampling & under sampling methods.

Manuscript:

ds-paper-514.doc

Data repository URLs:

promise.site.uottawa.ca/SERepository/

Date of Submission:

Thursday, August 3, 2017

Date of Decision:

Thursday, September 14, 2017

Nanopublication URLs:

Decision:

Reject

Solicited Reviews:

Review #1 submitted on 16/Aug/2017

By Stefano Bennati ORCID logo

https://orcid.org/0000-0001-7603-8564

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Bad
Suggested Decision: Reject
Technical Quality of the paper: Unable to judge
Presentation: Bad
Reviewer`s confidence: High
Significance: Low significance
Background: Incomplete or inappropriate
Novelty: Unable to judge
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

The paper describes a technique for prediction of software bugs based on fuzzy logic, to be used in situations where the training datasets contains a small number of positive labels.

Although being interesting and relevant work, the paper raises many questions about soundness of methodology and results, thus I recommend rejection.
Specifically, there are too many unjustified statements and modeling choices. Without proper justification, i.e. citing previous work or showing convincing evidence, it is impossible to determine if the claims are true and if results are novel.

Reasons to accept:

None

Reasons to reject:

A large field of literature is neither described nor compared to the proposed method, which weakens the paper significantly (see the survey paper D’Ambros et al. 2012. Evaluating defect prediction approaches: a benchmark and an extensive comparison).
The technique uses a classic evaluation metric: the ROC curve. This choice over a more sophisticated metric, e.g. effort-aware, is not justified.
The authors state that some attributes of the dataset have been removed as they were weakly relevant, yet no proof of this has been presented.
Furthermore, the results of the simulations are not compared to any other existing technique, thus it is impossible to evaluate the quality of the proposed method.

Nanopublication comments:

Further comments:

There are several grammatical errors in the text, so a thorough proof reading of the paper by a native English speaker is recommended.

The introduction talks about how other authors used a specific dataset and lists its attributes.
Although being a useful information, it is not presented appropriately: this information should be moved to either the related work section or to the methods section, the database is not named, specific work relying on this database should be explicitly cited, the attribute names are not informative enough by themselves.

Figure 2 is a screenshot showing several plots, which are not readable and miss labels and captions. This figure should be rethought in order to convey more information with less charts.

Figure1 and the description of the layers can be improved.

Table 2 is not explained.

Figures 3 to 6 need titles, labels, legends and a baseline for comparison.

Overall the paper does not look curated and gives the impression of being put together in a hurry.

Review #2 submitted on 13/Sep/2017

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Bad
Suggested Decision: Reject
Technical Quality of the paper: Bad
Presentation: Bad
Reviewer`s confidence: High
Significance: Low significance
Background: Incomplete or inappropriate
Novelty: Lack of novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This article describes a machine learning process to increase the accuracy of a binary classifier on data with very uneven distribution of classes, also referred as the unbalanced class classification problem. This topic is being thoroughly researched in the field of machine learning, meaning that there are several related articles in literature.

Reasons to accept:

No evident reason to accept. The revisions that are needed for this article demand a lot of time to implement.

Reasons to reject:

Although the topic is popular, there is no comparison against other machine learning algorithms or benchmarks. The explanation of the reasons behind the addition of a cost weighting process needs to be more precise and clear. The absence of comparison with state of the art algorithms and the lack of a detailed and comprehensive explanation of the modification to the ANFIS model, make the contribution of the paper vague. The usage of the English language in the paper is not appropriate. Several syntactic, spelling and grammar mistakes, make the text difficult to read and understand. A lot of technical terms and abbreviations remain unexplained, leaving their understanding at the discretion of the user. There are unreferenced or unexplained strong arguments, e.g.:
- Large number of inputs unnecessarily over fits the model.
Such arguments need to be referenced with several references, because there are several debates about them in modern data science, and may also be rejected for certain fields i.e. image recognition classification. No equation styling is used. The graphs are of very low quality, most of them are raster images, making it difficult to zoom and analyse. This paper, at its current state is not suitable for a journal publication. The presentation quality is very low and demands a lot of concentration, knowledge and dedication from the reader to follow the contents of the article. Improvements in plots, references and especially writing are necessary. More experiments need to be done for comparison, like the authors did in their previous ANFIS paper. If the increase in performance is significant, then the contribution of this paper will be clearer. Finally, I would like to see the evaluation for train, dev/validation and test set for this algorithm, to be able to determine whether the algorithm overfits/underfits. Other evaluation metrics can be shown as well in a table format or other chars.

Nanopublication comments:

Further comments:

Data Science

Class Imbalance Learning of Defective Prone Modules Using Adaptive Neuro Fuzzy Inference System

Tracking #: 514-1494

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision: