Technology for reduction feature space for classification immunosignature data

Tracking #: 555-1535

Authors:

	Name	ORCID
	Владимир Андрющенко	https://orcid.org/0000-0002-9757-0733
	Alexander Koshechkin	https://orcid.org/0000-0002-9751-1565
	Olga Romanovich	https://orcid.org/0000-0001-5698-991X
	Daniel Stamate	https://orcid.org/0000-0001-8565-6890
	Alexander Zamyatin	https://orcid.org/0000-0002-1416-7472

Responsible editor:

Núria Queralt Rosinach

Submission Type:

Position Paper

Abstract:

Random sequences of peptides in a microchip make it possible to generate specific immunosignatures that can diagnose various diseases. A large number of features does not allow for the quick and efficient analysis of such data. In this study, we propose technology to reduce feature space using various methods. The proposed technology makes it possible with minimal computational costs to ensure the accuracy and reliability of the classification of immunosignature data. The technology was tested on samples formed from a set of real data with the introduction of noise at various levels. The efficiency of the proposed technology on all test samples with various classifiers used for further data analysis is shown. .

Manuscript:

ds-paper-555.docx

Data repository URLs:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52580

Date of Submission:

Tuesday, January 29, 2019

Date of Decision:

Thursday, April 11, 2019

Nanopublication URLs:

Decision:

Reject

Solicited Reviews:

Review #1 submitted on 30/Mar/2019

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Bad
Suggested Decision: Reject
Technical Quality of the paper: Bad
Presentation: Bad
Reviewer`s confidence: High
Significance: Low significance
Background: Incomplete or inappropriate
Novelty: Lack of novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper presents an overview on dimensionality reduction algorithms for biologic applications (classification of immunosignature data).

Reasons to accept:

The article is well organised in general even though no discussion section is included to ascertain the limitations of the methods presented for the problem at hand. There are multiple grammar errors, which must be fixed. It is also important to note that there are multiple words in Russian, which are impossible to understand by non-native speakers. It is highly recommended to review the whole manuscript thoroughly.

Reasons to reject:

The presentation of the different filtering methods is not clear at all in the manuscript. The method section starts with a short overview of filtering methods, then moves on to present the wrapper methods and, finally, the embedded methods to go back to filtering methods.
The experimental design is not clear at all. There is no justification about why the authors use Gaussian noise as distortion without ascertaining the underlying distributions of the original dataset, which, quite often, are not Gaussian. Moreover, different standard deviations are used but no attention is given to the mean. Under this situation it is quite possible that the distorted dataset will remain separable.
The experimental design does not justify why multiplicative noise is used. Moreover, no graphics are presented for additive noise. If results are not presented for that, why add it in the first place?
The classification algorithms are standard (RF and SVM) and have not been justified in sufficient detail in the paper. The experimental design for these classifiers is not clear either. Have the authors used a test dataset or is it a cross-validation experiment? It also necessary to present how the authors have controlled the issue of class imbalance and overfitting, which is not clear in the current manuscript.

Nanopublication comments:

Further comments:

Review #2 submitted on 30/Mar/2019

By Peter Bloem ORCID logo

https://orcid.org/0000-0002-0189-5817

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Bad
Suggested Decision: Reject
Technical Quality of the paper: Bad
Presentation: Weak
Reviewer`s confidence: High
Significance: Moderate significance
Background: Incomplete or inappropriate
Novelty: Unable to judge
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

The paper investigates a handful of existing per-feature feature selection methods for the task of classifying human test subjects into a variety of cancer-based classes, using peptide biomarkers as features. The problem with the dataset is the large number of features compared with the low number of instances.

The authors apply various per-feature metrics to select the potentially strongest features and show that with a small number of these, a reasonable Cohen's kappa score can be achieved.

Reasons to accept:

Any positives the paper has are unfortunately outweighed by the reasons to reject outlined below.

Reasons to reject:

## Primary issues:

The paper is poorly written. So poorly, that it is impossible to deduce exactly what the authors did, and hence to properly evaluate the results. A few examples of sentences that make it impossible to understand what the authors did:
- "Results of evaluating informativeness of features can be represented in the form of a matrix, where the element of the matrix is evaluation of filtering method, columns are various binary comparisons, and rows are numbers of attributes in original data set." This seems to be the basis of the author's ensembling method, so so it's a crucial part of the paper. Unfortunately I cannot come up with a meaning for the phrase "various binary comparisons." This sort of handwaving should be replaced by an entirely unambiguous description of exactly what steps were performed.
- "For formation of final data set, a strategy is used to select the N best features for each binary comparison." Which strategy? This seems like a crucial detail. Does it refer to the ensembling method described in the next section?
- "To assess effectiveness of selection of informative features, it is necessary to form samples with various parameters that vary widely. It is advisable to implement this by distorting informativeness of features by entering noise into the data set." This sentence should explain the motivation of the main experiment of the paper. I'm afraid I cannot make sense of it. I don't understand what is being sampled, which parameters should vary widely, and how adding noise helps.

The evaluation is entirely insufficient and poorly presented. The most I can glean from the images is that a reasonable classification can be made from as little as 10 features, with no discernible difference between any of the methods tested (figures 1 and 2). The authors then add multiplicative and additive random noise in an attempt to distinguish between the methods (figures 3-6). It is not clear why a method should be preferred based on its robustness to random noise. In the domain where the classification performance is still reasonable (up to 1.0 standard deviation), there seems to be very little difference between the performance of the different methods.

The authors do not describe their evaluation protocol. How many instances were withheld as a test set? Were hyperparameters evaluated on a validation set to ensure that the test set was only seen once? Without a precise specification of the protocol, the paper cannot be evaluated.

The dataset contains very few instances. This makes it likely that the test set was no bigger than a few hundred instances, which means that for many of the classes we can expect no more than two or three instances in the test set. The authors should provide a confusion matrix to show how many classes are simply being ignored by the resulting classifier. Moreover, with such small data, it is extremely likely that most of the differences in performance between the different methods are due to random chance. The authors should provide error bars on all performance estimates to make this clear.

There seems to be a very strong overlap with the paper cited as [28], but no comparison between the two (the paper contains no related work section). [28] is cited to substantiate the choice of Cohen's kappa, but in [28] accuracy is used instead, and no mention is made of Cohen's kappa. Given this earlier work I am skeptical that there is any novelty to the paper.

The authors are not the first to create an ensemble of feature filtering methods. A few minutes on Google scholar results in the following relevant papers:
- Feature Selection Using Ensemble Based Ranking Against Artificial Contrasts, Tuv et al
- Robust feature selection using ensemble feature selection techniques, Saeys et al
- Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Abeel et al

Moreover, section 4.1 in [17] mentions several ensembling techniques that are specific to feature selection in bioinformatics.

## Secondary issues and comments

- Cohen's kappa is difficult to interpret. Since this is a paper about a specific use case, it should discuss exactly what levels of performance make a satisfactory classifier in the domain of the data. The authors should also show some simple baselines, to indicate which ranges of kappa value are actually relevant.
- The authors should be more clear about the specific claims they make about their method. For instance, in their conclusion they state that the "efficieny" is shown. This suggests that the method is efficient in terms of time or memory, but such a claim is never evaluated. If the claim is that the method is _effective_ (i.e. it results in good classification performance), the resulting performance should be placed in the context of the domain.
- The main classification problem in the domain is likely the binary problem of whether a person has cancer or not. It would be good to see that binary problem (i.e. healthy-vs-others) evaluated seprately. This would allow the authors to perform much more insightful evalutations, such as binary confusion matrices or precision/recall curves.
- It doesn't seem necessary to explain wrapper methods and embedding methods if only filtering methods are used.
-"Like the wrapper methods, the built-in methods are specific to a particular learning algorithm" I don't think wrapper methods are specific to a particular learning algorithm. The interact with the learning algorithm, but they can be applied to any algorithm that performs classification.

## Minor issues

- The authors often refer to creating a "technology" (including in the title). I would say that a paper like this should present and evaluate a method, the specific implementation of which is the technology.
- The title is not grammatical. Something like "A method to reduce the feature space in the classification of immunosignature data" would be more conventional.
- page 3: filtration -> filtering
- page 4: there seems to be a bracket missing in the definition of the JM distance.
- on page 4, some Cyrillic words appear, apparently in place of the word "where".
- The way formula symbols are explained in subclauses separated by a hyphen is very confusing. I would recommend spelling this out more explicitly: "where r_{i, j} refers to the relative ranking of [...], \text{pos}_{i,j} indicates the feature rating value [...]"

Nanopublication comments:

Further comments:

My background is in machine learning, not bioinformatics. Therefore, this review only relates to the machine learnign and data science aspects of the paper, not the bio-informatics aspects.

Review #3 submitted on 05/Apr/2019

By Sergi Picart ORCID logo

https://orcid.org/0000-0002-6426-8204

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak
Suggested Decision: Undecided
Technical Quality of the paper: Weak
Presentation: Weak
Reviewer`s confidence: Medium
Significance: High significance
Background: Incomplete or inappropriate
Novelty: Unable to judge
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This paper suggests an ensemble feature selection algorithm for high-dimensional immunosignature data with a low computational cost, intended to (1) outperform single approaches, and (2) be more robust to noisy data. The ensemble has three filtering methods -the gain ratio, the relief-F and the M-statistic- to rank each biomarker and aggregates the ranks as a weighted mean. The weights are based on the accuracy of each method. Their approach is applied to a public immunosignatures study, both with and without artificial Gaussian noise.

Reasons to accept:

1. The ensemble algorithm is sound and simple and eases the filter choice.

Reasons to reject:

1. There is no mention on how the data was arranged (e.g. holdout/cross-validation/leave-one-out) to estimate the performance while avoiding overfit.
2. The results section lacks elaboration. The findings deserve a deeper discussion, interpretation and a more formal analysis. Specific, quantitative and statistically sound claims are missing.

Nanopublication comments:

Further comments:

Feature selection is an outstanding challenge with high-dimensional data. The proposed method can be useful, but needs a more thorough description and the results require further elaboration to support the author's claims.

Q1. The state of the art shows that ensemble feature selection has already been used in other applications. The idea of aggregating various rankings by a weighted approach has been already explored:

Saeys, Y., Abeel, T., & Van de Peer, Y. (2008, September).
Robust feature selection using ensemble feature selection techniques.
In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 313-325). Springer, Berlin, Heidelberg.

In a posterior paper, an ensemble feature selection is weighted according to the performance in bootstrapped data:

Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., & Saeys, Y. (2009).
Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.
Bioinformatics, 26(3), 392-398.

The authors should discuss such efforts.

Q2. The dataset entry in the references is GSE52580. Accessing it shows a series of 240 samples with discrepancies with Table 1. The right GEO number seems to be GSE52581, with 1,516 samples and coherent categories.

Q3. The dataset should be better characterised. The reader ignores the magnitude and range of the features (for instance, I ignore if the 5-sd noise is high or low, relative to the data). If possible, low-dimensional representations should be provided. Many of the filtering methods estimate mean values and variances, which are sensitive to outliers - I would advocate for checking for their presence.

Q4. The details on filtering/wrapping/embedded methods seem a bit out of place in the materials and methods section. At this point, the reader should be aware of the state of the art (see also Q1) and the motivation behind the article.

Q5. Some claims are not obvious and would benefit from a reference or clarification:
* Page 3: "In connection with the absence of dependencies between features due to the biological specificity peptides, it is advisable to use filtration methods as the most computationally efficient". High collinearity between features is a common problem in biological datasets. Please provide a reference showing that this is not the case for peptide data.
* Page 5: "Filtering algorithms have various disadvantages that do not allow to find optimal set of informative features, as a result of which the efficiency of various classifiers varies considerably". Which disadvantages are those? In what mathematical sense should the set of features be optimal?

Q6. The filtering methods need more details:
* Jeffries-Matusita: needs reference. Also, is it univariate or multivariate? If the covariance matrix was computed, only one value of beta would be available for each binary comparison.
* Fisher score: reference is broken. The terms in the formula should be described (S_i, n_j, mu_ij, mu_i, rho_ij, K). The summations should be over j instead of k.

Q7. Why are only gain ratio, relief-F and M-statistic included in the ensemble? Why not just use the five methods?

Q8. There seems to be some circularity in the choice of N and the ensemble weights. In the absence of an external validation dataset or holdout, choosing the best N value based on overall performance will inevitably improve downstream performance in further tests. The same applies to the weights w_A, w_B, w_C, as their choice is performance-driven, endowing the ensemble with an unfair advantage.

Q9. How are the hyperparameters of the random forest and the SVM chosen?

Q10. The artificial noise addition/multiplication needs more detail for proper reproduction. What are the expected values of such distributions? Please put in mathematical terms how both (especially the multiplicative) noises are incorporated.

Q11. In the description of Cohen's Kappa, the citation [28] (Andryuschenko et al.) might need double checking because it does not mention Kappa.

Q12. The weights for the ensemble filtering are based on accuracies. Are those literally accuracies, or Cohen's Kappas? If accuracies, the class imbalance can distort the metrics. If Cohen's Kappas, those can be actually negative, and so can the denominator.

Q13. N=10 was chosen, but the graph seems flat from N=4 on. Is there a quantitative way to justify N=10 (or any other value)?

Q14. The plots should show a dispersion measure for each feature selection algorithm for a proper comparison.

Q15. Claiming that an algorithm is "the best" needs the support of a statistical test, in order to discard that the observed differences come from the sampling effect alone.

Q16. What is a reasonable amount of noise? Is the scenario with sigma=5 plausible in a real dataset? To discard biases, it would also be interesting to prove that all the classifiers are random (i.e. Kappa around 0) when the dataset breaks down.

Q17. How does the estimated performance compare to that in the state of the art for immunosignature data?

Q18. The use of English throughout the manuscript would benefit from correction from a native speaker.

Minor issues:

- There are several non-english words scattered around (e.g. энтропия признака, page 3)
- Page 3: what does "higher risk of retraining" mean? I was unable to find the word "retrain" in reference [13].
- Do tables 2-4 display real data? If so, why not write the real rows and column names?
- Please specify how the attributes are sorted (by rows/columns, ascending/descending). Are the best rankings represented by low or high values? The final attributes are prioritised in the interval [0,1]; is 1 the least or the most informative?
- Table 5: first row seems out of place. It could also be more informative by including the function name, the package version and its reference, if any.
- "Technology" sounds somewhat confusing, consider using algorithm/approach/method instead.
- Reference [29] is actually lacking citations in the text.

Review #4 submitted on 09/Apr/2019

By Jose Gavaldá García ORCID logo

https://orcid.org/0000-0001-6431-3442

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Bad
Suggested Decision: Reject
Technical Quality of the paper: Weak
Presentation: Bad
Reviewer`s confidence: High
Significance: Moderate significance
Background: Incomplete or inappropriate
Novelty: Lack of novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

Features selection is important for the correct performance of classifiers. Apparently this is important for biomarker immunosignature technology and can be done better but they don't explain why it can be done better. Then they test some methods without really knowing why they are taking this approaches and without a clear structure. There is no real conclusion.

Reasons to accept:

If they convince me that this is relevant an they do something novel then I'd reconsider. I still need to see that.

Reasons to reject:

Poor structure. Feels like a draft of a draft. It feels like it has been translated using google translate, even leaving some non-latin characters. Poor description of the figures. Lack of clear research question to answer. Methods spread around the paper and often non-argued why they are suitable for the kind of data. The data is said to be found in public repositories but they don't indicate where. Too general section on how a features selector works. There is no description on why Cohen's kappa is suitable for this data.

Nanopublication comments:

Further comments:

First of all, the structure, grammar and English in general has to be improved greatly. Second, the relevance of this study and why it's an improvement from previous studies (or novel) needs to be stressed. In addition, there is no clear research question and the figures are simply there without getting any conclusion out of them (because there's no question to answer). There are no unites in the information carrying table. The figures are not described.

1 Comment

Meta-Review by Editor

Submitted by Tobias Kuhn on Thu, 04/11/2019 - 00:37

I decided to reject this manuscript as three out of 4 reviewers (with high reviewer confidence on the topic) had an overall bad impression and suggested to reject it, while one reviewer (with medium reviewer confidence) is undecided. They all agree that the manuscript is poorly written which makes its review difficult, with poor, unclear and insufficient description of fundamental parts of a paper: the research question, method, experimental design, evaluation and discussion.

Núria Queralt Rosinach (https://orcid.org/0000-0003-0169-8159)

Data Science

Technology for reduction feature space for classification immunosignature data

Tracking #: 555-1535

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

1 Comment

Meta-Review by Editor