Geometry and Machine Learning: A Survey for Data Scientists and Machine Learning Researchers

Tracking #: 536-1516

Authors:

	Name	ORCID
	Colleen Farrelly	https://orcid.org/0000-0003-0725-6706

Responsible editor:

Karin Verspoor

Submission Type:

Position Paper

Abstract:

Many machine learning algorithms and statistical models rely heavily upon matrices and linear algebra for computation. Linear algebra is well-suited to modern computing, as operations can be computed quickly, have decent enough accuracy, and only rely on a few assumptions about the data (usually relating to linear independence of columns/rows, sample sizes being larger than the number of predictors, and determinant values of the matrix). However, assumptions sometimes fail in the real world, and accuracy is not always as good as a machine learning practitioner might need. Data may not even lie within a linear space. Fortunately, a plethora of alternatives to linear-algebra-based algorithms are being actively developed, providing machine learning researchers and data scientists with many useful tools. This new set of tools and algorithms is highlighting problem areas within many popular machine learning frameworks, creating tools that can work on extremely small datasets or datasets with many correlated predictors, and exploring the synergy between disparate fields of mathematics and computer science. Many of these algorithms rely on data and model geometry, and the methods detailed in Part 1 (algebraic geometry) and Part 2 (differential geometry) of this overview are only a small subset of those being actively explored and developed.

Manuscript:

ds-paper-536.docx

Data repository URLs:

None.

Date of Submission:

Monday, June 11, 2018

Date of Decision:

Friday, August 3, 2018

Nanopublication URLs:

Decision:

Reject

Solicited Reviews:

Review #1 submitted on 27/Jul/2018

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak
Suggested Decision: Reject
Technical Quality of the paper: Weak
Presentation: Weak
Reviewer`s confidence: High
Significance: Moderate significance
Background: Incomplete or inappropriate
Novelty: Limited novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

The paper is essentially a survey of Geometrically motivated methods and their applications to machine learning. The author provides a set of methods and mentions their applications under two categories of algebraic geometry and differential geometry.

Reasons to accept:

I vote for rejection

Reasons to reject:

The paper attempts to provide a comprehensive survey of Geometrically motivated methods for learning. First and foremost, there is a large abundance and depth of such method and the survey doesn’t provide any of those. The survey is more of a short list of methods and their applications which have been selected in what seems to be some random or author dependent priority without any clear motivation for the selection. The survey is lacking proper depth, and professionalism in describing the basic mathematics or even intuition behind these methods. The reader is left with some list of method names but without any clear understanding of their mathematical basis. It is even hard to understand the advantage and disadvantages that each method has in various applications under such circumstances. The level of scientific writing, I’m afraid, is lacking and below standard.
In some places there are attempts to explain some intuition about existing methodology and its mathematical concepts however it is done in a rather vague manner, for example, “Forman-Ricci curvature (Weber, Jost, & Saucan, 2016), which measures the amount of “stuff” weighing down and spreading out each edge in the network, signaling network growth/shrinkage and importance of that edge to the overall shape of the network (with highly-curved edges serving as part of the network’s underlying skeleton).” How could anybody not familiar with the concept understand that clearly? Some basic formalism could help here.
The illustrations are simplistic and often lacking clear purpose: for example, the tangent space drawing is rather trivial. The illustration have no numbering and captions. In page 6 there is some illustration of 2D points with some coloring related to labels – what is this related to? What point is the author trying to make via this illustration. I am afraid this is below any publication standard.
What does the illustration in page 4 trying to convey? Some of the concepts there are not even discussed in the survey.
The Language is vague, using undescribed nouns and concepts as if the authors expects the reader to be completely on same page with him\her: Last paragraph on page 3 is unclear and missing relevant references to allow the reader to understand better: what are paired-ranking problems, what ‘items’ are you referring too?
Another example, “The image below shows a 3-D brain image with 4 landmarks and the disc created by the Ricci-flow-based map between spaces.” What spaces are you referring to? Where are the 4 landmarks in the example? They are not marked no the image, how could anyone understand the illustration or the point you are trying to convey.
The author describes many mathematical terms and algorithms but doesn’t provide adequate reference. references to Bertinini algorithm or to a book in algebraic geometry, what about references to the MDS algorithm, PCA, etc. I am afraid this is again below any publication standard.
There are many more incidents as the one I describe above which renders this manuscript unpublishable in my view. I hope the author can use my comments to improve the writing and presentation.

Nanopublication comments:

Further comments:

Review #2 submitted on 03/Aug/2018

By Parikshit Ram ORCID logo

https://orcid.org/0000-0002-9456-029X

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Average
Presentation: Weak
Reviewer`s confidence: Medium
Significance: High significance
Background: Incomplete or inappropriate
Novelty: Clear novelty
Data availability: With exceptions that are admissible according to the data availability guidelines, all used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

This manuscript is attempting to introduce the ideas of algebraic geometry and differential geometry to a wider audience of machine learning and data science. The manuscript explains these concepts and their applicability to machine learning on a really high level.

Reasons to accept:

The idea of introducing these concepts of algebraic geometry and differential geometry to a wider audience is extremely valuable in my opinion, and I believe that some version of this manuscript should be published and made available to a wider audience. However, I voted as undecided for acceptance because of the reasons below.

Reasons to reject:

I want to start with the caveat that I would love for this manuscript to get published, and the comments below are just some suggestions to make this manuscript more accessible and useful:
- The manuscript is lacking appropriate references to the "plethora" of examples of the application of algebraic geometry or differential geometry to solving ML problems. These references should be such that it will allow the reader to easily recognize the value of these techniques for general ML tasks.
- Moreover, an exposition of (possibly simple) a complete such application in the manuscript would be very useful to the wider audience.
- A lot of examples and technical terms in the manuscript need appropriate references. Or alternately, the manuscript can contain more mathematical details.
- The examples in the (unnumbered) images need context and more elaborate explanation in the text to be useful to the reader.

Nanopublication comments:

Further comments:

2 Comments

Unable to handle

Submitted by Matjaz Perc on Tue, 06/12/2018 - 03:15

Sorry, I am unable to handle this because of traveling.

Meta-Review by Editor

Submitted by Tobias Kuhn on Fri, 08/03/2018 - 08:59

The reviewers have acknowledged that topic of the paper is relevant, and a paper which provides a comprehensive survey of the relationship between algebraic geometry and machine learning would have significant value. However, both reviewers observe that the survey is currently too shallow, and does not adequately reflect the current state of the art in the field. It is a short manuscript which does not introduce the core concepts in sufficient depth or with sufficient mathematical rigor. There are inadequate references to papers in the field, and as such cannot really be considered a "survey". The author is advised to reconsider the scope, audience, and objectives of the manuscript.

Karin Verspoor (http://orcid.org/0000-0002-8661-1544)

Data Science