Review Details
Reviewer has chosen to be Anonymous
Overall Impression: Bad
Suggested Decision: Reject
Technical Quality of the paper: Unable to judge
Presentation: Average
Reviewer`s confidence: High
Significance: High significance
Background: Unable to judge
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right
Summary of paper in a few sentences:
In this paper the authors present a tool for ML Metadata management. The system presents a multi-model database structure and supplemental api to record the activities and parameters within a machine learning project. In addition, the authors present a series of use cases for the system.
Reasons to accept:
N/A
Reasons to reject:
In this paper the authors present a tool for ML Metadata management. The system presents a multi-model database structure and supplemental api to record the activities and parameters within a machine learning project. The authors motivate the need for such a system, however the submission is very light on the technical implementations. Below is a series of comments section by section.
Data Science Workflow – This section presents a graph model used to represent a machine learning project lifecycle with nodes representing the key components. The authors allude to how this graph model can be used to represent different activities (e.g. “Hyper-parameter tuning experiments.” However, they provide no example of what this would look like. In addition, the authors state that the graph can be extended, but instead of documenting how this process works, refer the reader to examples on the projects Github repo.
Software Implementation
The authors list the components within the application, starting with the API, however no examples of API endpoints are provided so it is hard to quantify the functionality of the API.
The authors state that AranoDB is the database used, however the authors in the previous section state “To use Arangopipe in your organization, the graph used to track machine learning and data science activities needs to be provisioned. This graph is called Enterprise ML Tracker Graph and is provisioned using the administrative interface”. Does this mean the ML tracker graph is a “component” within the AranoDB ? - In addition, no information on AQL is provided, what is the syntax, what does a query in AQL look like ?
The authors then state that a web user interface is a component of the system, a screenshot of the web user interface would be appreciated.
The sentence “It is possible to use Arangopipe with Oasis, Arango DB’s managed service offering on the cloud. This would require no installations or downloads. The details of doing this are provided in section” is missing the section number at the end.
The authors state that Arangopipe is available as a series of container images. Are these container images a “component” of the overall system or are they merely a means of distribution and deployment of said system?
The authors state “The administrator would use the administration API” – Is this a separate API from the one mentioned previously?
Illustrative Examples of Arangopipe
This section (which I assume is to demonstrate the usage of the system) immediately starts by redirecting the user to Github and proceeds to describe a notebook. The rest of this section proceeds to describe this notebook. Having some examples within the paper on how to use the system would be a better approach.
Reusing Archived Steps
This is a key issue within ML , however the authors again redirect the reader to user guides for actual information e.g. “Please see using Arangopipe with TFDV for exploratory data analysis for an example of performing this in an exploratory data analysis task using tensor flow data validation tensorflow data validation. Please see performing hyper-parameter optimization with Arangopipe for an example with a hyperparameter tuning experiment.” – why are these steps not outlined within the text itself?
The authors state “Node data and results from modelling are easily converted into JSON, which is the format that ArangoDB stores documents in.” – If this is the format they are stored in, how are they converted, is it not just a read at that point. If some transformation is necessary, what is that transformation?
Extending the data model
This paragraph serves to point the user to the Github repo
Experimenting and documenting facts about models and data
This paragraph points to the Github repo and does not contain any technical information
Checking the validity and effectiveness of machine learning models after deployment
The authors state “Arangopipe provides an extensible API to check for dataset drift” – what is this API - what are the inputs and outputs
Storing Features From Model Development
The authors state “Arangopipe can be used to capture features generated from machine development.” However, no information on how this is achieved is presented.
Overall comments – the paper is incredibly light on the technical components with the authors repeatedly telling the reader to go their Github repository instead – making the paper read more like an advertisement for the project rather than an academic paper. I am concerned with this level of offloading the technical documentation to links on external websites, what if the project migrates to a new repo name, or to a different version control system altogether, such crucial documentation would be lost.
While I believe that the project indeed addresses fundamental issues within ML, I cannot recommend this paper due to the severe lack of information provided within the text.
Nanopublication comments:
Further comments:
1 Comment
Meta-Review by Editor
Submitted by Tobias Kuhn on
Your paper is quite interesting and could be valuable contribution for ML engineers but the manuscript needs to significanty reviewed. If you intend to submit a revision please ensure to address each reviewer's comments in point by point fashion.
Strengths
Weaknesses
Recommendation 1: Address the technical gaps in the manuscript.
Although well motivtated the manscript is lacking in techincal details and examples/illustrations in various sections. These gaps must be addressed in a future revision. See Review 2 for minor comments and Review 3 comments in particular for further corrections.
Recommendation 2: Improve the scientific writing and referencing significantly.
-Clean up the referencing and engage with proper peer reviewed citations in order to correctly support claims in the manuscript.
-Adapt the content to the apprioriate style for a scientific article.
-See Review 2 comments in particular for further corrections.
Brian Davis (https://orcid.org/0000-0002-5759-2655)