Reviewer has chosen to be AnonymousOverall Impression:
RejectTechnical Quality of the paper:
Unable to judgePresentation:
Unable to judgeNovelty:
Limited noveltyData availability:
All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript:
The length of this manuscript is about right
Summary of paper in a few sentences:
In this paper the authors present a tool for ML Metadata management. The system presents a multi-model database structure and supplemental api to record the activities and parameters within a machine learning project. In addition, the authors present a series of use cases for the system.
Reasons to accept:
Reasons to reject:
In this paper the authors present a tool for ML Metadata management. The system presents a multi-model database structure and supplemental api to record the activities and parameters within a machine learning project. The authors motivate the need for such a system, however the submission is very light on the technical implementations. Below is a series of comments section by section.
Data Science Workflow – This section presents a graph model used to represent a machine learning project lifecycle with nodes representing the key components. The authors allude to how this graph model can be used to represent different activities (e.g. “Hyper-parameter tuning experiments.” However, they provide no example of what this would look like. In addition, the authors state that the graph can be extended, but instead of documenting how this process works, refer the reader to examples on the projects Github repo.
The authors list the components within the application, starting with the API, however no examples of API endpoints are provided so it is hard to quantify the functionality of the API.
The authors state that AranoDB is the database used, however the authors in the previous section state “To use Arangopipe in your organization, the graph used to track machine learning and data science activities needs to be provisioned. This graph is called Enterprise ML Tracker Graph and is provisioned using the administrative interface”. Does this mean the ML tracker graph is a “component” within the AranoDB ? - In addition, no information on AQL is provided, what is the syntax, what does a query in AQL look like ?
The authors then state that a web user interface is a component of the system, a screenshot of the web user interface would be appreciated.
The sentence “It is possible to use Arangopipe with Oasis, Arango DB’s managed service offering on the cloud. This would require no installations or downloads. The details of doing this are provided in section” is missing the section number at the end.
The authors state that Arangopipe is available as a series of container images. Are these container images a “component” of the overall system or are they merely a means of distribution and deployment of said system?
The authors state “The administrator would use the administration API” – Is this a separate API from the one mentioned previously?
Illustrative Examples of Arangopipe
This section (which I assume is to demonstrate the usage of the system) immediately starts by redirecting the user to Github and proceeds to describe a notebook. The rest of this section proceeds to describe this notebook. Having some examples within the paper on how to use the system would be a better approach.
Reusing Archived Steps
This is a key issue within ML , however the authors again redirect the reader to user guides for actual information e.g. “Please see using Arangopipe with TFDV for exploratory data analysis for an example of performing this in an exploratory data analysis task using tensor flow data validation tensorflow data validation. Please see performing hyper-parameter optimization with Arangopipe for an example with a hyperparameter tuning experiment.” – why are these steps not outlined within the text itself?
The authors state “Node data and results from modelling are easily converted into JSON, which is the format that ArangoDB stores documents in.” – If this is the format they are stored in, how are they converted, is it not just a read at that point. If some transformation is necessary, what is that transformation?
Extending the data model
This paragraph serves to point the user to the Github repo
Experimenting and documenting facts about models and data
This paragraph points to the Github repo and does not contain any technical information
Checking the validity and effectiveness of machine learning models after deployment
The authors state “Arangopipe provides an extensible API to check for dataset drift” – what is this API - what are the inputs and outputs
Storing Features From Model Development
The authors state “Arangopipe can be used to capture features generated from machine development.” However, no information on how this is achieved is presented.
Overall comments – the paper is incredibly light on the technical components with the authors repeatedly telling the reader to go their Github repository instead – making the paper read more like an advertisement for the project rather than an academic paper. I am concerned with this level of offloading the technical documentation to links on external websites, what if the project migrates to a new repo name, or to a different version control system altogether, such crucial documentation would be lost.
While I believe that the project indeed addresses fundamental issues within ML, I cannot recommend this paper due to the severe lack of information provided within the text.