Enabling text search on SPARQL-endpoints through OSCAR

Tracking #: 551-1531

Authors:

	Name	ORCID
	Ivan Heibi	https://orcid.org/0000-0001-5366-5194
	Silvio Peroni	https://orcid.org/0000-0003-0530-4305
	David Shotton	https://orcid.org/0000-0001-5506-523X

Responsible editor:

Alejandra Gonzalez-Beltran

Submission Type:

Resource Paper

Abstract:

In this paper we introduce the latest version of OSCAR (Version 2.0), the OpenCitations RDF Search Application, which has several improved features and extends the query workflow comparing with the previous version (Version 1.0) we presented at the 4th Workshop entitled Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination (SAVE-SD 2018), held in conjunction with The Web Conference 2018. OSCAR is a user-friendly search platform that can be used to search any RDF triplestore providing a SPARQL endpoint, while hiding the complexities of SPARQL, making the searching operations operable by those who are not experts in Semantic Web technologies. We present here the basic features and the main extensions of this latest version. In addition, we demonstrate how it can be adapted to work with different SPARQL endpoints containing scholarly data, using as examples the OpenCitations Corpus (OCC) and and the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI) datasets, both provided by OpenCitations, and the Wikidata dataset provided by the Wikimedia Foundation. We conclude by reporting the usage statistics of OSCAR, retrieved from the OpenCitations website logs, so as to demonstrate its usefulness.

Manuscript:

ds-paper-551.zip

Special issue (if applicable):

SAVE-SD 2017/2018

Data repository URLs:

https://github.com/opencitations/oscar

Date of Submission:

Wednesday, December 19, 2018

Date of Decision:

Wednesday, February 13, 2019

Nanopublication URLs:

Decision:

Solicited Reviews:

Review #1 submitted on 09/Jan/2019

By Eric Prud'hommeaux ORCID logo

https://orcid.org/0000-0003-1775-9921

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Excellent
Reviewer`s confidence: Medium
Significance: High significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

OSCAR is a combination text search and query-building interface designed to be applicable to different domains. The article describes its architecture, configuration and execution and shows its use over three kinds of citation data: OpenCitations, COCI and wikidata citations (e.g. Scholia curation). Related works are partitioned into the same text search and query-building categories. In OSCAR, these interfaces are unified to allow filtering and linking operations to provide nice Webby output.

Recommend: accept with revisions (listed in Further Comments input).

Reasons to accept:

Solid work with good utility. Well described. Workflow included usability testing.

Reasons to reject:

Doesn't really A/B test against the other text search or query-building engines described in Related Work.

Nanopublication comments:

Further comments:

"by means of appropriate facets and values" -- "facets and parameters"?
"OpenCitations Corpus (OCC) [8] and COCI (the OpenCitations Index of Crossref open DOI-to-DOI citations)" -- first is title (abbreviation), second is abbreviation (title); i think abbreviation (title) for both works better than the other way around.
"so as to search on these datasets" -- maybe "to enable search on these datasets"?
"Specify conversion rules are employed to" -- propose "User can specify conversion rules, which are employed to"...
Should check whether the publication rules demand en-uk vs. en-us, c.f. "customisation".
"customises OSCAR to work to a specific SPARQL-endpoint" -- s/to work to a/to work over a/
"so as to keep or excluding specific values" -- s/excluding/exclude/
"while OSCAR provide" -- s/provide/provides/
"query within in the text search box" -- s/in the/the/
"if a user choose to run" -- s/choose/chooses/
"other additional operations are executed" -- probably better with s/other //
"then OSCAR can be configured" -- can drop the "then"
"particular field (a key), all the fields" -- s/, all/. All/ # new sentence
"has more than one authors, the several rows will be returned" -- propose "has multiple authors, multiple rows will be returned"
"allows us to group all these authors" -- propose "groups all of these authors"
"bibliographic resource, since this IRI" -- s/,//
"specified category in order to build" -- I think there should be a ',' after "category"
"constructs, and then moved" -- propose "constructs, which are then moved"
"pre-processing functions (key heuristics)" -- does that mean they are specified in the conf in the attribute name "heuristics"?
"each chosen searching strategy" -- s/searching/search/
"consulted previous articles which talks about" -- s/talks about/talk about/, or maybe just "discuss"
"For instance in Figure 5 the advanced query built asks to retrieve all the articles" -- propose "For instance, in Figure 5 the advanced query interface constructs a query retrieve all of the articles"
"input box dedicated for the free-text " -- s/dedicated for the free-text/dedicated to free-text/
"The query written in the interface asks to retrieve all the articles" -- propose "The constructed query retrieves all the articles"
"These data refers to the" -- s/refers/refer/
"Instead, in the second part" -- s/Instead //?
"beyond the purpose of this work" -- s/work/paper/ (or document)?
"listed in the results table" -- s/$/(OCC-browse, COCI-browse)/ # to clarify the legent in Figure 7.
"Beside this" -- s/Beside/Besides/

Review #2 submitted on 13/Jan/2019

By Simon Cox ORCID logo

https://orcid.org/0000-0002-3884-3420

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Excellent
Reviewer`s confidence: High
Significance: Moderate significance
Background: Incomplete or inappropriate
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper reports on a tool for building simple search interfaces over SPARQL end-points, thus hiding SPARQL complexities from end users who may have no interest or specific need to learn or engage with SPARQL technical details.
Specifically, it allows an end-point-provider to configure a text search interface within a web-page, where the details of which RDF fields are used are hidden from the end user.
Some additional query combinations are supported through standard logic operations.
The tool was developed in the context of scholarly citation discovery and analysis, and has been primarily tested in that context, though it appears that the tool is completely general.
Evaluation of the tool is reported with respect to several citation data sources, primarily in terms of its usage statistics.

Reasons to accept:

The paper reports on some useful refinements of previously reported work, which were undertaken in response to feedback.
Given the complexity of the semantic web stack, and general lack of knowledge of SPARQL in the user (and developer community) tools such as OSCAR are critical in enabling the semantic web stack to have an impact beyond a narrow community of cognoscenti.
OSCAR appears to be a useful elaboration of patterns that have been developed and tested in a number of applications.

Reasons to reject:

None

Nanopublication comments:

Further comments:

The title only focuses on one aspect of the tool (text search) and could be modified to indicate that OSCAR has more general capabilities.

Full citation of the previous version is not appropriate for an abstract and should be removed.

The five-point list of enhancements which is included in the Introduction should be moved to section 3 or the summary and conclusions.

ELDA (the Epimorphics Linked Data API) is an additional general purpose tool for masking a SPARQL endpoint, with a simple configuration method, which should perhaps be added to the review of related work.

There is no real comparison of the capabilities of OSCAR with related tools - just a relatively cursory overview. A tabulation of the features of each compared with OSCAR would be helpful.

Section 3 contains some rather detailed technical description of the tool.
Some of sections 3.2 and 3.3 is possibly too much for a journal or conference paper, and perhaps should be left as references to technical documentation.

Review #3 submitted on 18/Jan/2019

By Riccardo Albertoni ORCID logo

https://orcid.org/0000-0001-5648-2713

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

The paper presents the latest version of OSCAR (OpenCitation RDF Search Application), a tool developed to enable Semantic web non-experts to query RDF triplestore. It extends a previous workshop paper by demonstrating how OSCAR can be configured to work with Wikidata SPARQL endpoint. It describes the latest version functionalities (e.g., an advanced query interface to create multi-field queries, novel preprocessing functions, conversion rules and a restructuration of the configuration files). It analyses the usage statistics retrieved from the OpenCitations website logs to demonstrate the usefulness of OSCAR.

Reasons to accept:

OSCAR makes SPARQL endpoints usable by a broad audience. It provides a helpful tool which might enrich a large number of data portals. It works with any SPARQL endpoints, and once configured it offers a fairly simple interface to the end users. The paper provides application examples in the domain of scholarly documents, but as far as I can understand it could be used in any other domain.

The paper is fairly well written and comprehensible.

In the context of data science, it is important to let developers and scientists advertise and get credit for tools like OSCAR that are made available for third-party reuse.

Reasons to reject:

Papers describing tools hardly fit with the usual review criteria (e.g., the novelty and validation of contribution) but I think some extra effort can be made to clear the contribution of OSCAR from the scientific point of view.

For example,
in relation with the novelty of contribution, what does OSCAR provide that the others do not? A deeper comparison between OSCAR and the tools mentioned in the related work would substantiate the novelty of contribution from the scientific perspective.

In relation to OSCAR usefulness, I have no problem believing it is useful. However, showing the Usage statistics retrieved from the OpenCitations website logs is a very weak proof of its usefulness. It does not distinguish between the usefulness of the content served by OpenCitations and the actual usefulness of OSCAR. Of course, it is better than nothing, but actual user satisfaction using OSCAR should be more systematically investigated.

Nanopublication comments:

Further comments:

Further suggestions and comments follow:

- Selected Keywords (i.e., OpenOffice, ODT to RASH) do not relate to the content of the paper.

- please state explicitly that OSCAR can be applied beyond scholarly data portals, as it is configurable on any SPARQL endpoint and RDF schema.

- In the related works, I would consider citing YASGUI as an example of an interface for semantic web literate. Laurens Rietveld, Rinke Hoekstra, The YASGUI family of SPARQL clients. Semantic Web 8(3): 373-383 (2017)

- In the caption of figure 1, "workshop [4].The", there is a missing space.

- Configuration examples might be not easy to grasp, I think a reference to the configuration instructions and the inclusion of some comments in the configuration file might help.

-I suggest adding a discussion of the limitations and applicability of OSCAR. For example, in a separate section. In such a section, the authors might want to answer the following questions: When configuring OSCAR is less handy than writing a custom user interface? Under what kind of licence OSCAR is made available? is there any assistance in case one has any difficulties when configuring/using OSCAR? etc.

- One thing I've noticed playing with the results from http://opencitations.net/search?text=machine+learning : if one sorts the results by the number of citations, and limits the number of results visualized, he gets the first ten results in the result set, not the first ten most-cited papers. This is extremely counterintuitive! I suggest to fix it in the next release of OSCAR.

- The Data science journal requires that all used and produced data are openly available in established data repositories, as mandated by FAIR and the data availability guidelines (https://journals.plos.org/plosone/s/data-availability). As far as I understand the guidelines have not followed for the statistics regarding the accesses to OSCAR used in section 5. Please fix it or make clear how you have met the availability guidelines.

1 Comment

Meta-Review by Editor

Submitted by Tobias Kuhn on Wed, 02/13/2019 - 13:21

As you will see from the enclosed reviews, they are broadly favourable, while there are several recommendations for changes and improvements that you must consider before the paper is published. Please, consider all reviewers comments and address them on the final version.

In particular, please address the concerns by all reviewers about including a more in depth comparison with other tools and describe what are OSCAR’s distinctive advantages. For example, consider including a table comparing OSCAR’s functionality against the functionality provided by other tools. Do include in the comparison the new tools suggested by reviewers. Please, also address the distinction between the tool and its data content and provide ways of evaluating both. In addition, please make available all the material required to evaluate the tool and content (e.g. the usage statistics).

As regards the availability of associated material, while OSCAR development is open and you provide the GitHub repository (https://github.com/opencitations/oscar), there are currently no releases in that repository. I recommend you create a release and use the GitHub/Zenodo association to obtain a DOI and make the code citable (see https://guides.github.com/activities/citable-code/). Thus, please include a citation to your software using the Zenodo DOI.

Finally, take into account all the textual changes indicated by reviewers and proofread the paper again (e.g. avoid the repetition of SAVE-SD workshop URL in the introduction).

Alejandra Gonzalez-Beltran (http://orcid.org/0000-0003-3499-8262)

Data Science

Enabling text search on SPARQL-endpoints through OSCAR

Tracking #: 551-1531

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Tags:

Special issue (if applicable):

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

1 Comment

Meta-Review by Editor