Ten years of Stream Reasoning. Now what?

Tracking #: 430-1410

Authors:

	Name	ORCID
	Daniele Dell'Aglio	https://orcid.org/0000-0003-4904-2511
	Emanuele Della Valle	https://orcid.org/0000-0002-5176-5885
	Frank van Harmelen	https://orcid.org/0000-0002-7913-0048
	Abraham Bernstein	https://orcid.org/0000-0002-0128-4602

Responsible editor:

Tobias Kuhn

Submission Type:

Position Paper

Abstract:

Stream reasoning studies the application of inference techniques to data characterised by being highly dynamic. It can find application in several settings, from Smart Cities to Industry 4.0, from Internet of Things to Social Media analytics. This year stream reasoning turns ten, and in this article we analyse its growth. In the first part, we trace the main results obtained so far, by presenting the most prominent studies. Looking at the past is useful to prepare for the future: in the second part, we present a set of open challenges and issues that stream reasoning will face in the next future.

Manuscript:

ds-paper-430.pdf

Revised Version:

Ten years of Stream Reasoning. Now what?

Tags:

Reviewed

Data repository URLs:

none

Date of Submission:

Thursday, March 2, 2017

Date of Decision:

Monday, March 27, 2017

Nanopublication URLs:

Decision:

Undecided

Solicited Reviews:

Review #1 submitted on 13/Mar/2017

By Fredrik Heintz ORCID logo

https://orcid.org/0000-0002-9595-2471

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Reject
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: High
Significance: Moderate significance
Background: Incomplete or inappropriate
Novelty: Limited novelty
Data availability: All used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The main contributions of the paper are an overview of research within the new and growing area of stream reasoning relative to a proposed reference model and a set of open research challenges in the area. Stream reasoning is both relevant and interesting for the data science community and broader.

Overall the paper is a reasonable summary of where we are and where the field is likely to go in the short term future. My main criticisms are that there is no clear definition of what stream reasoning is, that the requirements and the discussions are very informal and that it mainly considers work originating from the field of semantic web.

My recommendation is revise and resubmit. I'd be happy to review it again!

Reasons to accept:

The topic is highly relevant and the paper gives a quite good summary of stream reasoning from some of the leading researchers in the field. Having a current summary is valuable to the field. The suggested topics for future research are relevant and could help guide researchers.

Reasons to reject:

My main criticisms are that there is no clear definition of what stream reasoning is, that the requirements and the discussions are very informal and that it mainly considers work originating from the field of semantic web.

Let's start with the definition. In my view, it is relevant to distinguish between stream processing and stream reasoning. Stream processing is about doing simple transformations on streams like selection, projection, joins and mappings. The common characteristic is that the result is always a subset of the input or a straight forward mapping from input to output. Stream reasoning, on the other hand, is about inferring implicit or new information. No such distinction is made in paper. As a consequence, almost anything is stream reasoning. Is for example a UNIX shell supporting pipes a stream reasoner? Is the UNIX tool tr a stream reasoner (it takes a stream as input and replaces characters in it)? The definition that is used refer to "logical reasoning in real-time on [...] noisy data streams" and still the authors claim that only some stream reasoning approaches work in real-time and most do not consider noisy data at all. Does that mean that there are no stream reasoning tools in existence? I beg to differ. The fact that we lack a clear definition of stream reasoning is one of the major open problems. This is not addressed.

When it comes to the requirements, they are generally good and relevant, but they are also only informal. This makes it very hard to determine if a formalism satisfies the requirement or not. For example, take R8 Complex domains. I would argue that all SQL-based solutions, like most DSMSs, have quite good support for complex domains. Still the authors claim that this is not the case as seen in Table 1. What is the argumentation there? When it comes to R3 Variety, I would argue that CEP are very good at handling variety as they usually consider a large set of different types of events and how these are combined into more complex events. To take a third example, R1 Volume. This seems obvious. If a system can handle millions of items then it must be able to handle volume. But what about a system that is scalable in the sense that it can run on an arbitrary number of processors, but it is extremely slow, would it be good at handling high volume data? It can scale to an infinite amount of data, if you have infinite amount of computational power. My conclusion is that the requirements make sense, but are quite hard to use in practice since they are too vague. Again, coming up with formal requirements that can actually be tested would be a major research contribution. The requirements could be in the form of specific use-cases.

Lastly, the paper is written by people in the area of semantic web mainly for other people in the semantic web. For example, RDF, RDF graphs and SPARQL are never explained. It is also the case that many results from classical AI areas such as belief revision, reasoning about action and change and runtime verification are not included.

Nanopublication comments:

Further comments:

Detailed comments section by section.

Section 1
S1a: In the introduction some examples of stream reasoning are provided. It is not clear to me whether all of them can currently be expressed in some existing stream reasoning formalism.

Section 2
S2a: You define a landmark window as a window of infinite length containing the full stream. What does this definition give? Another name for a stream? In my opinion a window should by definition be finite and a window should never be processed before it is complete. If it can be processed before, then no window is necessary.

S2b: I do not understand the following sentence "The graph-level entailment can be viewed as a direct application of SPARQL entailment regimes, since the inference process is taken into account in the context of the evaluation of graph patterns over graphs."

S2c: What do you mean by "applying the inference process context of the fixed windows"? I guess you mean it is computed based on the content of a window, but it is hard to understand.

S2d: In 2.2 it sounds like the forgetting that is a natural part of windows is a problem. I would argue that the whole point of a window is to allow a user to define in what temporal context a statement should be evaluated. The semantics of the query is directly dependent on the window definition. If you want to reason over a longer horizon you either have to create a larger window or use a window-less method, like the logic-based stream reasoning approach [46].

S2e: The works on logic-based stream reasoning presented in Section 3.2.1 are also examples of stream level entailment. The work in [46] for example only process each stream element once and there is no need to store any previous elements as the necessary history is implicitly encoded in the formula.

Section 3
S3a: Regarding 3.1.1, there is work on spatio-temporal stream reasoning for example:
Fredrik Heintz and Daniel de Leng. Spatio-Temporal Stream Reasoning with Incomplete Spatial Information. Proc. ECAI 2014.
Daniel de Leng and Fredrik Heintz. Qualitative Spatio-Temporal Stream Reasoning With Unobservable Intertemporal Spatial Relations Using Landmarks. Proc. AAAI 2016.

S3b: I don't understand what separates "analytics-aware stream reasoning" from other forms of stream reasoning. It seems that the main thing is that they are based on a cloud infrastructure. The type of reasoning could be provided in other stream reasoning approaches as well. That said, analytics is definitely one of the main uses of stream reasoning (therefore most approach ought to be "analytics-aware").

S3c: Is there a difference between an API and a query language? Both are ways of interacting with a system, one using a formal language the other function calls (and I guess in most cases sending a query is also a function call).

S3d: Why do you consider all CEP-based stream reasoners to care about ontology axioms? Most CEP-engines don't. This is another example where the semantic web perspective is clear (which is fine, as long as it is made clear that the focus is on semantic web type reasoning, which it currently isn't).

S3e: There is no "ontology inference process" in the reference architecture. Why is one needed in all stream reasoners?

S3f: How can the separation "safeguard" the complexity? Do you get a different total complexity if you split up the problem in two parts, or are you actually solving a different problem?

S3g: How can a fact still be relevant if it is outside the window? By the semantics of a window only the information inside it matters. If the user had a different intention she should write a different query. The semantics of windows is well-defined, so nothing outside can be relevant (unless the semantics is incorrect).

S3h: How can the semantics of forgetting be more precise? A window is a precise definition of what information is relevant. Yes, there are queries which can not be expressed with windows, so there is definitely a need for other constructions to capture these. This would extend the formalism which would allow more expressive queries to be asked, but I'm not sure I consider that "improving the quality of the engine answers" since the initial answer was perfectly correct given its semantics.

S3i: In 3.2.1, why is OWL necessary?

S3j: In 3.2.2, what do you consider the model to be in stream reasoning? It is not clear to me. If you have a query for selecting some elements from a stream, what is the model?

S3k: In 3.3, yes this has been done. There are some papers on semantic streams such as FUSION'13, IROS'13, SCAI'15...
Daniel de Leng and Fredrik Heintz. Ontology-Based Introspection in Support of Stream Reasoning. Proc. SCAI 2015.
Daniel de Leng and Fredrik Heintz. Towards On-Demand Semantic Event Processing for Stream Reasoning. Proc. FUSION 2014.
Fredrik Heintz. Semantically Grounded Stream Reasoning Integrated with ROS. Proc. IROS 2013.
Fredrik Heintz and Daniel de Leng. Semantic Information Integration with Transformations for Stream Reasoning. FUSION 2013.

S3l: In 3.4, you state that stream reasoning will not reach the performance of stream processing, what is the different in your opinion? There is no definition of either, so it is hard to argue for this claim.

S3m: Last sentence in 3.4.1, of course you can do things faster if you have an incorrect solution. Just return nothing for every query, it can't be faster (and probably not much more incorrect either). A reasoner without correctness guarantee is not a reasoner, it is poor hack.

S3n: In 3.4, there is plenty of work on how to guarantee the resource usage of stream reasoning in the runtime verification literature. The main concept is trace-length independence. If a reasoner is trace-length independent then its growth is bounded even on infinite traces/streams. A new concept, event-rate independence, has recently been suggested for reasoners that is neither dependent on the trace length nor the rate of events (number of events per time-unit). See for example Online Monitoring of Metric Temporal Logic by Hsi-Ming Ho, Joël Ouaknine and James Worrell from Runtime Verification (RV) 2014.

S3o: In 3.4, windows should not be used to control resource usage but to define the semantics of a query.

S3p: In 3.5.1, semantic matching as described in the references in S3k is another approach.

S3q: In 3.5.1, work on sychronization of streams have been done as described in Chapter 7.8 in Fredrik Heintz. DyKnow - A Stream-Based Knowledge Processing Middleware Framework. PhD Thesis 2009.

S3r: In 3.5.1, how does RSEP-QL relate to the reference model proposed in this paper?

S3s: In 3.5.2, there is plenty of work done in the areas of reasoning about action and change and belief revision that could be used as a basis for further research.

S3t: In 3.5.2, there is also plenty of research on reasoning with temporal intervals, Allen's Interval Algebra is probably the best known work, that could be used for further research.

The language of the paper can be improved. A detailed list of comments is available as a PDF document.

Review #2 submitted on 25/Mar/2017

By Ruben Verborgh ORCID logo

https://orcid.org/0000-0002-8596-222X

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Excellent
Presentation: Good
Reviewer`s confidence: Medium
Significance: High significance
Background: Comprehensive
Novelty: Limited novelty
Data availability: With exceptions that are admissible according to the data availability guidelines, all used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

In this article, the authors present an extensive survey of the state of the art of stream reasoning, followed by an analysis of which topics have been covered so far and which topics need more attention. The article appears to be very complete and the analysis is well done and carried out in depth. The introduced model is a clear and simple one to structure the discussion.

Reasons to accept:

– comprehensive survey
– comprehensive analysis of the domain, with solved and open questions
– technically clear
– well written

Reasons to reject:

– not clearly a vision paper, lacks visionary statements; rather a survey
– insufficient discussion about Big Data, deep learning, benchmarking/evaluation, pull/push

Nanopublication comments:

Further comments:

This article appears less to me as a “vision” article, but more as a survey and a critical assessment of the domain’s current status. The vision aspect is a rather careful (but well founded) discussion, which I certainly would call useful but not visionary—there are no remotely wild novel ideas in there. Given the broadness of the discussed work, I think it makes sense to reclassify this as a survey paper, which contains a perhaps atypically long “open questions” section, but this it not unexpected given the relatively young age of stream reasoning. Then again, perhaps the importance for the Data Science journal of such labels is perhaps minimal, so maybe it is not that big of an issue. Nonetheless, I would suggest to rethink the title and make it reflect the scope of the article better.

Overall, the article certainly has sufficient quality to be accepted; however, I suggest some minor points of revision below.

GENERAL
– The connection between reasoning and Big Data analytics, and reasoning and deep learning, is only very lightly touched upon (mostly in 3.2.2). Given the importance of these technologies, it seems necessary to compare more intensively. Especially if this is considered to be a vision for the future, the interaction of these domains seems like a subject too big to ignore.

– For a survey paper, a major missing area is evaluation/benchmarking. The point is touched upon in the conclusion, but the authors never elaborate on it.

ABSTRACT
– phrasing: "the next future"

INTRODUCTION
– “In this article, we introduce a reference model for stream reasoning, which we use to summarize and organize the results the research has obtained so far.” => Looks like evidence to call this a survey rather than a vision.

– The introduction starts with motivating use cases, but it is not clear to what extent these scenarios a) have been solved b) have been solved with stream reasoning c) will be solved in the near future d) will be solved with stream reasoning e) need to be tackled with stream reasoning as opposed to other techniques (especially machine learning, for instance regarding traffic jams).

– What is the source of the 9 R’s?
volume/velocity/variety are traditional Big Data V’s, for instance.

– What is the relation/difference between R2 and R6? Coping with velocity seems to be directly related to providing answers in a timely fashion, i.e., any system can handle high velocity data if we slow down that data—but the answers will arrive too late. This difference would need to be made explicit; also note that R2 and R6 appear to be coupled in Table 1.

– The label for R9, “understand what users want” inaccurately vague. It is not about “understanding” or interpretation of users’ intent at all; instead, it is about the flexibility to define certain query types. It is necessary to change this label into something more accurate.

– I found the comparison paragraph below the requirements rather hard to follow. Table 1 does a better job on the comparison; it would perhaps be clearer to start out with the table and discuss the different types of systems in different paragraphs.

– In Table 1: by only indicating the “true” values (V) and hiding the “false” values (F), the table presents the same information in a much more scanable way. Also; try to make the columns narrower so the marks are closer, facilitating visual comparison.

– In Table 1, consider renaming SemWeb into OBDA to align with the caption and the text.

– typo: “a oil-rig”
– typo: “These system where”

A REFERENCE MODEL FOR STREAM REASONING
– “When items are represented through RDF (e.g. an RDF graph)” => the notion of a “graph” seems strange here. What is the graph? The entire stream, or a single element? In the former case, I would say that the RDF definition of “graph” is not aligned with the notion of a stream; in the latter case, to what extent would it be meaningful to do so?

– The two last paragraphs of 2.1.1 are hard to follow; perhaps a comparison table might make sense.

– typo: "a temporal criteria" => "a temporal criterion"

OPEN PROBLEMS AND RESEARCH QUESTIONS
– 3.2.1: Why is it important to investigate other approaches? What is missing fro OWL 2 DL that other languages can/should bring to the table?

– 3.5.1 push/pull is a discussion that, in my opinion, needs its own place at an earlier point in the article. They are now only casually mentioned, but have a major impact.

– The statement that “stream processing has always coped with noise” seemingly contradicts Table 1.

CONCLUSION
– This section, unfortunately, disappoints, especially for a vision paper. What does the analysis of the article mean to the audience? The third paragraph summarizes a couple of trends and predictions, but does not really point to concrete action points or emerging fields that require a close follow-up. I expect more advice and inspiration, and perhaps even criticism or suggestions on a meta-level. For instance, the authors write that “it is necessary to develop benchmarking and evaluation activities” (a topic, by the way, that is not covered elsewhere), without pointing at a direction in which authors should think/look to do so.

– Similarly, the last paragraph about “real problems and scenarios” is very limited. Where will stream reasoning make a real difference according to the authors? This is a very important question with a visionary aspect that is absent from the conclusion.

Review #3 submitted on 26/Mar/2017

By Paul Fodor ORCID logo

https://orcid.org/0000-0002-2978-676X

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Excellent
Suggested Decision: Accept
Technical Quality of the paper: Excellent
Presentation: Excellent
Reviewer`s confidence: High
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This paper surveys the systems and techniques available in the field of stream reasoning and complex event processing. This field has applications in several settings that require high dynamic data: Smart Cities, Industry 4.0, Internet of Things, social media analytics and stock market. Although a new field (this year marks stream reasoning turn 10) it is a very prolific field for all of these years. Best part of this article is identifying the main issues in stream reasoning and starting the process of identifying open issues in the field.

Reasons to accept:

It is the first paper that surveys this prolific field. Although there is a book that surveyed a few of the systems discussed in this paper before (Event Processing in Action, by Opher Etzion and Peter Niblett), this paper covers the subsequent research since 2010 and also identifies open challenges.

Reasons to reject:

As a practitioner in this area, I would have liked some simple use cases that cover the many applications of stream processing, like: traffic analysis of transit information, weather updates or stock market. However, I think that was not the goal of this paper that had a lot of information about the different stream systems reasoning systems.

Nanopublication comments:

Further comments: