Reviewer has chosen not to be Anonymous
Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Unable to judge
Presentation: Average
Reviewer`s confidence: Low
Significance: Moderate significance
Background: Reasonable
Novelty: Unable to judge
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right
Summary of paper in a few sentences:
The paper motivates and introduces the concept of timed sequential patterns. It then reports on the development of two algorithms and implementations, a plain one and one optimized for multi-core processing. Finally, it presents evaluation results in term of execution time and effect of different parameters.
Reasons to accept:
- Interesting and relevant topic
- Overall model and approach seem sound and valuable
- Good comparison of two versions of the algorithm/implementation
Reasons to reject:
- The novelty could be better established
- The formal definitions and arguments are quite hard to follow (at least to me) due to sometimes unconventional (at least to me) notation and omissions
- No "sanity check" with other existing implementations of plain (non-timed) sequences, to see whether outputs are consistent and execution times are reasonable
Nanopublication comments:
Further comments:
- The table in Figure 1 could be presented better: unclear why elements are shown in this order; use of color could make it easier to understand; as an introductory illustration, this could possibly also be made more into an actual figure and less like a strict table
- In the introduction about patterns being "extended": It wasn't clear to me why "extending" a pattern isn't treated like creating a new pattern (so then obviously the result has to be calculated again). Are such "extension" just things that happen frequently and therefore need specific support? Or did I misunderstand what these "extensions" are? This should clarified either way.
- In the end of the introduction: "The time can be any descriptive statistic based on the user's preference, such as range, average, etc." could have been better introduced in the example and motivations before.
- "The idea of incorporating transition time ...": I'd use a word that's a bit stronger than "idea", e.g. "concept" or "model".
- It took me a bit to understand the first paragraph of Section 2. I was confused by the notation <{a1},{a2},...>, where {a1} etc seem to be sets with a single element. But I think they should just be sequences of sets, so where a1, a2, etc are sets? I believe that the curly brackets are in this case superfluous and confusing. Also, it's not explicitly specified in the text that a1 etc are itemsets.
- "A timed event is a pair e = (I, t), where I am an item set that ...": I suppose it should be "where I *is*"
- "ei.x subsetof I (1 ≤ i ≤ n)": here it's unclear what x and I stand for.
- Definition of "delta": what do p and j stand for here?
- {a, 10} seem to be a tuple (not a set). I might not be familiar with the conventions of this particular community, but to me using curly brackets normally indicate a set, and round brackets are more usual for tuples, so (a, 10). But again, there might be different conventions in this community.
- I cannot really judge the novelty, as I am not an expert in the field. Based on the Related Work section, it seems novel, but it would be good to see in a bit more detail (at some point in the paper, maybe based on examples) how the presented approach differs from some other recent approaches.
- "The drawback of these methods is ..." in Related Work: I was wondering what happens if these methods are used with an arbitrarily high ("infinite") time interval. Does it break or would it still work? If the latter, how does it then compare to your method?
- "A node can have multiple parent nodes": this confused me. Does it mean "multiple direct parents"? Then it's not a tree! Or "multiple indirect parents going up the tree hierarchy"? The latter is kind of obvious, and I would remove this part to avoid confusion.
- "<{a,2},<{a,b,19},{d,25}>" this seems mal-formed.
- I got lost following the algorithm/method somewhere around Figure 8. It wasn't clear to me here what example exactly we are seeing here. Where do all these numbers come from? Are the upper ones (<{a}> and <{b}>) just given as an example, and they then produce the bottom onces? Is TS3 in <{a}> the same as TS3 in <{b}>? If so, why are the trees different? If not, why are they called the same?
- "For air temperature ...": This paragraph (and some other parts) in Section 5 are not very pleasant to read. It would be better to put this into some kind of table or figure.
- "Competing Algorithms": I would have liked to see some rough comparison to other algorithms. For example, I suppose the timed sequences can be used to "simulate" plain sequences, so you could create some such cases, which would allow you to test whether our algorithms get you the same results as the established ones do for the plain sequences (as a kind of sanity check). Measuring the execution time in this scenario would also be an interesting thing to do. It's perfectly fine if your algorithm would be slower in this scenario, but it would be interesting to see how much slower (like doubled execution time or 100x execution time or ...?).
4 Comments
Review the paper and comment.
Submitted by Malik Jawarneh on
Positive:
• The proposed Minits-AllOcc and MMinits-AllOcc algorithms are capable of mining sequential patterns and the transition time between itemsets based on all occurrences of a pattern in the database.
• The proposed parallel multi-core CPU version of this algorithm, MMinits-AllOcc, is able to efficiently deal with Big Data.
• Extensive experiments on real and synthetic datasets show the advantages of this approach over the brute-force method, with the multi-core CPU version of the algorithm shown to outperform the single-core version on Big Data by 2.5X.
Negative:
• Some parts of the proposed algorithm lack clarity and could be better explained.
• The proposed algorithm could be further improved by incorporating more advanced techniques to optimize the computation time.
Review comment
Submitted by Malik Jawarneh on
10. These are Title related to your area, you may use.
Meta-Review by Editor
Submitted by Tobias Kuhn on
Your manuscript has been reviewed by two reviewers. Both reviewers found the paper to be interesting and potentially signficant. However, the reviewers also raised concerns about the novelty of the paper, with both noting that it was hard to evaluate the novelty as this wasn't clearly enough described in the manuscript. I consider this to be the major concern that your revision should address. In particular you should clearly compare the algorithm proposed in this manuscript to other existing algorithms, and give some idea of their relative performance, as well as delineating precisely where the new algorithm differs from existing approaches. As noted by reviewer 2, it does not necessarily matter if the computational performance of the new algorithm is slower than other methods, but the manuscript should give some idea of the relative performance in this domain.
Please also note that both reviewers highlighted that 'Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this'. Making all data FAIR and openly available will be a condition of eventual acceptance.
The reviewers also provided detailed comments on other aspects of the manuscript, and I invite you to consider these carefully in formulating your response and revision of the manuscript.
Richard Mann (https://orcid.org/0000-0003-0701-1274)
Withdrawn by the authors
Submitted by Tobias Kuhn on
This submission was withdrawn upon request by the authors. Thereby it is now marked Rejected instead of Undecided.