Reinforcement learning for personalization: a systematic literature review

Tracking #: 621-1601

Authors:

	Name	ORCID
	Floris den Hengst	https://orcid.org/0000-0002-2092-9904
	Eoin Martino Grua	https://orcid.org/0000-0002-5471-4338
	Ali el Hassouni	https://orcid.org/0000-0003-0919-8861
	Mark Hoogendoorn	https://orcid.org/0000-0003-3356-3574

Responsible editor:

Izabela Moise

Submission Type:

Survey Paper

Abstract:

The major application areas of reinforcement learning (RL) have traditionally been game playing and continuous control. In recent years, however, RL has been increasingly applied in systems that interact with humans. RL can personalize digital systems to make them more relevant to individual users. Challenges in personalization settings may be different from challenges found in traditional application areas of RL. An overview of work that uses RL for personalization, however, is lacking. In this work, we introduce a framework of personalization settings and use it in a systematic literature review. Besides setting, we review solutions and evaluation strategies. Results show that RL has been increasingly applied to personalization problems and realistic evaluations have become more prevalent. RL has become sufficiently robust to apply in contexts that involve humans and the field as a whole is growing. However, it seems not to be maturing: the ratios of studies that include a comparison or a realistic evaluation are not showing upward trends and the vast majority of algorithms are used only once. This review can be used to find related work across domains, provides insights into the state of the field and identifies opportunities for future work.

Manuscript:

ds-paper-621.pdf

Data repository URLs:

https://doi.org/10.5281/zenodo.3627118

Date of Submission:

Sunday, February 2, 2020

Date of Decision:

Tuesday, March 3, 2020

Nanopublication URLs:

Decision:

Solicited Reviews:

Review #1 submitted on 17/Feb/2020

By Shihan Wang ORCID logo

https://orcid.org/0000-0001-5971-7522

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: High
Significance: Moderate significance
Background: Comprehensive
Novelty: Clear novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This is a review paper. It provides an overview of reinforcement learning (RL) application for personalization across different application domains.
The authors first introduce the RL briefly and how to use RL in the personalization tasks. Then, they report a systematic literature review (SLR) following PRISMA, based on a classification framework of personalization settings. In the SLR, papers are summarized from setting, solution and evaluation perspectives.

Reasons to accept:

Overall, this is a well-written paper. It concentrates on an important research topic: how to use reinforcement learning for personalization applications. Following the challenges addressed by the authors, such a systematic literature review makes a contribution in this field.

Strengths:
- The paper is in the scope of journal DS.
- The systematic literature review is well reported following PRISM, and authors especially open all the SLR data online. It makes this review work reliable.
- Its review results are structured in a clear manner and analyzed statistically. For instance, table 2 provides a good guidance about summarizing the papers.
- Based on review results, authors addressed few good points about the remaining challenges (like lack of comparison with RL approaches over time and reuse of RL approaches). Those discussions provide excellent opportunities for future research.

Reasons to reject:

Please see further comments for things need to be improved.

Nanopublication comments:

Further comments:

Meanwhile, several things need to be improved:

One main problem is in section 2. Authors introduce the framework of MDP, and claims that 'RL considers problems in the framework of MDPs'. However, they did not mention another important family of RL problems: multi-arm bandit problems. Actually, papers using bandit algorithms were considered in this review (see Table 4, like contextual bandits, UCB, etc. and the sentence in section 4.1) This inconsistency makes the audience confused. Therefore, the definition of bandit problems should be added, or a claim about 'MDP is only one mathematical framework to address RL personalization tasks' shall be clarified.

Other minor issues:
- In section 4.2, give more information about why the used databases were chosen.
- In Figure 2, why add 'Additional records identified through other sources (N = 0)', if there was no paper actually found? If add, shall give more information about how additional records are queried.
- The format of figures need to be improved. For instance, misaligned labels in Figure 3, 5, 6, 8.
- What is the meaning of '.' in Table 6. Like for '57 (.548)', which is total number, which is ratio? Very confusing.
- Lack of full name or citations when mention algorithms at the first time (for instance DQN, UCB, CLUB in section 5.2 and Table 4).

Review #2 submitted on 18/Feb/2020

By Alessandro Rigazzi ORCID logo

https://orcid.org/0000-0003-2132-7726

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Accept
Technical Quality of the paper: Good
Presentation: Weak
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper constitutes a comprehensive literature review on the topic of Reinforcement Learning used for Personalization, which has been gaining momentum over the last years. The subject is treated in a rigorous manner, with a concise introduction to RL algorithms and personalization. A valuable collection of insights on several statistics about current publications on the topic is offered. Results show how this field is still far from being mature, and seems to exhibit the lack of a strongly connected community.

Reasons to accept:

The work is well written, and succeeds in reviewing state-of-the-art literature in a critical manner. All results are clearly exposed, and the methods used to achieve them are reasonably reproducible.

Reasons to reject:

None.

Nanopublication comments:

Further comments:

On page 4, line 4, the symbol pi* seems to be a typo as it should probably by \pi*
Section title "systematic literature review" is not capitalized.
I would warmly encourage the authors to re-think of most of the plots they included in the paper, which are often cluttered and/or hard to read. In particular:
Figure 4 is very hard to read in black and white (actually, even in colors it is not straight-forward).
Figures 5, 6, 8b, and 9b: x axis categories are very difficult to read. Possibly text orientation is not optimal.

Review #3 submitted on 21/Feb/2020

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak
Suggested Decision: Undecided
Technical Quality of the paper: Bad
Presentation: Good
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

The paper introduces the problem of Reinforcement Learning, provides a categorization of settings in which it can be (or has been) applied, and then provides a bibliographic overview of the related literature. The latter seems to be the main contribution of the paper. The bibliographic overview involves statistics such as how many algorithms are mentioned in the different papers, whether certain keywords (e.g., 'safety' or 'privacy') are mentioned in them, and various other aspects of the studied settings (e.g., whether the papers used data from user studies).

My main comment is that, while I found the content of the survey well-written, as a reader I would expect to obtain more in formation about the methods involved in the surveyed paper -- and in the current manuscript there is very little technical information.

My overall recommendation would be for the authors to expand the technical content of the paper.
One way to do this would be to :
(1) provide a more detailed description of the algorithms/methods mentioned in the paper (e.g., the ones mentioned in Table 4 or the methods that are cited but not discussed in Section 3), and:
(2) explain how they fall within the framework of approaches outlined in Section 2.

Reasons to accept:

* The paper provides a good introduction to RL.
* The paper gives a detailed account of the Systematic Literature Review process (Sections 4,5), to explain how papers the surveyed papers were collected.
* In terms of language and structure, the paper is written well.

Reasons to reject:

* The paper does not contain substantial technical content on the methods presented in the surveyed papers. It does provide a bibliographic exploration of the methods, but readers would be interested in more detailson methods from a survey article.

Nanopublication comments:

Further comments:

C1. Notation: in the reviewed manuscript, indices appear as upper-font and could be confused with exponents (e.g., r^{t+1}).

C2. The three categories of approaches described in Section 2 would be easier to understand via a running example. Specific examples of settings would also make it easier to understand the categorization of personalization settings in Section 3. Currently, the discussion is too abstract.

C3. In Section 2, there is mention of "users" and their "experiences", but such terms would fit better in a particular example. For the abstract description of the approaches, it would be better to make references to the terms that are already introduce to describe the setting -- e.g., 'agent', 'reward', 'environment'.

C4. The title of Section 4 should start with a capital letter.

C5. The quality of the figures should be improved. For example, in Fig. 4, the legend appears at an awkward location, place it at the top, middle, or bottom of the right side of the figure. Also, in Fig.5 the title of the plot overlaps with the plot.

Review #4 submitted on 01/Mar/2020

By Valerio Grossi ORCID logo

https://orcid.org/0000-0002-8735-5394

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision: Accept
Technical Quality of the paper: Average
Presentation: Average
Reviewer`s confidence: Low
Significance: Moderate significance
Background: Reasonable
Novelty: Lack of novelty
Data availability: With exceptions that are admissible according to the data availability guidelines, all used and produced data are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This paper provides an overview of reinforcement learning applications for personalization across a variety of application domains. The aim is to aid researchers and practitioners in identifying related work relevant to a specific personalization setting. The work shows how reinforcement learning is used for personalization and identifies challenges across domains. The authors after a brief introduction of the context and main issue and tools, present a framework to classify personalization settings by. The purpose of this framework is to identify relevant related work across domains. This framework has been tested for performing a systematic literature review.

Reasons to accept:

The paper addresses the actual problem identify relevant related work across different domains. The authors presented a clear, concise exposition of the proposed framework, fully supported also by experimental results.

Reasons to reject:

I suggest to consolidate the Screening process phase, that at this stage does not seems to be clearly designed for both instruments and procedures.

Nanopublication comments:

Further comments:

please revise the title of section 4 - maybe a "A" is missing.

2 Comments

Meta-Review by Editor

Submitted by Tobias Kuhn on Tue, 03/03/2020 - 14:30

The paper provides an overview of reinforcement learning (RL) application for personalization across different application domains. After a brief summary of reinforcement learning in general and its usage in personalization tasks in particular, the literature review is constructed based two main components: a quantitative one and a qualitative one. The quantitative component mainly reports on statistics about the topic in the literature (methods, keywords, data mentioned in the related literature). The qualitative component is however less complete than the first one.
While the actual counts are a measure of how much attention the topic is receiving in the literature, such an overview would benefit from a more focused analysis of methods: a comparison of methods, summary of their advantages and disadvantag es and guidelines of their suitability for specific domains, a more technical machine learning approach of reviewing methods.
While this paper represents a nice effort into providing a well-written and comprehensive review in a field that is still new, I encourage the authors to expand the review by adding a more technical analysis of machine learning methods for RL in personalization. Therefore, the authors need to address this concern together with the remarks from the reviewers.

Izabela Moise (https://orcid.org/0000-0003-0370-6749)

Dear editors and reviewers,

Submitted by Floris den Hengst on Mon, 03/23/2020 - 19:45

Dear editors and reviewers,

Thank you for your useful comments and consideration to accept this manuscript. We have addressed your comments and uploaded a revised manuscript. We have included a response letter detailing the changes.

Kindest regards,

Floris den Hengst

Data Science

Reinforcement learning for personalization: a systematic literature review

Tracking #: 621-1601

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

2 Comments

Meta-Review by Editor

Dear editors and reviewers,