Modelling and Predicting User Engagement in Mobile Applications

Tracking #: 583-1563

Authors:

	Name	ORCID
	Eduardo Barbaro	https://orcid.org/0000-0001-7878-2514
	Eoin Martino Grua	https://orcid.org/0000-0002-5471-4338
	Ivano Malavolta	https://orcid.org/0000-0001-5773-8346
	Mirjana Stercevic	https://orcid.org/0000-0002-9847-2959
	Esther Weusthof	https://orcid.org/0000-0002-1940-0800
	Jeroen van den Hoven	https://orcid.org/0000-0002-4529-137X

Responsible editor:

Jodi Schneider

Submission Type:

Research Paper

Abstract:

The mobile ecosystem is dramatically growing towards an unprecedented scale, with an extremely crowded market and fierce competition among app developers. Today, keeping users engaged with a mobile app is key for its success since users can remain active consumers of services and/or producers of new contents. However, users may abandon a mobile app at any time due to various reasons, e.g., the success of competing apps, decrease of interest in the provided services, etc. In this context, predicting when a user may get disengaged from an app is an invaluable resource for developers, creating the opportunity to apply intervention strategies aiming at recovering from disengagement (e.g., sending push notifications with new contents). In this study, we aim at providing evidence that predicting when mobile app users get disengaged is possible with a good level of accuracy. Specifically, we propose, apply, and evaluate a framework to model and predict User Engagement (UE) in mobile applications via different numerical models. The proposed framework is composed of an optimized agglomerative hierarchical clustering model coupled to (i) a Cox proportional hazards, (ii) a negative binomial, (iii) a random forest, and (iv) a boosted-tree model. The proposed framework is empirically validated by means of a year-long observational dataset collected from a real deployment of a waste recycling app. Our results show that in this context the optimized clustering model classifies users adequately and improves UE predictability for all numerical models. Also, the highest levels of prediction accuracy and robustness are obtained by applying either the random forest classifier or the boosted-tree algorithm.

Manuscript:

ds-paper-583.pdf

Revised Version:

Modelling and Predicting User Engagement in Mobile Applications

Data repository URLs:

http://s2group.cs.vu.nl/files/DataScienceReplicationPackage.zip

Date of Submission:

Saturday, June 8, 2019

Date of Decision:

Wednesday, July 24, 2019

Nanopublication URLs:

Decision:

Undecided

Solicited Reviews:

Review #1 submitted on 27/Jun/2019

By Viktoria Spaiser ORCID logo

https://orcid.org/0000-0002-5892-245X

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision: Undecided
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: Medium
Significance: Moderate significance
Background: Reasonable
Novelty: Unable to judge
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper tests four models (Cox Proportional Hazard Model, Negative Binomial Model, Random Forest Model, XRBoost Model) to predict mobile phone applications user engagement. The goal is to provide evidence that it is possible to predict user engagement fairly accurately, though the wider goal is to facilitate intervention to re-engage users or increase user engagement. The models are tested using data from user engagement with a waste recycling app. The paper shows that random forest and XGBoost models are best suited to make accurate predictions in terms of identifying engaged and disengaged users. The negative binomial model is the least accurate model in terms of predicting the number of user activities before disengagement.

Reasons to accept:

The paper provides evidence that user engagement can be predicted, albeit I would like to have a more explicit discussion to what extent this evidence has been missing so far, given this seems to be the main novelty of the paper.
The paper deploys and compares four different models to make predictions on user engagement, each useful in its own right and it is indeed very insightful to see how these models can be used with these type of data. The authors do admit that the comparison across these four models is not entirely justified, given their different natures, however, this need further acknowledgement in the discussion section. In fact, only RF and XRBoost models are really reasonably comparable, since both aim at the classifying users into engaged/disengaged. The other two models have very different predictive objectives and setups and hence comparing them to each other and to the two classification models is highly problematic in my eyes.
The authors show the value of clustering users prior to the modelling to obtain better prediction results. However, I would like to know what variables were used for clustering and what the clusters actually mean.

Reasons to reject:

I find the framing of the paper problematic in term of the wider goal of this research. Why should the users be prompted to use an app, that they probably consider (temporary) irrelevant, why should we try to make people spend even more time sticking to their phones? I find these explicitly stated goals highly problematic. It is quite striking that when the authors list all kind of reasons why people choose to disengage with an app (p.2|), the most obvious reason, that the app has lost its relevance (at least temporary), is not even listed. Also, I see how this investigation is of interest to the industry, but, this is supposed to be an academic paper and I would like to see how this is relevant for science. The authors write on p.2 "...provide a framework for modeling and predicting UE, which can be further extended or used in other scientific studies", this needs to be significantly expanded.
I am not convinced the four models should be compared at all (see comments above). I think the authors should rather treat the models in their own rights, given they serve different modelling/prediction purposes.
The clustering needs further explanation and interpretation (see comments above).

Nanopublication comments:

Further comments:

Please use gender-neutral (possessive) pronouns, e.g. on page 2 instead of "...better suited for his own mobile app", write "...better suited for their own mobile app" or on page 5 instead of "(2) the time of her last event within the app" (which by the way sounds a bit awkward anyway), write "(2) the time of their last event within the app" (as noted, you may want to rephrase the entire statement).
Figure 6, what you claim to be blue (predicted events) appears as black (at least on my screen).

Review #2 submitted on 28/Jun/2019

By Lars Lischke ORCID logo

https://orcid.org/0000-0002-9650-4945

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Weak
Suggested Decision: Undecided
Technical Quality of the paper: Weak
Presentation: Weak
Reviewer`s confidence: Medium
Significance: High significance
Background: Incomplete or inappropriate
Novelty: Limited novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

This submission presents a framework to model and to predict user engagement with mobile applications. The framework is evaluated by using a data set of app usage of one particular app focusing on waste recycling. With the successful evaluation, the authors aim to provide evidence that it is possible to predict when users of a mobile application will get disengaged.

Reasons to accept:

The focused topic of modeling and predicting user engagement for mobile applications is timely and relevant for the research communities in Data Science and Human-Computer Interaction. Overall, the presented approach seems to be novel and well-suited. Furthermore, also the results of the evaluation are promising.

Reasons to reject:

I have strong doubts regarding the used data set and features. The used waste recycling app is described only briefly. The authors do not argue, why this is a common mobile application. I would recommend discussing this with consideration of the results presented by Müller et al. [1]. I would question that it is common for mobile applications that gamification aspects (here earning points) are directly connected to providing monetary benefits. Here, it is particularly interesting that the granted points can only be used at local shops. Thereby user’s location becomes an obvious feature for disengagement. Additionally, using the zip code and the geolocation provides only redundant information. In general, the list of features and calculated variables is fuzzy. The authors claim to use 7 features but present only 6 in a list. Also, it reminds unclear how they combined the features to the 122 variables.

The authors do not describe if the application triggered any notifications. However, Sahami Shirazi et al. describe notifications as an essential element for engaging with mobile applications [2]. Hence, I wonder why the authors did not use the number of notifications or the reaction to notifications as a feature. To be able to understand user engagement or disengagement with the waste recycling app, it would be helpful, if the authors would also publish the application or provide at least a reference to the application.

While the authors motivate their work in the introduction very general, also looking on specific application domains such as health (reference [9] in the submission), the authors discuss the limitation of the used data set only briefly at the end of the paper.

[1] Hendrik Müller, Jennifer Gove, and John Webb. 2012. Understanding tablet use: a multi-method exploration. In Proceedings of the 14th international conference on Human-computer interaction with mobile devices and services (MobileHCI '12). ACM, New York, NY, USA, 1-10. DOI: https://doi.org/10.1145/2371574.2371576

[2] Alireza Sahami Shirazi, Niels Henze, Tilman Dingler, Martin Pielot, Dominik Weber, and Albrecht Schmidt. 2014. Large-scale assessment of mobile notifications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '14). ACM, New York, NY, USA, 3055-3064. DOI=http://dx.doi.org/10.1145/2556288.2557189

Nanopublication comments:

Further comments:

As described the used features and the data set look more specific than general to me. Hence, the submission would be more substantial if the authors would make less general claims and focus particularly on comparable mobile applications. Furthermore, publishing not only the data set but also the application or providing a reference to the application would improve the validity.

2 Comments

Undecided

Submitted by Bin Liu on Sun, 07/21/2019 - 10:54

This paper studies the user engagement in mobile apps. This is an interesting problem with important practical implications. The paper investigates the predictability of when mobile app users get disengaged with apps and shows that it can achieves the engagement prediction with a good level of accuracy. It applies different prediction models and also show clustering further facilitate the prediction. The paper is interesting, but there are some limits.

First, it only shows the predictability while doesnot show much details about the prediction itself, namely, what features would lead to the prediction. As a result, the practical implication would be limited. Also the features used in the perdition might not provide useful indication of engagement management without further investigation, such as significance test, etc.

Second, the prediction just applies some standard models without much technical novelties (also given the practical implication can be limited given the current status of the paper (ie, lacking of details about the prediction model); it would be helpful also give more details of technical barrier of the problem and solutions.

Third, more features would be helpful, in particular some features can explain use engagement such as version updates, similar apps in the market. App usage feature might just related to what to predict in this paper.

Meta-Review by Editor

Submitted by Tobias Kuhn on Wed, 07/24/2019 - 02:45

We encourage you to revise the manuscript, based on the 2 reviews and 1 comment made.

Jodi Schneider (http://orcid.org/0000-0002-5098-5667)