Computational Social Science to Gauge Online Extremism

Tracking #: 486-1466

Authors:

	Name	ORCID
	Emilio Ferrara	https://orcid.org/0000-0002-1942-2831

Responsible editor:

Tobias Kuhn

Submission Type:

Research Paper

Abstract:

Recent terrorist attacks carried out on behalf of ISIS on American and European soil by lone wolf attackers or sleeper cells remind us of the importance of understanding the dynamics of radicalization mediated by social media communication channels. In this paper, we shed light on the social media activity of a group of twenty-five thousand users whose association with ISIS online radical propaganda has been manually verified. By using a computational tool known as dynamical activity-connectivity maps, based on network and temporal activity patterns, we investigate the dynamics of social influence within ISIS supporters. We finally quantify the effectiveness of ISIS propaganda by determining the adoption of extremist content in the general population and draw a parallel between radical propaganda and epidemics spreading, highlighting that information broadcasters and influential ISIS supporters generate highly-infectious cascades of information contagion. Our findings will help generate effective countermeasures to combat the group and other forms of online extremism.

Manuscript:

ds-paper-486.pdf

Data repository URLs:

None

Date of Submission:

Monday, January 30, 2017

Date of Decision:

Friday, February 10, 2017

Nanopublication URLs:

Decision:

Reject

Solicited Reviews:

Review #1 submitted on 03/Feb/2017

By Floriana Gargiulo ORCID logo

https://orcid.org/0000-0001-9813-1815

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision:
Technical Quality of the paper: Good
Presentation: Excellent
Reviewer`s confidence: High
Significance: Moderate significance
Background: Comprehensive
Novelty: Limited novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: he length of this manuscript is about right

Summary of paper in a few sentences:

The paper presents a methodological framework to analyze the spreading of extremist contents on the web. In particular the focus is addressed on the case study of ISIS accounts on Twitter.

Reasons to accept:

The method for collecting data is based on a double level: an expert-based selection of the users to follow and a large data collection on the activity of the selected users. This methodology can be extremely useful in order to address the attention on the focal points where the extremist activity is centered.

The idea to use an indicator inspired by epidemics to assess the “opinion” contagion is interesting and can provide a good framework to analyze the propagation of radical behaviors on social networks.

Reasons to reject:

What I will list in the following are NOT reasons to reject but rather points on which some more attention should be dedicated in the text and in the analyses.

The fact that the analysis excludes the content (and the included hashtags) of the tweets do not allow to understand how the selected users are focused on the ISIS contents. Are these users twitting only ISIS propaganda? If not, the tweets, after retweeted by external users could be a priori based on “neutral” content. The association of more radical content with “viral” ones in order to improve visibility is a well know phenomenon in Twitter.
Idem: the followers/friends network of the selected users could be based on features external to ISIS propaganda (school mates, family, etc.)
Figure 6 should be put in relation with Figure 3: when the activity of a user is higher (Figure 3), it is much more probable that someone will adopt his content. Probably joining a timeline of the R0=RT/T could be useful to visualize an increase of the contagion.
How the histogram in Figure 8 is normalized?
A baseline comparison of ISIS accounts with a random sample of users could be useful to understand and detect anomalous behaviors. Is there a difference in the class partitioning respect to a random sample? Is the “contagion” process different from the baseline?
Are the used data available on some repository?

Nanopublication comments:

Further comments:

Review #2 submitted on 08/Feb/2017

By Karsten Donnay ORCID logo

https://orcid.org/0000-0002-9080-6539

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Good
Suggested Decision:
Technical Quality of the paper: Good
Presentation: Good
Reviewer`s confidence: High
Significance: High significance
Background: Comprehensive
Novelty: Clear novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper aims to shed light on online radicalisation dynamics by studying communication patterns of ISIS supporters on Twitter. The analysis relies on a corpus of human-coded pro-ISIS accounts studying their communication dynamics and their influence on the broader associated communication network. Characterising users based on their activity-connectivity relationship, the analysis reveals the differential influence of different classes of users in the sample studied. This may have implications for strategies designed to counter the spreading of radical content on social media.

Reasons to accept:

- Substantive Contribution:
The paper provides an in-depth analysis of online dynamics both within a pro-ISIS aggregate on Twitter as well as its impact on other users. The application of the activity-connectivity framework to this question and the substantive and differentiated findings derived from this are highly relevant. This confirms the substantial influence of only a core set of users and could lead to better strategies with which the spreading of radical content on social media might be countered.

- Novelty and Methodological Sophistication:
The use of human-coded data on pro-ISIS accounts is novel and contrasts with approaches that rely on content analysis to identify radical online communities. In addition, the classification of users based on their activity-connectivity relationship is both methodologically sophisticated and appropriate. Together with the descriptive analyses of the pro-ISIS accounts, the analysis provides a novel and comprehensive take on the study of the spreading of radical content through social media.

Reasons to reject:

As it stands there are a number of shortcomings that, I believe, can be addressed in a revised manuscript. Pending changes that carefully address these issues detailed under “Further Comments” below, I do not have any objections to the publication of the paper.

Nanopublication comments:

Further comments:

The issues raised here concern the framing of the study, the data used, the presentation of methods and results and the implications of this work. They are structured accordingly.

1) Framing:
- RQ 1:
The intention to identify a solid methodological framework is good and the paper demonstrates a set of best practices. However, simply applying such practices, in my view, is not actually a real research question. A research question would be to explore systematically whether the practices used here - human-coded accounts + retweets + mentions - provides a better coverage of such aggregates than alternative approaches. The author suggests that the alternatives are more limited but an explicit comparison is actually lacking.

In my opinion, RQ 1 should therefore be dropped completely. Best practices can and should be mentioned in the method description, in fact, they should even be described in much greater detail. It may go beyond the scope of this contribution but having a human-coded dataset on pro-ISIS accounts would actually allow for the systematic comparison to other, alternative approaches for identifying these aggregates.

- Comparison to related work:
I was very surprised to read only a short mention of the recent work by Johnson et al. in Science under “Related Work”. It is my understanding that the method employed there goes beyond the usual keyword-based identification criticised by the author. Instead, Johnson et al. appear to use a much more elaborate scheme that also involves verification by human coders. This would then be much closer to the kind of data used in the present study in terms of completeness of coverage of such aggregates. A more in-depth discussion would therefore certainly be warranted.

- Scope:
Given potential concerns regarding the completeness of coverage of pro-ISIS activity on Twitter in the sample studied (see also comment on data below), I would advise to more clearly emphasise that this study only pertains to this very specific aggregate of pro-ISIS users identified by the Lucky Troll Club. This does not negatively affect the relevance of this work but simply clarifies its scope.

2) Data
- Data availability concerns:
The first concern pertains to the Twitter data used. While the author uses OSoMe at Indiana University, he apparently had access to full tweets and meta information that goes beyond the simple summary statistics provided by the API. Barring similar access, researchers can therefore not actually readily reproduce his findings. In addition, no information was provided that would allow to exactly reconstitute the sample used. Information given in the text is too generic and I would suggest to either provide a much more detailed description (as supplementary material) or the code used to generate the dataset. Since Twitter data may not be shared, it is also customary to provide an index of Tweet IDs such that, given sufficient access, researchers can reconstitute the full sample used.

The second concern is in relation to the list of 25,538 accounts obtained from the Lucky Troll Club. Neither this list of accounts was provided nor is any information given on how this list could be obtained. This list forms the basis of the analysis and should be made available through one of the recommended data repositories. In addition, a more detailed description of how these accounts were identified (if available) is sorely lacking. It is simply not possible to judge the quality of the coding without this information.

- Data quality concerns:
The papers mentions a Guardian report on the number of suspended accounts that is larger than those coded by the Lucky Troll Club. While this number is given for a longer period than that considered in this study, it still points to questions regarding the quality and completeness of the data. Validation against the suspension list only verifies true positive identification of suspended accounts. But how many accounts did the Lucky Troll Club incorrectly flag (false positives)? And more importantly, how many did they miss (false negatives)? It may be that evaluating this is beyond the scope of this paper but a more in-depth discussion of these issues and the potential problems of bias that might arise would still be necessary.

- Dynamically changing sample properties:
From the description of the data it is not clear when the Lucky Troll Club coded these accounts. Was it a continuous coding or did they stop after some time? Give that these accounts were suspended by Twitter, the underlying population of accounts was constantly changing, something that should be acknowledged. In particular, if accounts were identified only in a given period and then analysed for a longer period until their suspension, this could lead to at least two conceptual issues. First, what if pro-ISIS activity simply shifted to other accounts not identified yet? In this case, we are getting a potentially very biased view on pro-ISIS Twitter activity. Second, what are the implications from comparing an account that was suspended at the beginning of the study to one that was suspended last? Are they fully comparable? A short discussion of the sample as such would help clarify if and why these issues arise or not.

- Sampling:
The Gardenhose API is still a sample, why then not use the Search API (as mentioned elsewhere in the paper) to recover the full set of tweets send by the 25,538 accounts? Without access to the Powertrack API it is infeasible to get an even larger and more complete sample of retweets and mentions but for the initial “seed” of tweets from the pro-ISIS aggregate there appears to be no immediate reason to limit data to those in the Gardenhose. Rate limits do apply for the Search API but ~25,000 accounts should not be impossible to query.

In addition, the manuscript only refers to mentions and retweets as means of expanding the sample beyond the original corpus of tweets from the 25,538 accounts. Is there any reason why the tweets of followers of these accounts were not also included? Is it a problem of data access?

3) Presentation of methods and results
- activity-connectivity map:
The intuition for this methodology, in my view, is not stated clearly enough up-front. Its purpose only becomes clearer throughout the subsequent discussion. It would help to add a few sentences at the beginning of the respective section that give a clear motivation for using it and a simple intuition of what it represents. This would make the section more accessible.

- followee distributional signature (Figure 2):
The interpretation of the upward trend in the followee distribution is not fully clear. Parallels to similar results are drawn but no clear intuition is given of what this plateau in the pdf means.

- identifying the most influential users:
In Figure 2, is there a way to systematically delineate the tweeting behaviour of the most active and influential users from the “rest”. Is there, for example, a statistical cut-off in the scaling of the pdf? Visually, it appears that while the distribution is heavy-tailed it does not necessarily follow a power law throughout.

- Common users:
Why are there so few common users in Figure 7 and 8? Is it because of the filtering described in section 3.3? This point should be clarified.

- Analogy to disease spreading:
The analogy to disease spreading and the calculation of R_0 is intuitive at a superficial level. But are the assumptions underlying the corresponding models of disease spreading also true on Twitter? Or are there no such limiting assumptions? As is, the discussion is very cursory and would benefit from a more in-depth description. In particular, it should be made clearer why this is a valid framework for studying contagion on Twitter. For example, there is no reference provided that would illustrate the use of these arguments from disease modelling for the study of social media.

- Retweet reproduction of common users:
In Figure 8, there is a noticeable fraction of common users with a very high reproduction rate (>5). Otherwise, only influentials feature these magnitudes of reproduction rates. What is the interpretation of this? The finding is not directly intuitive and should be explained further. Generally, the discussion of the results in Figure 8 is perhaps a bit too short and cursory.

4) Implications
A discussion of the implications of this work appears to be largely lacking. This is a bit surprising since the whole contribution is framed as providing such insights. A number of the substantive findings, as they stand, could already allow to formulate such recommendations. Or, if not, why does the author not want to draw conclusions? The paper ends on a very cautionary note but no real reason is given.

The question is whether we can, based on the analysis presented, make a recommendation for which group should be targeted first to curb activity most significantly? In addition, can one test some of these implications more explicitly or why not? For example, what effect does it have to remove all broadcasters of influentials from the sample? If the findings of this work are to inform Twitter policy or their strategy for suspending these accounts, the author should provide a much more elaborate interpretation of his results.

There is also an important additional caveat to be made that arises from the fact that the present analysis is intentionally not considering tweet contents: content can decisively matter for the impact on potential supporters. For example, should we not target the accounts first that spread the most inciting messages? Or if not, why? How is the limited screening capacity of accounts best used to target these kinds of accounts? Are there potentially simple mechanisms that rely on social dynamics to enable detection? Maybe something like a permanent Lucky Troll Club of volunteers helping Twitter? There are indications in the current U.S. debate that these grassroots movements are playing an increasingly important role in policing the accuracy of content on social media.

5) Minor points:
- define astroturf campaigns (p.1)
- incomplete reference: P. Suarez-Serrato, M. E. Roberts, C. Davis, and F. Menczer

Review #3 submitted on 09/Feb/2017

By Olivia Woolley-Meza ORCID logo

https://orcid.org/0000-0003-4517-2765

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision:
Technical Quality of the paper: Weak
Presentation: Average
Reviewer`s confidence: High
Significance: Moderate significance
Background: Reasonable
Novelty: Reasonable
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The author studies the activity of extremist groups, specifically ISIS, on the Twitter microblogging platform. Accounts are manually identified as associated with extremist ISIS related content. The tweets, retweets and mentions generated by these 25 thousand ISIS supporters are obtained for the period of January 2014 to June 2015, together with their followers and friends. Using this dataset, different analyses are performed to characterize spreading of extremist information and identify different types of information spreading strategies.

Reasons to accept:

The spread of radical (non-true) information is clearly an important and germane topic. The use of manually verified extremist accounts is, as the author highlights, a good and uncommon practice in understanding information spread. There are initial indications of interesting systematic variation in the effectiveness of different user "types" at spreading extremist information.

Reasons to reject:

Unfortunately the analysis is underdeveloped, staying on a purely descriptive level. More importantly, there are technical issues that need to be addressed in order to asses whether the results are correct and meaningful. Lastly, it is not clear to me that this stands as a unified position paper, it is rather half a position paper and half a research paper combined, but not reaching the necessary standard or depth for either. I discuss below in more detail.

Nanopublication comments:

Further comments:

Undoubtedly this paper addresses an important topic, unfortunately I see a number of problems that mean it is not yet ready for publication.
The first issue is that I am not sure this is a position paper -- it seems to fall more into the are of a research paper, but for this it lacks sufficient rigor, depth of analysis and interpretation. Perhaps I am missing the point, but I think that the author has to make it clearer how the more novel an interesting part of the approach, namely using a manually verified list of accounts on Twitter that can be linked to ISIS, is adding substantive new insight to our understanding of how extremist information can spread online.
Q1 and Q2 are not specific enough research questions to unify the content of the piece -- currently the introduction, addressing Q1 mainly, reads more like a position piece, and is disconnected from the analysis which addresses Q2 and is more a research piece.

Unfortunately, there are a number of technical issues:

The Base reproduction Number R0 is not used correctly. Ro is conceptually simple, but in fact very hard to estimate in a real population, for a real disease when there is a lack of homogeneity in transmission and contact dynamics and when the population is far from the assumed purely susceptible state. The situation is clearly much murkier in the case of information spread, where the transmission mechanism is more complex. I will skip a full discussion of the issues so as not to be a bore and avoid inaccuracies myself! My main point here is that a different term needs to be used to describe the "transmission" factor in this system. The comparison with specific disease Ro values is not only conceptually misleading, there is also no citation for the Ro numbers used for different diseases. These numbers further seem rather inaccurate and imprecise to my knowledge. This comparison should simply be avoided.

One important issue that should be clarified however: the computation of Ro per person is also hard to evaluate given that there is no clear statement in this case (there is for the global Ro calculation) whether only unique users retweeting are counted or whether the same user can contribute to the count (of one user or a number of users) multiple times. In the later case, the assumptions that the retweeter is susceptible (i.e. not infected) becomes suspect.

There is generally insufficient discussion of the limitations of the methodology. Especially in what relates to the oversimplified calculations for the "virality" of different users through retweeting. For instance, is it possible that some users appear to be retweeted simply because they mimic the behavior of other users that are heavily retweeted? In this case they are not infecting anyone themselves but would appear to. Also, a more careful characterization of the out-of-sample accounts is needed since this can introduce many limitations. For example, it is possible that the original classification misses individuals that are already ardent ISIS supporters. These individuals thus were already "infected" and can give the impression that virality of content is much higher than it really is.

Beyond the Ro and infectivity analysis, there is generally insufficient explanation and interpretation of results and of the limitations. In the conclusion the author says that their work supports the theory that extremists are using complex information strategies. Although they do not spell it out here, I believe this is in reference to the fact that different user types are revealed by the dynamic activity-connectivity maps. It is not clear to me this represents complex strategies. The classes are defined through arbitrary cutoffs and could simply be an expression of statistical variation along two dimensions. Perhaps the differences in how viral the content produced by some users would be much better explained by types along different dimensions. In my opinion, a more interesting analysis, which would address Q2 more specifically, is learning types through natural clustering (along these two dimensions or others) and quantifying how well they explain the variability in e.g. retweets.

Unless I am missing something, there seems to be a mistake in a number of the pdf plots: there are some curves (representing a specific class of users for instance), that appear to have strictly more area under them than other curves. Yet the area under all curves should be one. This is most salient in Figs. 7 and 8. How is it possible that the influential (yellow) bars in the top panel of figure 8 are taller for all Ro values? This would mean that influentials have a higher proportion of users in every Ro class! Perhaps I am missing something? All the pdf plots seem suspect to incorrect normalization or another mistake.

Furthermore, there are no error margins or statistical significance calculations for any of the numbers computed from the data. This makes it almost impossible to gage significance of the results.

More nit-picky issues that still need to be addressed:

Why is the related work section at the end? It would seem to belong more naturally around the introduction. It also needs to be more clearly connected to the work presented here (e.g. what limitations does the current study address?) so that a clearer assessment can be made of what the novelty is.

The author claims that in Fig. 2 the solid blue and dashed red lines are the "typical power-law shape". I agree that these distributions are heterogeneous and with a heavy tail, but there is no indication that a power law is a particularly good fit. A more rigorous fit or wording less likely to mislead would be appropriate here.

There are a number of typos, so proof reading is needed (since pages and lines are not numbered it is hard to point each one out here). However, those most confusing:
- In the definition of /delta f, a superscript says "mix" instead of "max" .
- In section 3.4, first paragraph, the third sentence, where the notion of adoption is defined, has some missing words or incorrect grammar and therefore I cannot understand it.

Figure 1 seems to have the wrong caption (I see no log axis etc...)

Figure 5 is not very "user friendly". The comparisons that the authors wish to draw attention to and are discussed in the text, cannot be easily made in the figure. Instead of the current presentation where the different user types are represented in different panels, the panels could for example be the three different communication statistics and within each panel the level for each group could be presented side-by-side to enable comparisons. Or everything could be incorporated in one panel, with the comparison measurements side-by-side... Furthermore, the meaning of the different markers and symbols in the box plots should be clearly explained in the caption for those of us who forget. In this manner the reader can actually draw conclusions about to what degree the noted differences are statistically significant.

2 Comments

Paper Withdrawn by Author

Submitted by Tobias Kuhn on Wed, 06/27/2018 - 10:53

This paper was withdrawn by the author on 10 February 2017.

Meta-Review by Editor

Submitted by Tobias Kuhn on Wed, 06/27/2018 - 10:57

The following is the meta-review from 10 February 2017 before the paper was withdrawn by the author:

Thank you for your submission to Data Science. I inform you that the acceptance or rejection of your manuscript is still UNDECIDED (we don't use "major revisions" or "minor revisions"). I ask you to respond to the enclosed comments by the reviewers, and to take them into account for your revised version.

In particular, the following problems identified by the reviewers need to be addressed:

- Data availability: Your work does currently not follow our data availability requirements (see the paragraph on "Data" in the author guidelines: http://datasciencehub.net/guidelines.html)
- The problem of Research Question 1, as noted by Reviewer 2 ("not actually a real research question")
- Lacking discussion of implications, as noted by Reviewer 2
- Problems with the contagion part of the analysis, as noted by Reviewer 3
- Potential problems with the plots, as noted by Reviewer 3

Below I also list some very minor comments and typos from my side.

Please provide point-by-point responses to the issues raised by the reviewers, and prepare a separate text file that contains these responses. The revised version of the paper is expected within 20 days.

Typos and minor comments from my side (I might be wrong about some of the typos though):

- "1.2 million of these tweets was" > "... were"
- "The plot also introduce" > "... introduces"
- "We note how the connectivity growth dimension spans three orders of magnitude [...] This
means that some users’ followerships grows tens of times faster than the rate at which
they follow others": Shouldn't it be (at least) "hundreds of times faster"?
- I would be helpful if Figure 4 had labels for what the x and y axes stand for, instead of just "x" and "y".
- "significantly less tweets" > "... fewer ..."
- "similar amounts of retweets than broadcasters": not sure how to correctly phrase this but "similar ... than" sounds wrong to me
- "does not appear set of" > "does not appear in the set of"

Tobias Kuhn (http://orcid.org/0000-0002-1267-0234)

Tracking #: 486-1466

Authors:

Responsible editor:

Submission Type:

Abstract:

Manuscript:

Tags:

Data repository URLs:

Date of Submission:

Date of Decision:

Decision:

2 Comments