Network Analysis of the Twitter Population Exploiting the Homophily Concept, a Machine Learning Approach

Tracking #: 534-1514

Marco PutsORCID logo
Jade Cock
Piet Daas

Submission Type: 

Research Paper


Twitter, with its constant generation of data, is an ideal platform to produce statistics about a certain population. Yet, demographic research on Twitter data has only recently become popular. Identifying the nationality of a Twitter user, Dutch in our case, would allow to focus future studies on a particular community or nation which is particularly advantageous to produce national statistics. This paper exploits the followee-follower relations in order to, first, establish a link between the characteristics of a company and the proportion of Dutch users present in their followers through a rule-based test, and then, to detect the existence of a Dutch network. For the latter, a dataset of 6548 profiles was built to train and test the following machine learning algorithms: support vector machines, Naive Bayes, K nearest neighbour/centroid, Adaboost and a voting classifier which accuracies reached over 90%. A potential use for those classifiers could be identifying the whole Dutch population present on Twitter.



  • Reviewed

Data repository URLs: 

Date of Submission: 

Monday, June 4, 2018

Date of Decision: 

Thursday, June 7, 2018


Reject (Pre-Screening)

1 Comment

some remarks concerning the authors

There are some remarks regarding the authors of this paper.

Jade Cock is the first author of the paper. Mena Habib left academia and his email address is unknow. However, he contributed to this paper and it should be good t have him as one of the co-authors.