Twitter, with its constant generation of data, is an ideal platform to produce statistics about a certain population. Yet, demographic research on Twitter data has only recently become popular. Identifying the nationality of a Twitter user, Dutch in our case, would allow to focus future studies on a particular community or nation which is particularly advantageous to produce national statistics. This paper exploits the followee-follower relations in order to, first, establish a link between the characteristics of a company and the proportion of Dutch users present in their followers through a rule-based test, and then, to detect the existence of a Dutch network. For the latter, a dataset of 6548 profiles was built to train and test the following machine learning algorithms: support vector machines, Naive Bayes, K nearest neighbour/centroid, Adaboost and a voting classifier which accuracies reached over 90%. A potential use for those classifiers could be identifying the whole Dutch population present on Twitter.
Data repository URLs: