Reviewer has chosen to be AnonymousOverall Impression:
AcceptTechnical Quality of the paper:
Limited noveltyData availability:
All used and produced data (if any) are FAIR and openly available in established data repositoriesLength of the manuscript:
The length of this manuscript is about right
Summary of paper in a few sentences (summary of changes and improvements for
second round reviews):
The submitted manuscript presents a new tool for anonymizing data download packages (DDPs) released by online service providers (data controllers) in the context of the GDPR. The main motivation behind this tool is to enable researchers to make use of these DDPs for scientific purposes. The provided tool is tested with 11 participants creating fake Instagram profiles and actively using them for about a week. It is a revised version of previously submitted manuscript.
Reasons to accept:
- The work is overall well presented and detailed, with a clear motivation and a sound approach
- The empirical results of the improved script are excellent
- Both the scripts and dataset are made open source
- The proposed tool is applicable to different social media
Reasons to reject:
- The method used to hide faces in images/videos previously shown to be prone to re-identification attacks
Except for one point, the authors have addressed all the main weaknesses in their revised manuscript. In particular, they have clearly positioned their work with respect to prior literature and referenced the missing tools. The authors also convinced me that the submitted paper fits within the journal scope. The authors have further shown that their tool could be applied more broadly than just to Instagram by providing a detailed table on the features of five prominent social networks (Instagram, Facebook, Twitter, Snapchat, WhatsApp). This table includes valuable information about the types of PII present in the different social networks, the information in the DDP, the format of the DDP files, etc. This table clearly demonstrates that different social networks share similar features and formats, and thus that the tool's impact goes beyond Instagram.
The authors have not addressed the last weakness that I listed in my review, i.e., that the method used to hide faces in images/videos is prone to re-identification attacks. However, as they noted in their revision letter, the tool is made of various de-identification modules and could be complemented later with more robust methods for face hiding. I think this goes beyond the scope of this work.
On a negative side, there still remain many typos, like:
- feature based -> feature-based
- person identifiable information -> personal identifiable information
- Rephrase the last sentence of Section 7.