Deep Learning based Crime Detection and Resource Creation Approach From Bengali Voice Calls

Tracking #: 735-1715

Authors:

	Name	ORCID
	Nahid Hossain	https://orcid.org/0000-0002-1325-8209
	Khalid Saifullah	https://orcid.org/0000-0003-4662-5033
	Mohammad Masudul Alam	https://orcid.org/0000-0003-2537-9937
	Prosenjit Majumder Joy	https://orcid.org/0009-0009-6430-0532
	Jadir Ibna Hasan	https://orcid.org/0009-0007-2799-5203
	Salekul Islam	https://orcid.org/0000-0002-7262-0060

Responsible editor:

Bharathi Raja Chakravarthi

Submission Type:

Research Paper

Abstract:

Mobile phones have revolutionized our way of communication. Despite its numerous benefits, it has become a great utility for conducting crimes or making threats. Due to the large number of users it is almost impossible for security forces to take proactive measures against those crimes. In this paper, with the help of machine learning, we focus on building a system that can detect potential threats in phone calls. We develop (to the best of our knowledge) the very first Bengali voice call dataset to train the machine learning system. Our system takes a voice call and uses a Deep 1D Convolutional Neural Network to analyze the call and a Multi-Layer Perceptron to decide whether any threats exist or not. The proposed simple baseline solution, trained on our $\sim$9hrs. worth voice call dataset, is able to achieve $91$\% precision, recall and F1-score in detecting the crime calls. We believe, in future these systems will aid in assisting in investigations, evaluating voice conversations, and giving predictions and estimations for potential threats. All of our recorded calls are freely available to use by the future researchers at: https://tinyurl.com/detecThreats

Manuscript:

ds-paper-735.pdf

Data repository URLs:

https://tinyurl.com/detecThreats

Date of Submission:

Wednesday, November 30, 2022

Date of Decision:

Friday, March 17, 2023

Nanopublication URLs:

Decision:

Reject

Solicited Reviews:

Review #1 submitted on 12/Jan/2023

By Pawan Goyal ORCID logo

https://orcid.org/0000-0002-9414-8166

Review Details

Reviewer has chosen not to be Anonymous

Overall Impression: Average
Suggested Decision: Undecided
Technical Quality of the paper: Average
Presentation: Good
Reviewer`s confidence: High
Significance: Moderate significance
Background: Reasonable
Novelty: Lack of novelty
Data availability: All used and produced data (if any) are FAIR and openly available in established data repositories
Length of the manuscript: The length of this manuscript is about right

Summary of paper in a few sentences:

The paper proposed a dataset and deep neural network based crime detection from Bengali voice calls. A synthetic data was created for the task, along with an available real data.

Reasons to accept:

1. Authors developed the very first Bengali voice call dataset for crime detection with three labels - crime, neutral and sarcastic. 2. The M11 architecture details have been explained well. 3. The graphical summary showed valuable insights.

Reasons to reject:

1. There is no discussion about the limitations of the current study. The model and dataset is only specific to Bengali in a monolingual scenario but in real life speech data, people talk in code-mixed voice (Bengali-English-Hindi or Bengali-Other Language). How would the current model be applicable in that scenario?

2. Authors created a synthetic data and mixed it with real life data. It would be nice to separate the test set into synthetic as well as mixed data, and discuss the performance on each of these separately.

3. The annotation process is not explained in detail - i) what are the annotation guidelines? ii) Who did the annotations? iii) what are the qualifications of the annotators? Are they experts? iv) what is the inter-annotator agreement?

4. The only input to the classification model is speech data. However, if the authors use an ASR to convert it to text, and use a text classifier, what would be the performance?

5. Authors should also provide the errors with examples.

Nanopublication comments:

Further comments:

Review #2 submitted on 17/Mar/2023

Review Details

Reviewer has chosen to be Anonymous

Overall Impression: Weak
Suggested Decision: Reject
Technical Quality of the paper: Weak
Presentation: Good
Reviewer`s confidence: High
Significance: Moderate significance
Background: Reasonable
Novelty: Lack of novelty
Data availability: Not all used and produced data are FAIR and openly available in established data repositories; authors need to fix this
Length of the manuscript: The authors need to elaborate more on certain aspects and the manuscript should therefore be extended (if the general length limit is already reached, I urge the editor to allow for an exception)

Summary of paper in a few sentences:

Authors used a deep 1D Convolutional Neural Network to identify whether any threats exist or not in voice calls of mobile phones.
They have collected voice call dataset with more than 9 hours of audio data and annotated with 3 labels namely crime, normal and sarcastic.
CNN with MLP was employed to classify the audio to one of the 3 labels. Two variations namely WeightedRandomSampler and ClassWeight methods were experimented on the dataset.

Reasons to accept:

Data set contribution.

Paper is well written

Reasons to reject:

Lack of contributions in the methodology

Lack of analysis

Nanopublication comments:

Further comments:

The major contribution of this paper is on data set. However, more details can be added related to data set collection.
1. How many annotators were involved in labeling?
2. What is the inter-rater agreement?
3. What are the guidelines used for annotating to crime, normal and sarcastic in Bengali language?

Introduction section can be concluded with what are the open challenges and contributions of this research.

Related work section can be summarized with a table highlighting the research gaps.

What is the reason for choosing M11 architecture for solving this problem? What is the impact of recurrent neural network which may be a better choice for voice data that captures the context better than CNN architectures.

The data set used for evaluation is not a gold standard one. Authors selected a few instances to be the test data set and they claim they achieved 91% accuracy which may be subjective to the test data they have chosen. It would be better if authors perform a k-fold cross validation when they evaluate on their own data set.

A detailed empirical analysis can be done with more variations of methodology.

Error analysis and statistical analysis can be included.

2 Comments

Review the paper and comment.

Submitted by Malik Jawarneh on Tue, 03/07/2023 - 05:31

Positive Comments:

-The research paper proposes a unique approach to detect potential threats in phone calls using deep learning.
-The proposed system uses a Deep 1D Convolutional Neural Network to analyze the calls and a Multi-Layer Perceptron to decide whether any threats exist or not.
-The proposed simple baseline solution is able to achieve 91% precision, recall and F1-score in detecting the crime calls.
-The recorded calls are freely available to use by the future researchers.

Negative Comments:

-The research paper does not provide any specifics on the Multi-Layer Perceptron used in the system.
-The research paper does not provide any information on the challenges faced while collecting the dataset.
-The research paper does not provide any information on the potential applications of the system.

Meta-Review by Editor

Submitted by Tobias Kuhn on Fri, 03/17/2023 - 09:25

There are many question from reviewers which are not answered in the paper. I encorage the authors to consider the reviewers suggestion carfully to improve the paper for future work.

Important questions to take care while updating the papers are

1) Annotation process inculding the guidelines and annotators (reviewer 1 and reviewer 2 question)-- since the papers main contribution is dataset

2) Situation in code-mixed -- real world senario (reviewer 1 question)

3) Lack of analysis (reviewer 2 question)

4) Details about the models used

Bharathi Raja Chakravarthi (https://orcid.org/0000-0002-4575-7934)

Data Science