Confusion Matrix and Cyber crime

Bhavna Surendra Latare
5 min readJun 6, 2021

This article is will you insight about confusion matrix and how it is used to detect cyber crimes.

Confusion Matrix

Confusion matrix is the most popular solution for solving binary classification problems. It is a N*N matrix used to evaluate the performance of the classification model where N refers to number of target classes.

In the matrix, actual target values are compared with values predicted by machine learning model. Thus, it helps us to analyze whether the classification model is correct or there are some errors with it.

Confusion matrix represents count from predicted and actual values.

Here,

True Positive (TP) : Prediction is positive and it is true.

True Negative (TN) : Prediction is negative and it is true.

False Positive (FP) : Prediction is positive but it is false.

False Negative (FN) : Prediction is negative and it is false.

Two types of errors comes up which are not at all predicted when the model predicts the data point accurately and some of the points are misclassified that leads to some error.

  • Type 1 Error : When actual value is false but it is predicted 1 i.e. False Positive.
  • Type 2 Error : When actual value is positive but it is predicted 0 i.e. False Negative.

Accuracy

It is defined as the closeness of prediction value to the actual value. Accuracy is the performance measure used to check our model. It is preferred when the number of false positives values and the false negative values are the same. When the false-positive rates and the false negative rates are different then it is not much a good approach to check the performance of our classifier.

Formula:

Accuracy = (TP + TN)/(TP + TN + FP + FN)

Precision

It is the ratio of true positive values to all positive values. It is the measure of truly predicted positive samples to the total number of positively predicted samples. If the precision score is more then it represents that our model is pretty good to classify the samples.

Formula:

Precision = TP/(TP + FP)

Recall

It is also called as true positive rate. It determines the true positive values predicted by the model. It is the measure of truly predicted positive samples of all the samples present in the actual class as yes. It is also termed as the sensitivity of the model.

Formula:

Recall = TP/(TP + FN)

F1 score

It is calculated as the weighted average of both precision and recall. Its main components are true negatives, true positives, false negatives and false positives. F1 score is preferred more than accuracy in order to know our classifier model performance measure.

Formula:

F1 Score = 2 (precision recall)

Cyber Crime

Cybercrime is criminal activity that either targets or uses a computer, a computer network or a networked device. Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money. Cybercrime is carried out by individuals or organizations. Some cybercriminals are organized, use advanced techniques and are highly technically skilled. Others are novice hackers. Rarely, cybercrime aims to damage computers for reasons other than profit. These could be political or personal.

Cybercrime will cost nearly $6 trillion per annum by 2021 as per the cybersecurity ventures report in 2020. For illegal activities, cybercriminals utilize any network computing devices as a primary means of communication with a victims’ devices, so attackers get profit in terms of finance, publicity and others by exploiting the vulnerabilities over the system.

Cybercrimes are steadily increasing daily.

Evaluating cybercrime attacks and providing protective measures by manual methods using existing technical approaches and also investigations has often failed to control cybercrime attacks. Existing literature in the area of cybercrime offences suffer from a lack of a computation methods to predict cybercrime, especially on unstructured data.

Therefore, this study proposes a flexible computational tool using machine learning techniques to analyze cybercrimes rate at a state wise in a country that helps to classify cybercrimes.

Security analytics with the association of data analytic approaches help us for analyzing and classifying offenses from India-based integrated data that may be either structured or unstructured.

The main strength of this work is testing analysis reports, which classify the offenses accurately with 99 percent accuracy.

At present, there is no generalized framework is available to categorize cybercrime offenses by feature extraction of the cases.

In the present work, data analysis and machine learning are incorporated to build a cybercrime detection and analytics system. The proposed system’s design and implementation utilize classification, clustering and supervised algorithms.

Proposed methodology to to analyze cybercrime incidents

In the reconnaissance phase, the integrated data (structured and unstructured) are collected from Kaggle and CERT-In. The next phase of the approach is preprocessing that is used to remove the noisy information from the raw data.

sample dataset

While preprocessing only the feature extraction process takes place. It converts the high dimensional data to low dimensional data. This preprocessed data are helpful for data visualization because a composite data can organize well when that complex data are converted as a less number of dimensions.

While prediction analysis , the cybercrime data were analyzed and used to predict which crime is occurring more in a particular year at a particular location. Through this analysis, one can predict the cybercrime data and can reduce the incarnation of cybercrime incidents. Therefore, in this step, the prediction of the cybercrime data is classified.

Clustered cyber crime data plotted
Confusion matrix for the model — investigating cyber crimes

The framework developed in our work is essential to the creation of a model that can support analytics regarding the identification, detection and classification of the integrated cybercrime offenses (structured and unstructured). To find the attacks that take advantage of the security vulnerabilities and analyze these attacks by making use of machine learning techniques.

Hope investing time in this blog proved fruitful to you!

Keep learning !

Keep sharing!

References :

https://www.researchgate.net/figure/Confusion-matrix-for-the-proposed-model_fig5_341453806

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Bhavna Surendra Latare
Bhavna Surendra Latare

Written by Bhavna Surendra Latare

I am a student passionate about Programming and Development | DSA Enthusiastic | working on ML | exploring Indian Culture

No responses yet

Write a response