Clear All Confusion About Confusion Matrix ….

How Confusion Matrix Use in Cyber Crime ?

Pritee Dharme .
6 min readJun 3, 2021

Hello All …

This is Pritee and I am come with another article in that we will clear all the confusion about Confusion Matrix and let see how it is used in Cyber Crime..?

So let’s start with basic points… :)

Confusion Matrix ….?

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.

A confusion matrix is a good and reliable metric to use with classification problems. It is used to prove that the model is good or bad for different classes and their different impact. For example, if the model needs to catch classes of one particular class more than the other, we can create that measure from the confusion matrix. Let’s understand this by the example of two classes 0 and 1. There are four possible scenarios can happen while prediction:

Class is 1 and our model predicted 1 — That’s correct!

Class is 1 and our model predicted 0 — Not good.

Class is 0 and our model predicted 1 — Again not good.

Class is 0 and our model predicted 0 — Correct!

We can bind all these scenarios in a matrix-like this :

Let’s understand TP, FP, FN, TN in terms of pregnancy analogy.

◼** True Positive:

Interpretation: You predicted positive and it’s true.

You predicted that a woman is pregnant and she actually is.

◼** True Negative:

Interpretation: You predicted negative and it’s true.

You predicted that a man is not pregnant and he actually is not.

◼** False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

You predicted that a man is pregnant but he actually is not.

◼** False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

You predicted that a woman is not pregnant but she actually is.

Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.

What can we learn from this..?

A valid question arises that what we can do with this matrix. There are some important terminologies based on this:

Precision :

It is the portion of values that are identified by the model as correct and are relevant to the problem statement solution. We can also quote this as values, which are a portion of the total positive results given by the model and are positive. Therefore, we can give its formula as TP/ (TP + FP).

Recall :

It is the portion of values that are correctly identified as positive by the model. It is also termed as True Positive Rate or Sensitivity. Its formula comes out to be TP/ (TP+FN).

F-1 Score :

It is the harmonic mean of Precision and Recall. It means that if we were to compare two models, then this metric will suppress the extreme values and consider both False Positives and False Negatives at the same time. It can be quoted as 2*Precision*Recall/ (Precision+Recall).

Accuracy :

It is the portion of values that are identified correctly irrespective of whether they are positives or negatives. It means that all True positives and True negatives are included in this. The formula for this is (TP+TN)/ (TP+TN+FP+FN).

Out of all the terms, precision and recall are most widely used. Their tradeoff is a useful measure of the success of a prediction. The desired model is supposed to have high precision and high recall, but this is only in perfectly separable data. In practical use cases, the data is highly unorganized and imbalanced.

Now after all this basic and important concept let’s move toward the main part and that cyber crime and use case of confusion matrix in cyber crime..

Confusion Matrix in Cyber Crime :

Particularly in the last decade, Internet usage has been growing rapidly. However, as the Internet becomes a part of the day to day activities, cybercrime is also on the rise. Cybercrime will cost nearly $6 trillion per annum by 2021 as per the cybersecurity ventures report in 2020. For illegal activities, cybercriminals utilize any network computing devices as a primary means of communication with a victims’ devices, so attackers get profit in terms of finance, publicity and others by exploiting the vulnerabilities over the system. Cybercrimes are steadily increasing daily. Evaluating cybercrime attacks and providing protective measures by manual methods using existing technical approaches and also investigations has often failed to control cybercrime attacks.

Cyber attack is becoming a critical issue of organizational information systems. A number of cyber attack detection and classification methods have been introduced with different levels of success that is used as a countermeasure to preserve data integrity and system availability from attacks. The classification of attacks against computer network is becoming a harder problem to solve in the field of network security.

  • True Positive (TP) : The amount of attack detected when it is actually attack.
  • True Negative (TN) : The amount of normal detected when it is actually normal.
  • False Positive (FP) : The amount of attack detected when it is actually normal (False alarm).
  • False Negative (FN) : The amount of normal detected when it is actually attack.

Type I Error :

This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.

Type II Error :

This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.

So that All…I hope get some thing from this article and your confusion get cleared …. :)

So if you like it then clap and share. And want to connect with me then below is my LinkedIn Profile Link..

Thank You So Much For Reading … See you soon… :)

--

--