Alert Fatigue. Alert Triage. Alert Prioritization. Security teams at many organizations generate more alerts than they can effectively handle. Their firewalls are too chatty. Their antivirus solution generates the same alerts all the time. Their threat intel feeds generate too many false positives. Going through them and manually whitelisting things is too much work – a system that automatically identifies the important alerts is desired. In this post, we describe how we leverage anomaly detection to help reduce alert volumes and focus analyst attention on the most important alerts.
The goal of this study was to automatically prioritize the alerts, to ensure that the organization would not miss the important alerts in a sea of uninteresting ones. This would also reduce the burden on the analysts, enabling the organization to spend less time on alert triage and more time investigating and protecting against real threats, doing proactive threat hunting, and taking other proactive security measures.
Below is a summary of the results for a 30-day period:
The result was a very manageable average of 200 alerts per day.
These patterns are incredibly helpful in triaging alerts. But an analyst has limited time and cognitive bandwidth to identify them, encode them, and communicate them to their team. This process is expensive and error prone, and is never going to be able to identify all the patterns. Instead, we have taken an anomaly detection approach to alert triage, where the anomaly detection algorithms do all of this work for the analyst. The value of this is three-fold.
The anomaly detection techniques applied here are completely unsupervised, and don’t require training data. The goal is for them to be immediately useful, without any feedback or manual tuning. As analysts add feedback, it can be used to refine the approach even further.
We begin with entity-specific anomalies. The following list shows hosts that had spikes in the number of different types of alerts they raised on a particular day. There were 6 spikes observed on 5 hosts.
Investigations of these spikes revealed a few different types of security incidents. Three of the spikes were caused by users installing applications packaged with malware. One involved installation of suspicious software. Two were quickly determined to be benign, caused by normal installation activity. In one case, this activity was followed closely by visits to .ru and .cn websites and execution of multiple files that raised alerts.
The following graph shows the host and some of the processes that generated alerts and their relationships to each other. It also shows that there was a simultaneous alert on some network communication between the host and an external IP address.
Other lists of entity-centric anomalies yielded other interesting results, including:
This approach is targeted toward both mature security organizations who already have a handle on their alerts, and are looking to improve prioritization or streamline processes, and especially those who feel like they are drowning in alerts, are constantly putting out fires, or can’t hire enough good people to deal with the volumes of data they are seeing.
Anomaly detection is just one of many ways that we help to reduce alert volume, prioritize investigations, and identify real security incidents. Keep following us for more information about how we help streamline security investigations using graph analytics and sophisticated alert scoring.
The Case Study
We have a client whose network and endpoint monitoring solutions together generate more than 5 billion events and 600 million alerts per month – more than 200 alerts per second. The network alerts account for the vast majority of this volume, but the endpoint alerts are not insignificant. They account for 1.3 million alerts per month, more than 1,700 per hour.The goal of this study was to automatically prioritize the alerts, to ensure that the organization would not miss the important alerts in a sea of uninteresting ones. This would also reduce the burden on the analysts, enabling the organization to spend less time on alert triage and more time investigating and protecting against real threats, doing proactive threat hunting, and taking other proactive security measures.
Below is a summary of the results for a 30-day period:
Before
|
After
|
Reduction in Alerts
|
600,000,000+
|
6,200
|
100,000 to1
|
The result was a very manageable average of 200 alerts per day.
The Anomaly Detection Approach
Anyone who has spent a lot of time staring at security alerts has noticed patterns. A user runs an application that generates the same alerts everyday. Regularly scheduled updates cause the same alerts to be raised across the company. A particular user has a penchant for downloading web toolbars, which raise a barrage of alerts.These patterns are incredibly helpful in triaging alerts. But an analyst has limited time and cognitive bandwidth to identify them, encode them, and communicate them to their team. This process is expensive and error prone, and is never going to be able to identify all the patterns. Instead, we have taken an anomaly detection approach to alert triage, where the anomaly detection algorithms do all of this work for the analyst. The value of this is three-fold.
- It helps to cut through the noise, those pesky alerts you see everyday.
- It surfaces the anomalies in groups, so analysts are simultaneously investigating and resolving multiple alerts.
- It presents all the information the algorithms used to identify the alerts to the analyst, so that they have all the background and context the algorithms had to make their decision.
- Entities generating alerts of types that are rare for that entity.
- Entities generating spikes in alerts in total or by type.
- Entities generating abnormal distributions of alerts by type.
- Alert types that are being observed on more entities than usual.
- Alerts on specific indicators that are being observed on more entities than usual.
The anomaly detection techniques applied here are completely unsupervised, and don’t require training data. The goal is for them to be immediately useful, without any feedback or manual tuning. As analysts add feedback, it can be used to refine the approach even further.
The Results
This section provides an example of the anomaly lists, providing details of why the highlighted anomalies were significant. Hostnames, dates, and IP addresses have been changed.We begin with entity-specific anomalies. The following list shows hosts that had spikes in the number of different types of alerts they raised on a particular day. There were 6 spikes observed on 5 hosts.
April 5th 2017 ray-mbp had 7 different types of alerts, a spike
April 11th 2017 eddie-win had 17 different types of alerts, a spike
April 19th 2017 colin-mbp had 5 different types of alerts, a spike
April 19th 2017 eddie-win had 27 different types of alerts, a spike
April 29th 2017 eddie-mbp had 8 different types of alerts, a spike
April 29th 2017 vivek-mbp had 9 different types of alerts, a spike
Investigations of these spikes revealed a few different types of security incidents. Three of the spikes were caused by users installing applications packaged with malware. One involved installation of suspicious software. Two were quickly determined to be benign, caused by normal installation activity. In one case, this activity was followed closely by visits to .ru and .cn websites and execution of multiple files that raised alerts.
The following graph shows the host and some of the processes that generated alerts and their relationships to each other. It also shows that there was a simultaneous alert on some network communication between the host and an external IP address.
Other lists of entity-centric anomalies yielded other interesting results, including:
-
A SIP exploit attempt originating in Germany against a large block of IP addresses, identified by the spike in activity from the attacker IP.
- A host connecting with known Zeus CnC servers using curl and generating multiple simultaneous alerts from their web browser, identified by the strange alert types being raised on that host.
- A host that had been hit with an exploit, identified by unusual parent-child process relationships in the alert.
- An internal user trying to brute for a password to an internal host, identified by the spike in alert activity.
- 33 hosts all infected with the same malware, identified by a spike in the number of machines generating the alert.
- 3 users using remote access software, identified by a spike in the number of users generating the alert.
Conclusion
Anomaly detection is an effective tool for prioritizing the most important alerts and significantly cutting down on the total number of alerts an analyst has to deal with. The results shared in this post show how anomaly detection effectively cut a list of 600 million alerts down to a handful of short, easy to understand lists of alert anomalies, enabling rapid identification of real threats.This approach is targeted toward both mature security organizations who already have a handle on their alerts, and are looking to improve prioritization or streamline processes, and especially those who feel like they are drowning in alerts, are constantly putting out fires, or can’t hire enough good people to deal with the volumes of data they are seeing.
Anomaly detection is just one of many ways that we help to reduce alert volume, prioritize investigations, and identify real security incidents. Keep following us for more information about how we help streamline security investigations using graph analytics and sophisticated alert scoring.