INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT Int. J. Network Mgmt (2014) Published online in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/nem.1857An efficient approach to reduce alerts generated by multiple IDS productsTu Hoang Nguyen,1,2 Jiawei Luo1,*† and Humphrey Waita Njogu3 1College of Information Science and Engineering, Hunan University, Changsha, China 2Centre for Informatics and Foreign Language, Hanoi University of Industry, Hanoi, Vietnam 3Kenya Institute for Public Policy Research and Analysis (KIPPRA), Nairobi, KenyaSUMMARY Intrusion detection systems (IDSs) often trigger a huge number of unnecessary alerts. Managing the overwhelming number of alerts, especially from multiple IDS products, is a concern to every security analyst. Analyzing and evaluating these alerts is a difficult task that frustrates the effort of analysts. In fact, true alerts are usually buried under heaps of false alerts. We have identified several research gaps in the existing alert management approaches that need to be addressed, especially when handling alerts from different IDS products. In this work, we present an efficient alert management approach that reduces the unnecessary alerts produced by different IDS products using two main modules: an enhanced alert verification module that validates alerts with vulnerability assessment data; and an enhanced alert aggregator module that reduces redundant alerts and presents them in the form of meta alerts. Finally, we have carried out experiments in our test bed and recorded impressive results in terms of high accuracy and low false positive rate for multiple IDS products. Copyright © 2014 John Wiley & Sons, Ltd. Received 19 November 2012; Revised 4 December 2013; Accepted 10 February 20141. INTRODUCTION Increasing cybercrime has seen a great demand for security devices for computer networks. Unfortu- nately, there is no single device with the ability to tackle all the network security concerns. As a result, many organizations are increasingly looking for additional security technologies to counter risk and vulnerability that other security tools fail to address. Intrusion detection systems (IDSs) are commonly used to complement other security tools in detecting network intrusions [1,2]. Recently, IDSs have gained wide acceptance as a valuable investment in organizations because they offer a layer of defense to computer networks. Generally, IDSs gather and analyze information in a network in order to identify possible security breaches and generate an alert or alarm if an intrusion is detected. There are two classes of IDS [3]: • Signature-based IDS: this recognizes patterns of attack and works in a similar way to antivirus software. The IDS essentially contains attack descriptions or signatures and matches them against the audit data stream, looking for evidence of known attacks. It employs signature databases of well-known attacks, and a successful match with current input raises an alert. Signatures generally target widely used applications or systems for which security vulnerabilities are widely advertised. • Anomaly-based IDS: this looks for deviations from normal usage behavior in order to identify abnormal behavior. An anomaly detection technique relies on models of the normal behavior of a network. The IDS may focus on the users, the applications or the network. Behavior profiles*Correspondence to: Luo Jiawei, College of Information Science and Engineering, Hunan University, Hunan, China †E-mail: luojiawei@hnu.edu.cn Copyright © 2014 John Wiley & Sons, Ltd. N. HOANGTU, L. JIAWEI AND H. W. NJOGUare built by performing statistical analysis on historical data, or by using rule-based approaches to specify behavior patterns. IDSs are designed to generate alerts of high quality to the analysts; unfortunately, traditional IDSs have not lived up to this objective because they trigger an overwhelming number of unnecessary alerts [4]. In fact, most of these alerts are primarily irrelevant resulting from non-existing intrusions [3–5]. The art of detecting intrusions is still far from perfect [4,5]. Actually, the current IDSs appear ineffective due to large volumes of alerts. It is interesting that false alerts form the largest proportion of the total number of alerts produced by IDSs. The interesting alerts are often missed in the analysis because they are buried under heaps of large numbers of false alerts [6,7]. In busy and large networks, the analysis of alerts becomes an unmanageable task and hence it is difficult to understand the intrusions behind the alerts. It is important to note that reduction of alerts and provision of quality alerts are very critical aspects of alert management. Our work focuses on reduction of the unnecessary alerts that are often generated by multiple signature-based IDSs. Alert reduction is a well-explored topic in alert management. More particularly, alert verification has been cited as a key component in alert management and looks to be very promising technique to deliver alerts of high quality. Generally, the alert verification process uses vulnerability assessment data to establish whether a given alert represents a real threat to the network or not. This technique filters out any alert that does not have a corresponding vulnerability in a given network. Although several efforts have been made in the literature regarding use of the alert verifica- tion technique to improve the quality of alerts, there are several research gaps that need to be addressed in order to reduce unnecessary alerts significantly, especially in a network with multiple IDS products from different vendors. As seen in Section 2, it is evident that knowledge-based approaches suffer from the issue of having incomplete details concerning vulnerabilities such as reference i.d.s that uniquely identifiers different vulnerabilities. This problem worsens when dealing with alerts generated by different IDS products because each product uses its own set of reference i.d.s, which may be different with other IDS products. In fact, the alert verification-based approaches do not have a comprehensive and effective collaborative architecture to improve the quality of alerts. We also noted that the act of alert verification may not guarantee final alerts of high quality. As noted in Section 2, little attention is given to alerts after the alert verification process. The validated alerts may contain a massive number of redundant and isolated alerts that need to be reduced. The trend of the multi-step intrusions is on the rise, leading to unmanageable redundant alerts. For example, a single intrusion can generate several alerts with common features. The analysis of single redundant alerts provides partial information on the attack that does uncover the real patterns of the attacks. This calls for a better post- alert verification mechanism that aggregates the individual redundant and isolated alerts representing every step of attack to have the ‘big picture’ of attacks for different IDS products. This paper proposes an effective alert management approach that handles alerts generated by signature-based IDSs from different vendors. The approach has two main modules: the alert verification module, which validates alerts with vulnerability data; and the alert aggregation module, which reduces the volumes of redundant and isolated alerts belonging to the same attack activity within a particular time window for the different IDS products and present alerts in the form of meta alerts. With the experiments, we are able to demonstrate the effectiveness of the proposed approach. We are able to successfully reduce the unnecessary alerts from different IDS products. The contributions of this work are summarized as follows: • Development of an effective alert verification-based collaborative architecture to improve the quality of alerts raised by multiple IDS products. This includes the development of an enhanced alert verification module that is able to successfully validate the alerts produced by multiple IDS products, and development of an enhanced alert aggregation module that aggregates alerts of multiple IDS products in order to reduce the high number of redundant and isolated alerts after the alert verification process. The aggregation module is able to further reduce similar alerts generated by different IDS products. • Construction of comprehensive vulnerability assessment data in order to improve the accuracy and quality of alerts. The vulnerability assessment data is drawn from four sources: scan reports produced by different vulnerability scanners; popular known vulnerability databases such asCopyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem NGUYEN HOANGTU, LUO JIAWEI AND H. W. NJOGUCVE and OSVDB; reference details from different IDS products; and the details of network resources of a given network. The rest of the paper is organized as follows. Section 2 discusses related work. In Section 3 we discuss our proposed approach. Section 4 describes the experimental set-up and analysis of the results. Section 5 concludes the paper.2. RELATED WORK IDSs often do not generate alerts of high quality, for several reasons [4]. For example, IDSs use their default set of signatures and hence are prone to trigger alerts for most intrusions regardless of success or failure to exploit vulnerabilities in a given network. There are several existing approaches designed to improve the quality of alerts and they have their merits and demerits. For example, tuning and reconfiguring of the signature database may eliminate some of the false alerts but may not work well in a large network because it is difficult to have all sensors tuned to an acceptable false positive level. In addition, the process of tuning and reconfiguring databases requires extensive knowledge and experience of IDS signatures [3,7,8]. Moreover, the improper manipulation of signatures could lower detection rates, hence exposing critical network resources to risk, especially if the critical intrusions are not detected. Correcting all the known vulnerabilities before damaging intrusions take place in order to avoid alerts may not be a possible solution because some vulnerabilities are protocol based and thus an immediate patch may not be available [9]. Kruegel et al. [7] observed that most IDSs either do not either consider network context or do not fully understand what is being protected, and hence trigger volumes of meaningless alerts. In the field of alert management, the alert verification technique has been cited as a critical component in improving the quality of alerts. The underlying principle of alert verification is to filter out alerts that do have corresponding vulnerabilities in a given network, thereby improving the accuracy of alerts. It is argued that many of the false alerts can be reduced by focusing on the vulnerabilities of the protected network [7,8,10]. Valeur [1] notes that failure to exclude the alerts that refer to failed attacks may lead to false positive alerts being misinterpreted or given undue attention. The authors note that when any scheme receives false positives as input the quality of the results can be degraded significantly. Al- Mamory and Zhang [11] note that most of the general alert aggregation approaches (e.g. [12–17]) do not make full use of the information that is available on a given network. This means that the general alert correlation approaches usually rely on the basic information presented on alerts which may be inadequate and unreliable and may lead to poor correlation results. In fact, correlating alerts that refer to failed attacks can easily result in the detection of whole attack scenarios that are non- existent. It is crucial to integrate the network context information in alert correlation in order to identify the exact level of threat that the protected systems are facing. In this section, we describe several prominent works aimed at reducing unnecessary alerts. First we focus on the general alert verification-based approaches and later on the alert verification-based approaches that use collabora- tive architecture to improve the quality of alerts. In this section, we have compared our approach to other similar approaches while pointing out the key differences. An interesting approach to verify alerts is proposed by Kruegel et al. [7] in order to improve the false positive rate of IDSs. The approach has architecture for alert analysis and generation of a prioritized report for security analysts. Basically, the work introduces a plug-in for Snort to verify alerts. That is, the plug-in integrates the Nessus vulnerability scanner into the Snort. When an alert is raised, it is not immediately forwarded to the analyst but is passed to the verification engine. The underlying assumption of this approach is that every Snort signature comes with a unique identifier to check the presence of a corresponding vulnerability. The approach looks very promising but lacks useful additional information on host architecture, software and hardware to support better alert verification. Again, the approach does not offer a collaborative architecture to handle alerts from multiple IDS products. In addition, the approach does not address the issue of reducing the redundant and isolated alerts after alert verification. Some notable efforts have been done on aggregating alerts in order to have meta alerts; such works include Julisch [12], Valdes and Skinner [13], Debar and Wespi [14], Al-Mamory and Zhang [15],Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem N. HOANGTU, L. JIAWEI AND H. W. NJOGUSourour et al. [16] and Jan et al. [17]. These works have reported considerable progress in reducing unnecessary alerts. However, our approach employs an alert validation technique before the aggrega- tion of alerts is done. It is important to note that most of the approaches that aggregate alerts usually deal with non-validated alerts (unverified alerts). Ning et al. [18] propose an alert correlator on the basis of the observation that most attacks consist of several related stages, with the early stages preparing for the later ones. Hyper-alert correlation graphs are used to represent correlated alerts in an intuitive way. The approach is useful, but it is ineffective when the attackers use either a different or spoofed IP source address at each attack step. A possible correlation approach based on integration of Snort, Nessus and Bugtraq databases is proposed by Massicotte et al. [19]. It is based on the reference numbers (identifiers) found on Snort alerts in order to link signature vulnerabilities contained in common vulnerability exposure (CVE), Bugtraq and Nessus scripts. The approach appears promising in trying to show a possible correlation based on reference numbers, but it is not effective because not all alerts have reference numbers and there is no guarantee that the lists provided by CVE and Bugtraq contain a complete listing of all vulnerabilities. In addition, unlike our approach, the approach does not incorporate a collaborative architecture to handle alerts from multiple IDS products. Porras et al. [20] propose an alert filtering-based approach known as M-Correlator. It is based on knowledge of the network architecture and vulnerability requirements of different incident types in order to tag alerts with a relevance metric and then prioritize them accordingly. In this approach, any alerts representing attacks against non-existent vulnerabilities are discarded. The approach correlates security alerts produced by spatially distributed heterogeneous information security (INFOSEC) devices. The basic principle of this approach is that it takes into account the topology and operational objectives of the protected network when alerts are being correlated and finally evaluates the impact of alerts on the overall mission that a network infrastructure supports. There are two main processes in M-Correlator: correlation and aggregation. That work processes alerts from different sources such as IDS, firewalls and other devices, while our work deals with IDS alerts only. Unlike in our work, the authors do not propose a comprehensive vulnerability data model to fully integrate the vulnerability data into the network context. A formal treatment of the integration of context information with alert messages known as M2D2 is proposed by Morin et al. [10]. The underlying principle of the approach is to use reference numbers to locate which vulnerability it is and then verify its genuineness using other context information. The approach performs alert correlation using different types of information. Based on these formally defined concepts, the approach examines the relationship between them and performs alert correlation. M2D2 mainly focuses on alert aggregation while concepts such as building the attack scenarios are still to be explored. Like the aforementioned approaches, this approach suffers because not all alerts have reference numbers and there is no guarantee that the list provided by either CVE or Bugtraq contains a complete listing of all vulnerabilities. In a more recent work, Morin et al. [21] present a data model known as M4D4, which formalizes the concepts and relationships of different network elements (such as network topology, network resources, vulnerabilities and attacks) in order to reason about alerts. The authors observe that many alerts (especially false positives) involve actors that are inside the monitored information system and whose properties are consequently also observable. The two aforementioned approaches do not address the issue of redundant alerts after the alert verification process. Hubballi et al. [3] propose a false positive alert filter to reduce false alerts. The underlying principle of this approach is the construction of a threat profile of the network which is used for alert correlation. This approach compares the IDS alerts with network-specific threats and finally filters out the unnecessary alerts. Basically, the approach has two major steps: (i) alerts are compared with the threat profile in order to generate correlation binary vector generation; and (ii) classification of correlation binary vectors using neural networks. We wish to note that the idea of comparing the alerts with the threat profile (vulnerabilities) appears promising in improving the accuracy of the final alerts. Nevertheless, the approach does not incorporate a collaborative architecture to handle alerts from multiple IDS products. In addition, the approach does not address the issue of reducing the number of similar and repetitive alerts after verification.Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem NGUYEN HOANGTU, LUO JIAWEI AND H. W. NJOGUNumerous works have proposed to evaluate alerts using some metrics that are computed based on vulnerability data. For instance, Bakar and Belaton [22] propose an intrusion alert quality framework (IAQF) to improve the quality of final alerts. The underlying principle of this framework is the use of vulnerability information that forms the basis to compute alert metrics (such as accuracy, reliability, correctness and sensitivity). Such a metric prepares alerts for higher-level reasoning and thus better decision making. The proposed framework has the following modules: alert collection, host/network information gathering, quality criteria scores measurement and normalization. There are also storage databases that store different types of data, i.e. raw alerts, host/network data, quality criteria rules, and enhanced and enriched alert data. We note that the framework improves the alert quality before alert verification but does not take into consideration the issue of reducing the redundant and isolated alerts after alert verification. In addition, the approach mentions the concept of alert correlation but has not implemented it. Similarly, Njogu and Jiawei [23] propose an approach to reduce unnecessary alerts that uses vulnerability data to compute alert metrics. This approach computes the similarity of alerts based on the alert metrics. All the alerts that show close similarity are grouped together into one cluster. An extension of this work is proposed by the same authors [24] in order to reduce unnecessary alerts based on enhanced vulnerability assessment data. The authors use the vulnerability assessment data to filter out any alert with no corresponding alerts, thus improving the accuracy and quality of alerts. The aforementioned approaches have reported tremendous progress in reducing unnecessary alerts; however, the approaches do not incorporate an effective collaborative architecture to handle alerts from multiple IDS products. Our new work seeks to strengthen our previous work in four ways: the first is to improve the accuracy of alerts by incorporating reference details of multiple IDS products in the vulnerability data; second is the use of dedicated alert verifiers for every IDS product; third is the use of dedicated alert aggregators for every IDS product; and finally we propose an approach that can aggregate similar alerts generated by multiple IDS products. We noted that there are several approaches that use a collaborative technique in order to reduce unnecessary alerts. An interesting collaborative framework architecture known as TRINETR is proposed by Yu et al. [25]. The proposed scheme is used for multiple intrusion detection systems in order to work together to detect real-time network intrusions. The proposed architecture is composed of three core parts: (i) collaborative alert aggregation; (ii) knowledge-based alert evaluation; and (iii) alert correlation to cluster and merge alerts from multiple IDS products to achieve an indirect collaboration among them. We note that the approach has the ability to reduce false positives by integrating network and host system information into the evaluation process, but the approach has a major drawback. This approach has not implemented the alert correlation part, which is very important when generating condensed alert views known as meta alerts. A collaborative and systematic framework to correlate alerts from multiple IDSs by integrating vulnerability information is proposed by Liu et al. [26]. The basic principle of the approach is to apply contextual information to distinguish between successful and failed intrusion attempts. The approach assigns confidence values to the alerts immediately after alert verification. The confidence values are 0 (for false alert) and 1 (for true alert). The corresponding actions are triggered based on the confidence values. We note that the approach has some merits but does not provide details of the procedure used to validate the alerts and does not include details on how the alerts are transformed into meta alerts. The proposed scheme is not able to differentiate levels of alert relevance. In addition, the approach only processes alerts with reference numbers (alerts without reference number are not considered), thus lowering detection rates. Similarly, an approach to filter innocuous attacks that takes advantage of the correlation between IDS alerts and the threat profile is proposed by Michele et al. [9]. Among the core units of the proposed approach are filtering and ranking units. A similar but extended work is presented by the same authors. They propose a distributed architecture [27] to provide security analysts with selective and early warnings. According to this approach, alerts are ranked based on match or mismatch of alerts between the alert and the vulnerability assessment data. We regard this type of ranking to be restrictive because: first, it relies on software match or mismatch to determine whether an alert is important or unimportant; Secondly, it does not indicate the degree of relevance of alerts and hence does not offer much help to the analyst. Moreover, the two aforementioned approaches do not consider the issue of reducing redundant and isolated alerts after alert verification. In our work, we prioritize alerts into different levels according to their degree of interestingness.Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem N. HOANGTU, L. JIAWEI AND H. W. NJOGUA general correlation framework that includes a set of components such as alert fusion, multi-step correlation and alert prioritization is proposed by Valeur et al. [1]. The raw alerts are processed in these components. In this approach, the alerts are fused before they are verified. Later, the alerts are forwarded to the alert correlation and prioritization components. The authors observe that reduction of alerts is an important task in alert management and note that when any scheme receives false positives as input the quality of the results can be degraded significantly. The framework proposed in our work is inspired by logical framework of Valeur et al. There are several key differences. First, the authors use scanning reports to verify alerts, whereas our work uses up-to-date and comprehensive vulnerability data to validate alerts. Secondly, we verify alerts prior to the aggregation phase process because efforts to improve the quality of alert should start in the early stages of alert management. Thirdly, the framework proposed by the authors has several components that may contribute to negative performance of the framework, whereas our approach has fewer but robust components. Lastly, our work focuses not only validating alerts reported by multiple IDSs but also aggregates alerts of multiple IDS products. An architecture for automatic alert verification known as ATLANTIDES is proposed by Bolzoni et al. [28]. ATLANTIDES reduces false positives both in signature- and anomaly-based IDSs. It has an engine to correlate alerts with an output anomaly detector (OAD), which acts as an anomaly detector in the reverse channel. The underlying principle of the approach is that there should be an anomalous behavior seen in the reverse channel in the vicinity of the alert generation time. Therefore, if these two are compared a good guess about the attack corresponding to the alert can be determined. We note that the time window used to look for the alert correlation is very critical for correctness of the approach. Therefore, a very small time window may result in an increase of false positive alerts, hence affecting performance. In summary, we observed that most of the alert verification-based approaches described in this section are able to validate alerts successfully. However, there are several issues that need to be addressed, as evidently seen in this section. This section has revealed that most of the vulnerability- based approaches are able to produce alerts that are useful in the context of the network. However, these approaches are still at the preliminary stage and there are some research gaps that need to be addressed in order to produce better results. The act of validating alerts may not guarantee alerts of high quality because the validated alerts may contain huge volumes of redundant alerts, as evidently seen in this section. Also as seen in this section, most of the existing approaches hardly process alerts after alert verification; hence the need to consider the issue of reducing the huge number of redundant and isolated alerts after verification. In addition, the vulnerability-based approaches focus on improving the quality of alerts that are generated by one particular IDS product and therefore do not offer a collaborative architecture to handle alerts from multiple IDS products. There is a need to focus on how to reduce the unnecessary alerts generated by multiple IDS products and therefore advance the notion of collaborative architecture for different IDS products. Lastly, we noted that several approaches rely on incomplete data such as incomplete reference i.d.s and outdated vulnerability assessment data to validate alerts and hence are likely to give inaccurate results. As noted earlier, our proposed approach has an architecture that employs the following modules: an alert verification module that uses vulnerability assessment data to validate alerts; and an aggregation module to merge the alerts of multiple IDS products in order to reduce the high number of redundant and isolated alerts after alert verification process.3. THE PROPOSED APPROACH In this section we describe the proposed architecture in order to address the challenges of alert verification-based approaches. Basically, the proposed solution has two major components: alert verification module and alert aggregation module. The proposed approach collects raw alerts produced by multiple IDS products using the alert pre-processor unit as illustrated in Figure 1. The function of the pre-processor is to collect raw alerts and pre-process the alerts. Alert pre-processing involves converting the raw alerts into intrusion detection message exchange format (IDMEF) [29] and the extraction of important features of alerts such as IP and port numbers.Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem NGUYEN HOANGTU, LUO JIAWEI AND H. W. NJOGU Figure 1. The general architecture of the proposed alert management approach.The purpose of IDMEF is to standardize the formats of alerts generated by multiple IDS products, since different types of IDS products have different alert formats. The pre-processed alerts are forwarded to the alert verification module and then to the alert aggregation modules. In summary, the main function of alert verification is to validate the alerts in order to filter out those with no corresponding vulnerability, thus improving the accuracy and quality of final alerts. The main function of the alert aggregation module is to reduce the redundant and isolated alerts from different IDS products. Eventually, the final alerts are presented to analysts in the form of meta alerts. Figure 1 illustrates the proposed strategy.3.1. Enhanced alert verification module As mentioned earlier, the signature-based IDSs are run with their default configurations (signature database). In fact, these IDSs are not fully integrated with network resources and therefore do not check the relevance of an intrusion (reported in the alert) to the local network context. As a result, the IDSs may generate huge volumes of raw alerts, the majority of which are irrelevant alerts and are not useful in the context of the network. The alert verification module receives pre-processed alerts from the alert pre-processor. Unlike existing alert verification approaches [3,6,7,18,25] that depend on incomplete and incomprehensive vulnerability assessment data to process all the raw alerts, our work introduces comprehensive vulnerability assessment (CVA) data to improve the quality of alerts before they are forwarded to the security analysts. The enhanced alert verification module uses CVA data to validate the raw alerts produced by multiple IDS products. The verification module validates the raw alerts by measuring their similarity to corresponding vulnerabilities contained in CVA data, as illustrated in Figure 2. Both alert and vulnerabilities have comparable features and hence it is easy to measure the similarity. TheFigure 2. Alert verification module. Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem N. HOANGTU, L. JIAWEI AND H. W. NJOGUvalidation process helps to determine the seriousness of alerts with respect to the network under consideration. The uniqueness of our verification module is how we have constructed the CVA data, as described in Section 3.1.2. Unlike other approaches, our approach has implemented a simple but robust alert verification process. From the literature it is noted that appropriate referencing of vulnerabilities plays a critical role in ensuring an accurate alert verification process. Different IDS products refer to the vulnerabilities differently, hence the need to standardize the referencing of vulnerabilities. In the next section, we explore possible solutions, such as the open source vulnerability database (OSVDB), to assist in referencing the vulnerabilities. The details of how a comprehensive list of vulnerabilities is derived from various sources of vulnerabilities are discussed in the next section. In order to improve the performance of alert verification, as mentioned earlier, the alert verification module has dedicated alert sub verifiers for different classes of attack. The scope of the types of attack considered in this work is: DoS, Telnet, FTP, MySQL and SQL. The alert verifier has six sub verifiers, as follows (see Figure 3): • DoS alert sub verifier: validates alerts reporting DoS attacks. • Telnet alert sub verifier: validates alerts reporting Telnet attacks. • FTP alert sub verifier: validates alerts reporting FTP attacks. • SQL alert sub verifier: validates alerts reporting SQL attacks. • MySQL alert sub verifier: validates alerts reporting MySQL attacks. • Undefined alert sub verifier: validates alerts reporting attacks that have not been considered during the design of alert verification module (handles related alerts reporting new attacks and not included in the above five classes). The proposed work introduces multiple alert sub verifiers to handle alerts regardless of the IDS product employed in the network. The main reason for using multiple alert verifiers in our approach is to improve the alert verification process, as shown in Figure 3. Another key benefit of using multiple sub verifiers is the ease of deployment in relatively large networks, especially where multiple products are deployed. As mentioned earlier, the main goal of the existing alert verification-based approaches as seen in Section 2 is to validate alerts from one particular IDS product. Unlike the existing approaches, the principal function of our alert verification module is to improve the accuracy of alerts (produced by multiple IDS products) by validating the alerts with vulnerability assessment data. The alert verification module validates alerts by measuring their similarity with respect to the network in order to determine the importance of alerts, as illustrated in Figure 4. The alert verification process basically involves measuring the match or mismatch of features between alerts, as shown in Table 1, and vulnerability (vulnerability assessment data shown in Table 2)Figure 3. Sub verifiers of alert verification module. Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem NGUYEN HOANGTU, LUO JIAWEI AND H. W. NJOGU Figure 4. Comparing the features of alerts and vulnerability assessment data.since both the alert and vulnerability assessment data have comparable features (refer to Figure 4). The vulnerability assessment data represent potential threats likely to be exploited by attackers and therefore the alert verification module improves the accuracy of alerts. The underlying assumption of the alert verification process is that, the higher the number of matches (alert relevance score), the higher is the likelihood of a successful intrusion. The alert relevance score used in our approach helps to separate false alerts from true alerts. It is worth noting that we performed a series of regrouping of alerts on the scale with the goal of searching for the best thresholds for the true and false alerts, as shown in Table 3. We categorize the alerts into three groups depending on the alert relevance score. As shown in the same table, the ideal relevant alert represents the most interesting alert, partial relevant alert represents the average interesting alert, while the non-relevant alert represents an alert that is not worth further investigation. We settled on the thresholds shown in the table but these values can be adjusted depending on the network environment. 3.1.1. Enhancing the semantics of alerts Generally, the features of any alert provide low-level information; hence relying solely on this information may increase cases of important alerts being misinterpreted, ignored or delayed when they are being evaluated. In order to improve the semantics of alerts, the alert verification module places alerts into different categories based on the alert relevance scores as shown in Table 3. In a nutshell, the alert relevance is based on similarity of alerts and their corresponding vulnerabilities. Figure 4 shows how alerts are compared with vulnerability assessment data. 3.1.2. Construction of comprehensive vulnerability assessment data (CVA data) Existing alert verification-based approaches [3,7,18,25] build vulnerability data that appear effective in validating the alerts from one IDS product. The existing approaches construct vulnerability assessment data from sources such as known vulnerability data and network resources and do not consider the aspect of including additional reference information from multiple IDS products. In our work, we have constructed CVA data in order to improve the quality of alerts that are generated by different IDS products. In brief, CVA data are comprehensive and effective data that represent the threat profile of the network. It lists all network-specific vulnerabilities by their reference i.d., name, priority, IP address, port, protocol, class, time and applications. We use the threat profile generator to construct CVA data in order to identify the relevant vulnerabilities representing actual current threats to the network. We design and implement a comprehensive vulnerability assessment data using the threat profile generator, as shown in Figure 5.Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem N. HOANGTU, L. JIAWEI AND H. W. NJOGU Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem Table 1. Pre-processed alert snapshot Priority IDS Product Reference i.d.* Name* IPdest* *Portdest IPsrc Portsrc (severity)* Protocol* Class* Time* Application* Snort CVE-1999-0001 Teardrop 192.168.1.3 1238 192.168.1.1 1238 High TCP DoS 8:9:2012:14:16:02 Windows Shoki CVE-1999-0527 Telnet resolve 192.168.1.2 1026 192.168.1.1 1026 Medium TCP U2R (Telnet) 8:9:2012:14:16:02 Telnet, Windows host conf Snort CVE-2011-1965 Buffer 192.168.1.4 80 192.168.1.1 80 Medium TCP DoS 8:9:2012:14:16:02 Windows overflow Key: *used in alert verification. NGUYEN HOANGTU, LUO JIAWEI AND H. W. NJOGU Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem Table 2. CVA data snapshot Reference i.d. Name IP address Port Priority(severity) Protocol Class Time Application CVE-1999-0001 Teardrop 192.168.1.3 1238 High TCP DoS 8:9:2012:14:16:02 Windows CVE-1999-0527 Telnet resolve host conf 192.168.1.2 1026 Medium TCP U2R (Telnet) 8:9:2012:14:16:02 Telnet, Windows CVE - 2011-1965 Buffer overflow 192.168.1.4 80 Medium TCP DoS 8:9:2012:14:16:02 Windows N. HOANGTU, L. JIAWEI AND H. W. NJOGU Table 3. Alert relevance score scale Threshold Alert type 7-9 Ideal relevant alerts 4-6 Partial relevant alerts 0-3 Non-relevant alerts Figure 5. Construction of comprehensive vulnerability assessment dataUnlike other approaches described in Section 2, the generator of the proposed approach draws data from four sources: scan reports produced by different vulnerability scanners; popular known vulnerability databases such as CVE [30] and OSVDB [31]; reference details from different IDS products; and the details of network resources of a given network. The generator establishes appropri- ate relationships in the data drawn from the above sources. As noted earlier, the appropriate referencing of vulnerabilities plays a critical role in ensuring accuracy in the alert verification process. In the literature it is reported that different IDS products refer to the vulnerabilities differently; hence the need to standardize the referencing of vulnerabilities. In order to address this issue, researchers have proposed possible solutions such as the OSVDB to assist in referencing the vulnerabilities. Such solutions may address this issue to some extent but may not contain a comprehensive list of reference details of all the vulnerabilities. This makes it difficult to have an independent alert verification based on one vulnerability database. Therefore, in our approach, we complement the OSVDB database with reference details from multiple IDS products; hence the need for a verifier for each product. We have automated the process of picking non-referenced vulnerabilities from different IDSs in order to build comprehensive CVA data. Additional information on building the CVA data is given in Section 4. In the literature it is documented that dealing with attack identifiers may pose a challenge, especially when dealing with alerts containing different names from those known by the vulnerability databases. The IDS vendors may use different names for the same attack and so far they have not come to an agreement on standardizing the naming of alerts. In order to address this challenge, we include the aspect of how different IDS products reference the vulnerability (such as reference i.d.s) into the CVA data in order to have a more accurate and comprehensive list of vulnerabilities. In addition,Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem NGUYEN HOANGTU, LUO JIAWEI AND H. W. NJOGUthe alert verifiers are trained adequately using CVA data on how to perform the alert verification with high accuracy. CVA data are so exhaustive and comprehensive that they permit the completeness of the attack identifiers for our test bed. Steps of alert validation. One of the functions of the alert verification module is to assign each alert to the attack class that might have produced it. The alert verifier makes this decision based on their attack identifier field (class of attack) in the alert. To facilitate the verification process, there are six sub verifiers that are used to process alerts regardless of the IDS products. The procedure for validating alerts using the IDS Alert-CVA data verifier is illustrated below: Input: pre-processed alerts, CVA data Output: validated alerts (with alert relevance score) Step 1: compare features of pre-processed alert with the corresponding vulnerabilities in CVA data in this order: • IP • ReferenceId • Port • Time • Application • Protocol • Class • Name • Priority Step 2: search for corresponding vulnerabilities from CVA data. Get Alert.IPdest to extract the corresponding vulnerabilities in CVA data (CVA data.IP that match Alert.IPdest). Step 3: compute alert relevance score for every corresponding vulnerability and choose the one with the highest score.• If Alert. IPdest match CVA data.IP Then IP_Score=1 Else IP_Score=0 • If Alert.ReferenceId match CVA data. ReferenceId Then Reference_Score=1 Else Reference_Score=0 • If Alert.Portdest match CVA data.Port Then Port_Score=1 Else Port_Score=0 • If Alert.Time is greater or equal to CVA data.Time Then Time_Score=1 Else Time_Score=0 • If Alert.Application match CVA data.Application Then Application _Score=1 Else Application _Score=0 • If Alert.Protocol match CVA data.Protocol Then Protocol_Score=1 Else Protocol _Score=0 • If Alert.Class match CVA data.Class Then Class_Score=1 Else Class_Score=0 • If Alert.Name match CVA data.Name Then Name_Score=1 Else Name _Score=0 • If Alert.Priority match CVA data.Priority Then Priority _Score=1 Else Priority _Score=0 • Alert Relevance Score =IP_Score + Reference_Score + Port_Score + Time_Score + Application_Score + Protocol_Score + Class_Score + Name _Score+ Priority _ScoreStep 4: categorize the alerts based on alert relevance score (see Table 3). Step 5: alerts with scores of 4 and above (both ideal and partial relevant alerts) are forwarded to the alert aggregator. Step 6: delete alerts with scores of 3 and below (alerts with little or no similarity with CVA data) Note: For simplicity reasons, any match is awarded a value of 1, while a mismatch is awarded a value of 0. To illustrate the concept of alert verification and how the alert metrics are computed, consider the following example. Table 1 shows a snapshot of pre-processed alerts. Table 2 shows a section of CVA data. In Table 1 the first alert reports a teardrop attack targeting a host with IPdest 192.168.1.3 on Portdest 1238. Using the alert IPdest 192.168.1.3, the alert verifier extracts all potential vulnerabil- ities in CVA data matching the IP address. In this case, CVA data have only one vulnerability matching the alert IPdest. The verifier then matches other features (reference i.d., port, severity,Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem N. HOANGTU, L. JIAWEI AND H. W. NJOGUprotocol, class, time, name and application) of the alert against the said vulnerability. The alert score is computed as follows: IP_Score = 1 because IP addresses are matching and so are the rest of the features; Reference_ Score = 1; Port_Score = 1; Time_Score = 1; Application_Score = 1; Protocol_Score = 1; Class_Score = 1; Name_Score = 1; Priority_Score = 1. Therefore, the alert score (total number of matches) is 9. The verifier enriches the alert by categorizing the alert as an ideal relevant alert because the alert has an alert relevance score = 9. In summary, unlike other alert verification-based approaches [3,6,18,25] which focus on improv- ing the quality of alerts generated by one IDS product, the proposed alert verification module is designed with the goal of improving the quality of alerts generated by multiple IDS products. The other key difference between our work and related studies is how the vulnerability data are constructed. We include the attack reference details of multiple IDS products in order to build comprehensive data.3.2. Alert aggregation module The literature has documented that the trend of the multi-step intrusions is on the rise and therefore contributing to the high number of unmanageable redundant alerts. Actually, a single intrusion can generate several alerts with common features. Generally, the analysis of single redundant alerts may provide partial information on the attack and hence is not valuable in uncovering the real patterns of the attacks. In order to address this issue, there is a need to aggregate the individual redundant and isolated alerts representing every step of the attack to have an overall view of attacks for the different IDS products. The trend of large modern networks is to implement multiple IDS products, especially where the network is viewed to be prone to attacks and has valuable information. The era of one IDS product monitoring a network segment is fading, especially where the information is viewed as important. We designed the alert aggregation module with the goal of reducing the volumes of redundant alerts belonging to the same attack activity within a particular time window. The alert aggregation module receives validated alerts (unrelated, isolated and redundant alerts) from the alert verification module. As shown in Figure 6, the module has different sub aggregators for different classes of attacks. We used different dedicated sub aggregators in order to simplify the process of alert aggregation, representing different classes of attacks. These sub aggregators are adjusted dynamically and automatically according to each class of attack.Figure 6. Alert aggregation module Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem NGUYEN HOANGTU, LUO JIAWEI AND H. W. NJOGUThe scope of the attacks considered in this work includes DoS, Telnet, FTP, MySQL and SQL. The aggregator has six sub aggregators, as shown in Figure 7: • DoS alert sub aggregator: aggregates alerts reporting DoS attacks. • Telnet alert sub aggregator: aggregates alerts reporting Telnet attacks. • FTP alert sub aggregator: aggregates alerts reporting FTP attacks. • SQL alert sub aggregator: aggregates alerts reporting SQL attacks. • MySQL alert sub aggregator: aggregates alerts reporting MySQL attacks. • Undefined alert sub aggregator: aggregates alerts reporting attacks that have not been considered during the design of the alert aggregation module (i.e. processes redundant alerts reporting new attacks and not covered in the other classes). The underlying principle of the alert aggregation module is simple. Each alert sub verifier of a given class of attack forwards alerts to the corresponding sub aggregator of that specific class of attack. To further illustrate this, a DoS alert sub verifier forwards its alerts to DoS alert sub aggregator. We wish to note that the final alerts of the alert aggregation module are in the form of meta alerts (meta alerts contain summarized information of related alerts of a given intrusion). Some definitions used for the variables of each meta alert M are as follows: • Attack class of meta class (M.Attack_Class): denotes the class of attack of alerts. • Relevance of meta alert (M.Rel): denotes the relevance of alerts. • Create time of meta alert (M.createtime): denotes the create time of the meta alert and is derived from the stamp time of the first alert that created the meta alert. • Total alerts (M.Nbrealerts): denotes the number of alerts contained in the meta alert. This number is increased each time a new alert is fused to meta alert. • Update time (M.updatetime): denotes the time of the last update in the meta alert and represents the time of the most recent alert fused to the meta alert. • Non-update time (M.non-updatetime): denotes the time elapsed since the last update made on the meta alert. The time is reset to 0 each time a new alert is fused to the meta alert. • Stop time (M.stoptime): denotes the time when the meta alert is assumed to have completely fused the necessary alerts. • Time out (M.Timeout): denotes the time that a meta alert should continue waiting for additional alert (s); it varies with the class of attack. • Exploit cycle time (ECtime): denotes the time that an exploit is believed to take (attack period); it varies with the class of attack.Figure 7. Sub aggregators of alert aggregator Copyright © 2014 John Wiley & Sons, Ltd. Int. J. Network Mgmt (2014) DOI: 10.1002/nem N. HOANGTU, L. JIAWEI AND H. W. NJOGU• Additional information such as source address (M.IPsrc and M.Portsrc) and destination address (M.IPdest and M.Portdest). The procedure of aggregating alerts is described as follows. The sub aggregator of a particular class of attack uses the IP address(es) of a given validated alert to identify the potential meta alerts. The sub aggregator measures the similarity of the validated alert and the existing meta alerts in order to identify the best meta alert representing the alert. In this work, we only consider and finally fuse an alert that fully corresponds (all overlapping features are equal: IP, port, time) to an existing meta. This means that if an alert fully corresponds to a given meta alert then details of the meta alert are updated (M.Nbrealerts is incremented, M.updatetime is updated and M. non-updatetime is reset to 0). We wish to note that only one meta alert can be chosen from potential meta alerts and an alert can belong to only one meta alert. However, if the meta alert does not exist that fully corresponds to the alert being considered, then a new meta alert is created. The new meta alert draws basic details from that alert, i.e. M.Nbrealerts is set to 1, the timestamp of the alert becomes the M.createtime and class of attack of alert becomes M.Attack_Class. It is very important to know how long an existing meta alert should remain active in order to deliver timely and accurate meta alerts to the analysts. To determine the activeness of meta alerts, we used the following policy: • If M.non-updatetime