Cemerlic - Network Intrusion Detection Based On Bayesian Networks

Network Intrusion Detection Based on Bayesian Networks
Alma Cemerlic, Li Yang, Joseph M. Kizza Department of Computer Science and Engineering University of Tennessee at Chattanooga Chattanooga, TN 37403 Alma-Cemerlic@utc.edu, Li-Yang@utc.edu, Joseph-Kizza@utc.edu
Abstract
Intrusion detection has drawn much attention in the past two decades. Signature analysis and statistical anomaly detection are two typical methods to identify network security breaches. Signature analysis requires access to a large database of known intrusion signatures and a way to match current behavior against the signatures to detect intrusions in progress. The limitation of this approach lies in its dependence on frequent updates of the signature database and its inability to generalize and detect novel intrusions. Anomaly detection methods can detect attacks based on statistical probability, which allows for generalization and helps in detection of novel attacks. However, statistical anomaly detection is not based on an adaptive intelligent model and cannot learn from normal and malicious traffic patterns. We propose an adaptive network intrusion detection using a Bayesian network, trained with a mixed dataset containing real-world and DARPA dataset traffic. Our Intrusion Detection System (IDS) model is designed to detect novel attacks. We use features of network connections to parameterize the system. The DARPA dataset and realworld traffic are used to measure the feasibility and effectiveness of our system. The network connections that are confirmed to be novel intrusions are added to the training dataset to re-train our IDS, thus enhancing our systems ability to detect future intrusions. Keywords: Intrusion detection, Bayesian network update the database of intrusion signatures and the inability to generalize and detect novel intrusions. In addition to these drawbacks, even if a new attack is discovered and its signature determined, there is often a substantial latency in the update of the signature databases for IDSs across networks. Anomaly detection techniques based on statistics, such as IDES [7], send an alarm when they detect an event that deviates from the behavior defined as normal. The observed network traffic is compared to profiles of normal network use. Statistical anomaly detection has no intelligent learning model which may lead to a high rate of false alarms. This happens primarily because previously unseen (yet legitimate) system behaviors may also be recognized as anomalies and hence flagged as potential intrusions. All these limitations have led to an increasing interest in intrusion detection systems based on data mining. Several researchers have been interested in developing IDSs using a generalization learning model. Axelsson et al. [12] employ Bayesian inference steps with transition models between inferences to assess whether a particular burst of traffic contains an attack. Kruegel et al. [10] proposed a model which simulates an intelligent attacker using Bayesian techniques to create a plan of goal-directed actions. This study also proposes an event classification scheme based on Bayesian networks. The advantage of Bayesian networks is in that they improve the aggregation of different model outputs and allow one to seamlessly incorporate additional information into an already existing model. Johansen et al. [11] believe that a Bayesian system provides a solid mathematics foundation to simplify a seemingly difficult and monstrous problem that todays IDS implementations fail to solve. They added that Bayesian network IDS should differentiate between attacks and the normal network activity by comparing metrics of each network traffic sample. We propose to develop an adaptive network intrusion detection system using a Bayesian network (BN), trained with a mixed dataset containing real-world and DARPA dataset traffic, aiming to detect novel intrusions with low number of false alarms. Our proposed IDS model is able to parse real-world traffic and identify network attacks including novel attacks that the system has not previously encountered. A BN is used to build an automatic intrusion detection model and signal an intrusion when a suspicious activity is noticed.
1. INTRODUCTION
Today, a large amount of sensitive information is processed through computer networks, thus it is increasingly important to make information systems, especially those used for critical functions in the military and commercial sectors, resistant and tolerant to network intrusions. An intrusion can be defined [9] as an attempt to gain unauthorized access to network resources. As the number of newly discovered vulnerabilities per year increases and as hacker tools become more advanced and automated, intrusion prevention techniques alone are not sufficient. Today, Intrusion Detection Systems (IDSs) are necessary for effective computer system protection. An intrusion can be detected using either signature-based detection or anomalybased detection. Signature-based analysis [8] as an intrusion detection technique requires a database of signatures of known intrusions in order to be able to detect attacks. The key advantage of signature detection techniques is in their high degree of accuracy in detecting known attacks and their variations. The main drawbacks are the need to frequently
2. FRAMEWORK OF AN ADAPTIVE IDS

The architecture of our proposed intrusion detection system consists of six modules (Fig. 1). The Data gathering (sensors) and parsing module is responsible for collecting data from the monitored network and parsing them into connections. A connection is equivalent to a session between two hosts on a network, and it is composed of all the observed packets that the hosts exchanged. The Bayesian Network Inference module is the analysis engine of the IDS responsible for processing the data collected from the sensors. The Knowledge base contains an intelligent model (Bayesian network) which learns from observed traffic and has the ability to predict whether a network connection is an attack. The System configuration provides information about the current state of the IDS. The Response component initiates actions when an intrusion is detected. The responses can either be automated (active) or involve human interaction (inactive). The Bayesian Network Learning module is used to build up knowledge from the offline training dataset.
3 BAYESIAN NETWORK
A Bayesian network is a graphic representation of the joint probability distribution function over a set of variables. The network structure is represented as a Directed Acyclic Graph (DAG) in which each node corresponds to a random variable and each edge indicates a dependent relationship between connected variables. Each variable (node) in a BN is associated with a Conditional Probability Table (CPT), which enumerates the conditional probabilities for this variable given all the combinations of its parents values [2]. Therefore, for a BN, the DAG captures causal relationships among random variables, and CPTs quantify these relationships. Since individual events in an attack can be represented as nodes and the causal relations between events can be modeled as edges in Bayesian networks, we use a BN as our inference model. A BN model is capable of learning causal relationships from an existing dataset and predicting the consequences of an intervention in the problem domain. A BN is an ideal model for combining prior knowledge with new data and inferring posterior knowledge. In order to learn the structure and test our proposed BN with datasets, we use Netica and Genie, the tools for modeling BNs.
3.1 Learning Algorithm

In our model, we use the K2 learning algorithm. The algorithm defines a set of variables of interest to build a directed acyclic graph (DAG) based on the calculation of a local score [6]. K2 is initialized with a single node, and it continues to incrementally add connections with other nodes as long as they increase the whole probability of the network structure. We use the following network connection features ordered according to the relevance analysis in [4]: protocol_type, sevice, num_wrong_fragments, land, logged_in, num_failed_login, root_shell, is_guest_login, and type.
Fig. 1 Bayesian Network-based IDS architecture Overall, our proposed framework consists of a training component and a detection component. We use a training dataset to parameterize the IDS. The DARPA dataset and real-world traffic are used to measure the feasibility and effectiveness of our system. Given the training dataset, the training component estimates the parameters of the Bayesian model in Step 0 in Fig. 1. The network traffic is gathered and parsed into application layer network connections in Step 1. The Bayesian model then considers both the network connections and the system configuration to infer the probability that the network is under attack. The Bayesian model is built based on a learning algorithm using training data as the knowledge base, shown in Step 2. A network connection recognized as an intrusion will trigger an IDS response. A never-before-seen network connection is marked as suspicious if it is classified as intrusive by our IDS system. If the suspicious connection is confirmed to be an intrusion by a network administrator, it is added to the training dataset to re-train our adaptive IDS, as shown in Step 4. This process enhances our systems ability to detect future intrusions.
3.2 Inference Algorithm

For the inference in our model, we use the Junction Tree Algorithm [5]. The idea behind this procedure is to construct a data structure called a junction tree which can be used to calculate any query through message passing on the tree. To build a junction tree, we first choose an ordering of the nodes and use node elimination to obtain a set of elimination cliques. A complete cluster graph is then built over the maximal elimination cliques. Each edge {B, C} is weighted by |BC| to compute a maximum-weight spanning tree. This spanning tree is a junction tree. 4. INTRUSION DETECTION TESTING DATASET We use two different datasets to test our proposed IDS model, namely the DARPA dataset and the real network traffic collected in our security lab.
4.1 DARPA Dataset

The DARPA intrusion detection evaluation dataset [1] from MIT Lincoln Lab is used to train and test our IDS. The dataset was collected from a simulation of a fictitious military network over the period of seven weeks.
Before feeding the data to the Bayesian network, for either learning or testing, raw network traffic has to be preprocessed and summarized into connections or high-level events. Each connection is described with a set of features. The DARPA KDD 99 dataset summarized DARPA 98 Lincoln Lab network traffic into connections with 41features per connection. We define a connection as a sequence of TCP packets starting and ending at some well defined points in time, between which data flows from a source IP address to a target IP address under some well defined protocol. In our model, we use 9 of the 41 features, namely protocol_type, service, num_of_wrong_fragments, num_of_failed_logins, land ,login_success, is_guest_login, root_shell_obtained, and type (intrusion or normal connection). Netwotk intrusions are classified into four categories [1]: user-to-root (u2r), remote-to-local (r2l), denial-of-service (DoS), and probe. The u2r attack occurs when attackers who have local access to the victim machines try to gain superuser privileges. The r2l attack happens when attackers who have no account on the victim machine try to gain access. The DoS attack occurs when attackers try to prevent legitimate users from using a service available on the network. The goal of a probe attack is to gain information about the target host. In the case of signature-based IDSs, the recency of the data in the signature database is crucial. In our case, the recency of the dataset is not significant because our model is an anomaly detector that needs no specific knowledge about attacks. Thus, the DARPA dataset is still viable for testing our model [13]. We used the labeled training dataset to train our Bayesian model, and the testing dataset to test for the correct discovery of intrusions. The Bayesian network used in our IDS model is shown in Fig. 2.
attack. For example, NetpwPathCanonicalize Overflow in Microsoft Server Service exploits a stack overflow in the NetApi32 CanonicalizePathName function using a Remote Procedure Call (RPC) call in the Server Service. On certain Windows platforms, even if unsuccessful, this attack can cause termination of all SMB-related services or a system reboot, and thus is classified as a DoS attack. As an addition to the set of probe attacks, we collected a footprint of a UDP service sweeper, a tool designed to detect common UDP services available on the target host.
Fig. 2 Learned Bayesian Network for IDS
5. EXPERIMENT SETUP
Our experiment consists of two phases. In the first phase, we use the DARPA training and testing datasets to train and test our Bayesian model respectively. In the second phase, we capture the real network traffic to further test the system. The traffic features relevant to our IDS are associated with each network connection rather than with each individual packet. This results in a faster Bayesian network training and a faster classification of incoming connections as either normal or intrusion. We monitor nine features for each connection. They are: protocol_type, sevice, num_wrong_fragments, land, logged_in, num_failed_login, root_shell, is_guest_login, and type. The last listed feature is used to label a specific connection as either normal or an attack for the purpose of training and testing. The initial learning and testing data sets are composed of labeled DARPA 98 dataset records. The real-world network traffic serves to test the systems ability to recognize never-before-seen attacks. The traffic collected in the tcpdump format is preprocessed by a custom parser which first groups the packets into connections, then extracts connection-specific features we use in our model. Eight of the features are extracted from packet headers and payload, while the nineth feature, type, is added by hand.
4.2 Customized Dataset with Novel Intrusions

After learning the BN model and testing it using the DARPA dataset, we created a custom dataset to measure the capability of BN model in detection of never-before-seen attacks. The custom dataset contains both attacks and normal traffic. The attacks are collected through repeating the vulnerability exploits available in the Metasploit 3 framework [3]. The traffic containing these attacks is recorded in the form of tcpdump files. We chose exploits in a way that they represent all four general categories of attacks: DoS, r2l, u2r, and probe [1]. A type of attacks known as buffer overflow (BoF) attacks can be used to gain the root access on the victim system. Depending on the targeted platform, a buffer overflow attack can be executed as u2r or r2l. For instance, the Microsoft Plug and Play Service Overflow, which exploits the plug and play service used by the operating system to detect new hardware, is an example of a buffer overflow attack that on certain platforms requires local access to be successfully completed, while on others can be executed remotely. Additionally, under certain circumstances, buffer overflow attacks can result in a DoS
Once the structure is learned and our Bayesian network is trained, the network is able to examine any input given in the correct format and label each connection as either normal or an intrusion. Those connections that do not fall in either class are labeled as intrusions, since they may be novel attacks.They are included in the learning dataset and used to retrain and improve the network. However, prior to adding the potential novel intrusions to the learning dataset, their classification needs to be inspected for correctness before they are able to affect the BN structure.
attacks. The detection rate was increased after the IDS model was retrained by a dataset that included the correctly labeled real-world traffic. Since the optimal node ordering with respect to the topology of Bayesian network is NP-hard, and the Bayesian network trained in the standard way does not perform to a satisfactory level, we plan to locally optimize the Bayesian network to improve the effectiveness of our IDS system. We will also add events from the system architecture level (such as CPU utilization) to the application level connection features that we currently use.
Acknowledgements: supported in part by Tennessee Higher Education Commission's Center of Excellence in Applied Computational Science and Engineering under R04-1330-023.
6. EXPERIMENT EVALUATION
In order to have a good prediction performance, an IDS should be able to correctly differentiate between intrusions and legitimate actions in a system environment. Typical features for evaluating predictive performance of IDSs include true positive (TP) rate (detection rate) and false positive (FP) rate shown in Table 1. True positive rate is the ratio of the number of correctly detected attacks and the total number of attacks, and false positive rate is the ratio of the number of normal connections that are incorrectly classified as attacks and the total number of normal connections. The performance analysis of our IDS given in Table 1 is reported on the 50% cutoff line, which means the Bayesian network classified an event as an intrusion only if its belief was higher than 50%. If we choose to lower this boundary, the percent of TP will rise, but also will the percent of FP. Our experimental results for the DARPA datasets are as follows: True negative rate correctly recognizing the normal connection is 93.89%. True positive rate correctly determining intrusions is 97.88%.The error rate is 2.881%, which means that in 2.881% of cases the network predicted a wrong value, where the predicted value is the one that had a higher belief value.
Actual Normal Connection Actual Intrusions (Attacks) Predicted Normal True Negative 93.89% False Negative 1.45% Predicted Intrusions False Positive 6.11% True Positive 97.88%
8. REFERENCES
[1] DARPA. Knowledge Discovery in Databases, 1999. DARPA archive. http://www.kdd.ics.uci.edu/databases/kddcup99/task.htm [2] F. Jesen. Bayesian Networks and Decision Graphs. Springer, New York, USA, 2001. [3] The Metasploit framework: http://www.metasploit.com [4] H. G. Kayacik, A. N. Zincir-Heywood, M. I. Heywood. Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99. Proceeding of third annual conference on privacy, security and trust (PST), New Brunswick, Canada. Oct. 2005. [5] F. Jemili, M. Zaghdoud, M. Ben Ahmed. A Framework for an Adaptive Intrusion Detection System using Bayesian Network. ISI IEEE, 2007. [6] D. Barber. Machine Learning: A Probabilistic Approach, pg.107, 2007. [7] T. Lunt, A.Tamaru, F. Gilhan, R.Jaganathan, P. Neumann, H. Javitz, A. Valdes, and T. Garvey. A real-time intrustion detection expert system (IDES). Technical report, Computer Science Laboratory, SRI International, Menlo, Park, California, Feb. 1992. [8] K. Ilgun, R. A. Kemmerer, P.A. Porras. State transition analysis: A rule-based intrusion detection approach. IEEE Transaction on Software Engineering, 21(3): pg. 181-199, Mar. 1995. [9] R. Heady, G. Luger, A. Maccabe, M. Servilla. The architecture of a network level intrusion detection system. Technical report, Computer Science Department, University of New Mexico, Aug. 1990. [10] C. Kruegel, D. Mutz, W. Robertson, F. Valeur. Bayesian event classification for intrusion detection. Proceedings of the 19th Annual Computer Security Applications Conference, Las Vegas, NE, Dec. 2003. [11] K. Johansen, S. Lee. Network Security: Bayesian Network Intrusion Detection (BINDS) May, 2003. [12] S. Axelsson. The base-rate fallacy and the difficulty of intrusion detection. ACM Transaction of Information System Security 3, 3 (Aug. 2000), pg. 186-205. [13] M. Mahoney, P.K. Chan. An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection. Recent advances in intrusion detection: 6th international symposium. RAID, pg. 220-237.
Table 1. Evaluation of Intrusion Detection When trained only by the DARPA training dataset, our IDS model indicated the presence of new intrusions in realworld testing dataset by conflicting evidences. Only the CAN-2003-0003 exploit and the UDP sweep were correctly detected. The reason is that the DARPA training dataset is much simpler than real-world traffic. The model trained by the DARPA dataset has limited capability to detect realworld normal traffic and attacks. Our solution is to mix the real-world training dataset with the DARPA dataset to train our IDS model again. After re-training, the model is able to correctly recognize the malicious connections it earlier was not able to.
7. CONCLUSION AND FUTURE WORKS

We developed an adaptive anomaly-based IDS to detect unknown attacks. This IDS has been tested with both the DARPA dataset and a real network traffic containing novel

Cemerlic - Network Intrusion Detection Based On Bayesian Networks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cemerlic - Network Intrusion Detection Based On Bayesian Networks

Uploaded by

Copyright:

Available Formats

Network Intrusion Detection Based on Bayesian Networks

2. FRAMEWORK OF AN ADAPTIVE IDS

3.1 Learning Algorithm

3.2 Inference Algorithm

4.1 DARPA Dataset

Fig. 2 Learned Bayesian Network for IDS

4.2 Customized Dataset with Novel Intrusions

7. CONCLUSION AND FUTURE WORKS

You might also like