Boosted Modified Probabilistic Neural Network (BMPNN) for Network Intrusion Detection
Tich Phuoc Tran, Tony Jan, and Lee Coulson
The Internet provides convenient benefits such as shortening effective geographical distances, enabling wide communication, and efficiently sharing information. However, the Internet poses many possible problems including intruders compromising communication systems. Different defence techniques have been devised and implemented to protect communication computer networks.
An emerging technology in this field is an Intrusion Detection System (IDS). An IDS is defined as a protective system that monitors network traffic and computing usage behaviour to detect any suspicious activity that may compromise the networks . Because cyber attacks inevitably manifest themselves in the host audit data and/or in the network traffic data , an IDS can be used to monitor and analyse data from those sources to detect potential attacks. Current IDS suffer several limitations such as low accuracy and high false alarm rates for rare and complicated attacks. Moreover, existing AI or decision algorithms used in Intrusion Detection are far from providing acceptable solution for large and complex communication networks. In reality, a sufficient load of data would be needed to obtain high accuracy and low variance for a learning model. However, this requirement is normally associated with high computational cost.
This paper proposes an algorithm called Boosted Modified Probabilistic Neural Network (BMPNN) that provides an acceptable trade off for learning bias, generalization variance and computation requirement. The effectiveness of this proposal is proven in network intrusion detection test. In order to compare the performance of the model against other existing models in fair manner, a benchmark dataset from The Knowledge Discovery and Data Mining competition has been applied.
A. Intrusion Detection System (IDS)
There is a large number of IDS available on the market to complement traditional firewalls and other defence techniques. There are several types of IDS.
Firstly, considering the analysis techniques, IDS could be categorized as either a misuse-detection or an anomaly-detection IDS. Popular IDS in the current market are mainly signature-based misuse detection systems in which the detection process involves monitoring the network, or system resources, and then comparing the observed data from the sensors to a database of known signatures of adversarial behaviours .
The misuse IDS needs to keep this database up-to-date to cope with the explosive growth of new intrusion techniques. Additional drawback of such systems is that they can only detect the known attacks which have had their characteristics recorded in their signature database, i.e. any novel attacks that are different from the known cases will not be detected. Unlike misuse-based detection approach, anomaly-based detection relies on detecting the activities deviating from “normal” operations that are profiled by the IDS. The normal usage patterns profiling process profiles initial users’ normal behaviours, and then updating this database with further usage patterns . Though this type of IDS can detect unknown novel attacks, it still has some weakness including difficulty in defining normal behaviour metrics, suffering from high false alarms, which can lead to ‘lack of trust’ in the security software .
Timeliness characteristic of detection is one of the most important issues in this field. In practice, there are two possible approaches: audit-trial detection or real-time detection. An audit-trial, or offline IDS, refers to a system which analyses the system resources such as log files in order to detect any intrusive activities. This takes a long time and a large computational cost to maintain this large data. More importantly, it is post-mortem because of the nature of audit records. Regardless of these limitations, one of the most efficient ID techniques for host-based IDS is still the audit trail analysis . Unlike the audit trail approach, real-time, or on-the-fly processing, is a mechanism of sensing network packets constantly in a real-time environment. It analyses the current system operations and events (packet header, payloads and pathologies) instead of the known security breaches in the log file .
Consequently, potential intrusions can be detected prior to their malicious activities. The drawbacks of this method are the large consumption of random access memory (RAM) to buffer the retrieved data and a large amount of network traffic flows through the protection layer. Detection efficiency may also be affected if the packets are encrypted.
B. Artificial intelligence for Network intrusion detection
As current networking technology becomes more powerful and affordable, existing IDS face significant challenges, such as detection speed, accuracy and system adaptability. In order to overcome these limitations, there is a trend in Network Security to utilize Artificial intelligence (AI) techniques to develop a generalization capability from limited training data, and to be able to correctly classify future data as normal or abnormal. Eskin et al  proposed unsupervised anomaly detection algorithms with unlabeled data, assuming the number of normal instances is significantly larger than the number of anomalies, and anomalies appear as outliers in the data. However, this assumption is not reliable since in some circumstances, such as KDD-99 dataset, there exist relatively large attack clusters and small normal clusters. Another novelty detection approach using non-parametric density estimation, based on Parzen-window estimators with Gaussian kernels, is proposed by Yeung and Chow  to build an intrusion detection system using normal data only.
This model detects whether a record is intrusive or non intrusive, and not if the attack records belongs to a specific attack category. One of the emerging algorithms used in intrusion detection is Neural Networks (NNs) which can be implemented to recognize normal user behaviours in anomaly detection systems. Data sources such as network usage behaviours, or command list executed by different users, are good training examples for this purpose.
The resulting classifier then identifies malicious behaviours that do not match its training experience. In the paper , an intrusion dataset was used to train a model which is a combination of Self-Organizing Map (SOM) with Resilient Propagation Neural Network (RPROP) for intrusion detection. SOM was used for clustering and visualization while RPROP was used to classify normal patterns and intrusions. This dataset was divided into eight subsets and a SOM was trained with each subset. Neuron weights from the eight resulting SOM were then fed to the three-layer PRPROP neural network. The classification takes place at the third level of PRPROP. Another neural network based approach involves a Multi Level Perception architecture that consisted of four layers .
Besides NNs, Support Vector Machines (SVMs) are also a good candidate for intrusion detection that plot the training vectors in high dimensional feature space, labelling each vector by its class. The data is then classified by determining a set of support vectors, which are members of the set of training inputs which outline a hyper plane in the feature space. SVMs are scalable as they are relatively insensitive to the number of data points . Several other AI paradigms including linear genetic programming , Bayesian networks, multivariate adaptive regression splines , fuzzy inference systems etc. have been investigated for the design of IDS.
C. Benchmark KDD-99
There are several different approaches developed in the field of network intrusion detection. To facilitate the comparison of advanced research in this area, the Lincoln Laboratory at Massachusetts Institute of Technology (MIT)conducted the 1998 and 1999 evaluation of intrusion detection to provide a basis for making comparisons of existing systems under a common set of circumstances and assumptions. These programs were funded by The Defence Advanced Research Projects Agency (DARPA), an agency of the United States Department of Defence, responsible for the development of new technology for use by the military. Based on binary TCP dump data provided by DARPA evaluation, network connections are summarized and abstracted. These were used as the benchmark training and test data sets for “Classifier Learning Contest” organized in conjunction with the 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining in 1999 (KDD-99). In this paper, the KDD-99 dataset will be used as a benchmark which contains approximately 5 million network connection records and 41 independent features and a label indicating the connection is normal, or a specific attack type . These attacks fall into 4 categories: probe, denial of service (DoS), remote to local (R2L) and user to root (U2R) attacks.
In particular, probe attacks refer to automatic scanning computer networks to gather useful information for intruders, while DoS attacks are more serious by obstructing the legitimate requests to system resources. The R2L attacks occur when an intruder who does not have account on a machine can gain local access to the system by sending packets over networks. In contrast, U2R attacks assume that the intruder has normal user account and they can exploit the system vulnerabilities to gain root user privileges.
There are 41 features to summarize the connection information. These features are classified as basic features which are derived from packet headers without inspecting the packets content, content features which assess the payload of TCP packets by applying the domain knowledge and traffic-based features which are captured within a 2 second temporal window or over the number of connections. In this competition, all the top three winning solutions out of 24 entries submitted are some variants of decision trees.
The winning entry  is composed from 50×10 C5 decision trees fused by cost-sensitive bagged boosting. The second placed winning approach  applied a data mining tool called Kernel Miner (KM). KM constructs the set of locally optimal decision trees which were trained specifically for different attack categories and attack types. In total, KM generated a decision forest containing 755 trees. The third placed entry  used two layers of voting decision trees augmented with human security expertise. These entries are reported to be far from an acceptable solution to the intrusion detection problem, with low detection rates of U2R and R2Land high false alarm rates.
III. Boosted Modified Probabilistic Neural Network (BMPNN)
MPNN algorithm in Equation 4 can reduce the system sensitivity as well as to improve the computational efficiency. However, the original MPNN’s accuracy is not always superior compared with its precedent GRNN. In fact, MPNN is sometimes less accurate than GRNN. The main purpose of this research is to improve accuracy of MPNN by some meta-learning techniques such as bagging and boosting. As mentioned earlier, both bagging and boosting can significantly improve the stability of a weak learning algorithm i.e. reduce the generalization variance of the model. However, boosting also reduces the learning bias by changing the distribution over the training examples as a function of the errors made by previously component classifier. Because of this advantage of boosting, Ada-Boost method by Freund and Schapire (1996)  will be used to complement the original MPNN to form a new algorithm called BMPNN. The effectiveness of BMPNN will then be evaluated in the context of network intrusion detection. BMPNN will be compared with the several other algorithms.
In BMPNN, ADA-Boost algorithm is used to generate a number of MPNN classifiers which are trained by different sample sets drawn from the original training set. Each of these classifiers produces a hypothesis from which a learning error can be calculated.
When this error exceeds a certain level, the process of creating these hypotheses will be terminated. A final composite hypothesis is then created by combining individual hypotheses.
Bias of a classifier is defined as the error associated with that model, while variance is the stability of the model in making predictions on unseen input data. It is agreed that some trade off between bias and variance must be taken into account to achieve a compromised solution. This is known as the bias-variance dilemma . The trade off between bias and variance is possible by appropriately choosing the optimised level of model complexity. This suggests that between bias, variance and model complexity, there exists a certain relationship, which can be useful in assessing model performance. This may be conceptually formulated as a general function of these three values or just simply a linear expression as follow:
Cost = A*Bias + B*Variance + C*Complexity
Considering this cost can extend the bias-variance dilemma to a more complete picture in selecting the best model. Depending on the nature of the problems at hand, this cost function can vary from a simple function to a more complicated relationship.
Often in practice, model complexity can be seen as closely related to computational requirements of the model.
The proposed BMPNN is proven to successfully obtain a compromised solution for the trade off between learning bias, generalisation variance and computational requirements, as shown in the following experimental section. In particular, it implements Boosting algorithm to further reduce pure MPNN’s error as well as network sensitivity by training the classifiers with different distributions of a dataset. More importantly, this selection of distribution relies on the previous classifiers’ performance and therefore focuses on improving the overall performance 
IV. Experimental Analysis
The purpose of experiment is to show the effectiveness of the proposed algorithm (BMPNN) in comparison to other comparable models. In this paper the KDD-99 data set is used because it one of few in the domain of intrusion detection and it attracts significant attention from the researchers due to its well-defined and readily accessible nature. In particular, BMPNN is implemented to detect 5 class labels from this dataset including Normal, probe, DoS, U2R and R2L. It is then compared against other learning techniques as well as theKDD-99 winning methods. Specifically, the newly proposed Boosted MPNN method is compared with J48 Decision Tree, Boosted J48, MLP, GRNN and pure MPNN in terms of detection rate (accuracy), training time (model complexity) and model sensitivity (variance). The implemented boosting algorithm for both J48 and MPNN is AdaBoostM1 which is available from Weka library . For the RBFNN models, similar model size was chosen with one hidden layer containing 15 hidden nodes. The MLP has a structure of 3layers with the number of input neurons equal to the number of input features, 5 hidden neurons and 5 output neurons (1 for each class). The above numbers of hidden nodes are selected from multiple experiments (number of hidden nodes varying from 5 to 55 with steps of 10) by choosing the setting with lowest bias and variance. The learning models are trained with the full set of attributes. The resulting model is tested for 5times, each time by splitting the entire KDD-99 (around 5million records) into a training set of 100,000 records and a testing set of 50,000 records. Some evaluation metrics are selected such as detection rates, false positive rate, training time (indicates the computational requirement during the training process) and misclassification cost (according to the cost matrix given by KDD-99).
To protect copyright and proprietary data numerous technical details are not published in this documents, and are otherwise unavailable for public viewing.
Please contact About Us - Next Generation Security for further information.
This technical report is based on the authors’ previous publication presented at the 2006 International Joint Conference on Neural Networks, Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada, July 16-21, 2006.