You are on page 1of 16

International Journal of Computational Intelligence and Applications Vol. 6, No.

3 (2006) 299313 c Imperial College Press

A CEREBELLAR MODEL CLASSIFIER FOR DATA MINING WITH LINEAR TIME COMPLEXITY

DAVID CORNFORTH School of Information Technology and Electrical Engineering University of New South Wales Australian Defence Force Academy Northcott Drive, Canberra, ACT 2600, Australia Received 14 October 2003 Revised 24 July 2006 Accepted 7 August 2006 Techniques for automated classication need to be ecient when applied to large datasets. Machine learning techniques such as neural networks have been successfully applied to this class of problem, but training times can blow out as the size of the database increases. Some of the desirable features of classication algorithms for large databases are linear time complexity, training with only a single pass of the data, and accountability for class assignment decisions. A new training algorithm for classiers based on the Cerebellar Model Articulation Controller (CMAC) possesses these features. An empirical investigation of this algorithm has found it to be superior to the traditional CMAC training algorithm, both in accuracy and time required to learn mappings between input vectors and class labels. Keywords : Cerebellar model articulation controller; classication; training.

1. Introduction A well-studied class of machine learning problems is that of categorization, or classication. Here, the key is to determine some relationship between a set of input vectors that represent stimuli, and a corresponding set of values on a nominal scale that represent category or class. The relationship is obtained by applying an algorithm to training samples that are 2-tuples u, c consisting of an input vector u and a class label c. The learned relationship can then be applied to instances of u not included in the training set, in order to discover the corresponding class label c.1 A number of machine learning techniques including genetic algorithms,2 and neural networks3 have been shown to be very eective in solving such problems. There are many large databases in existence that could yield valuable information if ecient and scalable methods of automated classication could be found.4 Some of the desirable features of algorithms for automated classication of large databases are: low order time complexity, training with only a single pass of the data, and accountability for class assignment decisions.
299

300

D. Cornforth

Many algorithms for automated classication have an inherently non-linear relationship between time taken by the algorithm to run and the number of training examples. Analysis methods that work well for small data sets are completely impractical when applied to larger data sets. For example, training of a neural network using back-propagation is known to be NP-complete.5 Some studies suggest that evolutionary algorithms have polynomial time complexity.6 The work presented here investigates classication algorithms based on the Cerebellar Model Articulation Controller (CMAC),7 which have linear time complexity. Global error minimization techniques, such as back-propagation, require multiple traversals of the data set during training. If the training set is very large, it cannot t inside the memory of the machine. This will result in multiple disk read/write operations, which are relatively costly in time and can contribute greatly to data processing time. Current approaches include compression or summary of the data set before processing, and redesign of analysis tools so that analysis can be completed with only one pass of the data. This paper shows how the original CMAC training algorithm, which normally uses an iterative global error minimization technique, may be adapted so that the training set only needs to be accessed once. The usefulness of a classication algorithm may be enhanced by providing an explanation for each class assignment decision. This could take the form of a set of rules that contribute to the assignment, or a probability for each class, given the input. Black box methods such as neural networks do not naturally lend themselves to this form of analysis. The new algorithm described here provides accountability for class assignment decisions in the form of class probabilities. In this paper, I propose the Kernel Addition Training Algorithm (KATA) as a more eective learning algorithm for the CMAC when used as a classier. The proposed method requires only a single pass of the data and provides a probability model for class assignment decisions. The organization of the remainder of this paper is as follows. Section 2 briey reviews the architecture of the CMAC, and introduces the proposed modications. Section 3 provides an empirical investigation of the new fast learning algorithm and the traditional error minimization methods. Section 4 provides a discussion of the results and implications arising from them. 2. Cerebellar Model Articulation Controller The CMAC, or Albus perceptron, is a sparse coarse-coded associative memory algorithm that mimics the functionality of the mammalian cerebellum.8 Originally, the CMAC was proposed as a function modeler for robotic controllers,7 but has been extensively used in reinforcement learning9 , 10 and also as a classier.1114 The training method proposed by Albus was an iterative algorithm based on global error minimization. Empirical evidence, presented in this paper, suggests that this algorithm has linear time complexity. While training the CMAC has been shown to be faster than training a neural network using back-propagation,15 the method still

A Cerebellar Model Classier for Data Mining with Linear Time Complexity

301

requires multiple passes of the training data. The method proposed in this paper requires only a single pass of the data. Furthermore, it provides a probability model for class assignment decisions. The CMAC is able to accept real valued inputs. An input vector u with d components may be visualized as a point in d-dimensional space. The input space is quantized using a set of q overlapping tiles as shown in Fig. 1(a), where q = 2. For input spaces of high dimensionality, the tiles form hyper-rectangular regions.

(a)

(b) Fig. 1. (a) The CMAC tile conguration, with a query point activating a tile in both the tile sets. (b) The two active tiles activate two memory locations. These contain values that are summed to produce the output.

302

D. Cornforth

A query is performed by rst activating all the tiles that contain a query point. The activated tiles address memory cells, which contain stored values. These are the weights of the system, as shown in Fig. 1(b). The summing of these values produces an overall output. The CMAC output is therefore stored in a distributed fashion, such that the output corresponding to any point in input space is derived from the value stored in a number of memory cells. A change of the input vector results in a change in the set of activated tiles, and therefore a change in the set of memory cells participating in the CMAC output. The memory size required by the CMAC depends of the number of tilings and the size of tiles. If the tiles are large, such that each tile covers a large proportion of the input space, a coarse division of input space is achieved, but local phenomena have a wide area of inuence. If the tiles are small, a ne division of input space is achieved and local phenomena have a small area of inuence. The number of tiles in the input space, and therefore the number of memory cells, is usually suciently large to become prohibitive due to memory constraints. Many of these tiles are never used due to the sparse coverage of the input space. One solution is to employ a consistent random hash function to collapse the large tiling space into a smaller memory cell space.16 This reduces the memory use, but still requires relatively large memory requirements for a classier. An alternative and more comprehensive solution is the hierarchical CMAC.17 Here, several low-dimensional CMACs are connected to form a multi-layer tree structure. Training is accomplished by minimizing the output error, and back-propagating errors to hidden layers. The tree structure can also be pruned to reduce redundant nodes.18 This method cannot be employed here because it is not compatible with the training rule presented. The CMAC learns a mapping from input space U Rd to output space Z R, where d is the number of dimensions, or the size of the input vector. Following existing convention, this can be broken into three mappings12 : The input space to multi-layer tiling system mapping E : u x. The multi-layer tiling system to memory table mapping H : x y. The memory table to output mapping (weighted summation) W : y z . The mapping E can be implemented using simple integer division in each dimension. The integer values for each dimension are combined to form one address for each tiling layer. Addresses for the other tiling layers are calculated in a similar way. The mapping H receives q addresses that must be mapped to memory cells. This mapping is usually implemented by a hashing function. The mapping W is a weighted summation of the contents of the memory cells. These values are set during training. An improvement over the Albus CMAC is the widely adopted practice of embedding kernel functions into the quantizing regions.1921 This modies the output mapping to a weighted summation: e=
q i=1 k (disti )y [ai ] q i=1 k (disti )

(1)

A Cerebellar Model Classier for Data Mining with Linear Time Complexity

303

(a)

(b)

Fig. 2. How kernel functions may be embedded into a 2-dimensional tiling grid. (a) Step kernel function. (b) Linear kernel function.

Each weight y is indexed by address a, and the kernel function k is applied to some distance measure of the query point from the centre of the tile. The number of tiling layers is q . Some common kernel functions are illustrated in Fig. 2.

2.1. Output mapping The CMAC may be used as a classier by adopting a suitable mapping between the real valued output variable z and the nominal variable class label c. One possible mapping8 interprets positive values of z as one class, and negative values of z as another class. This is sucient for two class problems, and is the most often cited in the literature.12 , 13 , 22 , 23 For problems with more than two classes, one could dene threshold values such as to divide the scalar range of z into the number of classes to be represented: c = v : tlow < z < thigh v v (2)

where threshold tlow > thigh v v 1 . Equation (2) represents a scalar mapping. Using this mapping, the CMAC can be used as a classier if, during the training phase, weights are adjusted to make the output z approach a suitable target value. For example, the target for a given class could be a value equidistant from the thresholds corresponding to that class.

2.2. Albus training algorithm The Albus CMAC is trained by evaluating the error as the dierence between desired output zd and actual output z , and updating the active weights at each time step t: wi(t+1) = wit + (zdj zj ) k (di ) . k (di ) (3)

q i=1

Error minimization algorithms in the form of Eq. (3) have proven to converge.24 A gain term is introduced to control the convergence rate.

304

D. Cornforth

2.3. Kernel addition training algorithm The scalar mapping above is not ideal, as it represents a nominal variable using a continuous scale, and there is no information about the degree of membership of a class. Consider an alternative output mapping, using a CMAC for each class: c = v : zv = max(z1 , z2 , . . . , zm ) (4)

where m is the number of classes. Equation (4) represents a vector mapping. This may be used to assess the decision of the classier to assign any particular input to a class. For example, it is possible to discover if two classes have high activation, or if one class is the clear winner. A desirable property is that the output activations are proportional to the probability of the class, given the input, so that zv represents a relative probability of selecting class c. Then, it is possible to take account of a priori probability using Bayes Law: P (ci |x) = P (ci )P (x|ci ) i P (ci )P (x|ci ) (5)

where P represents probability.3 The frequency of samples occurring in each class may be used to estimate P (ci ). The goal of training then is to provide an output zi that can be used to estimate P (x|ci ). There is no need to calculate the denominator, as assignment to the highest probability class requires only comparison. The new training algorithm, the KATA,25 uses a vector class mapping. As each training vector is presented, a kernel function value for each activated tile is added to the value of the corresponding memory cell. Assuming n training points distributed uniformly over a tile, the expected value of the corresponding cell after training will be n.ke , where ke is the expected value of the kernel function. If the kernel function is the step function, the value of each memory cell after training is a count of the number of times the corresponding tile was accessed during training. If the kernel function is not the step function, then training amounts to estimation of a histogram, using as a weight some function of the distance of the input from the centre of the histogram bin. From the well-known properties of histograms, one concludes that: The value of any tile after training is proportional to the probability of inputs activating that tile. A histogram improves its estimate of the underlying distribution as the number of training samples increases, so the algorithm will converge. It is only necessary to present the training data once. There is no value in repeated presentation of the same training data. After training, the output z for each class will be proportional to the numerator of Eq. (5). This can be seen by considering a classication problem where the number of samples in each class is the same. In this case, the CMAC output for each class after training is proportional to the probability distribution of inputs in that class, P (x|ci ). If the number of samples in one class is now doubled, the CMAC

A Cerebellar Model Classier for Data Mining with Linear Time Complexity

305

output for this class is also doubled. Assume that the number of samples in a class is a good estimator of P (ci ). Then the CMAC output after training is proportional to P (ci ).P (x|ci ). After training, each CMAC forms a piecewise model of the probability density function for the corresponding class. There is no need to normalize the output as in Eq. (1), so the output is given by:
q

z=
i=1

k (disti ) w[addri ]

(6)

The KATA CMAC is trained using the value of the kernel: wi(t+1) = wit + k (disti ) (7)

In contrast to the Albus training algorithm, the KATA is not an iterative algorithm. The weights are updated during a single presentation of the training data at the inputs. From this, it follows that the KATA is not sensitive to the order in which input samples are presented. Also, the KATA is robust to outliers, as outliers occur with low frequency, and so will have minimal eect on the CMAC output.

3. Experiments and Results Comparing Eq. (3) with Eq. (7), it can be seen that the KATA can be completed in less time than one iteration of the Albus algorithm. The speed advantage will not be as great as suggested simply by comparing these equations due to the dierent software overheads. However, one would expect that the KATA would be faster than the Albus training algorithm. This conclusion was tested using computer models of the two algorithms for comparison purposes. The experiments were designed to demonstrate the linear relationship between number of training samples and training time. 3.1. Articial test problem The two CMAC learning algorithms were tested using the parity problem. This problem was chosen because of its low spatial frequency, ensuring that there will be enough samples to discriminate classes in tests with a high number of dimensions or a high number of classes. In this problem, the input space is partitioned into m regions in each dimension, where m is the number of classes and d the number of dimensions. Given an input vector x = {x1 xd }, 0 < xi < r, then the class label is given by:
d

o=
i=1

oor

mxi r

mod m

(8)

If there are just two input variables the problem is known as Exclusive-OR (or XOR). The parity problem for three inputs and two classes is shown in Fig. 3.

306

D. Cornforth

Fig. 3. The parity problem for a three-dimensional input space. White represents class 0, black represents class 1.

Data sets were generated using randomly generated x values, and assigning a class label to each record according to Eq. (8). Seven data sets were generated, containing from 2 to 5 dimensions and from 2 to 5 classes. Each database consisted of 1 million samples, with input variables drawn from a uniform distribution. The classiers were tested using dierent numbers of samples. 3.2. Natural test problem The two CMAC algorithms were tested using a natural data set, derived from the 1998 DARPA Intrusion Detection Evaluation Program.26 The dataset was originally collected to establish the ecacy of intrusion detection and includes a variety of simulated intrusions of a military computer network. A version of this was used for the KDDcup99 contest. There are 24 classes representing dierent types of attack and 41 measurements, or features used as inputs to the classier. Some of these are discrete and some are numeric. The datasets used in these tests contained 494,020 records. The dataset was adapted for testing the time complexity of the CMAC algorithms as follows. Features with a small number of integer values were removed, as the CMAC uses continuous inputs only. Also, features that are zero most of the time were removed. Thus, 12 features are left. The algorithms were tested on dierent numbers of records by extracting records at random, containing dierent numbers of samples. 3.3. Test methodology Both versions of the CMAC used the same parameters. Input space was uniformly quantized in all dimensions. Tile spacing was based on the work of Parks and Militzer.27 A hashing function with chaining was used to achieve zero collisions. The distance measure used for kernels was Euclidean, and the kernel function used was linear. Both versions were implemented in C++, using similar data structures and components in order to make the resulting code as similar as possible, and

A Cerebellar Model Classier for Data Mining with Linear Time Complexity

307

thereby enable meaningful comparisons of running time. The Albus CMAC used a scalar output mapping, while the KATA CMAC used a vector output mapping. The gain term for the Albus training algorithm, , was set to 1.0 at the start of training, and reduced during training, as this guarantees quick convergence.28, 29 This was implemented by setting to the value of the normalized training error. The number of epochs used must be sucient to allow convergence, but not too many so as to cause over tting. After each epoch, the accuracy was compared to that from the previous epoch. If the accuracy had increased by less than 0.1%, then training would be terminated. The performance of the two algorithms was tested using three-fold cross validation, so that accuracy was always tested only on unseen data. The data sets used were each divided into three parts at random. Training was performed using two parts of the data, and the trained model was tested on the remaining one part. This was done three times using a dierent part for testing. In this manner, the model was tested on all data, and reported a number of correctly classied samples, which was divided by the size of the data set to obtain percentage accuracy. The choice of the fraction one third is a compromise between using all data to train, which may result in over tting the model, and using less data to be computationally ecient.30 For each test, the time taken to train and the resulting accuracy was measured. 3.4. Performance comparison In order to put these results in context, some other classier algorithms were compared to CMAC. For this purpose, the Weka toolbox was used.31 As this toolbox consists of programs written in Java, and using a common framework, it is possible to make direct comparisons between running time. For this purpose, the KATA CMAC algorithm was also coded in Java using the same libraries in order to provide the most realistic comparison. Initially, 12 algorithms were considered from the wide range provided by Weka. Some of these were discarded during tests because their long running time did not provide a fair comparison with CMAC. Of the available algorithms, the three fastest were selected: functions.RBFNetwork (placement of Gaussian kernels using clustering), functions.SMO (the Sequential Minimal Optimization version of Support Vector Classiers), and trees.J48 (an implementation of the C4.5 decision tree algorithm). These three, as well as KATA CMAC, were tested on the same data sets from the Parity problem described earlier, using 10-fold cross validation. 3.5. Results Figures 4 and 5 show the results for the Parity problem, where the number of dimensions and classes used is indicated on the caption. These results suggest a linear relationship between the number of samples and training time for both versions of

308

D. Cornforth

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 4. Results for the Parity problem, showing training time in seconds against samples in thousands. (a) 2 dimensions and 2 classes, (b) 3 dimensions 2 classes, (c) 4 dimensions and 2 classes, (d) 5 dimensions and 2 classes, (e) 2 dimensions and 3 classes, (f) 2 dimensions and 4 classes, (g) 2 dimensions and 5 classes, (h) legend.

A Cerebellar Model Classier for Data Mining with Linear Time Complexity

309

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 5. Results for the Parity problem, showing accuracy in percent against samples in thousands. (a) 2 dimensions and 2 classes, (b) 3 dimensions 2 classes, (c) 4 dimensions and 2 classes, (d) 5 dimensions and 2 classes, (e) 2 dimensions and 3 classes, (f) 2 dimensions and 4 classes, (g) 2 dimensions and 5 classes, (h) legend.

310

D. Cornforth

(a)

(b)

Fig. 6. Comparisons for the Parity problem using 2 dimensions and 5 classes. (a) showing training time in seconds against number of samples (in thousands), (b) showing accuracy against number of samples (in thousands).

the CMAC algorithm. Figure 6 shows the results for the Intrusion Detection problem. This supports the belief that CMAC classiers have a linear time complexity for training. In all the tests, the KATA algorithm was about 2.5 to 3 times as fast as Albus. Note that in all these tests, the Albus training algorithm used a variable number of training epochs, which explains in part the occasional outliers. So the relative speed advantage of the KATA depends on the number of iterations of the Albus algorithm. The accuracy obtained by training with the KATA is consistently superior to that obtained using the Albus technique. There are two possible explanations for this. First, when the problem becomes more dicult, using more classes or dimensions, the performance of the classier is bound to deteriorate, because the number of samples available for each homogenous block of the input space decreases. Since the Albus technique uses an error minimization, this is an inherently biased model, whereas the KATA uses an unbiased model of input space. Therefore, the accuracy of the classier trained using the KATA degrades more slowly. Second, the Albus method suers from the diculty of correctly setting the parameter, which is not necessary for the KATA. Figure 7 shows results for the comparison between four classier methods. Here, all the methods chosen show evidence of linear time complexity. The algorithm taking the longest time to build the model was J48, taking up to 10,000 seconds to train on datasets near one million samples. The next slowest was SMO, taking up to 4000 seconds to train. CMAC was the algorithm with the fastest training time, and RBF was very close. The slowest classier for testing was RBF, taking 70 seconds to classify. The next slowest was CMAC, taking up to 20 seconds to classify. The other two methods, SMO and RBF were much quicker, classifying unknown cases in less than 10 seconds. Surprisingly, two of the methods, SMO and J48, have very poor performance on this problem in terms of their accuracy [Fig. 7(c)]. This is probably due to the number of vectors that would be required to dene the class boundaries in

A Cerebellar Model Classier for Data Mining with Linear Time Complexity

311

(a)

(b)

(c)

(d)

Fig. 7. Comparisons for the Parity problem using 2 dimensions and 5 classes. (a) training time in seconds against number of samples (in thousands), (b) testing time in seconds against number of samples (in thousands), (c) accuracy in percent correct against number of samples (in thousands), (d) legend.

SMO, and the close superposition of the boundaries in J48. The accuracy of the RBF classier is similar to that of CMAC. This is expected, since the similarity of CMAC to RBF is well known. It is clear that CMAC compares well with the other methods examined. It should be noted that only the fastest classier methods were examined, so it was possible that CMAC would outperform other classier methods on speed and accuracy.

4. Conclusions There are three main results from this work. First, the training of CMAC-based classiers has linear time complexity. This is a highly desirable property of machine learning techniques, as it makes the processing of large databases more computationally feasible. Second, a new training algorithm allows the CMAC to be trained in a single pass of the data, in contrast to error minimization algorithms. This avoids the need for training data to be held in memory during training, contributing to computational eciency. Third, the output encoding of the KATA oers

312

D. Cornforth

accountability for class assignment decisions and allows a priori probability to be accounted for. This has potential in applications that require estimation of the risk of incorrect classication. These three main results are supported by empirical evidence presented here. Other results may be inferred from the nature of the algorithm, namely, that the KATA is not sensitive to the order of training data, and is robust to outliers. Comparative trials suggest that dierent classiers have advantages in dierent areas, but CMAC with KATA has the characteristic of fast training. This new training algorithm has great potential for application in data mining and automated knowledge discovery.

Acknowledgments The author wishes to thank the New South Wales Centre for Parallel Computing (NSWCPC) for the use of their SGI Power Challenge machine upon which the calculations for this paper were performed. Part of this work was supported by a Faculty Seed Grant from Charles Sturt University, and part was supported by a Rectors Start-up Grant from the University of New South Wales.

References
1. T. G. Dietterich and G. Bakiri, Solving multiclass learning problems via errorcorrecting output codes, J. Artif. Intell. Res. 2 (1995) 263286. 2. J. Holland, Adaptation in Natural and Articial Systems: An Introductory Analysis with Applications to Biology, Control, and Articial Intelligence (MIT Press, 1992). 3. R. O. Duda and P. E. Hart, Pattern Classication and Scene Analysis (John Wiley and Sons, New York, 1973). 4. J. Han and M. Kamber, Data Mining Concepts and Techniques (Morgan Kaufman, 2001). 5. A. Roy, S. Govil and R. Miranda, A neural-network learning theory and a polynomial time RBF algorithm, IEEE Trans. Neural Networks 8(6) (1997) 13011313. 6. J. He and X. Yao, Drift analysis and average time complexity of evolutionary algorithms, Artif. Intell. 127(1) (2001) 5785. 7. J. S. Albus, A new approach to manipulator control: The Cerebellar Model Articulation Controller (CMAC), J. Dynam. Syst. Measurement Contr. 97 (1975) 220233. 8. J. S. Albus, Mechanisms of planning and problem solving in the brain, Math. Biosci. 45 (1979) 247293. 9. J. C. Santamaria, R. S. Sutton and A. Ram, Experiments with reinforcement learning in problems with continuous state and actions spaces, Technical Report UM-CS-1996088, Department of Computer Science, University of Massachusetts, Amherst, MA (1996). 10. M. Wiering, R. Salustowicz and J. Schmidhuber, Reinforcement learning soccer teams with incomplete world models, Autonomous Robots 7 (1999) 7788. 11. D. Cornforth and D. Elliman, Modelling probability density functions for classifying using a CMAC, in Techniques and Applications of Neural Networks, eds. M. Taylor and P. Lisboa (Ellis Horwood, 1993).

A Cerebellar Model Classier for Data Mining with Linear Time Complexity

313

12. Z. J. Geng and W. Shen, Fingerprint classication using fuzzy cerebellar model arithmetic computer neural networks, J. Electron. Imag. 6(3) (1997) 311318. 13. H. Fashandi and M. Moin, Face detection using CMAC neural network, in Proc. 7th Int. Conf. Artif. Intell. Soft Comput. ICAISC, eds. L. Rutkowski, J. Siekmann, R. Tadeusiewicz and L. Zadeh, Lecture Notes in Computer Science, Vol. 3070 (Springer, 2004) 724729. 14. W. Xu, S. Xia and H. Xie, Application of CMAC-based networks on medical image classication, in Proc. Int. Symp. Neural Networks, eds. F. Yin, J. Wang and C. Guo, Lecture Notes in Computer Science, Vol. 3173 (Springer, 2004) 953958. 15. D. Cornforth, Classiers for machine intelligence, PhD thesis, Nottingham University, UK (1994). 16. T. H. Corman, C. E. Leirson and R. L. Rivest, Introduction to Algorithms (McGrawHill, 1986). 17. H. Lee, C. Chen and Y. Lu, A self-organizing HCMAC neural-network classier, IEEE Trans. Neural Networks 14(1) (2003) 1527. 18. C. Chen, C. Hong and Y. Lu, A pruning structure of self-organising HCMAC neural network classier, in Proc. 2004 IEEE Int. Joint Conf. Neural Networks 2 (2004) 861866. 19. P. C. E. An, W. T. Miller and P. C. Parks, Design improvements in associative memories for cerebellar model articulation controllers, Proc. ICANN (1991), pp. 12071210. 20. S. H. Lane, D. A. Handelman and J. J. Gelfand, Theory and development of higherorder CMAC neural networks, IEEE Cont. Syst. (1992) 2330. 21. F. J. Gonzalez-Serrano, A. R. Figueiras-Vidal and A. Artes-Rodriguez, Generalizing CMAC architecture and training, IEEE Trans. Neural Networks 9(6) (1998) 15091514. 22. H. Xu, C. Kwan, L. Haynes and J. Pryor, Real-time adaptive on-line trac incident detection, Proc. IEEE Int. Symp. Intell. Contr. (1996), pp. 200205. 23. J. Geng and T. Lee, Freeway trac incident detection using fuzzy CMAC neural networks, Proc. IEEE World Congress Comput. Intell. 2 (1998) 11641169. 24. Y. Wong, CMAC learning is governed by a single parameter, in Proc. IEEE Int. Conf. Neural Networks, San Francisco (1993), pp. 14391443. 25. D. Cornforth and D. Newth, The kernel addition training algorithm: Faster training for CMAC based neural networks, in Proc. Conf. Artif. Neural Networks Expert Syst. (University of Otago, 2001), pp. 3439. 26. S. Hettich and S. D. Bay, The UCI KDD Archive, University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu. 27. P. C. Parks and J. Militzer, Improved allocation of weights for associative memory storage in learning control systems, in Proc. IFAC Design Meth. Contr. Syst., Zurich, Switzerland (1991), pp. 507512. 28. C. Lin and C. Chiang, Learning convergence of CMAC technique, IEEE Trans. Neural Networks 8(6) (1997) 12821292. 29. S. Yao and B. Zhang, The learning convergence of CMAC in cyclic learning, Proc. Int. Joint Conf. Neural Networks 3 (1993) 25832586. 30. S. Weiss and C. A. Kulikowski (eds.), Computer Systems That Learn: Classication and Prediction Methods From Statistics, Neural Nets, Machine Learning, and Expert Systems (Morgan Kaufman, San Mateo, CA, 1991). 31. I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, 1999).

You might also like