You are on page 1of 4

Estimate the Call Duration Distribution Parameters in GSM System Based on K-L Divergence Method

JUNQIANG GUO, FASHENG LIU, ZHIQIANG ZHU College of Information and Electrical Engineering, Shandong University of Science and Technology, Qingdao, 266510, China Guojunqiang2008@163.com

Abstract
This article analyses call duration probability density function and its parameters in GSM system. The sample data are obtained from an actual system, and the observed call duration is as long as 1800 seconds. K-L Divergence method has been introduced to compare the goodness of candidate call duration probability density functions, and to estimate the parameters of three candidate probability density functions. Using these parameters, a day time hour performance of the GSM system is predicted. The comparison shows that lognormal distribution of call duration is more precise than exponential or Erlang distribution in the performance prediction in GSM system. Keywords: entropy, K-L Divergence, GSM, call duration, lognormal distribution

1. Introduction
Call duration in mobile telephony system is a very important parameter. Call duration probability density function and its parameters are often required in system design, simulation and analysis. In system management and network optimization, especially in charging policy, call duration probability density function and its parameters are key parameters to ensure the correctness of prediction and analysis. In mobile telephony system, call duration parameter plays a very important role in determining cell size, handover property, and channel allocation strategy. Charging strategy is dependent on call duration, too. Estimation of call duration probability density function and its parameters is often a not trivial problem. In most situations, we expect a simple, analytic and relatively precise function. Call duration in mobile telephony system traditionally is assumed exponential just as in the fixed telephony networks. However, many empirical approaches have been used to look into the probability distribution of call duration both in mobile and fixed telephony systems, and it is found that lognormal distribution fits empirical distribution better than exponential or Erlang distribution does[1][2][4][5]. In [3], it is said that the length of sentences fits the log-normal distribution. Maybe that is the reason that

call duration in GSM telephony system fits lognormal distribution better. In [2], the authors tested the following distributions: lognormal, shift exponential, Erlang-k and Erlang-jk. They found that the lognormal distribution fitted the empirical distribution better. In their studies [1][2], the authors mainly used statistical goodness-of-fit test method. The data samples they used are shorter than 6 minutes. In [4], the authors believe that it is critical to select sample of calls not being truncated, i.e., look at all calls that started in a given time period, not those that started and ended in a given time interval. The Kulback-Leibler divergence is enunciated by Kullback and is called the principle of minimum cross entropy (MinxEnt), which aims to find a probability distribution that is as close as possible to another distribution [6]. Minimum cross-entropy (MinxEnt) method has been applied to a wide range of problems such as signal processing, pattern recognition, economics, statistics, and communication system, etc. [7-9]. To estimate the call duration distribution parameters in GSM System based on K-L divergence method is meaningful to this methodology in engineering. It shows the power of Kulback-Leibler divergence method: simple and precise. This paper presents a field study of call duration distribution in GSM mobile telephony system. The K-L Divergence is used as a measure to estimate three candidate distributions: exponential, Erlang, lognormal. The conclusions are: lognormal distribution has less K-L Divergence with statistical probability distribution; Minimum K-L Divergence method is simple and precise in estimating the call duration distribution parameters in GSM system.

2. Data obtaining
The data used in this paper is from a mobile switch centre in GSM system of Qingdao, China. The call start time and call duration are selected from signaling command. In one whole hour, all calls to and from the mobile switch centre are recorded. All calls starting within this hour were used to construct a data sample. The traffic distributions of the mobile switch centre are

1-4244-1312-5/07/$25.00 2007 IEEE

2988

shown in Figure 1 and Figure 2. Figure 1 shows a day traffic distribution; Figure 2 shows two monthly averaged week day traffic distributions. Two data samples are constructed. Data sample 1 was recorded from 13:17:00 to 14:16:59 on Wednesday, Jan.24, 2007. Data sample 2 was obtained on Thursday, Jan. 25, 2007, from 10:22:00 to 11:21:59. Some values of the original samples are eliminated. The values under 3 seconds are considered to be caused by noise or interference, and are not normal calls. The values over 1800 seconds are all recorded as 1800 seconds so are also eliminated in our analysis. Table 1 and 2 show the statistical values of the two data samples respectively. From the tables we can find that the eliminated data have negligible effect to our analysis. So the call duration data samples used in this article are from 3S to 1799S. The mean time of data sample 1 is 84.44 seconds, and the standard deviation is 123.32 seconds, and the data sample 2 has a mean of 81.95 seconds, and standard deviation 115.86 seconds. The mean call duration is much shorter than the value in [10], which studied a fixed telephony system and showed 200-300 seconds mean call duration. Perhaps this is one of the main differences of mobile and fixed telephone system. Table 1. Call duration statistical values of data set 1
Number of calls Total 02 S 31799S 1800S 87415 1093 86237 85 Percentage of total calls 1.25 98.65 0.097 Percentage of total calls 1.20 98.75 0.048 Traffic 2065.46 0.305 2022.65 42.5 Percentage of total traffics 0.015 97.93 2.06 Percentage of total traffics 0.0135 98.932 1.055

The K-L divergence is a natural distance measure from a probability distribution p(x) to a probability distribution q(x). One can show that it is always nonnegative and zero only if p(x)=q(x)[13]. The larger the divergence/difference between p and q, the higher the value of K; the more similar are p and q, the lower will be K; in a limiting case, if p=q, then K=0. In our problem, the goal is to find an analytical function q(x) with proper parameters. It can be written as an optimization problem:
M in K ( p q ) = q p ( x ) ln q( x) 0 p(x) 0 S .T . q ( x ) dx = 1 p ( x ) dx = 1 p( x) q( x) dx

(3)

Calculating problem (3) by computer, continuous function p(x) and q(x) must be transformed into discrete forms, and continuous integral should be approximated by discrete summation. In practice, discrete approximation of the continuous integral is as equation (4). The numerical approximation is based on sample values of f(x). The range of x[a ,b ] , is divided into n equal intervals. Let (x1, x2, , xn) be the representatives of these intervals, and {f(x1), f(x2), , f(xn)} be the values corresponding to the points {x1, x2, , xn}. The continuous integral of f(x) is approximated by
ba b a f ( x )dx = n n f ( xi )= pi i =1 i =1 n

(4)

Table 2. Call duration statistical values of data set 2


Number of calls Total 02 S 31799S 1800 S 104326 1256 103020 50 Traffic 2370.37 0.32 2345.05 25

Set
pi = ba n f ( xi )

(5)

3. Calculating Based on K-L Divergence Method


According to E.T.Jaynes [11], for a continuous variable x, the entropy of a probability distribution function p(x) is defined as: H ( p) = k p( x) ln p( x )dx (1) For distributions over a continuous variable x, the K-L divergence (Kullback-Leibler divergence) between two probability distribution functions p(x) and q(x) is defined by Kullback and Leibler [12] as: p( x) K ( p q) = p ( x ) ln dx (2) q ( x)

The probability distribution p(x) is calculated by the frequency distribution of the data sample. Let the sample data be {x1, x2, , xN}, and xmin and xmax be the minimum and maximum values of x. The interval, [xmin, xmax], is partitioned into n intervals of equal length. Based on this partitioning, the frequency distribution of the sample data is obtained as {f1, f2, , fN}. The probability distribution p(x) is given as { f1 , f 2 , , f N } .
N N N

In our problem, call duration is variable x, and x [3, 1799] , partitioning interval is 1 second. Sample points of q(x) are integers of [3, 1799]. Three candidate functions of q(x) are used. They are lognormal as equation (6), exponential as equation (7) and Erlang-2 as equation (8).
q( x) = 1 2 x exp( 2 (ln x ln m) ) 2 2

(6)

1-4244-1312-5/07/$25.00 2007 IEEE

2989

q( x) =

exp(

q ( x) =

2
x

(7)
x )

less than 100s, and they account for only 40% of traffic.

exp(

(8)

6. Conclusions
From the calculation of K-L divergence in section 4, and the prediction results in section 5, some conclusions can be obtained. First, K-L divergence is a simple but effective measure to test the goodness of fit of a pdf. Second, using K-L Divergence method, the parameters of candidate probability density function can be estimated precisely. Third, the Lognormal pdf is more precise than exponential or Erlang pdf in the performance prediction of the GSM system.

Minimize K ( p q ) in problem (3) by searching for the parameters of q(x) with optimal algorithm, and the results are described in section 4.

4. Results of calculation and parameters estimation


Table 3 lists the calculation results of KL divergence of p(x) with the three candidate q(x) and parameters of q(x). Figure 3 depicts the probability distributions vs. call duration. Four types of probability density function are compared: the actual frequency distribution of data sample 1 and three candidate pdfs(probability density functions), which are lognormal, exponential and Erlang-2. From table 3, it can be found that the value of the K-L divergence of p(x) with lognormal pdf(probability density function) is the least of the three values. It is more close to 0, and is much less than the other two values. From figure 3, it is obvious that lognormal pdf fits the actual frequency distribution of data sample 1 better than exponential or Erlang pdf. Table 3. Results of K-L divergence calculation and estimated parameters
Density function Erlang Exponential Lognormal K-L divergence 0.277 0.124 0.00787 Parameters =42.135, k=2 =84.270 m=49.133, =1.0041

Figure 1. Traffic density distribution on JAN.25, 2007

From the results, it can be concluded that the lognormal pdf with m=49.133 and =1.0041, can be considered as the real call duration pdf of this GSM system.

5. Results of Prediction
Using parameters obtained in section 4, some prediction results are depicted from figure 4 to figure 8. The total call number of data set 2 was used as a known parameter for the three candidate pdfs. Each prediction result was compared with the actual value of data sample 2. From figure 4 to figure 7, it can be found that the lognormal pdf with m=49.133 and =1.0041 predicted call number or traffic distribution very well. The prediction results using lognormal pdf are more precise than that using exponential or Erlang pdf. Figure 8 depicts the relationships between call number, traffic and call duration of this GSM system. It can be found that a large proportion of calls with short call duration occupy a small proportion of traffic. For example, it can be read from figure 8 that 78% of total calls with call duration

Figure 2. Average weekday traffic density distribution from DEC.1, 2006 TO JAN.31, 2007

Figure 3. Call duration probability distributions comparison

1-4244-1312-5/07/$25.00 2007 IEEE

2990

Figure 4. Predictions of call number distribution vs. call duration

Figure 8. The relationships between call number, traffic and call duration

References
[1] G. Boggia, P. Camarda, A. DAlconzo, A. De Biasi and M. Siviero, Drop Call Probability in Established Cellular Networks: from data Analysis to Modelling, Proc. of IEEE VTC 2005, spring, May 2005. [2] J. Jordan and F. Barcelo, Statistical modeling of channel occupancy in trunked PAMR systems, Proc. 15th Int. Teletraffic Conf. V. Ramaswami and P. E. Wirth, Eds. Elsevier Science B.V., 1997, pp. 11691178 [3] L. Eckhard, A.S. Werner, A. Markus, Log-normal Distributions across the Sciences: Keys and Clues, BioScience, Vol. 51, May 2001, No. 5. pp. 341351. [4] D.E. Duffy, A.A. McIntosh, M. Rosenstein, W. Willinger, Statistical Analysis of CCSN/SS7 Traffic Data from Working CCS Subnetworks, IEEE Journal on Selected Areas in Comm., Vol. 12, April 1994, pp. 544-551. [5] V. Bolotin. Modeling call holding time distributions for CCS network design and performance analysis, IEEE J. Sel. Areas in Commun., Vol.12, No. 3, Apr., 1994, pp.433-438. [6] S. Kullback, Information Theory and Statistics, Wiley, New York, 1959, pp. 37. [7] I. Csiszr, information theoretic methods in probability and statistics, IEEE Information Theory Society Newsletter 48, 1998, pp. 21-30. [8] S.A Laddin, G.igdem, U. Ilhan, M.K. Yeliz, Determining Probability Distribution by Minimum Cross Entropy, Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization, Lisbon, Portugal, September 22-24, 2006, pp. 644-647. [9] M. Srikanth, H.K. Kesavan, H.R. Peter, Probability Density Function Estimation Using the MinMax Measure, IEEE transactions on systems, man, and cybernetics, Vol.30, No.1, February 2000, PP. 1-7. [10] V. Bolotin, "Telephone Circuit Holding Time Distributions", Proc. of 14th ITC, Elsevier Science B.V., 1994, pp. 125-134. [11] E.T. Jaynes, Information Theory and Statistical Mechanics, Reprinted from the Physical Review, Vol. 106, may 15, 1957, pp.620-630. [12] S. Kullback,. and R. A. Leibler, On Information and sufficiency, Annals of Mathematical Statistics, Vol. 22, No. 1 (Mar., 1951), pp. 79-86. [13] T. Cover and J. Thomas, Elements of Information Theory, Wiley, New York, 1991.

Figure 5. Predictions of cumulated call number vs. call duration

Figure 6. Predictions of traffic distribution vs. call duration

Figure 7. Predictions of cumulated traffic vs. call duration

1-4244-1312-5/07/$25.00 2007 IEEE

2991

You might also like