Professional Documents
Culture Documents
ABSTRACT
Due to the strong competition and economic hardship, sales forecasting is a challenging
problem as the demand fluctuation is influenced by many factors. A good forecasting
model leads to improve the customers satisfaction, reduce destruction of fresh food,
increase sales revenue and make production plan efficiently. In this study, the GELM
forecasting model integrates Grey Relation Analysis (GRA) and extreme learning machine
(ELM) to support purchasing decisions in the retail industry. GRA can sieve out the more
influential factors from raw data and transforms them as the input data in a novel neural
network such as ELM that can abandon the slow gradient-based learning speed and
parameters tuned iteratively. The proposed system evaluated the real sales data of fresh
food in the retail industry. The experimental results indicate the GELM model outperforms
than other time series forecasting models, such as GARCH, GBPN and the GMFLN model
in predicting accuracy and training speed. Otherwise, the different activation functions of
the GELM model have significant differences in training time and performance during our
experiments.
Keywords: Sales Forecasting, Grey Relation Analysis, Extreme Learning Machine, Retail
Industry, Activation Functions
intuition instead of model-based approaches. In this weather, promotion, competitive market, etc.
paper, we present a relatively novel neural network Therefore, traditional methodologies require some
methodology, Grey relation analysis integrated with improvements for providing better forecasting
extreme learning machine (GELM) to construct a suggestions.
forecasting model in the fresh food sector of the retail Next, we will briefly introduce the traditional
industry. statistical forecasting models and ANN model in
Sales in the retail sector exhibit strong seasonal sales forecasting applications.
variations. Historically, modeling and forecasting
seasonal data is one of the major research efforts and 2.1 Traditional Statistical Model for Time Series
many theoretical and heuristic methods have been Data Forecasting
developed in the last several decades. The available In the past several decades, many researchers
traditional quantitative approaches include heuristic had used many kinds of forecasting methods to study
methods such as time series decomposition and time series events. Univariate time series models
exponential smoothing as well as time series include the moving average model, exponential
regression and autoregressive and integrated moving smoothing model, and auto-regressive integrated
average (ARIMA) models that have formal statistical moving average (ARIMA) model. Box and Jenkins
foundations [7]. Nevertheless, their forecasting ability [9] developed ARIMA, a basic principle of this
is limited by their assumption of a linear behavior and model is the assumption of linearity among the
thus, it is not always satisfactory [37]. Recently, variables. However, many time series events may not
artificial neural network (ANN) have been applied hold on the linearity assumption. Clearly, ARIMA
comprehensively in sales forecasting [17,31], pattern models could not be effectively used to capture and
recognition [26], aggregate retail [7], PCB industry explain non-linear relationships, especially for
[11]. Most studies indicate that ANN have the better handling actual sales forecasting problems. When it is
performance than conventional methodology [23,24]. applied to processes that are non-linear, forecasting
This flexible data-driven modeling property has made errors often increase greatly as the forecasting
the ANN model an attractive tool for many horizon becomes longer. For improving forecasting
forecasting tasks. However, most ANN and its non-linear time series events, many researchers have
varieties used gradient-bases learning algorithms, developed alternative modeling approaches. These
such as back-propagation network (BPN), and faced approaches include non-linear regression models, the
many difficulties in stopping criteria, learning rate, bilinear model, the threshold auto-regressive model,
learning epochs, over-tuning, local minima and long the auto-regressive heteroscedastic model (ARCH)
computing time. A new learning algorithm for [16] and generalized auto-regression conditional
single-hidden-layer feed-forward neural network heterskedasticity (GARCH) model [4].
(SLFN) called the extreme learning machine (ELM) Although the traditional methods have been
has been proposed recently and overcome the proved somewhat effective, they still have certain
previous disadvantages as we mentioned [18,19,30, shortcomings. Zhang [36] indicated that although
32,34]. these methods had displayed some improvements
The rest of this study will illustrate the GELM over the linear models in some specific cases, they
model for improving the accuracy of forecasting fresh tended to be applied to special events, and lacked
foods in the retail industry. Section 2 reviews the generality and were poorly implement.
related sales forecasting literatures including the
traditional statistical model and the ANN model. 2.2 ANN Model in Time Series Data Forecasting
Section 3 presents the methodology of this study in The ANN model is a model-free approach that
solving the real forecasting problems. Section 4 was been recently applied in forecasting due to its
describes the development of various forecasting competent performance in forecasting and pattern
models and discusses the comparison results. Then recognition. In general, it consists of a collection of
the conclusion will be provided in Section 5. simple non-linear computing elements whose inputs
and outputs are tied together to form a network. Many
studies have attempted to apply ANN model to time
2. LITERATURES REVIEW series forecasting. Weigend. et al. [35] introduced the
''weight-elimination'' back-propagation learning
The available traditional time series forecasting procedure and applied it to sunspots and
approaches are divided into two groups i.e. the exchange-rate time series. Tang and Han [33]
univariate time series model and multivariate time compared the ANN model with the ARIMA model by
series model. One of the major limitations of using international airline passenger traffic, domestic
traditional statistical methods is that they are car sales and foreign car sales in the USA.
essentially linear methods. The sales status of fresh Chakraborty et al. [10] presented an ANN approach
food is often influenced by uncertain factors such as based on multivariate time-series analysis, which can
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating GRA and ELM 109
accurately predict the flour prices in three cities in the inventory levels. Au et al. [6] and Sun et al.[37]
USA. Lachtermacher et al.[20] developed a calibrated developed different sales forecasting models in
ANN model. In this model, the Box-Jenkins methods fashion retailing. Au et al. [6] illustrated evolutionary
are used to determine the lag components of the input neuron network for sales forecasting and showed that
data. Moreover, it employed a heuristics method to when guided with the BIC and the pre-search
choose the number of hidden units. approach, the non-fully connected neuron network
Ansuj et al. [5] expressed a comparison made can converge faster and more accurate in forecasting
for the time series model with interventions related to for time series than the fully connected neuron
the ANN model for analyzing the sales behavior of a network and traditional SARIMA model. Forecasting
medium-size enterprise. The results showed that the is often time crucial, the improvement of convergence
ANN model was more accurate. Furthermore, Bigus speed makes widely applicable to decision-making
[7] used promotion, time of year, end of month age, problems. Sun et al [37] applied ELM neural network
and weekly sales as inputs for the ANN model to model to investigate the relationship between sales
forecast the weekly demand with promising results. amount and some significant factors which affect
Kuo and Chen [22] believed that the traditional demand. The experiment results demonstrate that the
statistic approaches had higher performance dealing proposed methods outperform than back-propagation
with data of seasonality and trends, but they are neural network model. Ali et al. [3] explored
inappropriate for unexpected situations. forecasting accuracy versus data and model
In the ELM method, the input weights (linking complexity tradeoff in the grocery retailing sales
the input layer to the hidden layer) and hidden biases forecasting problem, by considering a wide spectrum
are randomly chosen and the output weights (linking in data and technique complexity. The experiment
the hidden layer to the output layer) are analytically results indicated that simple time series techniques
determined by using the Moore-Penrose (MP) perform very well for periods without promotions.
generalized inverse. As this new learning algorithm However, for periods with promotions, regression
can be easily implemented, it tends to identify the trees with explicit features improve accuracy
smallest training error, obtains the smallest norm of substantially. More sophisticated input is only
weights and the good generalization performance, beneficial when advanced techniques are used. Chen
and runs extremely fast. et al. [12] developed the GMFLN forecasting model
by integrating GRA and MFLN neural networks.
2.3 Demand Forecasting of the Retail Industry GRA sieves out the more influential factors from raw
Chu and Zhang [13] and Alon et al.[4] data then transforms them as the input data in the
developed the artificial networks for forecasting the MFLN model. The experimental results indicated the
aggregate retail sales. Alon et al.[21] compared with proposed forecasting model outperforms than MA,
traditional methods including Winter exponential ARIMA and GARCH forecasting model of the retail
smoothing, Box-Jenkins ARIMA model, and goods.
multivariate regression. The derivative analysis According to the above literature review, the
shows that the nonlinear neural networks model is retail forecasting problems are usually a time and
able to capture the dynamic nonlinear trend and accuracy crucial issue. This paper aims to construct a
seasonal patterns, as well as the interactions between more efficiently sales forecasting model that could
them. Chu et al.[7] found the non-linear models are perform more accurate and faster than the univariate
able to outperform linear counterparts in and multivariate time series model for retail goods.
out-of-sample forecasting, and prior seasonal As we know, sales will be affected by many dynamic
adjustment of the data can significantly improve factors. GRA and the expert knowledge will sieve the
performance of the neural network model. The more influential factors out as the input variables of
overall best model is the neural network built on the ELM model. Providing an improved forecasting
deseasonalized time series data. Doganis et al. [15] method that can help the managers to make decisions
also presented a evolutionary sales forecasting model for ordering the appropriate amounts will be the focal
which is a combination of two artificial intelligence point of this research.
technologies, namely the radial basis function and
genetic algorithm. The methodology is applied 3. METHODOLOGY
successfully to sales data of fresh milk provided by a
major manufacturing company of daily product. The following section presents the purposed
Aburto and Weber [1] presented a hybrid intelligent sales forecasting model by integrating GRA and
system combing ARIMA model and MLP neural ELM. The GRA computes the Grey Relation Grades
networks for demand forecasting. It shows (GRG), which are the influential degree of a
improvements in forecasting accuracy and a compared series by relative distance. Subsequently,
replenishment system for a Chilean supermarket, the data composed of these input and output pairs are
which leads simultaneously to fewer sales and lower divided into training, testing and predicting data. All
110 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)
the data sets should be normalized into a specific 3.2 Normalization and Unnormalization
range [-1,1]. The ELM would offer predicting results The normalized method for the input and
then process the unnormalization step, to convert the output data set is described as follows:
data back into unnormalized outcomes.
X ij Max{ X ij } ( X ij Min{ X ij })
3.1 Grey Relation Analysis (GRA) X normalize
( Max{ X ij } Min{ X ij })
Deng [14] proposed the Grey Relation Analysis
(GRA) mathematics. It has been successfully applied i 1,2,..., n; j 1,2,..., N (3)
in many fields such as management, economics, and The unnormalized method for the predicting
engineering. The Grey Relation Grades (GRG) is the result is described as follows:
influence degree of a compared series on the
reference series that can be represented by the relative Punnormalize
distance. The smaller distance would have more Pij ( Max{ X ij } Min{ X ij }) Max{ X ij } Min{ X ij }
influence. The degree of influence describes the
2
relative variations between two factors that indicate i 1,2,..., n; j 1,2,..., N (4)
the magnitude and gradient in a given system. The
GRG between two series, the compared series and the
reference series, is called relational coefficient 3.3 Extreme Learning Machine (ELM)
r ( x0 (k ), xi (k )) . Before calculating the Grey relational ELM is a single hidden-layer feed-forward
neural network (SLFN). It randomly chooses the
coefficients, each data series must be normalized by input weight matrix W and analytically determines
dividing the respective data from the original series the output weight matrix of SLFN. Suppose that we
with their averages.
are training a SLFN with K hidden neurons and an
After performing Grey data processing, the
transformed reference sequence is x0={x0(1), x0(2),..., activation function vectors g ( x) [ g1 ( x), g 2 ( x),...,
x0(n)}. The compared sequences are denoted by g k (x)] to learn N distinct samples ( xi , ti ) , where
xi={xi(1), xi(2),,xi(n)}, i=1 to m. The relational xi [ xi1 , xi 2 ,..., xin ]T R n and ti [ti1 , ti 2 ,...tim ]T R m .
coefficient r ( x0 (k ), xi (k )) between the reference
If the SLFN can approximate the N samples with a
series x0(t) and the compared series xi(t) at time t=k zero error then we have
can be calculated using the following equation [20]:
N
r ( x0 , xi (k )) j 1
|| y j t j || 0 (5)
min min | x0 (k ) xi (k ) | max max | x0 (k ) xi (k ) |
i k i k
Where y is the actual output value of the
| x0 (k ) xi (k ) | max max | x0 (k ) xi (k ) |
i k SLFN. There also exist parameters i , wi and bi
k=1,2,,mi=1,2,,m (1) such that
N
t 1
( At Ft ) 2
MSE (11)
N 1
1
N
MAD t 1
At Ft (12) Figure 2: The framework and non-linear
N
transformation of the GELM network
Where At is the actual amount and Ft is the 3.5 The Properties of the GELM Forecasting
forecasting amount, respectively. Model of the Retail Industry
Step 9: Repeat step 6~8 for the same data The GELM forecasting model combined Grey
The GELM model will offer the best predicted relation analysis (GRA) and Extreme learning
results then measure the accuracy of those results. We machine (ELM) methodologies. The GRA in the grey
will do some statistical tests (paired t-test) on system is an important problem-solving method that
obtained results of the sigmoidal activation is used when dealing with the similarity measures of
function, sine activation function and hardlim complex relations. The main purpose of GRA in the
activation function. proposed hybrid-forecasting model is to realize the
relationship between two sets of time series data in
relational space [25]. The Grey relational grade
(GRG) is a globalized measure adopted for GRA. It is
used to describe and explain the relation between two
sets. If the data for the two sets at all individual time
points were the same, then all the relational
coefficients would equal one. The great GRG
between two sets, the closer the relationship between
the sets are. The higher GRG of the candidate data
sets would be the delegates as the input data sets of
the GELM model for enhancing the predict ability.
Owing to the learning speed of the feedforward
neural network is far slower than required and it has
been a major bottleneck in practical application for
past decades. This study applied ELM for
single-hidden layer feed-forward neural networks that
randomly chooses hidden nodes and analytically
determines the output weights of the networks. The
major property of the ELM can abandon the slow
gradient-based learning speed and parameters tuned
iteratively algorithms that are extensively used to
Figure1: Outline of present study train neural network then provide good generalization
forecasting performance at extremely fast learning
Figure 2 shows the framework and non-linear speed.
transformation of the GELM network that The limitation of the purposed GELM
incorporates input layers, hidden layers and output forecasting model is that lacks to consider the
layers. Generally, GELM model in practical influence of the financial crisis, free trade agreement,
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating GRA and ELM 113
4. EXPERIMENT RESULTS
AND DISCUSSIONS
In operational management in the retail
industry, it is indispensable to forecast the further
Figure 3: The sale quantities of the target brand
demand and place orders at various times of the day.
If the system can offer more accurate prediction
As we know, many factors will affect consumers
functions that can assist managers to cater for the
behavior in the actual retail industry. Among those
demand of customers and reduce scraped quantities
factors that would be described include how to select
of fresh food. Using the GELM model to predict sales
the most influential indices by using the analytical
amounts can increase the accuracy in the proposed
methodology to be the input data of the ELM model
system. The procedures of the experiments and the
as below. After normalizing the raw data and
results are described sequentially in the following
calculating the GRG of each index. The expert
subsections.
selected three factors with higher GRG to be the input
This study compared the GELM forecasting
data of the multivariate time series model and ELM
model with the multivariate statistical forecasting
model. The GRG of each factor is shown in Table 1.
methods such as the GARCH model, the
The selected factors will represent the more
Back-Propagation Network (BPN) as well as the
influential in the sales amounts of fresh food. The
GBPN and the GMFLN model by forecasting 120
three selected factors are W, TAs and TBS.
days sales. The GBPN model integrates the GRA and
Back-Propagation Networks and the GMFLN model
4.2 Experiment Results
integrate the GRA and Multilayer Functional Link
The experimental algorithms of the GARCH,
Networks. The GARCH model is built by E-view and
GBPN, GMFLM, and GELM import the same data
the simulation of relative BPN model are conducted
sets including three indices (W, TAs and TBS )
in MATLAB running on an ordinary notebook with a
selected by GRA and the last 7 days lagged data.
1.4 GHz CPU and 760MB RAM.
4.2.1 GARCH Forecasting Model
4.1 Data Collection and Analysis
Bollerslev [8] proposed the GARCH
Well-known retailers and a government
(Generalized ARCH) conditional variance
organization in Taiwan provided the initial data that
specification that allows for a parsimonious
can be separated into three different groups. Firstly,
parameterization of the lag structure. In analyzing the
the target store collected the daily sales data and price
time series model, several suitable models could
of 960ml containers of milk. The total numbers of the
explain the input data. We adopt two statistics to be
data was 334 as shown in Figure 3. We also collected
the criterions for choosing the best statistical
the sales amounts of other two different brands and
forecasting model.
their prices respectively. Ordinarily, the sales price
1. AIC (Akaikes Information Criterion)
would not be a fixed number, as it will be adjusted
Akaike [2] provided the following criterion to
due to many reasons such as promotion, the hot/cold
evaluate the fitness of the proposal statistical models.
season or some specific activities. Secondly, the sales
(Data set fitted by P parameters of the statistics
data was also obtained from two neighboring stores.
models.)
Those neighboring stores are in the same distribution
area. Stores were close to each other and they
serviced the same customers. We also collected the AIC(P)= n Ln( a2 ) 2 P (13)
sales amounts and price data from other stores.
Thirdly, the Central Weather Bureau provided the 2. SBC (Schwartzs Bayesian Criterion)
local weather records. Schawrtz [28] provided the similar criterion to
evaluate the fitness of the statistical models.
and TBS) to predict the 120 days demand. After +0.06524TAs + 0.06403TBS-0.81732 t-1+
examining AIC(-0.75645) and SBC(-0.60712) the 0.93919 t-2-0.69544 t-3+0.14682 t-5+ t
best adapted model is described below. t~N(0, 2t)
2t0.02877-0.09753 2t-2 (15)
yt=0.83341yt-1-0.84852yt-2+0.79288yt-3+0.37308Wt
4.2.2 GBPN Forecasting Model networks [12]. It is composed of one or two hidden
Generally, the BPN is a typical type of artificial layers that have competent continuous function in a
neural networks model, which is a class of theoretical time-series. In the analogous models, the
generalized non-linear nonparametric model that was hidden nodes are used to capture the non-linear
inspired by studies of the brain and nervous system. structures. Making the decision for how many hidden
BPN is composed of several layers of input, hidden nodes should be used is another difficult issue in the
and output nodes. It is a challenge to develop neural network forecasting model construction
appropriate size of BPN model for combining the process. In practice, the numbers of hidden nodes
available data in the training data and the testing data. were chosen through experiments or by
The structure size of the model depends on the trial-and-error without any theoretical basis to guide
number of input nodes and the number of hidden the decision.
nodes. There are no systematic reports on the Some theories suggest that more hidden nodes
decision of input and hidden nodes. Different input can increase the accuracy in approximating a
and hidden nodes have a significant impact on the functional relationship but it still causes the
learning and prediction ability of the network. over-fitting problem. This problem is more likely to
As mentioned before, the purpose of GRA is to happen in the GMFLN model than in other statistical
realize the relationship between two sets of time models. The over-fitting problem solution is to find a
series data in a relational space. In the GBPN model, parsimonious model that fits the data well. Another
the input nodes of the neural network are usually the way to tackle the over-fitting problem is to divide the
past, lagged observations and more influential factors time series into three sets; training, testing and
that will affect the sales amounts, and the output node validation [21]. The first two sets are used for model
is the real sales data. We expect to obtain an building and the last is used for model validation or
applicable GBPN forecasting model that has evaluation. The best GMFLN model is the one that
generalization and good forecasting capability. gives the best results in the predicting set.
accuracy, we designed the experiments with different and three indices (W, TAs and TBS) to predict the
activation functions for the number of hidden nodes. 120 days demand.
In the GELM forecasting model, we compare the Table 2 shows the training time of GELM,
accuracy with sigmoidal activation function, and GBPN and GMFLM. The GELM learning algorithm
hardlim activation function in the different numbers spent 0.3705s CPU time with the Sigmoidal
of hidden nodes. The numbers of hidden nodes are activation function and 200 hidden nodes. The
selected from the 20, 50, 100 and 200. The GELM traditional gradient-based learning algorithm as
forecasting model will use the same time series data GBPN and GMFLN cost too much training time
compared with GELM.
Table 3 shows the performance of the GELM compare training time and performance with different
forecasting model in different activation functions activation functions we tested 30 times each run and
and hidden nodes. The more hidden node has a better did some statistical tests (paired t-test) on obtained
ability to predict the sales amounts. The best results to examine the statistically significant
forecasting results have MAD of 0.07039 and MSE difference. The paired t-test is a widely used method
of 0.00907 with sigmoidal activation function and to examine whether the average difference of
200 hidden nodes. performance between two methods over various data
In the GELM model, the input weights and sets is significantly from zero. If the p-value
hidden biases are randomly chosen and the output generated by a paired t-test is lower than the
weights are analytically determined by using the significant level (0.05) that indicate the difference
Moore-Penrose generalized inverse. In order to between the two methods.
Table 4 shows the training time of the GELM nodes. The p-value in sigmoidal and sine
with different activation functions and different activation function is always lower than 0.05, which
hidden nodes. There is no significant difference when means there is a significant difference between these
the numbers of hidden nodes are 20 and 50. When two activation functions and the performance of the
hidden nodes are 100, the training time of the hardlim sigmoidal activation function is always better than the
activation function is significantly different between sine activation function. The hidden nodes are 20, 50
sigmoidal and sine activation function. But the and 100, the p-value in the sigmoidal and the hardlim
sigmoidal and sine activation function have no activation function are lower than 0.05, the sigmoidal
difference. When hidden nodes are 200, these three activation function is significantly better than hardlim
activation functions have significant differences. The activation function. The hidden nodes are 20, 100 and
hardlim activation function is better than the 200, the p-value in the sine and the hardlim activation
sigmoidal activation function and the sigmoidal function are lower than 0.05, the hardlim activation
activation function is better than the sine activation function is significantly better than sine activation
function. Table 5 shows the MAD of GELM within function.
different activation functions and different hidden
116 International Journal of Electronic Business Management, Vol. 9, No. 2 (2011)
Table 5: Paired t-test results of predicting between different activation function in MAD
Paired Differences
Hidden Paired 95% Confidence Interval of
nodes Methods Mean StDev the Difference t P values
Lower Upper
Sig.Sin. -0.00735 0.00756 -0.01017 -0.00452 -5.32 0.000*
20 Sig.Har. -0.00272 0.00435 -0.00434 -0.00110 -3.43 0.002*
Sin.Har. 0.00463 0.00832 0.00152 0.00773 3.05 0.005*
Sig.Sin. -0.00538 0.00971 -0.00901 -0.00176 -3.04 0.005*
50 Sig.Har. -0.00413 0.00925 -0.00758 -0.00068 -2.45 0.021*
Sin.Har. 0.00125 0.01081 -0.00278 0.00529 0.64 0.530
Sig.Sin. -0.00837 0.00620 -0.01068 -0.00605 -7.40 0.000*
100 Sig.Har. -0.00378 0.00466 -0.00552 -0.00203 -4.43 0.000*
Sin.Har. 0.00459 0.00737 0.00184 0.00733 3.42 0.002*
Sig.Sin. -0.00485 0.00532 -0.00684 -0.00286 -4.99 0.000*
200 Sig.Har. 0.01204 0.04376 -0.00430 0.02838 1.51 0.143
Sin.Har. 0.01689 0.04478 0.00017 0.03361 2.07 0.048*
Table 6 shows the MSE of GELM within function. But, there is no significant difference
different activation functions and different hidden between sigmoidal vs. hardlim or sine vs.
nodes. The p-value in sigmoidal and sine hardlim.
activation function are always lower than 0.05, which
means there is a significant difference between these 4.3 Discussion
two activation functions and the performance of Table 7 presents the results of different
sigmoidal activation function is always better than forecasting models. The best GARCH model has
sine activation function. The hidden nodes are 50 and MAD of 0.13876 and MSE of 0.03191. The best
100, the p-value in sigmoidal and hardlim is are forecasting result of GBPN model has MAD of
lower than 0.05, the sigmoidal activation function is 0.09837 and MSE of 0.01979. The best forecasting
significantly better than hardlim activation function. result has MAD of 0.08911 and MSE of 0.01883. The
The hidden nodes are 20 and 200, the p-value in best GELM model has MAD of 0.07039 and MSE of
sine and hardlim activation function are lower 0.00907. The GELM forecasting model we proposed
than 0.05, the hardlim activation function is has the smallest predicting errors and the learning
significantly better than sine activation function. speed is extremely faster than others.
From above results, the sigmoidal activation function
has significant differences between the sine activation
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating GRA and ELM 117
Table 6: Paired t-test results of predicting between different activation function in MSE
Paired Differences
Hidden Paired 95% Confidence Interval of
nodes Methods Mean StDev the Difference t P values
Lower Upper
SigSin -0.00321 0.00316 -0.00439 -0.00203 -5.56 0.000*
20 SigHar -0.00051 0.00174 -0.00116 0.00014 -1.62 0.116
SinHar 0.00269 0.00072 0.00122 0.00417 3.74 0.001*
SigSin -0.00184 0.00329 -0.00307 -0.00061 -3.07 0.005*
50 SigHar -0.00176 0.00280 -0.00280 -0.00071 -3.43 0.002*
SinHar 0.00009 0.00375 -0.00131 0.00149 0.13 0.900
SigSin -0.00213 0.00250 -0.00306 -0.00120 -4.67 0.000*
100 SigHar -0.00138 0.00167 -0.00020 -0.00075 -4.52 0.000*
SinHar 0.00075 0.00246 -0.00016 0.00167 1.68 0.104
SigSin -0.00083 0.00106 -0.00123 0.00044 -4.29 0.000*
200 SigHar 0.00168 0.00578 -0.00047 0.00384 1.60 0.121
SinHar 0.00252 0.00597 0.00029 0.04750 2.31 0.028*
Therefore, GELM is a valid and effective forecasting 10. Chakraborty, K., Mehrotra, K. and Mohan, C.
tool that can be further applied in similar field for K., 1992, Forecasting the behavior of
applications. multivariate time series using neural networks,
Examining the performance with different Neural Networks, Vol. 5, No. 6, pp. 961-970.
activation functions by a paired t-test, the sigmoidal 11. Chang, P. C. and Wang, Y. W, 2006, Fuzzy
activation function has significant differences with Delphi and back-propagation model for sales
the sine activation function in MAD and MSE forecasting in PCB industry, Expert Systems
criterions. with Applications, Vol. 30, No. 4, pp. 715-726.
In this paper, our experiments have 12. Chen, F. L. and Ou, T. Y., 2009, Grey relation
analysis and multilayer function link network
successfully demonstrated the GELM can be well
sales forecasting model for perishable food in
employed in sales forecasting for the retail industry.
convenience store, Expert Systems with
It not only provides smaller predicting errors but also
Application, Vol. 36, No. 3, pp. 7054-7063.
improves the training speed more than other 13. Chu, C. W. and Zhang, G. P., 2003, A
forecasting models. Future research will focus on the comparative study of linear and nonlinear
different temperature levels of fresh food in the retail models for aggregate retail sales forecasting,
industry and improve the stability and learning speed International Journal of Production Economics,
of the GELM model. Vol. 86, No. 3, pp. 217-231.
14. Deng, J. L., 1982, Control problems of Grey
REFERENCES systems, System Control Letter, Vol. 1, No. 4,
pp. 288-294.
1. Aburto, L. and Weber, R., 2007, Improved 15. Doganis, P., Alexandrids, A., Patrinos, P. and
supply chain management based on hybrid Sarimveis, H., 2006, Time series sales
demand forecasts, Applied Soft Computing, forecasting for short shelf-life food products
Vol. 7, No. 1, pp. 126-144. based on artificial neural networks and
2. Akaike, H., 1974, A new look at the statistical evolutionary computing, Journal of Food
model identification, IEEE Transactions on Engineering, Vol. 75, No. 2, pp. 196-204.
Automatic Control, Vol. 19, No. 6, pp. 16. Engle, R. F., 1982, Autoregressive conditional
716-723. heteroskedasticity with estimates of the
3. Ali, . G., Sayin, S., Woensel, T. V. and variance of U.K. inflation, Econometrica, Vol.
Fransoo, J., 2009, SKU demand forecasting in 50, No. 4, pp. 987-1008.
the presence of promotions, Expert Systems 17. Frank, C., Garg, A., Sztandera, L. and Raheja,
with Application, Vol. 36, No. 10, pp. A., 2003, Forecasting womens apparel sales
12340-12348. using mathematical modeling, International
4. Alon, I., Qi, M. and Sadowski, R. J., 2001, Journal of Clothing Science and Technology,
Forecasting aggregate retail sales: A Vol. 15, No. 2, pp. 107-125.
comparison of artificial neural networks and 18. Huang, G. B., 2003, Learning capability and
traditional methods, Journal of Retailing and strong capacity of two-hidden-layer
Consumer Services, Vol. 8, No. 3, pp. 147-156. feedforward networks, IEEE Transactions on
5. Ansuj, A. P., Camargo, M. E., Radharamanan, Neural Networks, Vol. 14, No. 2, pp. 274-281.
R. and Petry, D. G., 1996, Sales forecasting 19. Huang, G. B., Zhu, Q. Y. and Siew, C. K., 2006,
using time series and neural networks, Extreme learning machine: Theory and
Computers and Industrial Engineering, Vol. 31, applications, Neurocomputing, Vol. 70, No.
No. 1-2, pp. 421-424. 1-3, pp. 489-501.
6. Au, K. F., Choi, T. M. and Yu, Y., 2008, 20. Huang, S. T., Chiu, N. H. and Chen, L. W.,
Fashion retail forecasting by evolutionary 2008, Integration of grey relational analysis
neural networks, International Journal of with genetic algorithm for software effort
Production Economics, Vol. 114, No. 2, pp. estimation, European Journal of Operational
615-630. Research, Vol. 188, No. 3 , pp. 898-909.
7. Bigus, J. P., 1996, Data Mining with Neural 21. Kaastra, I. and Boyd, M., 1996, Designing a
Networks: Solving Business Problems - From neural network for forecasting financial and
Application Development to Decision Support, economic time series, Neurocomputing, Vol.
McGraw-Hill, New York. 10, No. 3, pp. 215-236.
8. Bollerslev, T., 1986, Generalized 22. Kuo, R. J. and Chen, J. A., 2004, A decision
autoregressive conditional heteroskedasticity, support system for order selection in electronic
Journal of Econometrics, Vol. 31, No. 3, pp. commerce based on fuzzy neural network
307-327. supported by real-coded genetic algorithm,
9. Box, G. E. P. and Jenkins, G. M., 1976, Time Expert Systems with Application, Vol. 26, No. 2,
series analysis forecasting and control, pp. 141-154.
Management Science, Vol. 17, No. 4, pp. 23. Kuo, R. J., 2001, A sales forecasting system
141-164.
F. L. Chen and T. Y. Ou: Constructing a Sales Forecasting Model by Integrating Gra and Elm 119
APPENDIX
Appendix 1 Appendix 2
Moore-penrose Generalized Inverse Minimum Norm Least-square Solutions of
The resolution of a general linear system General Linear System
Ax y , where A may be singular and may even For general linear system Ax y , we say that
not be square, can be made very simple by the use of x is a least-square solutions if
Moore-Penrose generalized inversed [29].
Definition 1: A matrix G of order n m is the Ax y min x Ax y where |||| is a norm in
Moore-penrose generalized inverse matrix A of order
m n , if Euclidean space (15)
*
101
GELM
GELM
GARCHGBPNGMFLNGELM
(activation function)
*d927810@oz..nthu.edu.tw