You are on page 1of 5

Towards a Better Integration of Data Mining and Decision Support

via Computational Intelligence

Sylvain Delisle
Département de mathématiques et d’informatique
Université du Québec à Trois-Rivières
Québec, Canada, G9A 5H7
Sylvain_Delisle@uqtr.ca www.uqtr.ca/~delisle

Abstract utterly-specialized (computer science- or mathematics-


related) techniques, whereas research on strategic,
Despite a high level of activity and a large number methodological, and even epistemological aspects of
of researchers involved, current research in data DM are rare—too few people seem to be working at
mining is plagued with several serious problems that the level of the forest and too many at the level of the
should be regarded as top-priority challenges. First, individual tree! Yet another challenge, and one on
large-scale, visible results and benefits seem rare. which we will focus in this paper, is the fact that DM
Second, the field has become utterly specialized and a remains essentially very poorly integrated with the
lot of work remains to be done at the “big picture” decision support level (and its corresponding
level. Finally, data mining is very poorly integrated computerized systems) it is supposed to serve, thus not
with the decision support level. This paper fully realizing its very own raison d'être.
concentrates on the latter challenge and presents a In this paper, we present the core of our new
research proposal in that direction. research project currently in its initial stage: integrating
DM and decision support (DS) through computational
1. Introduction intelligence. Our main goal is to make DM more
accessible and effective for decision makers to support
With the relentless growth of information available application-oriented DS. We believe there is currently
on the Internet, combined with the storage of gigantic a lack of research in this area; progress in that direction
quantities of data on personal and corporate computers, could foster significant advancement in applied DM.
individuals and businesses often find themselves in
situations where they feel overwhelmed by the sheer 2. Research Goals
amount of data they must process to make time-
constrained but informed decisions. The long term goal of this research project is to
To add to this already challenging state of affairs, the discover and evaluate new solutions to the crucial and
last couple of years have brought along a new twist: very challenging problem of poor integration between
valuable information could be "hidden" in your DM and DS. These solutions will be based mainly on
personal/corporate/Web data, and data mining (DM) computational intelligence concepts and tools (e.g.
may allow you to uncover it. And so DM, and various ontologies and reasoning systems), but also on new
related methodologies, techniques and tools are now development methodologies that take DM aspects early
becoming more and more widely used in order to into consideration in the development cycle of
uncover this implicit and potentially useful information information and DS systems. The combination of these
or knowledge. two domains, knowledge and methodology, constitutes
Unfortunately, despite the huge interest and a most promising avenue relative to the current state of
sometimes unrealistic hopes it has generated, DM still the art in computer science. Our short term objective is
faces several serious challenges. One of these is the to find solutions to the following specific problems
fact that large-scale, systematic, and predictable which constitute serious limitations in typical
applications of DM to the benefit of numerous situations of DM and DS integration:
organizations and businesses seem rather uncommon. i) Current DM processes make very little use of
Another challenge is the fact that the majority of already existing corporate knowledge, even when
current progress in DM research seems to be based on such knowledge has been (partially) formalized

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05)
1529-4188/05 $20.00 © 2005 IEEE
and computerized. Consequently, DM is more distributed enterprises information systems and can
tedious than needed and its results tend to produce surely be considered promising as it strives to integrate
“already known information”. the so-called data and computational grids with the
ii) Most DM projects occur as a posteriori knowledge grid—see also [8].
development for which specific databases and pro- One of the rare references dealing in some depth
cessing capabilities must be (re-)developed, and with DM and DS is that of [19] with chapters dedicated
for which special (non-interactive and non-robust) to various aspects of the integration of DM and DS—
means are used to make DM results usable by the these researchers have proposed the equation “data
DS level. Thus, the lack of methodological and mining + decision tools = better business”. Work in
technical integration with standard information meta-learning is also quite relevant as it attempts to
systems causes DM projects to appear as “extra support DM: see METAL (www.metal-kdd.org), [6],
work” for which the return on investment can and [24]. Other recent work in the field of KDD
hardly be measured. (Knowledge Discovery in Databases) is that of [13] in
iii) There is a lack of standardisation in DM that which an ontology is used to model domain knowledge
generates a lot of redundant work and complicates to support the KDD process, that of [20] where an
in a major way the evaluation and comparison of environment for the rapid development of pre-DM
tools, methods, and results. Steps must be made processing chains is introduced, and that of [25] in
towards the adoption of effective DS-oriented DM which expert system and machine learning
standards, at least at the local (e.g. enterprise) technologies are combined to support DM. Work
level. dealing with conceptual queries and online/Web
Because we live in an information-based world, the (interactive) DM is also of interest since it must take
problem and research objectives identified above are of into account some elements of the DS dimension: see
the greatest importance, both at the scientific and [9], for instance.
economic levels. They are directly related to the all- In the domain of DS systems, the Journal of
pervading problem of information overload for which Decision Support Systems published in 2002 a special
DM offers valuable (though partial) solutions, even issue on “directions for the next decade” containing
more so if adequately coupled with modern businesses’ several key papers: [7] propose to integrate DS and
DS systems. We think our project offers great potential knowledge management processes using DM; [21]
for advancing the state of the art in DM applied to propose the concept of a knowledge warehouse to
information and DS systems. manage the knowledge of the firm; and [22] sum-
marize the evolution of DS systems and emphasize the
3. Related Work importance of data warehouses, OLAP, and DM in
those systems. Also of interest is case-based reasoning
The problem of poor DM and DS integration has (CBR) which offers valuable tools for DS, as
only recently been clearly acknowledged in the exemplified in the works of [2], [18], and [23].
scientific literature and little work has been done along Because decision makers tend to use analogy-based
the lines presented here. Some recent conference calls reasoning processes when solving problems and
for papers confirm the relevance of our viewpoint, such because they usually do so by considering only a very
as [1]: “Recently emerged the idea that database limited number of cases, especially in the field of
management systems should provide support and business administration, we hypothesize that CBR
technology for data mining, as it occurred for OLAP constitutes a promising approach for our research
and data warehouses in the last decade”. [15], as project—more on this in the following section.
several others, mentions “integration of data
warehousing, OLAP and data mining” in its list of 4. For a Better Integration of DM and DS
preferred topics of interest. Some conferences are
aimed more specifically at decision making and less so Holsapple & Whinston ([14]) consider that DS
at the technological infrastructure supporting it, as in systems’ raison d’être is to increase the productivity of
the work of [3], while others are associated with the decision makers through implementing their abilities to
field of multiagent systems ([17]): “Integrated manipulate knowledge, by facilitating problem-
knowledge intensive systems emerge in all domains of solving, and by providing an aid for non-structured
business and engineering, when intelligent decision problems. Nowadays, this requires the constant
support … requires knowing how this knowledge is processing of phenomenal quantities of data and
produced, measured, communicated, and information and always involves the use of
interpreted”—see also [27] for an agent-based computerized systems. DM methods and tools, hold
approach to DM. The recent domain of grid the promise of facilitating the decision maker’s life by
intelligence, see e.g. [16], is concerned with large-scale extracting hidden information from a sea of data, and

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05)
1529-4188/05 $20.00 © 2005 IEEE
compacting it into a form easily amenable to decision What we propose, in order to support a fruitful
making. But as we argued before, DM is poorly integration of DM and DS, is to better exploit existing
integrated with DS and, consequently, cannot knowledge—be it background or specialized, or,
effectively support decision makers. relative to the enterprise, internal or external—via what
We start from the work of Bolloju et al. ([7]) in we will be referring to as KIVs, i.e. Knowledge (DM-
order to depict the typical situation addressed by our DS) Integration Vectors. We define a KIV as a source
proposal, from a DS perspective. Indeed, as shown in of data, information or knowledge that plays a role in
Figure 1 below, DM and DS systems are two distinct DM and DS, and in their integration. Examples of
components that do not usually interact through a KIVs are data models, decision models, general and
computerized tool. Thus, DM and DS are not domain-specific ontologies, and general and domain-
integrated at all—Bolloju et al. do not consider this specific rule bases. All KIVs will be exploitable and
question in their work which focuses on knowledge managed by KIVE, i.e. the KIV Environment we
management. From a DM perspective and in relation to propose. KIVE could be added to the architecture
our objectives, we must remark that all too often in depicted in Figure 1 as a new system interfacing with
DM or machine learning, the zero-knowledge (tabula decision makers and interconnected with existing
rasa) hypothesis is used and, not very surprisingly, resources. KIVE will contain a case-based system that
leads to results that can be qualified as “already known will index (see [5]) all relevant information to facilitate
information or knowledge” (e.g. “rediscovering” DM, and ultimately decision making, from the decision
database functional dependencies). maker’s viewpoint.

Figure 1 (Source: Bolloju et al. 2002 [7]).


EIS=Enterprise Information System; DSS=Decision Support System(s); model bases= models used by decision makers

DS is now an important application domain of case- systems; a feature/attribute selection and engineering
based reasoning ([4], [12]) as it embodies well the tool; a catalogue of DM algorithms; a catalogue of
processing performed by actual decision makers, (positive and negative) commented cases that will be
including their potential usefulness with regard to indexed on multiple criteria, including information on
decision explanation ([10]). The case-based approach the decision problem related to it; and a case learner
we suggest here is somewhat similar to that of [20], and knowledge acquisition module.
although more much elaborated. KIVE will not only The development and implementation of the
consider pre-DM processing, but all aspects of DM proposed KIVE environment is a challenging project
related to DS. Also, contrary to [20], we do not that will allow us to study fundamental issues related
exclude unsuccessful cases as they can also be quite to the problem of DM-DS integration. Ultimately, this
informative for DM purposes. Thus, KIVE will will support significant progress in applied DM. We
constitute a DS-oriented DM environment that contains now consider many key questions (in no particular
the following main components: a user interface order) that will motivate our thinking during this
suitable for decision makers that includes a DS- research project:
oriented DM wizard; a catalogue of KIVs which are • KIVs include a wide variety of data, information,
linked to other relevant resources, information and DS and knowledge. What is the best way to represent

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05)
1529-4188/05 $20.00 © 2005 IEEE
KIVs, especially for DM purposes and in a format questions than definite answers. We believe finding
(and language) amenable to DS? How to ensure answers to some of these questions could trigger
that existing corporate knowledge is properly significant advancement in applied DM, especially in
formalized and accessible in due time to supply the context of decision support.
reliable KIVs? Our goal is to develop an intelligent environment
• What types of metadata and ontologies should be (named KIVE) to support decision makers in their
developed to successfully guide the DM process? appropriate application of DM so that they can
How can recent work on meta-learning be put to efficiently make better decisions in a given domain. So
use here? Should metadata be related to a specific the “first order” goal is DM for DS, i.e. DM for
DM methodology and/or to a specific theory of application-domain-related DS. However, intelligently
knowledge? supporting DM involves DM-related DS (e.g. how to
• How to define cases, or derive them from select the most appropriate DM technique during the
databases ([20], [26]) in order to serve both the DM process). Thus, the “second order” goal is DS for
objectives of DM and DS? Meta-learning seems DM. Both first and second order goals will be tackled
promising, as well as knowledge extraction from with DM, DS, and computational intelligence concepts
DM experts. And what is the best way to process and techniques.
(i.e. index, retrieve, assess similarity, adapt) them For instance, KIVE should be able to allow a
within KIVE’s case system? manager to perform an in-depth analysis, involving
• How should the knowledge base of KIVE’s wizard DM, of the last two years’ production data in order to
be developed in the first place, and what type of derive decision trees, classification rules, or regression
rule should it use? Should it put more emphasis on functions that would help him/her better understand
DM or on DS aspects? Should it be data- or how to control production costs and thus be in a
model-oriented, or both? Should the wizard’s rule- position to make better decisions in that area. Another
based system be replaced altogether by a case example could be the use of DM on enterprise data in
system instead (with learning capacities)? order to derive a parameterized model of its overall
• To what extend can the trial and error strategy, performance (see our related work in [11]).
which is often used in DM, be minimized during This work is oriented towards the elaboration of new
the interaction with the KIVE interactive wizard conceptual models and computer-oriented techniques,
when no pre-existing case is available? Here and their implementation and testing in software and
again, meta-learning could be useful. How to related methodologies—but all this while keeping “the
define a productive concept of interactive DS forest” in mind. We hypothesize that the current state
through DM when many DM processes may of the art in DM is limited by hyper-specialized sub-
require a lot more time than what is usually disciplines not sufficiently taking into consideration
understood by “online, interactive”? other DM-related disciplines and challenges from a
broader perspective.
• How can standard development methodologies for
DM is a multidisciplinary field. It obviously involves
information systems and database systems be
statistical and computer-science related techniques, but
revised or adjusted to include right from the start
it also involves decision theory and epistemological
relevant requirements and specifications related to
aspects at an equally important level. We think that
DM and DS? Or should entirely new
much more research is needed on the synergistic
methodologies be devised?
combination of the various sub-fields of DM,
• How DM and DS integration should be balanced especially, as we have argued here, with regard to
(i.e. with more emphasis on DM or on DS) to decision support and computational intelligence.
better serve the needs of typical decision makers?
Should DM-specific and DS-specific functions be
accessible separately (through the wizard) when 6. Acknowledgments
needed? Can we define precisely the notion of We thank the anonymous reviewers who provided
DM-DS integration and propose a theoretical judicious comments that helped us improve our paper.
model supporting it? Can this integration be We acknowledge the financial support of the National
measured or quantified? Sciences and Engineering Research Council of Canada
(NSERC).
5. Conclusion
7. References
We have briefly presented the core of our new [1] ACM/SAC (2005), The 2005 ACM Symposium on
research project currently in its initial stage. Applied Computing, Special Track on Data Mining, CFP,
Consequently, this paper has put forward more Santa Fe (New Mexico, USA), March 2005.

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05)
1529-4188/05 $20.00 © 2005 IEEE
[2] Angehrn, A.A. & S. Dutta (1998), “Case-Based Decision [17] KIMAS (2005), The 2005 IEEE International
Support”, Communications of the ACM, 41(5). Conference on Integration of Knowledge Intensive Multi-
[3] Arnott, D., G. Pervan, P. O’Donnell & G. Dodson (2004), Agent Systems, Waltham (Massachusetts, USA), April 2005.
“An Analysis of Decision Support Systems Research: [18] Lee, J.K. & J.K. Kim (2002), “A Case-Based Reasoning
Preliminary Results”, The 2004 IFIP International Approach for Building a Decision Model”, Expert Systems,
Conference on Decision Support Systems, Prato (Italy), July 19(3).
2004. [19] Mladenic, D., N. Lavrac, M. Bohanec & S. Moyle
[4] Avesani, P., S. Ferrari & A. Susi (2003), “Case-Based (2003), Data Mining and Decision Support (integration and
Ranking for Decision Support Systems”, The ICCBR 2003 collaboration), Kluwer.
Conference, Lecture Notes in Artificial Intelligence 2689 [20] Morik, K. & M. Scholz (2004), “The MiningMart
(Spinger), 35-49, Trondheim (Norway), June 2003. Approach to Knowledge Discovery in Databases”, in
[5] Azuaje, F., W. Dubitzky, N. Black & K. Adamson Intelligent Technologies for Information Analysis, N. Zhong
(2000), “Retrieval Strategies for Case-Based Reasoning: A & J. Liu (eds.), Springer.
Categorized Bibliography”, The Knowledge Engineering [21] Nemati, H.R., D.M. Steiger, L.S. Iyer & R.T.Herschel
Review, 15(4), 371-379. (2002), “Knowledge Warehouse: An Architectural
[6] Blanzieri, B., P. Giorgini, P.Massa & S. Recla (2001), Integration of Knowledge Management, Decision Support,
“Data Mining, Decision Support and Meta-Learning: Artificial Intelligence and Data Warehousing”, Decision
Towards an Implicit Culture Architecture for KDD”, Support Systems, 33, 143-161.
Workshop on Positions, Developments and Future [22] Shim, J.P., M. Warkentin, J.F. Courteny, D.J. Power, R.
Directions, 2001 ECML/PKDD Conference, Freiburg Sharda & C. Carlsson (2002), “Past, Present, and Future of
(Germany), September 2001. Decision Support Technology”, Decision Support Systems,
[7] Bolloju, N., M. Khalifa & E. Turban (2002), “Integrating 33, 111-126.
Knowledge Management into Enterprise Environments for [23] Sun, B., L.D. Xu, X. Pei & H. Li (2003), “Scenario-
the Next Generation Decision Support”, Decision Support Based Knowledge Representation in Case-Based Reasoning
Systems, 33, 163-176. Systems”, Expert Systems, 20(2).
[8] Cannataro, M. & D. Talia (2003), “The Knowledge [24] Vilalta, R., C. Giraud-Carrier, P. Brazdil & C. Soares
Grid”, Communications of the ACM, 46(1). (2004), “Using Meta-Learning to Support Data Mining”,
[9] Chen, Q., X. Wu & X. Zhu (2004), “OIDM: Online International Journal of Computer Science and Applications,
Interactive Data Mining”, The 2004 IEA/AIE Conference, 1(1), 31-45.
Lecture Notes in Artificial Intelligence 3029 (Springer- [25] Weiss, S.M., S.J. Buckley, S. Kapoor & S. Damgaard
Verlag), 66-76, Ottawa (Canada), May 2004. (2003), “Knowledge-Based Data Mining”, The 2003
[10] Cunningham, P., D. Doyle & J. Loughrey (2003), “An SIGKDD Conference, Washington (D.C., USA), August
Evaluation of the Usefulness of Case-Based Explanation”, 2003.
The ICCBR 2003 Conference, Lecture Notes in Artificial [26] Yang, Q. & H. Cheng (2003), “Case Mining from Large
Intelligence 2689 (Spinger), 122-130, Trondheim (Norway), Databases”, The ICCBR 2003 Conference, Lecture Notes in
June 2003. Artificial Intelligence 2689 (Spinger), 691-702, Trondheim
[11] Delisle, S., J. St-Pierre & T. Copeck (to appear), “A (Norway), June 2003.
Hybrid Diagnostic-Advisory System for Small and Medium- [27] Zhang, Z., C. Zhang & S. Zhang (2003), “An Agent-
Sized Enterprises: A Successful AI Application", Applied Based Hybrid Framework for Database Mining”, Applied
Intelligence (The International Journal of Artificial Artificial Intelligence, 17, 383-398.
Intelligence, Neural Networks, and Complex Problem-
Solving Technologies), Springer.
th
[12] ECCBR (2004), The 7 European Conference in Case-
Based Reasoning, CFP, Madrid (Spain), August 2004.
[13] Euler, T. & M. Scholz (2004), “Using Ontologies in a
KDD Workbench”, Workshop on Knowledge Discovery and
Ontologies, 2004 ECML/PKDD Conference, Pisa (Italy),
September 2004.
[14] Holsapple, C.W. & A.B. Whinston (1996), Decision
Support Systems: A Knowledge-Based Approach, West.
[15] ICDM (2004), The Fourth IEEE International
Conference on Data Mining, CFP, Brighton (U.K.),
November 2004.
[16] KGGI (2003), Workshop on Knowledge Grid and Grid
Intelligence, The 2003 IEEE/WIC International Conference
on Web Intelligence / Intelligent Agent Technology, Halifax
(Nova-Scotia, Canada), October 2003.

Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05)
1529-4188/05 $20.00 © 2005 IEEE

You might also like