DBMS Data Mining

1
Data Mining - Beer and Nappies
On Thursday nights people who buy diapers also tend to buy beer
Introduction
Data is growing at a phenomenal rate
Users expect more sophisticated information How?
UNCOVER HIDDEN INFORMATION DATA MINING
Data Mining
Data mining, the extraction of hidden predictive
information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledgedriven decisions.
Data mining
Data mining involves the use of sophisticated data
analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms, and machine learning methods (algorithms that improve their performance automatically through experience, such as neural networks or decision trees). Consequently, data mining consists of more than collecting and managing data, it also includes analysis and prediction.
5
Data Mining Algorithm

Objective: Fit Data to a Model Descriptive Predictive Preference Technique to choose the best model Search Technique to search the data Query
Data Mining
Descriptive
Identify and describe groups of customers with
common buying behavior
Data Mining
Predictive
Given a customers characteristics a model predicts
how much the customer will spend on the next catalog order.
Predicting likelihood (probability) a customer would respond to an offer

8
Data Mining Models and Tasks
Data Mining
Association (purchasing a pen and purchasing paper), Sequence or Path analysis (birth of a child and purchasing diapers), Classification (duct tape purchases and plastic sheeting purchases), clustering Finding and visually documenting groups of previously
unknown facts,
geographic location and brand preferences) forecasting (discovering patterns
from which one can make reasonable predictions regarding
future activities, such as
10
Data Mining
Data Mining is Knowledge discovery using a sophisticated blend of techniques from traditional statistics, artificial intelligence and computer graphics. Data mining is the process of semi-automatically analyzing large databases to find interesting and useful patterns Data mining overlaps with machine learning, statistics, artificial intelligence and databases.
11
Goals of Data Mining

Explanatory : To explain some observed event or
condition.
(Why sales of Maruti Swift has increased in Chennai).
Confirmatory : To confirm a hypothesis. (whether two-income families are more likely to buy family medical coverage than single-income families) Exploratory : To analyze data for new or
unexpected relationships.
(What spending patterns are likely to accompany credit card fraud.)

12
Issues in data mining

Data quality, which refers to the accuracy and completeness of the data being analyzed. Interoperability of the data mining software and databases being used by different agencies. Mission creep, The use of data for purposes other than for which the data were originally collected. Privacy.
13
Advanced forms of Data Mining

Web mining
Spatial Mining Temporal Mining
14
Web mining
Crawlers Robot (spider) Focused crawler PageRank backlinks Personalization
15
Spatial mining
Goal: data mining on spatial data Spatial selection may involve specialized selection comparison operations:
Near North, South, East, West Contained in Overlap/intersect
16
Temporal mining
Goal: data mining for temporal data Time Series Pattern Detection Sequences Temporal Association Rules
HR database
17
Temporal Database
Snapshot Traditional database
Temporal Multiple time points
18
Types of Database (Temporal)

Snapshot No temporal support Transaction Time Supports time when
transaction inserted data

Timestamp Range
Valid Time Supports time range when
data values are valid Bitemporal Supports both transaction and valid time
19
Database Searching vs. Data Mining

Query Well defined SQL
Query Poorly defined No precise query language
Data Operational data Output Precise Subset of database
Data Not operational data
Output Fuzzy Not a subset of database
20
Query Examples
Database Find all credit applicants with last name of Smith. Identify customers who have purchased more than $10,000 in the last month. Find all customers who have purchased milk
Data Mining Find all credit applicants who are poor credit risks. (classification)
Identify customers with similar buying habits. (Clustering) Find all items which are frequently purchased with milk.
(association rules)
21
Data Mining vs. KDD

Knowledge Discovery in Databases
(KDD): process of finding useful information and patterns in data. Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process.
22
KDD Process
Selection: Obtain data from various sources. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner.
23
Data Warehousing
A data warehouse is subject-oriented, integrated, time-variant, and nonvolatile collection of data
Subject-oriented : Contains information regarding
objects of interest for decision support: Sales by region, by product, etc. Integrated: Data are typically extracted from multiple, heterogeneous data sources (e.g., from sales, inventory, billing databases etc.). Time-variant: Contain historical data, longer horizon than operational system. Nonvolatile : Data is not (or rarely) directly updated.
24
Why build a data warehouse

Access to data from multiple sources, have a
comprehensive data collection. Separate transactional and analysis systems: Improve query response time (without slowing down transaction processing) Easy formulation of complex queries Access to historical data (not in operational systems) Improved data quality (fewer errors and missing values)
25
Data Warehouse Back-End Tools

Data extraction: Extract data from multiple,
heterogeneous, and external sources Data cleaning (scrubbing): Detect errors in the data and rectify them when possible Data converting: Convert data from legacy or host format to warehouse format Transforming: Sort, summarize, compute views, check integrity, and build indices Refresh: Propagate the updates from the data sources to the warehouse
26
Database
Application Oriented (OLTP) Used to run business Clerical User Detailed data Current up to date Operational Data Repetitive access by small transactions Fast response time (seconds) Read/Update access Relational Schema
Data Warehouse
Subject Oriented (OLAP) Used to analyze business Manager/Analyst Summarized and refined Historical data Integrated Data Ad-hoc access using large queries Slow response time (minutes) Mostly read access (batch update) Star / Snowflake Schema
27
On-Line Analytical Processing OLAP

Front-end to the data warehouse. Allowing easy data manipulation
Allows conducting inquiries over the data at various levels of abstractions Fast and easy because some aggregations are computed in advance No need to formulate entire query OLAP uses data in multidimensional format (e.g., data cubes) to facilitate query and response time
28
Data Mining Vs. Data Warehouse

Data Mining: Applications of methods (algorithms) to discover patterns in data. Include some OLAP operations
OLAP: deductive process - testing existence of hypothetical
patterns in data Good to explore the data and test hypotheses
Data Mining mostly refers to modeling underlying
data
Uncovering patterns in data Potentially surprising patterns may arise
Data Mining methods may use data from a data
warehouse (when available)
29
Data Mining + Data Warehouse

Data Warehousing provides the Enterprise with a memory
Data Mining provides the Enterprise with intelligence
30
31
MDDBMS
Multidimensional data model emerged over the past
10-15 years MDDBMS is the Rubik's Cube of database management systems Focuses on analyzing the data, not recording transactions Data is categorized as either facts with numerical measures, or as dimensions that characterize the fact
32
MDDBMS
Takes data from many sources, such as RDBMS, Legacy
System, etc Data is physically stored on disk in a data structure that is highly optimized for multidimensional processing and fast retrieval Storage is between 2 and 10 times more efficient over RDBMS due to better indexing, compression and representation of sparse data
33
Benefits
Queries are simply a request to see pre-
existing data organized in a specific fashion. Already highly organized, so the requested data is removed and reorganized Stores information in the same way that it is viewed (less data management, and maintenance)
34
The drawbacks
Not the best solution for every problem
Works only on information with
interrelations Database explosion with large amounts of sparse data (calculating all relationships can increase the database size dramatically).
35
Example
MDDBMS are an important tool in KM,
36
Thank You
40

DBMS Data Mining

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS Data Mining

Uploaded by

Copyright:

Available Formats

1

Data Mining - Beer and Nappies

UNCOVER HIDDEN INFORMATION DATA MINING

Data Mining Algorithm

common buying behavior

Predicting likelihood (probability) a customer would respond to an offer

Data Mining Models and Tasks

geographic location and brand preferences) forecasting (discovering patterns

from which one can make reasonable predictions regarding

future activities, such as

Goals of Data Mining

(Why sales of Maruti Swift has increased in Chennai).

(What spending patterns are likely to accompany credit card fraud.)

Issues in data mining

Advanced forms of Data Mining

Types of Database (Temporal)

transaction inserted data

Valid Time Supports time range when

Database Searching vs. Data Mining

Query Poorly defined No precise query language

Data Operational data Output Precise Subset of database

Data Not operational data

Output Fuzzy Not a subset of database

Data Mining vs. KDD

Why build a data warehouse

Data Warehouse Back-End Tools

On-Line Analytical Processing OLAP

Data Mining Vs. Data Warehouse

patterns in data Good to explore the data and test hypotheses

Data Mining mostly refers to modeling underlying

Uncovering patterns in data Potentially surprising patterns may arise

Data Mining methods may use data from a data

warehouse (when available)

Data Mining + Data Warehouse

Data Mining provides the Enterprise with intelligence

You might also like