Professional Documents
Culture Documents
Techniques
Chapter 11
Additional Theme: RFID Data Warehousing
and Mining and High-Performance Computing
Tag Reader
Source: www.belgravium.com
shorter paths?
What locations are common to the paths of a set of
defective auto-parts?
Identify containers at a port that have deviated
from their historic paths
Data mining
Find trends, outliers, frequent, sequential, flow
patterns,
03/16/17 Data Mining: Concept 11
Example: A Supply Chain Store
A retailer with 3,000 stores, selling 10,000 items a day
per store
Each item moves 10 times on average before being sold
Movement recorded as (EPC, location, second)
Data volume: 300 million tuples per day (after
redundancy removal)
OLAP Query
Avg time for outwear items to move from
warehouse to checkout counter in March 2006?
Costly to answer if scanning 1 billion tuples for
March
10 pallets store 1
(1000 cases)
Dist. Center 1 shelf 2
store 2
Dist. Center2
Factory
10 packs
(12 sodas)
20 cases
(1000 packs)
Store View:
Transportation View:
a series of locations
e.g., what is the average time that milk stays at
location
If using record transitions: difficult to answer queries, lots of
intersections needed
Map Table: (GID, <GID1,..,GIDn>)
Links together stages that belong to the same path. Provides
additional: compression and query processing efficiency
High level GID points to lower level GIDs
manufacturer, price
r1 l1 t1 t10 r1,r2,r3
g1 l1 t1 t10
r2 l1 t1 t10 r3
g1.2 l4 t15 t20
r2 l3 t20 t30
Map Table
r3 l1 t1 10
gid gids
g1.1 r1,r2
g1.2 r3
IO Cost:
(r2,l2,t3,t4) IO Cost:
One IO per GID in
One IO per item in (rk,l1,t1,t2) locations l1, l7, and
locations l1 or l7 or l13 (rk,l2,t3,t4) l13
Observation: Observation:
Very costly, we retrieve Retrieve records
l1 n 1
n 3
l2 l3 l4
l5 l6 l7 l8 l9 l10 n 6
l2
unique identifier that l1
0.0 0.1 l2
0.0.0
0.0.0 l3 t20 t30 6
3
t1,t10: 3 l1 t1,t8: 3 0.1.0
0.0.0
l3 0.1.0 l3 0.1.1 l4 0.1 l2 t1 t8 3
t20,t30: 3 t20,t30: 3 t10,t20: 2
{r8,r9} 0.1.0.1 l6 t35 t50 1
0.1.0.0 0.1.0.1
t40,t60: 2 t35,t50: 1
0.1.1 l4 t10 t20 2
0.0.0.0
t40,t60: 3 l5 l5 l6
{r1,r2,r3} {r5,r6} {r7}
Compression vs.
Cleansed data size
P=1000, B=(500,150,40,8,1), k = 5
Lossless compression, cuboid is at
the same level of abstraction as
cleansed RFID database
Construction Time
P=1000, B=(500,150,40,8,1), k = 5, N=1,000,000
Savings by constructing from lower level cuboid 50% to 80%
A B C D E A B C D E
AB AC AD AE BC BD BE CD CE DE AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE ABCD ABCE ABDE ACDE BCDE
ABCD ABCD
E E
Number of items, number of records, width of
records,
Study the correlation with the characteristics of the
projection
Depth, bushiness, tree size, number of leaves,
fan-out/in,
Result No rule found with the above parameters for
the projection mining time
03/16/17 Data Mining: Concept 46
Dynamic Estimation
Runtime sampling
Use the relative mining time of a sample to estimate the
sample size.
e.g. Dataset pumsb
1% random sampling
Becomes accurate when
10 10 10 10
1
1 1 1
1 2 4 8 16 32 64
1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64
Processor# Processor# Processor# Processor#
optimal optimal
Par-FP Par-FP
optimal
Par-FP
10 10 10
1 1 1
1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64
Processor# Processor# Processor#
1
1 1 1
1 2 4 8 16 32 64
1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64
T30I0.2D1K T40I10D100K T50I5D500K
100 100 100
optimal
optimal optimal
one-level
one-level
one-level
multi-level multi-level
multi-level
10 10 10
1 1 1
1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64
optimal
optimal optimal
Par-Span
Par-Span
Par-Span
10 10 10
1 1 1
1 2 4 8 16 32 64 1 2 4 8 16 32 64 1 2 4 8 16 32 64
Processor# Processor# Processor#
C100N20T2.5S10I1.25 C200N10T2.5S10I1.25
100 100
optimal
Par-Span optimal
Par-Span
10 10
1 1
1 2 4 8 16 32 64 1 2 4 8 16 32 64
Processor# Processor#
optimal optimal
Par-CSP Par-CSP
10 10
1 1
1 2 4 8 16 32 64 1 2 4 8 16 32 64
Processor# Processor#
C200S25N9 Gazelle
100 100
optimal optimal
Par-CSP Par-CSP
10 10
1 1
1 2 4 8 16 32 64 1 2 4 8 16 32 64
Processor# Processor#
25
20
15
10