Professional Documents
Culture Documents
Luc Bouganim,
INRIA, France
Ioannis Koltsidas
IBM Research, Switzerland
Stratis D. Viglas
University of Edinburgh, United Kingdom
Why Bother?
Disk is disk
~650 mio units
shipped in 2010
PCM is coming
100x faster
10 mio write cycles
[Papandreou et al., IMW 2011]
2000
HDD Capacity
200 GB
x10
2 TB
HDD GB/$
0,05
x600
30
HDD IOPS
200
x1
200
14 GB (2001)
x20
256 GB
SSD GB/$
3 x10E-4
0,5
SSD IOPS
10E3 (SCSI)
x1000
x1000
SSD Capacity
10E6+ (PCIe)
5x10E3+ (SATA)
PCM Capacity
PCM IOPS
and a Fact
SSD-based Systems
SSD-based blades
Scaled up
Neteeza Twin-fin
Oracle Exadata
Block Device
SSDs and HDDs provide the same memory abstraction: a block device interface
ERASE (address)
Strong Modularity
SSDs and HDDs provide the same memory abstraction: a block device interface
application
Design Assumptions
=> Actually DBMS design very much based on disk characteristics:
(1) locality in the logical space preserved in the physical space,
(2) sequential access is faster than random access.
tracks
Random accesses
are avoided
Sequential accesses
are favored: Extent-based
allocation, clustering
platter
spindle
read/write
head
actuator
disk arm
Controller
Page-based
IO quantization;
Identical representation
In memory and on disk
Write-ahead logging;
Physiological logging
disk interface
10
Tutorial Outline
1. Introduction (Philippe)
2. Flash devices characteristics (Luc)
3. Data management for flash devices (Stratis)
4. Two outlooks (Stratis & Philippe)
11
! They use different strategies but start from the same IO traces
of that algorithm and own an MTRON and 2 identical INTEL
X25-M SSDs.
Same model
Same firmware
Algorithm X
Never used
Used
IO Traces
12
Configuration File
IOS
1
2
4
8
SR
70
81
104
150
RR
87
98
122
167
IO Traces
SW
51
64
85
129
RW
9023
8723
8686
8682
Simulator
Results
13
runs long tests on the same SSDs and obtain his own basic
performance numbers. Then, he proceeds as Bob.
! Dave does not like simulation and runs the traces directly on
the SSDs.
IO Traces
14
&'(
MTRON
%"
INTEL X25
$E"
Used
%!
$
$"
$!
#E"
#"
Never
used
#!
!E"
"
!
!
)*+,&./0/'1--0(
!
!
2345&'+67*-55
,/*+48/93:(5
;1/8*+-5&*3:<
,/*+48/93:(
=/>-5&8?:53:
@ABCD(
2345&'+67*,/*+48/93:(
;1/8*+-5&*3:<
,/*+48/93:(
=/>-5&B?:53:
?'-.5F$"(
=/>-5&B?:53:
:-G5F$"(
!
!
15
! We hit a wall with the black box approach # we open the box,
i.e., the FTL, and look at FTL techniques.
16
The Good
17
The Bad
Pagess must be
programmed
sequentially
within the block
(256 pages)
18
s
p
i
h
c
Flash
BY
A bit of electronic to
understand flash chip
constraints and trends
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
19
Flash cells
! Flash cell: resembles a semiconductor transistor
"! 2 gates instead of 1
"! Floating gate insulated all around by an oxide layer
Control Gate
Floating Gate
N+
P substrate
N+
20
NAND
"!
"!
"!
"!
21
20 V
0V
0V
0V
Programming
20 V
20 V
Erasing
22
MLC
4 KB
1 MB
16 GB
150
1000
3000
1 page
256
pages/
block
Floating
gate
1 flash
cell
Control
gate
23
Program Disturb
! Some cells not being
programmed receive
elevated voltage stress
(near the cells being
programmed)
24
!Program Disturb
25
"! NAND process migration: faster than Moores Law (today 20 nm)
"! More bits/cell:
! SLC (1), MLC (2), TLC (3)
! Lifetime decreases
26
! We hit a wall with the black box approach # we open the box,
i.e., the FTL, and look at FTL techniques
27
The Good
The hardware!
! A flash device contains many (e.g., 32, 64) flash chips and
provides inter-chips parallelism
28
The Bad
Pagess must be
programmed
sequentially
within the block
(256 pages)
29
And The FTL
Read sector
Write sector
MAPPING
Read page
Program page
GARBAGE
COLLECTION
WEAR
LEVELING
No constraint!
SSD
Constraints
FTL
Flash chips
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
30
DBMS
Read
sector
Write
sector
No
constraint!
MAPPING
GARBAGE
COLLECTION
?
WEAR
LEVELING
FT
L
Constraints
Read page
Program page
Erase block
SSD
Flash chips
31
"! No safe assumption can be made on the device behavior (black box)
! e.g., Random writes are expensive
"! No safe assumption on the benchmark usage!
"! IO cost is highly variable and depends on the whole device history!
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
32
33
34
Sequential Reads
Random Writes
Sequential Reads
Pause
1
0.1
0
250
500
750
1000
1250
1500
35
36
Response
time (s)
Response
time (s)
IO size (KB)
37
IO Size = 4KB
%!!"
$#!"
$!!"
01"
01""
11"
11""
0+"
0+""
1+"
1+""
#!"
"
!"
"
&'()'*""
&'(+,-./""
&'()'*""
&'(+,-./""
Fully written
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
38
Is it a problem ?
39
(FMS 2011)
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
40
! We hit a wall with the black box approach # we open the box,
i.e., the FTL, and look at FTL techniques
41
Opening the
black box !
42
Read sector
Write sector
MAPPING
Constraints
Read page
Program page
GARBAGE
COLLECTION
WEAR
LEVELING
No constraint!
FTL
Flash chips
SSD
43
Block 1
Block 2
Block 3
"! Problem: the table is too large ! (1 GB for 1 TB flash) (4KB pages)
SRAM
Global Translation
Directory
Flash
Translation
blocks
Cached Mapping
Table
Data
blocks
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
44
Logical
Physical
Block 0
Block 1
Block 2
Block 3
! Hybrid Mapping
"!
"!
"!
"!
45
!
Block 2
!!!!
Block 1
Block 3
!
Block 2
Erase
Erase
Block 3
Switch
Block 0
!
!
!
Log(Block0)
!
!
Block 0
!
!
!
!
Log(Block0)
Erase
Partial Merge
!
New Block0
Block 0
Full Merge
Erase
!
!
!
!
Log(Block0)
!!!!
Erased
New block 0
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
46
FTL-Wear leveling
! Goal: ensure that all blocks of the flash have about the same
erase count (i.e., number of program/erase cycle).
! Difficulties:
47
FTL: Trends
Hybrid
mapping
Detect sequential
or semi-random
writes
Temporal/spatial
locality?
Caching
Compression /
deduplication
Adaptivity
Background/
on demand
MAPPING
TRIM
management
Security /
encryption
GARBAGE
COLLECTION
WEAR
LEVELING
Consider
hot/cold data
Dynamic /
static WL
48
49
! We hit a wall with the black box approach # we open the box,
i.e., the FTL, and look at FTL techniques
50
! Pros
DBMS
! Cons
Flash chips
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
51
! Inter-blocks : Random
! Intra-block : Sequential
! Example with 3 blocks of 10 pages:
IO address
&!"
%#"
%!"
$#"
$!"
#"
!"
0 10 11
time
1 20 21 22
2 23 24 12
3 13 14
4 25 26 15
5 16 27
7 17 18 19 28
8 29
52
"! Provide a tunnel for those IOs that respect constraints C1-C3 ensuring maximal
performance
"! Manage other unconstrained IOs in best effort
"! Minimize interferences between these two modes of operation
! Pros
DBMS
"! Flexible
"! Maximal performance and
control for the DBMS for
constrained IOs
"! No behavior guarantees for
unconstrained IOs.
! Cons
unconstrained
patterns
constr. patterns
(C1, C2, C3)
Flash chips
53
Page 0
Page 1
Page 2
Page 3
Page 4
Page 5
Flag = Non-Optimal
CurPos=6
Page 0
Page 1
Page 1
Page 1
Page 0
Page 2
CurPos=6
! No interferences!
! No change to the block device interface:
"! Need to expose two constants: block size and page size
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
54
Free
(CurPos = 0)
TRIM
TRIM
Write at @
CurPos++
Write at @ ! CurPos
Non
optimal
Optimal
Write at @ CurPos++
Flag = Non-Optimal
Page 0
Page 1
Page 1
Page 1
Page 0
Page 2
Garbage collector
actions
Flag = Optimal
CurPos=3
Page 0
Page 1
Page 2
CurPos=6
55
56
57
58
Summary
! Flash chips
! Hardware constraints!
Complex FTLs
Simple FTLs
HW Constraints
HW Constraints
Complex FTLs
Bimodal
Unpredictable performance
No stable design
Stable Design
59
60
Tutorial Outline
1. Introduction (Philippe)
2. Flash devices characteristics (Luc)
3. Data management for flash devices (Stratis)
4. Two outlooks (Stratis & Philippe)
!"#$%&'#'()$*+,-./'0/1%23"345'
%! 6*(/*$'37'8#42)0+(/'9/:/*',/*73*8#21/'0%#2';<<''
%! =3>',3>/*'132$+8,-32'
%! <*3,,)24',*)1/$'
!!
?(/#&'@%*3>'#>#5';<<$'#2('*/,"#1/'/./*50%)24'>)0%'!"#$%'AA<$'
!!
!!
!!
;3>/./*D'!"#$%'E0$'./*5'>/""'9/0>//2'<FGH'#2(';<<'
!!
!!
!!
B30'/23+4%'1#,#1)05'
B30'/23+4%'832/5'03'9+5'0%/'230C/23+4%C1#,#1)05'
<FGHI!"#$%I;<<',*)1/'*#-3&'JKLL&KL&K',/*'MN'
<FGHI!"#$%I;<<'"#0/215'*#-3&'JK&KL&KLL'
?20/4*#0/'!"#$%')203'0%/'$03*#4/'%)/*#*1%5'
!!
61
138,"/8/20'/O)$-24'<FGH'8/83*5'#2(';<<$'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
6+0")2/'
!!
!"#$%C9#$/('(/.)1/'(/$)42'
!!
!!
!!
A3")('$0#0/'(*)./$'
H#P)24'AA<$'(#0#9#$/C7*)/2("5'
A5$0/8C"/./"'1%#""/24/$'
!!
!!
!!
!!
62
;59*)('$5$0/8$'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
?2(/O)24'32'R#$%'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
6+0")2/'
!!
!"#$%C9#$/('(/.)1/'(/$)42'
!!
!!
!!
A3")('$0#0/'(*)./$'
H#P)24'AA<$'(#0#9#$/C7*)/2("5'
A5$0/8C"/./"'1%#""/24/$'
!!
!!
!!
!!
63
;59*)('$5$0/8$'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
?2(/O)24'32'R#$%'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
!"#$%C9#$/('A3")('A0#0/'<*)./$'
!!
T38832'?I6')20/*7#1/'
!!
!!
B3'8/1%#2)1#"'"#0/215'
!!
!!
!!
!!
V'5/#*'>#**#205'73*'/20/*,*)$/'AA<$'\#$$+8)24'KL'138,"/0/'*/C>*)0/$',/*'(#5]'
[2/*45'/W1)/215'
!!
!!
F/#($'#*/'7#$0/*'0%#2'>*)0/$'
[*#$/C9/73*/C>*)0/'")8)0#-32'
=)8)0/('/2(+*#21/'I'>/#*'"/./")24''
!!
!!
G11/$$'"#0/215')2(/,/2(/20'37'0%/'#11/$$',#:/*2'
UL'03'VL'-8/$'83*/'/W1)/20')2'?6XAIY',/*'MN'0%#2';<<$'
F/#('I'Z*)0/'#$588/0*5'
!!
!!
N"31PC#((*/$$#9"/')20/*7#1/'
KLL'^'_LL'-8/$'83*/'/W1)/20'0%#2';<<$')2'?6XA'I'Z#:'
X%5$)1#"',*3,/*-/$'
!!
!!
64
F/$)$0#21/'03'/O0*/8/'$%31PD'.)9*#-32D'0/8,/*#0+*/D'#"-0+(/'
B/#*C)2$0#20'$0#*0C+,'-8/'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
AA<'#*1%)0/10+*/'
!!
`#*)3+$'73*8'7#103*$'
!!
AG@G'\Kabc'^'UaVc]'
!!
!!
B+89/*'37'1%#22/"$'
!! d'03'Ke'3*'83*/'
FGH'9+Q/*$'
!!
!!
<#0#'
N+Q/*'
!"#$%'
T%),'
H)1*3C'
,*31/$$3*'
FGH'
T%#22/"'
T320*3""/*'
[TT'
KHN'+,'03'83*/'0%#2'_VeHN'
h'
!"#$%'
T%),'
h'
!"#$%'
T%),'
!"#$%'
T%),'
6./*C,*3.)$)32)24'
!!
!!
;3$0'
?20/*7#1/'
T%#22/"'
T320*3""/*'
[TT'
h'
!!
XT?C/'
AGA'\Kabc'^'UaVc]'
!!
556(7#8,"9'89:#'(
KLf'+,'03'dLf'
F/g+/$0';#2("/*'
T388#2('X#*#""/")$8'
!!
?20*#C1388#2('
!!
?20/*C1388#2('
F/3*(/*)24'I'8/*4)24'\BTS]'
!!
65
=NGC03CXNG'
8#,,/*'
Z*)0/'X#4/'
G""31#03*'
M#*9#4/'
T3""/103*'
Z/#*'
=/./")24'
=NGC03CXNG'
H#,'
N#('N"31P'
=)$0'
!*//'N"31P'
S+/+/'
H/0#'<#0#'
T#1%/'
!"#$%&#'()!*&+,(-#&.+*&/0.(1&2'#(3(!-14(
KVP'FXH'AGA';<<&'J$"!R%!!'?6XA'
ia_P'FXH'AG@G';<<&'JS!'?6XA'
6QC0%/C$%/"7'AA<$'
I3865I/,038'
)5
25
;5
=5
H5
XG@G'<*)./'
AG@G'<*)./'
AG@G'<*)./'
AGA'<*)./'
XT?C/'1#*('
;3:'?6-85
;3:'?6-85
;3:'?6-85
H:0-878+'-5
H:0-878+'-5
I*/'15;1+7''
H=T'
H=T'
H=T'
A=T'
A=T'
;/7/,+0J'
U_'MN'
KLL'MN'
KeLMN'
KdLMN'
dVL'MN'
B-/.5
2/:.G+.01'
VU'HNI$'
_bV'HNI$'
_VL'HNI$'
__L'HNI$'
iLL'HNI$'
K8+0-5
2/:.G+.01'
_b'HNI$'
_VL'HNI$'
KLL'HNI$'
KKV'HNI$'
VLL'HNI$'
B/:.365LM25
B-/.5NCOP'
J'K'3*(/*'37'8#42)0+(/'
UaVP'
ULP'
UVP'
dVP'
KdLP'
k'_'3*(/*$'37'8#42)0+(/'
B/:.365LM25
K8+0-5NCOP'
LaLKP'
KLP'
LaeP'
KeP'
iLP'
P08--05O8+,-Q'
J'KV'YIMN'
\_LLi]'
J'd'YIMN'
\_LKL]'
J'_aV'YIMN'
\_LKL]'
JKb'YIMN'
\_LKK]'
J'Ub'YIMN'
\_LLj]'
66
F/#('"#0/215'
d'lN'F#2(38'F/#($'+2)73*8"5'()$0*)9+0/('3./*'0%/'>%3"/'8/()+8'
VLf'*#2(38'(#0#'
1
0,9
0,8
Latency (ms)
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
0
20000
40000
60000
80000
100000
120000
140000
160000
IOPS
67
Z*)0/'"#0/215'
d'lN'F#2(38'Z*)0/$'+2)73*8"5'()$0*)9+0/('3./*'0%/'>%3"/'8/()+8'
VLf'*#2(38'(#0#'
3
B
2,5
Latency (ms)
1,5
0,5
0
0
68
5000
10000
15000
20000
25000
IOPS
30000
35000
40000
45000
H)O/('>3*P"3#('^'F/#('"#0/215'
d'lN'?I6'3,/*#-32$'+2)73*8"5'()$0*)9+0/('3./*'0%/'>%3"/'8/()+8'
VLf'*#2(38'(#0#D'S+/+/'(/,0%'m'U_'
d'
)>-8/<-5B-/.5V/0-:,J5
UDV'
N'
T'
iLf'
bLf'
<'
['
B-'73:'-5A+6-5&6'(5
U'
_DV'
_'
KDV'
K'
LDV'
L'
Lf'
KLf'
_Lf'
ULf'
dLf'
VLf'
eLf'
jLf'
KLLf'
T53U5K8+0-'5
69
H)O/('>3*P"3#('^'Z*)0/'"#0/215'
d'lN'?I6'3,/*#-32$'+2)73*8"5'()$0*)9+0/('3./*'0%/'>%3"/'8/()+8'
VLf'*#2(38'(#0#D'S+/+/'(/,0%'m'U_'
K_'
LDV'
)>-8/<-5K8+0-5V/0-:,J5)>-8/<-5K8+0-5V/0-:,J5
LDd'
KL'
['
N'
LDU'
LD_'
LDK'
L'
B-'73:'-5A+6-5&6'(5
B-'73:'-5A+6-5&6'(5
LDe'
T'
<'
bLf'
jLf'
['
b'
e'
Lf' KLf' _Lf' ULf' dLf' VLf' eLf' iLf' bLf' jLf' KLLf'
d'
T53U5K8+0-'5
_'
L'
Lf'
KLf'
_Lf'
ULf'
dLf'
VLf'
eLf'
iLf'
KLLf'
T53U5K8+0-'5
70
6+0")2/'
!!
!"#$%C9#$/('(/.)1/'(/$)42'
!!
!!
!!
A3")('$0#0/'(*)./$'
H#P)24'AA<$'(#0#9#$/C7*)/2("5'
A5$0/8C"/./"'1%#""/24/$'
!!
!!
!!
!!
71
;59*)('$5$0/8$'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
?2(/O)24'32'R#$%'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
o=//'p'H332D'A?MH6<'_LLiq'
?2C,#4/'"344)24'
!!
!!
X#4/'+,(#0/$'#*/'"344/('
=34'$/103*'73*'/#1%'<N',#4/'
!!
!!
!!
=34'*/4)32')2'/#1%'R#$%'9"31P'
X#4/'>*)0/C9#1P$'32"5')2.3"./'"34C
$/103*'>*)0/$'
!!
!!
n2-"'#'8/*4/')$'*/g+)*/('
n,32'*/#(&'
!!
!!
!!
G""31#0/('>%/2',#4/'9/138/'()*05'
!/01%'"34'*/13*($'7*38'R#$%''
G,,"5'0%/8'03'0%/')2C8/83*5',#4/'
A#8/'3*'83*/'2+89/*'37'>*)0/$'
!!
2?0D'$)42)E1#20'*/(+1-32'37'/*#$+*/$'
;3>/./*&'
^!
^!
72
@%/'<NHA'2//($'03'1320*3"',%5$)1#"',"#1/8/20''''
X#*-#"'R#$%',#4/'>*)0/$'#*/')2.3"./('
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
X#*#""/")$8r'
=34)1#"'9"31P'
!!
A-""D'$38/'3,/*#-32$'#*/'83*/'/W1)/20'
32'%#*(>#*/'
!!
!!
H#,,)24'37'0%/'#((*/$$'$,#1/'03'R#$%',"#2/$D'
()/$'#2('1%#22/"$'
!!
[TTD'/21*5,-32'/01a'
!!
Z/#*C"/./")24'$-""'2//($'03'9/'(32/'95'0%/'
(/.)1/'E*8>#*/'
@%/')20/*2#"'(/.)1/'4/38/0*5')$'1*)-1#"'
03'#1%)/./'8#O)8+8',#*#""/")$8'
@%/'<NHA'2//($'03'9/'#>#*/'37'0%/'
4/38/0*5'03'$38/'(/4*//''
T%#22/"'
T320*3""/*'
[TT'
!"#$%'
T%),'
T%#22/"'
T320*3""/*'
[TT'
h'
!"#$%'
T%),'
_
h'
h'
!!
L K _ U
!"#$%'
T%),'
!"#$%'
T%),'
V3<+,/*54*3,M5'+W-5X5I*/'154*3,M5'+W-5
=34)1#"'9"31P'$)s/'*/"/.#20'03'2+89/*'37'1%#22/"$D',),/")2)24'1#,#9)")-/$'32'/#1%'
1%#22/"D'/01a'
73
AA<$'^'$+88#*5'
!!
!"#$%'8/83*5'%#$'0%/',30/2-#"'03'*/83./'0%/'?I6'93:"/2/1P'
!!
[$,/1)#""5'73*'*/#(C(38)2#0/('>3*P"3#($'
!!
tAA<c&'8+"-,"/'1"#$$/$'37'(/.)1/$'
!!
[O1/""/20'*#2(38'*/#('"#0/215')$'+2)./*$#"'
!!
F/#('#2('>*)0/'9#2(>)(0%'.#*)/$'>)(/"5'
!!
<*#8#-1'()Q/*/21/'#1*3$$'*#2(38'>*)0/'"#0/21)/$'
!!
<*#8#-1'()Q/*/21/$')2'0/*8$'37'/13238)1$&'YIMN'13$0D',3>/*'
132$+8,-32D'/O,/10/('")7/-8/D'*/")#9)")05'
!!
G'"30'37'*/$/#*1%'03'9/'(32/'03>#*($'(/E2)24'<NHAC$,/1)E1'
)20/*7#1/$'
74
6+0")2/'
!!
!"#$%C9#$/('(/.)1/'(/$)42'
!!
!!
!!
A3")('$0#0/'(*)./$'
H#P)24'AA<$'(#0#9#$/C7*)/2("5'
A5$0/8C"/./"'1%#""/24/$'
!!
!!
!!
!!
75
;59*)('$5$0/8$'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
?2(/O)24'32'R#$%'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
AA<'1#1%/'
9+Q/*'
,33"'
(/8#2('
,#4)24'
/.)1-32'
AA<',/*$)$0/20'
$03*#4/'
76
;<<',/*$)$0/20'
$03*#4/'
!"#$%'8/83*5'73*',/*$)$0/20'$03*#4/'
AA<'1#1%/'
9+Q/*'
,33"'
AA<',/*$)$0/20'
$03*#4/'
77
;<<',/*$)$0/20'
$03*#4/'
;59*)('$03*#4/'"#5/*'
AA<'1#1%/'
9+Q/*'
,33"'
AA<',/*$)$0/20'
$03*#4/'
78
;<<',/*$)$0/20'
$03*#4/'
!"#$%'8/83*5'#$'1#1%/'
AA<'1#1%/'
9+Q/*'
,33"'
AA<',/*$)$0/20'
$03*#4/'
79
;<<',/*$)$0/20'
$03*#4/'
6+0")2/'
!!
!"#$%C9#$/('(/.)1/'(/$)42'
!!
!!
!!
A3")('$0#0/'(*)./$'
H#P)24'AA<$'(#0#9#$/C7*)/2("5'
A5$0/8C"/./"'1%#""/24/$'
!!
!!
!!
!!
80
;59*)('$5$0/8$'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
?2(/O)24'32'R#$%'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
;59*)('$5$0/8$'
!!
X*39"/8'$/0+,'
!!
!!
!!
F/$/#*1%'g+/$-32$'
!!
!!
!!
AA<$'#*/'9/138)24'13$0C/Q/1-./D'9+0'$-""'230'*/#(5'03'*/,"#1/';<<$'
)2'0%/'/20/*,*)$/'
T/*0#)2"5'1321/).#9"/'03'%#./'930%'AA<$'#2(';<<$'#0'0%/'$#8/'"/./"'
37'0%/'$03*#4/'%)/*#*1%5'
;3>'1#2'>/'0#P/'#(.#20#4/'37'0%/'AA<'1%#*#10/*)$-1$'>%/2'
(/$)42)24'#'(#0#9#$/r'
;3>'1#2'>/'3,-8#""5',"#1/'(#0#'#1*3$$'930%'05,/$'37'8/()+8r'
H/0%3(3"34)/$'
!!
!!
!!
81
Z3*P"3#('(/0/1-32'73*'(#0#',"#1/8/20'
=3#('9#"#21)24'03'8)2)8)s/'*/$,32$/'-8/'
T#1%)24'(#0#'9/0>//2'()$P$'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
ol'p'`D''`=<N'_LLbq'
Z3*P"3#(C(*)./2',#4/',"#1/8/20'
n$/*C"/./"'
,#4/'?I6'
!!
0#&12%)(345(#6'$%7#8,(
N+Q/*'
8#2#4/*'
F/,"#1/8/20'
,3")15'
!!
9':;',",(<#$(6-*,12%)(345(
A03*#4/'
8#2#4/*'
AA<'
!"#$%'8/83*5'#2(';<<'#0'0%/'
$#8/'"/./"'37'0%/'$03*#4/'
%)/*#*1%5'
H32)03*',#4/'+$/'95'P//,)24'0*#1P'
37'*/#($'#2('>*)0/$'
!!
=-*,12%)(345(#6'$%7#8,(
!!
X#4/'8)4*#-32$'
!!
;<<'
!"#$%&'()%*'$(
!!
8Y/'15&GY/'1(5
+%,-(
F/#(I>*)0/'
3,/*#-32'
8PP=5&GPP=(5
GPP=5
GY/'15
?(/2-75'0%/'>3*P"3#('37'#',#4/'
#2('#,,*3,*)#0/"5',"#1/')0'
!!
!!
!!
.//(
=34)1#"'3,/*#-32$'\)a/aD'*/7/*/21/$'
32"5]'
X%5$)1#"'3,/*#-32$'\#10+#""5'
03+1%)24'0%/'()$P]'
;59*)('83(/"'\"34)1#"'3,/*#-32$'
8#2)7/$0/('#$',%5$)1#"'32/$]'
F/#(C)20/2$)./',#4/$'32'R#$%'
Z*)0/C)20/2$)./',#4/$'32';<<'
H)4*#0/',#4/$'>%/2'0%/5'%#./'
/O,/2$/('0%/)*'13$0')7'/**32/3+$"5'
,"#1/('
_C$0#0/'0#$P'$5$0/8'
82
oT#2)8D'H)%#)"#D'N%#:#1%#*u//D'F3$$'p'=#24D'`=<N'_LLjq'
69u/10',"#1/8/20'
!!
;59*)('()$P'$/0+,'
6v)2/'033"'
!!
!!
@>3',%#$/$'
!!
!!
!!
!!
6,-8#"'39u/10'#""31#-32'
#1*3$$'0%/'0>3'05,/$'37'()$P'
X*3E")24&'$0#*0'>)0%'#""'39u/10$'
32'0%/';<<'#2('832)03*'
$5$0/8'+$/'
</1)$)32&'9#$/('32',*3E")24'
$0#-$-1$'/$-8#0/'
,/*73*8#21/'4#)2/('7*38'
83.)24'/#1%'39u/10'7*38'0%/'
;<<'03'0%/'AA<'
F/(+1/'0%/'(/1)$)32'03'#'
P2#,$#1P',*39"/8'#2('#,,"5'
4*//(5'%/+*)$-1$'
?8,"/8/20/(')2'<N_'
83
(/.)1/'
,#*#8/0/*$'
>3*P"3#('
<#0#9#$/'
/24)2/'
N+Q/*',33"'
832)03*'
!!
F/#(I>*)0/$'
69u/10'
,"#1/8/20'
#(.)$3*'
='$<#$>%82'(
&%18(?,@(
!!/(
.//(
D;"E#F(6#18"(
A03*#4/'$5$0/8'
!!/(A;B&'"(?C@(
oA3+2(#*#*#u#2D'X*#9%#P#*#2D'N#"#P*)$%2#2D'p'Z399/*D'!GA@'_LKLq'
o;3""3>#5'X%<'@%/$)$D'nZCH#()$32D'_LLjq'
Z*)0/'1#1%)24'
!!
!!
!!
AA<'73*',*)8#*5'$03*#4/D'#+O)")#*5';<<''
@#P/'#(.#20#4/'37'9/:/*';<<'>*)0/',/*73*8#21/'03'/O0/2('
AA<'")7/-8/'#2(')8,*3./'>*)0/'0%*3+4%,+0'
Z*)0/$'#*/',+$%/('03'0%/';<<''
!!
!!
!!
=34'$0*+10+*/'/2$+*/$'$/g+/2-#"'>*)0/$'
!)O/('"34'$)s/'
621/'"34')$'7+""'8/*4/'>*)0/$'9#1P'03'0%/'AA<'
;<<C*/$)(/20'
"34'
>*)0/'
*/#('
84
AA<'
8/*4/'
oZ+'p'F/((5D''HGAT6@A'_LKLq'
=3#('9#"#21)24'03'8#O)8)s/'0%*3+4%,+0'
!!
!!
A/w24'132$)$0$'37'#'0*#2$#1-32',*31/$$)24'$5$0/8'>)0%'930%'05,/$'
37'()$P'
69u/1-./')$'03'9#"#21/'0%/'"3#('#1*3$$'8/()#'
!!
!!
G1%)/./('>%/2'0%/'*/$,32$/'-8/$'#1*3$$'8/()#'#*/'/g+#"D')a/aD'#'
Z#*(*3,'/g+)")9*)+8'
G"43*)0%8$'03'#1%)/./'0%)$'/g+)")9*)+8''
!!
!!
X#4/'1"#$$)E1#-32'\%30'3*'13"(]'
X#4/'#""31#-32'#2('8)4*#-32'
!"#$%&'(>%8%&'>'8"()%*'$(
;30I13"('(#0#'
1"#$$)E/*'
1#1%/'
N:.+8-,056/77+:<50/4*-5
6,/*#-32'
*/()*/103*'
0#&12%)(
%BB$G(
/'H12'(
=-*,12%)(%BB$G(
A03*#4/'8#'
<#0#'\*/]"31#03*'
;<<_'
AA<_'
</.)1/',/*73*8#21/'
832)03*'
85
X3")15'132E4+*#-32'
6+0")2/'
!!
!"#$%C9#$/('(/.)1/'(/$)42'
!!
!!
!!
A3")('$0#0/'(*)./$'
H#P)24'AA<$'(#0#9#$/C7*)/2("5'
A5$0/8C"/./"'1%#""/24/$'
!!
!!
!!
!!
86
;59*)('$5$0/8$'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
?2(/O)24'32'R#$%'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
N+Q/*)24')2'8#)2'8/83*5'
!!
X*39"/8'$/0+,'
!!
!!
!!
F/$/#*1%'g+/$-32$'
!!
!!
!!
!"#$%'8/83*5')$'+$/('73*',/*$)$0/20'$03*#4/'
@5,)1#"'32C(/8#2(',#4)24'
Z%)1%',#4/$'(3'>/'9+Q/*r'
Z%)1%',#4/$'(3'>/'/.)10'#2('>%/2r'
H/0%3(3"34)/$'
!!
!!
!!
87
!"#$%'8/83*5'$)s/'#")428/20'
T3$0C9#$/('*/,"#1/8/20'
Z*)0/'$1%/(+")24'
ol)8'p'G%2D'!GA@'_LLbq'
N"31P',#(()24'=Fn'\NX=Fn]'
!!
!!
H#2#4/$'0%/'32'()$P'
FGH'9+Q/*'
<#0#'9"31P$'#*/'
3*4#2)s/('#0'/*#$/C+2)0'
4*#2+"#*)05'
!!
!!
!!
I9J(A)#2K(
N"P'_'
e'
88
N"P'U'
L'
j'
K'
d'
KK'
=Fn'g+/+/')$'32'(#0#'
9"31P$'
62'*/7/*/21/D'83./'0%/'
/2-*/'9"31P'03'0%/'%/#('
37'0%/'g+/+/'
62'/.)1-32D'
$/g+/2-#""5'>*)0/'0%/'
/2-*/'9"31P'
N"P'L'
09J(A)#2K(
N"P'K'
V'
0#&12%)(,'2"#$(LL($'<'$'82'B(
N"P'U'
N"P'_'
j'
N"P'K'
L'
e'
KK'
N"P'L'
K'
d'
V'
M127>(A)#2KN((
)#&12%)(,'2"#$,(OP(Q(R$1S'8(
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
ol)8'p'G%2D'!GA@'_LLbq'
NX=Fn&'!+*0%/*'3,-8)s#-32$'
n$/'37',#(()24'
!!
!!
?7'#'(#0#'9"31P'03'9/'
>*):/2'%#$'230'9//2'
7+""5'*/#(D'*/#('>%#0x$'
8)$$)24'#2('>*)0/'
$/g+/2-#""5'
*/#('
j'
b'
KK'
>*)0/'
=Fn'138,/2$#-32'
!!
N"P'_'
N"P'L'
N"P'U'
e'
L'
j'
i'
K'
b'
L'
N"P'L'
d'
KK'
j'
K'
N"P'K'
V'
e'
d'
i'
KK'
V'
e'
N"P'U'
N"P'K'
N"P'_'
09J(A)#2K(
!!
A/g+/2-#""5'>*):/2'
9"31P$'#*/'83./('03'
0%/'/2('37'0%/'=Fn'
g+/+/'
=/#$0'")P/"5'03'9/'
>*):/2')2'0%/'7+0+*/'
!@='*/#($'
8)$$)24'
$/103*$'#2('
*/,"#1/$'(#0#'
9"31P')2'32/'
$/g+/2-#"'>*)0/'
i'
e'
I9J(A)#2K(
!!
T3$0C9#$/('*/,"#1/8/20'
!!
!!
T%3)1/'37'.)1-8'(/,/2($'32',*39#9)")05'37'*/7/*/21/'\#$'
+$+#"]'
N+0'0%/'/.)1-32'13$0')$'230'+2)73*8'
!!
!!
!!
?0'(3/$2x0'%+*0')7'>/'8)$/$-8#0/'0%/'%/#0'37'#',#4/'
!!
!!
T"/#2',#4/$'9/#*'23'>*)0/'13$0D'()*05',#4/$'*/$+"0')2'#'>*)0/'
?I6'#$588/0*5&'>*)0/$'83*/'/O,/2$)./'0%#2'*/#($'
A3'"324'#$'>/'$#./'\/O,/2$)./]'>*)0/$'
l/5')(/#&'1389)2/'=FnC9#$/('*/,"#1/8/20'>)0%'13$0C
9#$/('#"43*)0%8$'
!!
90
G,,")1#9"/'930%')2'AA<C32"5'#$'>/""'#$'%59*)('$5$0/8$'
oX#*PD'z+24D'l#24D'l)8'p'=//D'TGA[A'_LLeq'
T"/#2'E*$0'=Fn'\T!=Fn]'
!!
N+Q/*',33"'().)(/(')203'0>3'*/4)32$'
!!
!!
!!
Z3*P)24'*/4)32&'9+$)2/$$'#$'+$+#"'
T"/#2CE*$0'*/4)32&'1#2()(#0/$'73*'/.)1-32'
B+89/*'37'1#2()(#0/$')$'1#""/('0%/'>)2(3>'$)s/'Z'
!!
G">#5$'/.)10'7*38'1"/#2CE*$0'*/4)32'
!!
[.)10'1"/#2',#4/$'9/73*/'()*05'32/$'03'$#./'>*)0/'13$0'
?8,*3./8/20&'T"/#2C!)*$0'<)*05CT"+$0/*/('o6+D';#*(/*'p'y)2D'<GH6B'_LLjq'
!!
!!
T"+$0/*'()*05',#4/$'37'0%/'1"/#2CE*$0'*/4)32'9#$/('32'$,#-#"',*3O)8)05'
()*05',#4/'
1"/#2',#4/'
=Fn'3*(/*&' XbD'XiD'XeD'XV'
T!=Fn'3*(/*&' XiD'XVD'XbD'Xe'
U#$K18&($'&1#8(
XK'
I9J(
91
X_'
XU'
D)'%8ET$,"($'&1#8(
Xd'
XV'
Xe'
Xi'
U18B#R(,1V'(U(
Xb'
09J(
ol'p'`D''`=<N'_LLbq'
T3$0C9#$/('*/,"#1/8/20')2'%59*)('$5$0/8$'
!!
A)8)"#*'03'0%/',*/.)3+$')(/#D'9+0'
73*'%59*)('$/0+,$'
!!
!!
AA<'#2(';<<'73*',/*$)$0/20'$03*#4/'
<).)(/'0%/'9+Q/*',33"')203'0>3'
*/4)32$'
!!
!!
@)8/'*/4)32&'05,)1#"'=Fn'
T3$0'*/4)32&'73+*'=Fn'g+/+/$D'32/'
,/*'13$0'1"#$$'
!!
!!
!!
!!
!!
T"/#2'R#$%'
T"/#2'8#42/-1'
<)*05'R#$%'
<)*05'8#42/-1'
6*(/*'g+/+/$'9#$/('32'13$0'
[.)10'7*38'-8/'*/4)32'03'13$0'
*/4)32'
!)2#"'.)1-8')$'#">#5$'7*38'0%/'13$0'
*/4)32'
92
D#,"($'&1#8(
13$0'
!!
!!
W1>'($'&1#8(
oA03)1#D'G0%#2#$$3+")$D'y3%2$32'p'G)"#8#P)D'<GH6B'_LLjq'
G,,/2('#2(',#1P'
!!
T32./*0'*#2(38'>*)0/$'03'
$/g+/2-#"'32/$'
!!
!!
*#2(38'>*)0/$'
!!
A%)8'$03*#4/'8#2#4/*'"#5/*'
4*3+,'#2('
>*)0/''
$/g+/2-#""5'
)2.#")(#0/'
AA<',/*$)$0/20'$03*#4/'
93
!!
!!
A%)8'"#5/*'9/0>//2'$03*#4/'
8#2#4/*'#2('AA<'
62'/.)1-32D'4*3+,'()*05',#4/$D'
)2'9"31P$'0%#0'#*/'8+"-,"/$'37'
0%/'/*#$/'+2)0'
<3'230'3./*>*)0/'3"('./*$)32$D'
)2$0/#('>*)0/'9"31P'
$/g+/2-#""5'
?2.#")(#0/'3"('./*$)32$'
X#5'0%/',*)1/'37'#'7/>'/O0*#'
*/#($'9+0'$#./'0%/'13$0'37'
*#2(38'>*)0/$'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
T#1%)24')2'R#$%'8/83*5'
!!
X*39"/8'$/0+,'
!!
!!
!!
F/$/#*1%'g+/$-32$'
!!
!!
!!
!!
AA<'#2(';<<'#0'()Q/*/20'"/./"$'37'0%/'$03*#4/'%)/*#*1%5'
!"#$%'8/83*5'+$/('#$'#'1#1%/'73*';<<',#4/$'
Z%/2'#2('%3>'03'+$/'0%/'AA<'#$'#'1#1%/r'
Z%)1%',#4/$'03'9*)24')203'0%/'1#1%/r'
;3>'03'1%33$/'.)1-8',#4/$r'
H/0%3(3"34)/$'
!!
!!
!!
94
6,-8#"'1%3)1/'37'%#*(>#*/'132E4+*#-32'
AA<'#$'#'*/#('1#1%/'
!"#$%C*/$)(/20'/O0/2(/('9+Q/*',33"$'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
oB#*#5#2#2D'@%/*/$P#D'<322/""5D'["2)P/-'p'F3>$0*32D'[+*3A5$'_LLjq'
?213*,3*#-24'AA<$')203'0%/'/20/*,*)$/'
!!
!!
<)$1),")2/('>#5'37')20*3(+1)24'
AA<'$03*#4/'
0*#1/$'
H)4*#-32'7*38';<<$'03'AA<$'
!!
!!
!!
F/g+)*/8/20$'#2('83(/"$'
#2#"5-1#""5'03'$3"./'0%/'
132E4+*#-32',*39"/8'
A)8,"5'*/,"#1)24';<<$'>)0%'
AA<$')$'230'13$0C/Q/1-./'
AA<$'#*/'9/$0'+$/('#$'#'1#1%/'
!!
!!
!!
_C-/*/('#*1%)0/10+*/'
=34'#2('*/#('1#1%/'32'0%/'
AA<$D'(#0#'32'0%/';<<$'
N+0'/./2'0%/2'0%/'9/2/E0$'#*/'
")8)0/('
95
$,/1$'
39u/1-./$'
Z3*P"3#('
*/g+)*/8/20$'
A3"./*'
</.)1/'
83(/"$'
132E4+*#-32'
9/21%8#*P$'
>*)0/'
*/#('
U$1"'E%-'%B()#&(
9'%B(2%2-'(
AA<'-/*'
;<<'-/*'
o=/./20%#"D'T388+2a'GTHD'VK\i]'_LLbq'
@%/'{!A',/*$,/1-./'
!!
!!
H+"-C-/*/('#*1%)0/10+*/'
T389)2#-32'37'"344)24'#2('
*/#('1#1%)24'
!!
!!
!!
!"#$%'8/83*5')$'433('73*'
"#*4/'$/g+/2-#"'>*)0/$')2'#2'
#,,/2(C32"5'7#$%)32'\23'
+,(#0/$]'
G"$3'433('#$'#'*/#('1#1%/'73*'
;<<'(#0#'
[.)10C#%/#(',3")15'
!!
96
G44*/4#0/'1%#24/$'7*38'
8/83*5'#2(',*/()1-./"5',+$%'
0%/8'03'R#$%'03'#83*-s/'%)4%'
>*)0/'13$0'
9+Q/*'
,33"'
=34'
3,/*#-32$'
)#&(
G44*/4#0/'1%#24/$'#2('
,*/()1-./"5',+$%'
9'%B(2%2-'(
AA<'-/*'
;<<'-/*'
oT#2)8D'H)%#)"#D'N%#:#1%#*u//D'F3$$'p'=#24D'X`=<N'_LKLq'
o;3""3>#5'X%<'@%/$)$D'nZCH#()$32D'_LLjq'
<NHA'9+Q/*',33"'/O0/2$)32$'
!!
!!
!!
TXnI1#1%/'
G4#)2'#'8+"-C-/*/('
\K]'*/#('*/13*('
\e]'>*)0/'*/13*('
#,,*3#1%'
I%18(>'>#$*(A;F'$(6##)(
X3")1)/$'#2('#"43*)0%8$'73*'
1#1%)24')2'R#$%C*/$)(/20'
9+Q/*',33"$'
@/8,/*#0+*/C9#$/('
\_]'*/#(',#4/'
*/,"#1/8/20'
AA<'#$'$/132(#*5'9+Q/*',33"'
\d]'>*)0/',#4/'
!!
!!
!!
@/8,/*#0+*/C9#$/('/.)1-32'
,3")15'7*38'8/83*5'03'AA<'
X#4/$'#*/'1#1%/('32"5')7'%30'
/23+4%'
!!/(A;F'$(6##)(
\V]'>*)0/'
()*05',#4/'
\e]'+,(#0/'
AA<'13,5'
\U]'*/#(',#4/'
230'32'AA<'
G"43*)0%8$'73*'$521)24'(#0#'
#1*3$$'0%/'1#1%/$'
.//(R1"-()#&12%)($'&1#8,(
97
ol'p'`D'G<N?A'_LKKq'
X+w24')0'#""'034/0%/*'
!!
!!
[O0/2$)./'$0+(5'37'+$)24'R#$%'
8/83*5'#$'1#1%/'
X#4/'R3>'$1%/8/$'()10#0/'%3>'
(#0#'8)4*#0/$'#1*3$$'0%/'-/*$'
!!
!!
!!
!!
!!
?21"+$)./&'(#0#')2'8/83*5')$'#"$3'32'
R#$%'
[O1"+$)./&'23',#4/')$'930%')2'
8/83*5'#2('32'R#$%'
=#s5&'#2')2C8/83*5',#4/'8#5'3*'
8#5'230'9/'32'R#$%'(/,/2()24'32'
/O0/*2#"'1*)0/*)#'
T3$0'83(/"',*/()10$'%3>'#'
1389)2#-32'37'>3*P"3#('#2('
$1%/8/'>)""'9/%#./'32'
132E4+*#-32''
B3'8#4)1'1389)2#-32|'()Q/*/20'
$1%/8/$'73*'()Q/*/20'>3*P"3#($'
#2('()Q/*/20';;<$'#2('AA<$'
98
9+Q/*'
,33"'
AA<'
1#1%/'
;<<',/*$)$0/20'
$03*#4/'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
6+0")2/'
!!
!"#$%C9#$/('(/.)1/'(/$)42'
!!
!!
!!
A3")('$0#0/'(*)./$'
H#P)24'AA<$'(#0#9#$/C7*)/2("5'
A5$0/8C"/./"'1%#""/24/$'
!!
!!
!!
!!
99
;59*)('$5$0/8$'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
?2(/O)24'32'R#$%'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
?2(/O)24'
!!
X*39"/8'$/0+,'
!!
!!
!!
F/$/#*1%'g+/$-32$'
!!
!!
!!
Z%)"/',*/$/2-24'0%/'$#8/'?I6')20/*7#1/'#$';<<$D'R#$%'
8/83*5'%#$'*#()1#""5'()Q/*/20'1%#*#10/*)$-1$'
?I6'#$588/0*5D'/*#$/C9/73*/C>*)0/'")8)0#-32'
;3>'$%3+"('>/'#(#,0'/O)$-24')2(/O)24'#,,*3#1%/$r'
;3>'1#2'>/'(/$)42'/W1)/20'$/132(#*5'$03*#4/')2(/O/$'^'
,30/2-#""5'73*'83*/'0%#2'32/'8/0*)1r'
H/0%3(3"34)/$'
!!
!!
!!
100
G.3)('/O,/2$)./'3,/*#-32$'>%/2'+,(#-24'0%/')2(/O'
A/"7C0+2)24')2(/O)24D'1#0/*)24'73*'R#$%C*/$)(/20'(#0#'
T389)2/'AA<$'#2(';<<$'73*')21*/#$/('0%*3+4%,+0'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
oB#0%'p'M)9932$D'`=<N'_LLbq'
A/8)C*#2(38'>*)0/$'
!!
!!
!!
!!
A0#*-24',3)20')$'$0+(5)24'05,)1#"'>*)0/'#11/$$',#:/*2$')2'0%/'1320/O0'37'
$#8,")24'
!#10&'*#2(38'>*)0/$'%+*0',/*73*8#21/'
N+0'1#*/7+"'#2#"5$)$'37'#'05,)1#"'>3*P"3#('$%3>$'0%#0'>*)0/$'#*/'*#*/"5'
138,"/0/"5'*#2(38'
F#0%/*D'0%/5'#*/'$/8)C*#2(38'
!!
!!
!!
!!
F#2(38"5'()$,#01%/('#1*3$$'9"31P$D'$/g+/2-#""5'>*):/2'>)0%)2'#'9"31P'
A)8)"#*'03'0%/'"31#")05',*)21),"/$'37'8/83*5'#11/$$'
@#P/'#(.#20#4/'37'0%)$'#0'0%/'$0*+10+*/'(/$)42'"/./"'#2('>%/2')$$+)24'>*)0/$'
N+"P'>*)0/$'03'#83*-s/'>*)0/'13$0'
h9+0'#10+#""5'>*):/2'$/g+/2-#""5'>)0%)2'#'9"31P'
N"31P'K'
N"31P'_'
N"31P'U'
>*)0/$'$//8)24"5'*#2(38"5'()$,#01%/(')2'-8/h'
101
h'
N"31P'8(
7>'(
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
oB#0%'p'l#2$#"D'?AXB'_LLiq'
!"#$%<N&'$/"7C0+2)24'N}C0*//'
!!
!!
F/#($'#*/'1%/#,D'>*)0/$'#*/'
,30/2-#""5'/O,/2$)./')7'*#2(38'
@>3'83(/$'73*'N}C0*//'23(/$'
!!
!!
!!
!!
!!
<)$P'83(/&'23(/')$',*)8#*)"5'*/#('
=34'83(/&'23(/')$',*)8#*)"5'+,(#0/(|'
)2$0/#('37'3./*>*)-24D'8#)20#)2'"34'
/20*)/$'73*'0%/'23(/'#2('*/132$0*+10'32'
(/8#2('
)#&(>#B'(
B3(/'(#0#'
8/*4/'
=34'/20*)/$'
@*#2$"#-32'"#5/*',*/$/20$'+2)73*8'
)20/*7#1/'73*'930%'83(/$'
A5$0/8'$>)01%/$'9/0>//2'83(/$'95'
832)03*)24'+$/'
A)8)"#*'"344)24'#,,*3#1%')2'oZ+D'l+3'
p'T%#24D'GTH'@*#2$a'62'[89/((/('
A5$0/8$D'e\U]D'_LLiq'
!!
!!
102
<)Q/*/21/')$')2'>%/2'#2('%3>'>*)0/$'
#*/'#,,")/('
N+Q/*/('E*$0D'0%/2'9#01%/('#2('#,,")/('
95'0%/'N}C0*//'!@='
N}C0*//'
23(/'
F/#(I>*)0/'3,/*#-32'
/1,K(
>#B'(
8)4*#-32'13$0'
7*38'()$P'03'"34'
8)4*#-32'13$0'
7*38'"34'03'()$P'
0#&(
>#B'(
_C$0#0/'0#$P'$5$0/8'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
o=)D';/D'=+3'p'z)D'?T<['_LLjq'
H#P)24'N}C0*//$'R#$%C7*)/2("5'\0%/'!<C0*//]'
!!
H+"-,"/'"/./"$')2'0%/')2(/O'
!!
!!
!!
[#1%',*34*/$$)./'"/./"'37'(3+9"/'$)s/'
A3'"324'#$'>/'#*/'230'+,(#-24')2',"#1/D'#2('#"$3',/*73*8)24'"#*4/'$/g+/2-#"'>*)0/$'03'
#83*-s/'0%/'13$0D')0x$'#""'433('
!<C0*//'"/./"$'#*/'$3*0/('*+2$'
!!
!!
!!
;/#(C0*//'\E*$0'"/./"$]')2'8#)2'8/83*5'
621/'#'"3>/*'"/./"'/O1//($')0$'1#,#1)05')0')$'8/*4/('>)0%'0%/'2/O0'32/'
A,/1)#"'/20*)/$'\7/21/$]'#*/'+$/('03'8#)20#)2'0%/'$0*+10+*/'#2('(/#"'>)0%',30/2-#"'$P/>'
/#1%'"/./"'#''
$3*0/('*+2'
103
Z%/2'"/./"'
)$'7+""D'
8/*4/'>)0%'
"3>/*'#2(''
>*)0/'
$/g+/2-#""5'
A,#-#"')2(/O)24'
!!
A)8)"#*'39$/*.#-32$'#$'>)0%'N}C0*//$'1#2'9/'8#(/'32'FC0*//$'
!!
!!
!!
@%/5x*/'0*//')2(/O/$'#~/*'#""'
=/$$32')$'"#*4/"5'0%/'$#8/&'32/'2//($'03'1#*/7+""5'1*#~'0%/'$0*+10+*/'
#2(')0$'#"43*)0%8$'73*'0%/'2/>'8/()+8'
N#01%'+,(#0/$'03'#83*-s/'>*)0/'13$0$'
!!
!!
@*#(/'1%/#,'*/#($'73*'/O,/2$)./'>*)0/$'95')20*3(+1)24')89#"#21/'
!!
!!
oZ+D'T%#24'p'l+3D'M?A'_LLUq'
ol'p'`D'AA@<'_LKKq'
A5$0/8#-1'$0+(5'32',/*73*8#21/'37'FC0*//$'32'AA<$')2'
o[8*)1%D'M*#7D'l*)/4/"D'A1%+9/*0'p'@%38#D'<GH6B'_LKLq'
!!
!!
104
AA<$'230'#$'$/2$)-./'#$';<<$'03',#4/'$)s/'
T#,#9"/'37'#((*/$$)24'%)4%/*'()8/2$)32#"'(#0#')2'"/$$'-8/'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
o{/)2#"),3+*Cz#s-D'=)2D'l#"34/*#P)D'M+23,+"3$'p'B#uu#*D'!GA@'_LLVq'
;#$%')2(/O)24'9#$/('32'H)1*3;#$%'
!!
()*/103*5'
oLCKL]'
A&'_'
T&'K'
oKLC_L]'
A&'K'
T&'U'
o_LCUL]'
A&'L'
T&'L'
oULCdL]'
A&'L'
T&'L'
!!
A/0+,')$'$/2$3*'23(/$'>)0%'")8)0/('8/83*5'
#2(',*31/$$)24'1#,#9)")-/$'
A)8)"#*'03'/O0/2()9"/'%#$%)24'0/1%2)g+/$'
!!
<)*/103*5'P//,$'0*#1P'37'9+1P/0'93+2(#*)/$'
!!
F/3*4#2)s#-32'
\*/,#*--32)24]'
)7'$,")0'0%*/$%3"(')$'_'
!!
X*34*/$$)./'/O,#2$)32'9#$/('32'/g+)C>)(0%'
$,")w24'
!!
!!
()*/103*5'
105
oLCKL]'
A&'_'
T&'K'
oKLCKV]'
A&'K'
T&'L'
oKVC_L]'
A&'U'
T&'K'
o_LCUL]'
A&'L'
T&'L'
oULCdL]'
A&'L'
T&'L'
!!
[O,#2$)32'0*)44/*/('>%/2'0%/'2+89/*'37'
$,")0$'/O1//($'$38/'0%*/$%3"('
?27*/g+/20"5'+$/('9+1P/0$'#*/'R+$%/('03'AA<'
</"/-32'0%*3+4%'4#*9#4/'13""/1-32'#2('
*/3*4#2)s#-32'
!!
!!
!3*'/#1%'9+1P/0'8#)20#)2'0%/'"#$0'-8/')0'>#$'
+$/('A'#2('0%/'2+89/*'37'-8/$')0'%#$'9//2'
$,")0'T'
N#01%'+,(#0/$'#*/'%/",7+"'
M/2/*#")s#-32'37'")2/#*'%#$%)24'95'"#s5'
$,")w24')2'o)#24D'{%3+'p'H/24D'ZG?H'
_LLbq''
!!
@#P/$'#(.#20#4/'37'9#01%'+,(#0/$'
/.)10/('03'R#$%'
o</92#0%D'A/24+,0#'p'=)D'`=<N'_LKLq'
G'(/$)42'73*'%59*)('$5$0/8$'\!"#$%A03*/]'
9YI(>'>#$*&'>*)0/'#2('*/#('9+Q/*$'#2('8/0#(#0#'
;&+,(9&<*'(
".='>(
?#"9'(<:@'#(
l/5C.#"+/',#)*'
!)*$0'.#")(',#4/'
(/$0#4)24'
h'
A'&=(8&8,'(
l/5C.#"+/',#)*'
;<<'
h'
h'
A'8'.82(<"9(B'890#(
K'
L'
h'
K'
6"+C(D#'+'.8'(E*00$(F*9'#(
K'
L'
h'
K'
h'
L'
=#$0'.#")(',#4/'
X)%,-(>'>#$*&''
*/151"/('#,,/2('"34'
3*4#2)s/('#$'#''
151")1'")$0'37',#4/$D'
(/$0#4/('03';<<'
9#$/('32'*/1/215'
l//,$'0*#1P'37'(/$0#4/('/20*)/$'
106
oN/*2$0/)2D'F/)('p'<#$D'T?<F'_LKLq'
@*#2$#1-32'8#2#4/8/20'32'R#$%&';5(/*'
!!
G'()Q/*/20'#*1%)0/10+*/'73*'0*#2$#1-32'
8#2#4/8/20'
!!
@%/'0#*4/0')$'(#0#'1/20/*$'
!!
!!
!!
B//('73*'$1#"/C3+0'
=34C$0*+10+*/('8+"-C./*$)32/('
(#0#9#$/'
!!
!!
!!
t@%/'"34')$'0%/'(#0#9#$/c'
@%3+4%'AA<$'3*'/./2';<<$'8#5'9/'
+$/('#$'>/""'
@%*//C"#5/*/('#*1%)0/10+*/'
!!
!!
!!
107
A/*./*'_'
@*#2$#1-32')2,+0'
$2#,$%30'
F3""'"34'73*>#*('
@*#2$#1-32'
)20/2-32'\FZ'$/0$]'
G$$/89"/'"31#"'"34'
13,5'
B3',#*--32)24'
F#>'R#$%'1%),$'+$/('73*'$03*#4/'
!!
!!
A%#*/('(#0#D'8+"-C13*/'23(/$'
A/*./*'K'
A03*#4/'"#5/*'8#)20#)2$'$%#*/('"34'
?2(/O'"#5/*'$+,,3*0$'"33P+,'#2('
./*$)32)24'
@*#2$#1-32'"#5/*',*3.)(/$')$3"#-32'#2('
132-2+3+$"5'*/7*/$%/$'0%/'(#0#9#$/'
1#1%/'95'*+22)24'0%/'t8/"(c'#"43*)0%8'
N*3#(1#$0'03'
$/*./*$'
X*30313"'
!3*>#*(')20/2-32'
G,,/2('03'"34'
A1#"#9"/'*/")#9"/'()$0*)9+0/('"34'
6+0")2/'
!!
!"#$%C9#$/('(/.)1/'(/$)42'
!!
!!
!!
A3")('$0#0/'(*)./$'
H#P)24'AA<$'(#0#9#$/C7*)/2("5'
A5$0/8C"/./"'1%#""/24/$'
!!
!!
!!
!!
108
;59*)('$5$0/8$'
A03*#4/D'9+Q/*)24'#2('1#1%)24'
?2(/O)24'32'R#$%'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
S+/*5'#2('0*#2$#1-32',*31/$$)24'
!!
X*39"/8'$/0+,'
!!
!!
!!
F/$/#*1%'g+/$-32$'
!!
!!
!!
!!
A#8/'05,/$'37'g+/*5'#2('0*#2$#1-32#"'>3*P"3#('
<)Q/*/20'8/()+8|'230'>%#0'/O)$-24'#,,*3#1%/$'%#./'9//2'
3,-8)s/('73*'
G*/'0%/*/',*39"/8$'0%#0'9/$0'E0'03'AA<$r'
<3/$'32/'2//('*#()1#""5'()Q/*/20'#,,*3#1%/$D'3*'$")4%0'#(#,0#-32$r'
Z%/*/')2'0%/'$03*#4/'%)/*#*1%5'$%3+"('>/'+$/'AA<$'#2('%3>r'
H/0%3(3"34)/$'
!!
!!
!!
109
!"#$%C#>#*/'#"43*)0%8$'/)0%/*'95'(/$)42'3*'0%*3+4%'#(#,0#-32'
6v3#(',#*0$'37'0%/'138,+0#-32'03'R#$%'8/83*5'
[13238)/$'37'$1#"/'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
6"('$03*)/$D'2/>'035$'
!!
?8,#10'37'$/"/1-.)05'32',*/()1#0/'/.#"+#-32'oH5/*$D'HA1'
@%/$)$D'H?@D'_LLiq'
!!
!!
!!
6./*#""D'#$'$/"/1-.)05'7#103*')21*/#$/$',/*73*8#21/'(/4*#(/$'\2//("/'
)2'%#5$0#1P'g+/*)/$]'
G0'-8/$';<<$'8)4%0'3+0,/*73*8'AA<$'
y3)2',*31/$$)24'32'AA<$'+$)24'#"43*)0%8$'(/$)42/('73*';<<$'
o<3'p'X#0/"D'<GH6B'_LLjq'
!!
!!
!!
!!
!!
110
AA<'u3)2$'8#5'>/""'9/138/'TXnC93+2(D'$3'>#5$'03')8,*3./'0%/'TXn'
,/*73*8#21/'9/138/'$#")/20''
@*#()24'*#2(38'*/#($'73*'*#2(38'>*)0/$',#5$'3Q'
F#2(38'>*)0/$'*/$+"0')2'.#*5)24'?I6'#2('+2,*/()10#9"/',/*73*8#21/'
N"31P/('?I6'$-""')8,*3./$',/*73*8#21/'
N"31P'$)s/'$%3+"('9/'#'8+"-,"/'37'0%/',#4/'$)s/'
o@$)*34)#22)$D';#*)s3,3+"3$D'A%#%D'Z)/2/*'p'M*#/7/D'A?MH6<'_LLjq'
?8,#10'37'$03*#4/'"#53+0&'0%/'!"#$%y3)2'
!!
!!
A03*#4/'"#53+0'9#$/('32'
XG'
B3'2//('03'*/0*)/./'>%#0x$'
230'2/1/$$#*5'
!!
!!
select
from
where
n$/'#'$,/1)#")s/('3,/*#03*'
\!"#$%A1#2]'73*'32C0%/CR5'
,*3u/1-32$'3./*'XG'
!!
!/01%'
P/*2/"'
!/01%)24'(#0#'95',*3u/1-24' ND'T'
*/"/.#20'\0%/'7/01%'P/*2/"]'
[.#"+#0/'0%/'u3)2',*/()1#0/'
G'
\0%*3+4%'0%/'u3)2'P/*2/"]'
#2('8#0/*)#")s/'0%/'*/$+"0')2'
!"#$%'
#'u3)2')2(/O'
A1#2'
!'
y3)2'
P/*2/"'
G'm'<'
FK\GD'ND'T]'
111
!/01%'
P/*2/"'
M'
</"/4#0/'u3)2'138,+0#-32'
03'0>3'$0/,$'
!!
ND'TD'[D';D'!'
R1.B, R1.C,
R2.E, R2.H, R3.F
R1, R2, R3
R1.A = R2.D and
R2.G = R3.K
y3)2'
P/*2/"'
M'm'l'
l'
!"#$%'
A1#2'
<'
!"#$%'
A1#2'
FU\!D'lD'=]'
[D';'
F_\<D'[D'MD';]'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
oT#2)8D'H)%#)"#D'N%#:#1%#*u//D'=#24'p'F3$$D'G<HA'_LKLq'
H/89/*$%),'g+/*)/$'
!!
H3-.#-32&',33*'"31#")05'37'N"338'E"0/*$'
!!
!!
!!
!!
;+*0$'1#1%/',/*73*8#21/')2'TXnC)20/2$)./'#,,")1#-32$'
M33('1#2()(#0/'73*'3v3#()24'03'R#$%'8/83*5'
M33('*#2(38'*/#(',/*73*8#21/'138,#*/('03';<<$D'9+0'>/'$-""'
2//('03'1*3$$'0%/'8/83*5C()$P'93+2(#*5'
A3"+-32&'9/)24'"#s5',#5$'3Q'
!!
!!
</7/*'*/#($'#2('>*)0/$'#2('0%*3+4%'9+Q/*)24'
?20*3(+1/'%)/*#*1%)1#"'$0*+10+*/'03'#113+20'73*'()$PC"/./"',#4)24'
N+Q/*'"#5/*')2'8/83*5'
<)$P',#4/$'#10'#$'
$+9CN"338'E"0/*$'
LKKLKKLLLKLKKLLK'
KKLLKKLLLKLKKLKK'
KKLLKKLKLKLKKLKL'
N+Q/*'9"31P$'
P//,'0*#1P'
37'(/7/**/('
*/#($'#2('>*)0/$'
LLKKKKLLLKLKKLKL'
!)"0/*'"#5/*'32'AA<'
112
o=//'D'H332D'X#*PD'l)8'p'l)8D'A?MH6<'_LLbq'
<#0#9#$/'3,/*#-32$'$+)0/('73*'AA<$'
!!
!!
!!
M)./2'0*#2$#1-32#"'#2('g+/*5',*31/$$)24'>3*P"3#($D'
>%)1%'3,/*#-32$'#*/'AA<$'9/:/*'73*r'
A0+(5'37'0%/'?I6',#:/*2$'
?(/2-75'$/*./*'$03*#4/'$,#1/$'0%#0'/O%)9)0'AA<C7*)/2("5'?I
6',#:/*2$'
!!
!!
@#9"/$D')2(/O/$D'0/8,3*#*5'$03*#4/D'"34D'*3""9#1P'$/48/20$'
A/132(#*5'$0*+10+*/$'#*/'9/:/*'$+)0/('73*'AA<$'
!!
!!
!!
113
=324'$/g+/2-#"'>*)0/$'\23'+,(#0/$]'#2('*#2(38'*/#($'
X/*73*8#21/')8,*3./8/20'83*/'0%#2'32/'3*(/*'37'8#42)0+(/'
>%/2'"344)24'#2('*3""9#1P'$/48/20$'#*/'(/"/4#0/('03'AA<$'
!#103*'37'0>3')8,*3./8/20'>%/2'0/8,3*#*5'$03*#4/')$'32'AA<$'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
oT%/2D'A?MH6<'_LLjq'
?2/O,/2$)./'"344)24')2'R#$%'8/83*5'
!!
G$'8/2-32/(D'"344)24')$'32/'37'0%/'9/$0'E0$'73*'AA<$'
!!
!!
!!
!!
@5,)1#""5D'>*)0/$'#*/'#,,/2(/('03'0%/'"34'
@%/'32")2/'./*$)32'37'0%/'"34')$'+$+#""5'$8#""'
nAN'R#$%'8/83*5')$'1%/#,'#2('nAN',3*0$'#*/'#9+2(#20'
?20+)-32&'$,*/#('0%/'"34'#1*3$$'8+"-,"/'1%/#,'nAN'R#$%'()$P$'
!!
X*3.)(/'$)8,"/'"344)24')20/*7#1/'
#*1%)./*'
h'
>3*P/*'
>3*P/*'
114
9':;',"(:;';'(
38"'$<%2'(
>*)0/D'R+$%D'1%/1P,3)20|'
*/13./*5'
?2C8/83*5'"34'9+Q/*'
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
115
Tutorial Outline
1. Introduction (Philippe)
2. Flash devices characteristics (Luc)
3. Data management for flash devices (Stratis)
4. Two outlooks (Stratis & Philippe)
6+0"33P'
!!
AA<')$'#'()./*$/'1"#$$'37'(/.)1/$'
!!
!!
!!
@%/'32"5'138832'1%#*#10/*)$-1'37'#""'8/89/*$'37'0%/'1"#$$')$'0%/'
/O1/""/20'*#2(38'*/#(',/*73*8#21/'
n2(/*"5)24'0/1%23"345'#Q/10$',/*73*8#21/')2'30%/*'3,/*#-32$'
AA<$'(3'230'138,"/0/"5'(38)2#0/';<<$'^'230'5/0'
!!
!!
Z%/*/'(3'0%/5'E0')2'0%/'(#0#9#$/'$0#1Pr'
!!
!!
!!
!!
!!
!!
!!
A38/'05,/$'37'AA<'8#5'9/'#2'3*(/*'37'8#42)0+(/'$"3>/*'0%#2';<<$')2'
*#2(38'>*)0/$'
X/*$)$0/20'$03*#4/'^'8#59/')2'1389)2#-32'>)0%';<<$'
F/#('1#1%/'37';<<'(#0#'
@*#2$#1-32#"'"344)24'
n$)24'0%/';<<'#$'#'"34C$0*+10+*/('>*)0/C1#1%/'73*'0%/'AA<'
@/8,3*#*5'$03*#4/'#2('$0#4)24'#*/#'
G25'37'0%/'#93./'
H3*/'*/$/#*1%'2/1/$$#*5'#0'0%/'AA<I<N')20/*7#1/$'
116
117
Design Space
Query
Processor
Storage
Manager
OS
RAID Controller
FTL
FD HW
Cross-layer issues:
- Avoid duplicating work
- Split work most effectively
- Schedule work most effectively
- Avoid arbitrary limitations
Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011
118
Query Processor
Performance Contract
Flash Devices Characteristics:
Storage Manager
OS
RAID controller
FTL
FD HW
119
120
Query Processor
Storage Manager
OS
[Schloser et al, CMU tech report 2003; Schloser et al, FAST 2004; Prabakharan et al., OSDI 2008]
RAID controller
FTL
FD HW
ERASE (address)
Command
Interpreter
ERASE (address)
121
TRIM command
122
Beyond TRIM
[Nellans et al. FusionIO 2010, Arpaci-Dusseau et al, HotStorage 2010]
123
Atomic Writes
[Prabakharan et al, OSDI 2008; Ouyang et al, HPCA'11]
124
125
Take-away Point # 1
&!
&!
126
Take-away Point # 2
&!
&!
127
Take-away Point # 3
&!
&!