Professional Documents
Culture Documents
n n n n n n n
Introduction Background Distributed DBMS Architecture Distributed Database Design Semantic Data Control Distributed Query Processing Distributed Transaction Management
Data server approach Parallel architectures Parallel DBMS techniques Parallel execution models
o o o o
Parallel Database Systems Distributed Object DBMS Database Interoperability Concluding Remarks
1998 M. Tamer zsu & Patrick Valduriez Page 13.1
Distributed DBMS
Large volume of data use disk and large main memory I/O bottleneck (or memory access bottleneck)
Speed(disk) << speed(RAM) << speed(microprocessor)
Predictions
(Micro-) processor speed growth : 50 % per year DRAM capacity growth : 4 every three years Disk throughput : 2 in the last ten years
Distributed DBMS
Page 13.2
Page 1
The Solution
n
1990's: same solution but using standard hardware components integrated in a multiprocessor
Software-oriented Standard essential to exploit continuing technology
improvements
Distributed DBMS
Page 13.3
Multiprocessor Objectives
n n
High-performance with better cost-performance than mainframe or vector supercomputer Use many nodes, each with good costperformance, communicating through network
Good cost via high-volume components Good performance via bandwidth
Trends
Microprocessor and memory (DRAM): off-the-shelf Network (multiprocessor edge): custom
The real chalenge is to parallelize applications to run with good load balancing
Distributed DBMS
Page 13.4
Page 2
client interface
Application server
database
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.5
Distributed DBMS
Page 13.6
Page 3
Advantages
Integrated data control by the server (black box) Increased performance by dedicated system Can better exploit parallelism Fits well in distributed environments
Potential problems
Communication overhead between application and data
server
u
High-level interface
Distributed DBMS
Page 13.7
(e.g., C*, Fortran90) Offer a new language in which parallelism can be expressed or automatically inferred n
Critique
Hard to develop parallelizing compilers, limited resulting
speed-up
Distributed DBMS
Page 13.8
Page 4
Data-based Parallelism
n Inter-operation p operations of the same query in parallel
op. R
Distributed DBMS
op. R1
op. R2
op. R2
op. R4
Page 13.9
Parallel DBMS
n
vendor edge)
manufacturer edge)
Distributed DBMS
Page 13.10
Page 5
Much better cost / performance than mainframe solution High-performance through parallelism
High throughput with inter-query parallelism Low response time with intra-operation parallelism
High availability and reliability by exploiting data replication Extensibility with the ideal goals
Linear speed-up Linear scale-up
Distributed DBMS
Page 13.11
Linear Speed-up
Linear increase in performance for a constant DB size and proportional increase of the system components (processor, memory, disk)
components
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.12
Page 6
Linear Scale-up
Sustained performance for a linear increase of database size and proportional increase of the system components.
ideal
Barriers to Parallelism
n
Startup
The time needed to start a parallel operation may
Interference
When accessing shared resources, each new process slows
Skew
The response time of a set of parallel processes is the time
Distributed DBMS
Page 13.14
Page 7
DM task 11
DM task 12
DM task n1
Distributed DBMS
Page 13.15
Session manager
Host interface Transaction monitoring for OLTP
Request manager
Compilation and optimization Data directory management Semantic data control Execution control
Data manager
Execution of DB operations Transaction management support Data management
Distributed DBMS
Page 13.16
Page 8
Hybrid architectures
Hierarchical (cluster) Non-Uniform Memory Architecture (NUMA)
Distributed DBMS
Page 13.17
Shared-Memory Architecture
P1 Pn
interconnect
Global Memory
Distributed DBMS
Page 9
Shared-Disk Architecture
P1 M1 Pn Mn
interconnect
network cost, extensibility, migration from uniprocessor complexity, potential performance problem for copy coherency
1998 M. Tamer zsu & Patrick Valduriez Page 13.19
Distributed DBMS
Shared-Nothing Architecture
interconnect
P1 D1 M1
Pn Dn Mn
Examples : Teradata (NCR), NonStopSQL (TandemCompaq), Gamma (U. of Wisconsin), Bubba (MCC)
Extensibility, availability Complexity, difficult load balancing
1998 M. Tamer zsu & Patrick Valduriez
Distributed DBMS
Page 13.20
Page 10
Hierarchical Architecture
P1 Pn P1 Pn
interconnect
interconnect
Global Memory
Global Memory
n n
nodes High number of small nodes, e.g., 16 x 4 processor nodes, has much better cost-performance (can be a cluster of workstations)
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.21
Single address space : Sequent, Encore, KSR Multiple address spaces : Intel, Ncube Physical memory u Central : Sequent, Encore u Distributed : Intel, Ncube, KSR
u u
Distributed DBMS
Page 13.22
Page 11
NUMA Architectures
n n
Distributed DBMS
Page 13.23
COMA Architecture
Disk Disk Disk
P1
P2
Pn
Cache Memory
Cache Memory
Cache Memory
Distributed DBMS
Page 13.24
Page 12
Data placement
Physical placement of the DB onto multiple nodes Static vs. Dynamic
Transaction management
Similar to distributed transaction management
Distributed DBMS
Page 13.25
Data Partitioning
n
Each relation is divided in n partitions (subrelations), where n is a function of relation size and access frequency Implementation
Round-robin
u u
Maps i-th element to node i mod n Simple but only exact-match queries Supports range queries but large index Only exact-match queries but small index
B-tree index
u
Hash function
u
Distributed DBMS
Page 13.26
Page 13
Partitioning Schemes
Hashing
Round-Robin
a-g
h-m
u-z
Interval
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.27
hurts load balancing when one node fails interleaved partitioning (Teradata) chained partitioning (Gamma)
Distributed DBMS
Page 13.28
Page 14
Interleaved Partitioning
Node Primary copy Backup copy r 2.3 r 3.2 r 3.2 1 R1 2 R2 r 1.1 3 R3 r 1.2 r 2.1 4 R4 r 1.3 r 2.2 r 3.1
Distributed DBMS
Page 13.29
Chained Partitioning
1 R1 r4
2 R2 r1
3 R3 r2
4 R4 r3
Distributed DBMS
Page 13.30
Page 15
Placement Directory
n
In either case, the data structure for f1 and f2 should be available when needed at each node
Distributed DBMS
Page 13.31
Join Processing
n
They also apply to other complex operators such as duplicate elimination, union, intersection, etc. with minor adaptation
Distributed DBMS
Page 13.32
Page 16
node 2 R2:
S1 node 3 node 4
S2
R
Distributed DBMS
S i=1,n(R
Si)
Page 13.33
S1 node 3 node 4
S2
R
Distributed DBMS
S i=1,n(Ri
Si)
Page 13.34
Page 17
node 1
node 2
R
Distributed DBMS
S i=1,P(Ri
Si)
Page 13.35
Search strategy
Dynamic programming for small search space Randomized for large search space
Distributed DBMS
Page 13.36
Page 18
Left-deep
j1 R1
Right-deep
Zig-zag
R4
Bushy
Distributed DBMS
Page 13.37
R1
Distributed DBMS
R2
1998 M. Tamer zsu & Patrick Valduriez
R1
R2
Page 13.38
Page 19
Load Balancing
n
Solutions
sophisticated parallel algorithms that deal with skew dynamic processor allocation (at execution time)
Distributed DBMS
Page 13.39
JPS
Res2
AVS/TPS
S2
Join1
Join2 S1
AVS/TPS
RS/SS AVS/TPS
R2 Scan1
RS/SS
Scan2 R1
AVS/TPS
Distributed DBMS
Page 13.40
Page 20
Prototypes
EDS and DBS3 (ESPRIT) Gamma (U. of Wisconsin) Bubba (MCC, Austin, Texas) XPRS (U. of Berkeley) GRACE (U. of Tokyo)
Products
Teradata (NCR) NonStopSQL (Tandem-Compac) DB2 (IBM), Oracle, Informix, Ingres, Navigator
(Sybase) ...
Distributed DBMS
Page 13.41
Hybrid architectures OS support:using micro-kernels Benchmarks to stress speedup and scaleup under mixed workloads Data placement to deal with skewed data distributions and data replication Parallel data languages to specify independent and pipelined parallelism Parallel query optimization to deal with mix of precompiled queries and complex ad-hoc queries Support of higher functionality such as rules and objects
1998 M. Tamer zsu & Patrick Valduriez Page 13.42
Distributed DBMS
Page 21