You are on page 1of 63

Data Center Disaster Recovery

KwaiSeng
Consulting Systems Engineer

Presentation_ID

2006 Cisco Systems, Inc. All rights reserved.

Cisco Confidential

Agenda
Data CenterThe Evolution Data Center Disaster Recovery
Objectives Failure Scenarios Design Options

Components of Disaster Recovery


Site SelectionFront End GSLB Server High AvailabilityClustering Data Replication and SynchronizationSAN Extension

Data Center Technology Trends Summary


2006 Cisco Systems, Inc. All rights reserved.

The Evolution of Data Centers

2006 Cisco Systems, Inc. All rights reserved.

Data Center Evolution


Networked Data Center Phase
Data Center Continuous Data Center Availability Virtualization

Business Agility

Compute Evolution
Client/ Server Mainframes

Internet Computing

Data Center Consolidation Network Optimization

Data Center Networking


Content Networking Thin Client: HTTP TCP/IP 1. Consolidation 2. Integration 3. Virtualization 4. High Availability

Terminal

Network Evolution 2010


4

1960

1980
2006 Cisco Systems, Inc. All rights reserved.

2000

Todays Data Center


Integration of Many Systems and Services
Storage Network N-Tier Applications Front End Network Application/Server Optimization
Security Web Servers

WAN/ Internet
DR Data Center Scalable Infrastructure

FC Switch

Cache Resilient IP Firewall

NAS

Application and Server Optimization


App Servers IDS

Content Switch

Data Center Security


MAN/ Internet

VSANs
FC Switch DB Servers Mainframe IP Comm. Operations

DC Storage Networks Distributed Data Centers

FC Switch RAID
Metro Network DWDM/SONET/Ethernet
2006 Cisco Systems, Inc. All rights reserved.

Tape

FC SAN

Secondary Data Center

What Is Distributed Data Center?

App A

App B

App A

App C

Data Replication
FC FC

Primary Data Center

Secondary Data Center


6

2006 Cisco Systems, Inc. All rights reserved.

Distributed Data Centers


Required by disaster recovery and business continuance Avoid single, concentrated data depositary High availability of applications and data access Load balancing together with performance scalability Better response and optimal content routing: proximity to clients

2006 Cisco Systems, Inc. All rights reserved.

Front-End IP Access Layer

Content Routing Site Selection


App A App B App A App C

FC

FC

Primary Data Center

Secondary Data Center


8

2006 Cisco Systems, Inc. All rights reserved.

Application and Database Layer

App A

App B

Content Switching Load Balancing Server Clustering High Availability

App A

App C

FC

FC

Primary Data Center

Secondary Data Center


9

2006 Cisco Systems, Inc. All rights reserved.

Backend SAN Extension

App A

App B

Storage and Optical Data Replication and Transporting

App A

App C

FC

FC

Primary Data Center

Secondary Data Center


10

2006 Cisco Systems, Inc. All rights reserved.

Data Center Disaster Recovery

2006 Cisco Systems, Inc. All rights reserved.

11

Agenda
Introduction to Data CenterThe Evolution Data Center Disaster Recovery
Objectives Failure Scenarios Design Options

Components of Disaster Recovery


Site SelectionFront End GSLB Server High AvailabilityClustering Data Replication and SynchronizationSan Extension

Data Center Technology Trends Summary


2006 Cisco Systems, Inc. All rights reserved.

12

Disaster Recovery
Recovery of data and resumption of serviceEnsuring business can recover and continue after failure or disaster Ability of a business to adapt, change and continue when confronted with various outside impacts Mitigating the impact of a disaster

2006 Cisco Systems, Inc. All rights reserved.

13

Disaster Recovery What It Means for Business


Business Resilience
Continued Operation of Business During a Failure

Business Continuance
Restoration of Business After a Failure

Disaster Recovery
Protecting Data Through Offsite Data Replication and Backup

Zero Down Time Is the Ultimate Goal


14

2006 Cisco Systems, Inc. All rights reserved.

Disaster Recovery Planning


Business Impact Analysis (BIA)
Determines the impacts of various disasters to specific business functions and company assets

Risk analysis
Identifies important functions and assets that are critical to companys operations

Disaster Recovery Plan (DRP)


Restores operability of the target systems, applications, or computing facility at the secondary data center after the disaster

2006 Cisco Systems, Inc. All rights reserved.

15

Disaster Recovery Objectives


Recovery Point Objective (RPO)
The point in time (prior to the outage) in which system and data must be restored to Tolerable lost of data in event of disaster or failure The impact of data loss and the cost associated with the loss

Recovery Time Objective (RTO)


The period of time after an outage in which the systems and data must be restored to the predetermined RPO The maximum tolerable outage time

2006 Cisco Systems, Inc. All rights reserved.

16

Recovery Point/Time vs. Cost


Critical Data Is Recovered Disaster Strikes Systems Recovered and Operational Time Recovery Point time t0
Days Hours Mins

Recovery Time Time t1


Secs Secs Mins

Time t2
Hours Days Weeks

Tape backup

Periodic Asynchronous Synchronous Extended Replication Replication Replication Cluster

Manual Migration

Tape Restore

$$$ Increasing Cost

$$$ Increasing Cost

Smaller RPO/RTO
Higher $$$, replication, hot standby
2006 Cisco Systems, Inc. All rights reserved.

Larger RPO/RTO
Lower $$$, tape backup/restore, cold standby
17

Agenda
Introduction to Data CenterThe Evolution Data Center Disaster Recovery
Objectives Failure Scenarios Design Options

Components of Disaster Recovery


Site SelectionFront End GSLB Server High AvailabilityClustering Data Replication and SynchronizationSan Extension

Data Center Technology Trends Summary


2006 Cisco Systems, Inc. All rights reserved.

18

Failure Scenarios
Disaster Could Mean Many Types of Failure
Network failure Device failure Storage failure Site failure

2006 Cisco Systems, Inc. All rights reserved.

19

Network Failures
ISP failure
Dual ISP connections Multiple ISP
Service Provider A

Internet

Service Provider B

Connection failure within the network


EtherChannel Multiple route paths

2006 Cisco Systems, Inc. All rights reserved.

20

Device Failures
Routers, switches, FWs
HSRP VRRP
Service Provider A

Internet

Service Provider B

Hosts
HA cluster LB server farm NIC teaming

2006 Cisco Systems, Inc. All rights reserved.

21

Storage Failures
Disk arrays
RAID
Service Provider A

Internet

Service Provider B

Disk controllers Storage Replication


Site to Site Mirroring Optimization

2006 Cisco Systems, Inc. All rights reserved.

22

Site Failures
Partial site failure
Application maintenance Application migration Application scheduled DR exercise
Service Provider A

Internet

Service Provider B

Complete site failure


Disaster

2006 Cisco Systems, Inc. All rights reserved.

23

Agenda
Introduction to Data CenterThe Evolution Data Center Disaster Recovery
Objectives Failure Scenarios Design Options

Components of Disaster Recovery


Site SelectionFront End GSLB Server High AvailabilityClustering Data Replication and SynchronizationSan Extension

Data Center Technology Trends Summary


2006 Cisco Systems, Inc. All rights reserved.

24

Warm Standby
A data center that is equipped with hardware and communications interfaces capable of providing backup operating support Latest backups from the production data center must be delivered Network access needs to be activated Application needs to be manually started

2006 Cisco Systems, Inc. All rights reserved.

25

Disaster RecoveryActive/Standby

App A

App B

App A

App C

IP/Optical Network
FC

Primary Data Center

Secondary Data Center (Warm Standby)

FC

2006 Cisco Systems, Inc. All rights reserved.

26

Hot Standby
A data center that is environmentally ready and has sufficient hardware, software to provide data processing service with little down time Hot backup offers disaster recovery, with little or no human intervention Application data is replicated from the primary site A hot backup site provides better RTO/RPO than warm standby but cost more to implement Business continuance

2006 Cisco Systems, Inc. All rights reserved.

27

Disaster RecoveryActive/Standby

App A

App B

App A

App C

IP/Optical Network
FC FC

Primary Data Center


2006 Cisco Systems, Inc. All rights reserved.

Secondary Data Center


28

Active/Active DR Design Multiple Tiers of Application


Service Provider A

Internet

Service Provider B

Presentation Tier Application Tier Storage Tier

2006 Cisco Systems, Inc. All rights reserved.

29

Active/Active Data Centers


Internal Network Service Provider A

Internet

Service Provider B

Internal Network

Active/Active Web Hosting Active/Active Application Processing Active/Standby Database Processing or Active/Active for Different Application
2006 Cisco Systems, Inc. All rights reserved.

30

Components of Disaster Recovery

2006 Cisco Systems, Inc. All rights reserved.

31

Agenda
Introduction to Data CenterThe Evolution Data Center Disaster Recovery
Objectives Failure Scenarios Design Options

Components of Disaster Recovery


Site SelectionFront End GSLB Server High AvailabilityClustering Data Replication and SynchronizationSAN Extension

Data Center Technology Trends Summary


2006 Cisco Systems, Inc. All rights reserved.

32

Site Selection Mechanisms


Site selection mechanisms depend on the technology or mix of technologies adopted for request routing:
1. HTTP redirect 2. DNS-based 3. L3 Routing with Route Health Injection (RHI)

Health of servers and/or applications needs to be taken into account Optionally, other metrics (like load) can be measured and utilized for a better selection

2006 Cisco Systems, Inc. All rights reserved.

33

HTTP RedirectionTraffic Flow

http://www.cisco.com/ http://www1.cisco.com/

1 /1. .com P TT isco ed om H v / c o M co.c ET w w . G 2 1. t: w 30 2.cis 1 . s P/1 www Ho T n: HT 2. atio c Lo 3. GET/H TTP/1.1 Host: ww w2.cisco .co

Kee ves pali


m

HTTP/1.1 200 OK

http://www2.cisco.com/
2006 Cisco Systems, Inc. All rights reserved.

34

DNS-Based Site SelectionTraffic Flow


Root Name Server for/ DNS Proxy 2
3 4 5 1 10 8 9 6 7

Authoritative Name Server for .com Authoritative Name Server cisco.com

Client
http://www.cisco.com/ UDP:53 TCP:80

Authoritative Name Server www.cisco.com


Ke epa live s

es aliv p e Ke

Data Center 1
2006 Cisco Systems, Inc. All rights reserved.

Data Center 2
35

Route Health InjectionImplementation


Client A Router 13 Router 11 Client B

Router 10

Router 12

Low Cost

Very High Cost


Location A Backup Location for VIP x.y.w.z Location B Preferred Location for VIP x.y.w.z

2006 Cisco Systems, Inc. All rights reserved.

36

Site Selection Summary


Redundancy Mode HTTP Re-Direct DNS Active/Active DNS Cache Yes No Active/Active No Convergence App Health Visibility No Site Persistence Yes

RHI

Active/Standby

Within Secs

Yes

No

2006 Cisco Systems, Inc. All rights reserved.

37

Agenda
Introduction to Data CenterThe Evolution Data Center Disaster Recovery
Objectives Failure Scenarios Design Options

Components of Disaster Recovery


Site SelectionFront End GSLB Server High AvailabilityClustering Data Replication and SynchronizationSan Extension

Data Center Technology Trends Summary


2006 Cisco Systems, Inc. All rights reserved.

38

Cluster Overview
Load Balancing Cluster : multiple copies of the same application against the same data set, usually read only High Availability Cluster : multiple copies of application that requires access to a common data depository, usually read and write Clustering provides benefits for availability, reliability, scalability, and manageability

Web Servers

Application Servers

Database Servers

2006 Cisco Systems, Inc. All rights reserved.

39

High Availability Cluster Design


Public Network : Client /Application requests
APP Cluster Software Cluster Enabler OS

Private Network : Interconnection between nodes Storage Disk : Shared storage array, NAS or SAN
2006 Cisco Systems, Inc. All rights reserved.

40

HA Cluster Application View


Active/standby
Standby takes over when active fails Two-node or multi-node

Active/active
Database requests load balanced all nodes Lock mechanism ensures data integrity

Shared everything
Each node mounts all storage resources Provides a single layout reference system for all nodes

Node1

Node2

Shared nothing
Each node mounts only its semi-private storage Data stored on the peer systems storage is accessed via the peer-peer communication
41

2006 Cisco Systems, Inc. All rights reserved.

Geo-Clusters Considerations
Geo-Cluster: Cluster That Span Multiple Data Centers
WAN
Local
Datacenter

Remote Datacenter

Node1

Node2

Challenges:
Disk Replication Synchronous or Asynchronous 2 x RTT

Split brain L2 heart-beats Storage

2006 Cisco Systems, Inc. All rights reserved.

42

HA Cluster Challenges : Split-Brain


Split-brain : Active nodes concurrently accessing the same disk, leads to data corruption
Node1 Node2

Resolution : Use a Quorum, a tie breaker for gaining access to the disk

Data Corruption

2006 Cisco Systems, Inc. All rights reserved.

43

Layer 2 Heartbeats
Extended L2 Network : L2 adjacency required for nodes heartbeat. Extending VLAN across site is hazardous
Node1
Local Datacenter

WAN
Remote Datacenter

Public Layer 2 Network Private Layer 2 Network


Node2

Resolution : L3 Capability for Cluster Heartbeat. EoMPLS to carry L2 hearbits across DR sites.

Disk Replication Synchronous or Asynchronous

2006 Cisco Systems, Inc. All rights reserved.

44

Storage Disk Zoning


Storage Zoning : Taking over of storage disk array when active node fails. Resolution : Cluster software to communicate with the Cluster Enabler. Instructs the Disk Array to perform an failover when failure is detected.
Node1 Active Node2 Standby

Extended SAN

sym1320 RW RW

sym1291 WD WD
45

2006 Cisco Systems, Inc. All rights reserved.

Agenda
Introduction to Data CenterThe Evolution Data Center Disaster Recovery
Objectives Failure Scenarios Design Options

Components of Disaster Recovery


Site SelectionFront End GSLB Server High AvailabilityClustering Data Replication and SynchronizationSan Extension

Data Center Technology Trends Summary


2006 Cisco Systems, Inc. All rights reserved.

46

Storage for Applications


Presentation tier
Unrelated small data files commonly stored on internal disks Manual distribution

Application processing tier


Transitional, unrelated data Small files residing on file systems May use RAID to spread data over multiple disks

Storage tier
Large, permanent data files or raw data Large batch updates, most likely real time Log and data on separate volumes

2006 Cisco Systems, Inc. All rights reserved.

47

Replication: Modes of Operation


Synchronous
All data written to local and remote arrays before I/O is complete and acknowledged to host
Speed of Light = 3 x 108m/s (Vacuum) 3.3s/km Speed through Fiber c 5s/km 2 RTT per write I/O = 20s/km

Asynchronous
Write acknowledged and I/O is complete after write to local array; changes (writes) are replicated to remote array asynchronously

2006 Cisco Systems, Inc. All rights reserved.

48

Synchronous vs. Asynchronous TradeOff


Enterprises Must Evaluate the Trade-Offs
Synchronous
Impact to Application Performance Distance Limited (Are Both Sites Within the Same Threat Radius) No Data Loss

Asynchronous
No Application Performance Impact Unlimited Distance (Second Site Outside Threat Radius) Exposure to Possible Data Loss

Maximum tolerable distance ascertained by assessing each application Cost of data loss

2006 Cisco Systems, Inc. All rights reserved.

49

Data Replication with DB Example


DB name Creation date Backup performed Redo log time period Datafile state Control Files

Control files identify other files making up the database and records content and state of the db Datafile is only updated periodically Redo logs record db changes resulting from transactions
Used to play back changes that may not have been written to datafile when failure occurred Typically archived as they fill to local and DR site destinations

Identify

Datafiles

Record Changes To

Table spaces Indexes Data dictionary


2006 Cisco Systems, Inc. All rights reserved.

Redo Log Files Database changes

50

Data Replication with DB Example (Cont.)


Time

...

...

...

t0

Archived Redo Logs

Online Redo Logs

t1

Failure or Disaster Occurs at Time t1


Media failure (e.g., disk) Human error (datafile deletion) Database corruption

Hot Backup of Datafiles and Control Files Taken at Time t0

Database restored to state at time of failure (time t1) by:


1. Restoring control files and datafiles from last hot backup (time t0) 2. Sequentially replaying changes from subsequent redo logs (archived and online)changes made between time t0 and t1

2006 Cisco Systems, Inc. All rights reserved.

51

Data Replication with DB Example (Cont.)


Primary Site
Redo Logs (Cyclic)
Copy of Every Committed Transaction

Redo Logs (Cyclic) Synchronously Replicated for Zero Loss

Secondary Site
Earlier DB Backups

Database

Database Copy at Time t0 Point in Time Copy Taken When DB Quiescent

SAN Extension Transport


Replicated/Copied

Database Copy at Time t0

Archive Logs

Replicated/Copied

Archive Logs

Mixture of Sync and Async Replication Technologies Commonly Used


Usually only redo logs sync replicated to remote site Archive logs created from redo log and copied when redo log switches Point in Time (PiT) copies of datafiles and control files copied periodically (e.g., nightly)
2006 Cisco Systems, Inc. All rights reserved.

52

Data Center Interconnection Options


Internet Stateful Firewalls Content Caching Server Load Balancing Intrusion Detection Front-End Application Servers Stateful Firewalls Content Caching Server Load Balancing Intrusion Detection Front-End Application Servers High Density Multilayer LAN Switch Internet

High Density Multilayer LAN Switch

SONET/SDH

DWDM/ CWDM
Back-End Application Servers High Density Multilayer SAN Director Enterprise-Class Storage Arrays

Back-End Application Servers High Density Multilayer SAN Director

IP/Metro E

Enterprise-Class Storage Arrays

2006 Cisco Systems, Inc. All rights reserved.

53

Data Center Transport Options


Increasing Distance Data Center Campus Metro Dark Fiber Sync Optical Regional National

Limited by Optics (Power Budget) Limited by Optics (Power Budget) Limited by BB_Credits

CWDM Sync (2Gbps)

DWDM Sync (2Gbps Lambda)

SONET/SDH Sync (1Gbps+ Subrate) Async IP MDS9000 FCIP Sync (Metro Eth) Async (1Gbps+)

2006 Cisco Systems, Inc. All rights reserved.

54

DATA CENTER ARCHITECTURE TRENDS

2005 Cisco Systems, Inc. All rights reserved. 2006 Cisco Systems, Inc. All rights reserved.

55 55

Cisco Data Center Vision


Server Data Storage Fabric Network Network Network Enterprise Applications

AUTOMATION
Dynamic provisioning and autonomic Information Lifecyle Management (ILM) to enable business agility Business Policies On-Demand Service Oriented

LAN WAN MAN

SAN

HPC Cluster GRID

VIRTUALIZATION
Management of resources independent of underlying physical infrastructure to increase utilization, efficiency and flexibility

Intelligent Information Network

Compute Network

CONSOLIDATION
Centralization and standardization to lower costs, improve efficiency and uptime

Storage
Compute Network Storage

2006 Cisco Systems, Inc. All rights reserved.

56

Summary

2006 Cisco Systems, Inc. All rights reserved.

57

What we have talk so far?


DR and its Business Objectives
Define budget, Technical solution Management Buy In DR is a process

Components of a Data Center


Multi Tier Architecture Front-end, Application, Backend Database

Techniques in Data Center Disaster Recovery


HTML Re-Direction/GSS/RHI Clustering SAN extension

Trends in Data Center Technology


2006 Cisco Systems, Inc. All rights reserved.

58

Todays Data Centers


Require an Architectural Approach to
Protect with Business Resilience
Tighten security Improve business continuance

Optimize with Consolidation


Improve operational efficiency and resource utilization Lower complexity and cost of ownership

Grow towards Services-oriented Infrastructure


Align virtualized resources with business demands Automate infrastructure to respond dynamically
2006 Cisco Systems, Inc. All rights reserved.

59

The Big PictureThe Cisco Data Center


The Emerging Data Center Architecture
MAINFRAME CONNECTIVITY ENTERPRISE TAPE STORAGE ENTERPRISE DISK STORAGE ENTERPRISE SAN SWITCHING

MDS 9000 Family


Embedded Intelligent Network Services
Server Balancing VPN Termination SSL Termination Firewall Services Intrusion Detection

Virtual Fabrics (VSANs) Storage Virtualization Data Replication Svcs Fabric Routing Svcs

Embedded Intelligent Storage Services

Multiprotocol Gateway Services

Embedded Intelligent Virtualization Services V


Server Virtualization
VFrame

Virtual I/O

Catalyst 6500 Family

TOPSPIN FAMILY

Grid/Utility Computing Low Latency RDMA Services Clustering

Server Farm Switching


NAS WIN UNIX

ENTERPRISE GRID
Blade Servers Virtual Private Server Fabric #1 Virtual Private Virtual Private Blade Server Server Fabric #3 Fabric #2

SERVER FABRIC SWITCHING

Enterprise NAS Storage

UNIX/Windows Servers

2006 Cisco Systems, Inc. All rights reserved.

60

Whats Next?
A Security Strategy to Protect the Data Center
Understands the vulnerabilities, and apply the relevant mitigations

Leverage on Ciscos Technology to


Optimize the Server Resources Reducing TCO for DRs Virtualization to maximize resource invested Grow DC infrastructure, enabling Business Agility Automating computing resources provisioning Speed of deploying new services

2006 Cisco Systems, Inc. All rights reserved.

61

Q and A

2006 Cisco Systems, Inc. All rights reserved.

62

2006 Cisco Systems, Inc. All rights reserved.

63

You might also like