You are on page 1of 27

HP Virtual Connect Enterprise Manager

Profile Failover and Profile Moves




Technical white paper

Table of contents
Executive summary ............................................................................................................................... 2
Getting started with VCEM profile failover .............................................................................................. 3
Key components .............................................................................................................................. 3
Profile failover example .................................................................................................................... 4
Initiating profile failover .................................................................................................................... 5
Profile failover internal operations ...................................................................................................... 5
Capabilities and limitations of profile failover ...................................................................................... 6
Recommendations and best practices ................................................................................................. 7
Important notes concerning hardware compatibility ................................................................................. 8
Installation and configuration checklist ................................................................................................... 8
All installations ................................................................................................................................ 8
Pre-failover installation and configuration checklist ............................................................................... 8
Configuring and managing spare bays and servers ................................................................................. 9
Initiating failover manually .................................................................................................................. 10
Initiating Failover using HP SIM Automatic Event Handling ..................................................................... 12
Concepts....................................................................................................................................... 12
Choosing HP Systems Insight Manager Events for failover ................................................................... 13
Using HP Systems Insight Manager action on events with failover ........................................................ 16
Configuring HP SIM Automatic Event Handling script to initiate failover ................................................ 17
Using customized scripts to initiate failover ....................................................................................... 24
Using a remote system to initiate failover .............................................................................................. 25
Remote monitoring and VCEM profile failover ................................................................................... 25
Initiate Failover using enclosure and bay .......................................................................................... 25
Limitations of using a remote system to monitor systems: ..................................................................... 25
Summary .......................................................................................................................................... 26






2
Executive summary
Virtual Connect Enterprise Manager (VCEM) profile failover provides a way to quickly move
a Virtual Connect (VC) profile from one system to another. Profile failover can be triggered
using the VCEM GUI or via the VCEM failover command line. The failover command line
may also be used in conjunction with Systems Insight Manager event monitoring to
automatically trigger a failover if a particular event is received for a specific system. VCEM
profile failover is not disaster recovery but rather a fast-recovery tool that can help you
minimize unplanned downtime.

The basic steps of VCEM failover are:

1. Spares are designated via the VCEM GUI.
2. VCEM profile failover is invoked via the CLI or the GUI.
3. VCEM profile failover moves the VC profile from the source system to a designated spare
system that is the same blade model type. The following occurs:
a) The source system is powered down.
b) An appropriate spare is chosen from the designated spares within the same
Virtual Connect Domain Group as the source system.
c) The spare system is assigned the VC profile.
d) The spare system is powered up.

The outcome is that the failed system is powered off and the OS and application stack is
now running on the spare blade.

VCEM failover can be invoked by:

CLI The CLI may be invoked using:
o The failed systems hostname.
o The failed systems IP address.
o The failed systems enclosure and bay number.
GUI The GUI has failover buttons on:
o The Profiles tab.
o The Bays tab.

This whitepaper describes how you can use VCEM failover. It explains the following:

Usage: Covers preparation, initiation, and expected results.
Limitations: Includes constraints and dependencies. The most important limitation is that
the boot LUN associated with the VC profile must be portable enough to function on the
original system as well as potential spare systems. Pre-testing a boot LUN across possible
spares is the best way to address the risk associated with this limitation.


3
This failover functionality is meant to help you in your day-to-day administration tasks and in
particular, the fast recovery of problematic systems. It is built to be flexible and robust to
adapt to different needs and environments.

Getting started with VCEM profile failover
This document provides detailed information on how to use the Virtual Connect Enterprise
Manager (VCEM) profile failover feature and how to initiate failover with HP SIM Automatic
Event Handling.
VCEM profile failover provides a fast method to recover a SAN-boot server that has
sustained a critical hardware failure. The process of profile failover moves the failed servers
SAN and IP network resources to a spare server. The spare server is then started with the
failed servers boot LUN. VCEM allows failover to be initiated manually or automatically
through events.
VCEM profile failover is dependent on the ability of an image to run on different systems.
VCEM profile failover does some basic checks for compatibility but it is important that you
understand the limitations of profile mobility.
Key components
A brief description of the key components of failover and their respective roles:

Virtual Connect (VC) virtualizes server-attached Ethernet and fibre channel networks from
an individual server blade using a server profile. When a server profile is assigned or
moved to an HP BladeSystem c-Class server bay, any resident server blade will present the
MAC and WWN addresses contained by the profile to the external Ethernet and fibre
channel networks. The profile also contains the virtual serial number and virtual UUID
which the server will use.
VCEM allows you to control failover and perform profile failover operations. VCEM is part
of the Central Management System (CMS). It is installed as a standalone component or as
a plug-in to HP Systems Insight Manager. VCEM aggregates multiple VC Domains into one
or more Virtual Connect Domain Groups, allowing single ranges of MAC and WWN
addresses to be shared across all VC Domain Groups. Further, all server blades in any VC
Domain Group share the same set of Ethernet and fibre channel connections.
For more information see the HP Virtual Connect Enterprise Manager User Guide
1

1
See the latest version of the HP Virtual Connect Enterprise Manager User Guide at
http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=101&prodClassId=10008&contentType=
SupportManual&docIndexId=64255&prodTypeId=18964&prodSeriesId=3601866
,
Introduction

4
Profile failover example
An example of profile failover follows. The infrastructure depicted in Figure 1 includes:
One VC Domain Group with two HP BladeSystem enclosures.
Virtual Connect with Ethernet and SAN interconnects.
Red box represents a production server blade (left enclosure).
Blue box represents a spare server blade (right enclosure).
Server profile that contains the MAC and WWN assignments. You can assign a
profile to a bay or leave it unassigned.


Figure 1: VCEM profile failover example


The following describes the profile failover operation shown in Figure 1.

A workload, a SAN-boot system drive containing an operating system, or an application
and the applications data, is running on the production server (red box). There is a need
to move the workload to another server. For example, there is a hardware problem or
there is server maintenance scheduled.
A profile failover on the production server is initiated (red box). As a result VCEM executes
the following automated actions:
The production server is powered off (red box).
Spare server blade (blue box) is selected.

5
The production servers profile is moved from the production bay (red box) to the spare bay
(blue box). After the move, the spare server has the Ethernet and SAN assignments previously held
by the production server.
The spare server is powered up. It boots the system drive image that had been running on the
production server and the production workload is restarted on the spare server.
Initiating profile failover
VCEM provides both the GUI and the CLI interfaces for initiating failover. The GUI is
accessible only to VCEM users. The CLI is accessible to users and as an API for other
software, such as user-developed scripts and HP Systems Insight Manager Automatic Event
Handling.
To perform a failover, you must have at least VCEM Limited Group Operator (or higher)
permission for the group where the failover will occur.
Using the GUI, the source server is always identified by its enclosure and bay location.
Using the CLI, the source server can be identified by its enclosure name and bay location,
by its hostname, or by any of its IP addresses. Full Systems Insight Manager server discovery
must be running to identify the source server by either hostname or IP address.
There are three basic operational approaches to initiating failover:
Initiate failover from the GUI or CLI.
Receive an alert about a possible problem from HP SIM Automatic Event Handling. After
verifying the problem, you initiate failover using the GUI or CLI.
HP SIM Automatic Event Handling automatically initiates failover via the CLI and sends you
an alert that this action has occurred.
Initiation using HP SIM Automatic Event Handling is discussed below. See Initiating Failover
using HP SIM Automatic Event Handling for more information.
Profile failover internal operations
The same internal actions are performed by VCEM whether failover is initiated by the GUI or
the CLI. This section explains what profile failover does when invoked.
The VCEM Job Manager
The Job manager is a VCEM internal resource that executes a series of configuration actions
as a unified process. Only one failover job per VC Domain Group can be in concurrent
execution.
Most of the work of failover is done within the context of the failover job. The progress of a
failover job can be monitored in the GUI or by using the CLI as described in Failover
initiation and Actions performed by the failover job below.
Failover initiation
When failover is initiated through the CLI, the command line is validated and, if used to
identify the source server, the hostname or IP address is translated into its unique enclosure-
bay location.

6
The failover job is then scheduled and its job number is returned so you can check the job
status.
Actions performed by the failover job
The failover job performs the following actions.
Evaluates VC Domain status for any abnormal conditions such as an unmanaged or
unlicensed domains or a domain that has a status of incompatible. The failover job then
locks the VC Domain Group of the source server to other operations in VCEM for the
remainder of the job. This prevents possible conflicting actions by other users or processes.
Checks source server product name (different treatment for ProLiant and Integrity products).
If the product is not identified then the job will exit
Powers off the source server using press and hold (hard power off).
Picks a qualified spare server from the source servers VC Domain Group. If no spare
server is found, then the job will exit. A qualified spare server is defined as the first spare
found that meets the following criteria:
The spare bay has no assigned profile.
The spare bay server is physically present.
The model of the spare server is the same as the model of the source server.
Note: The server model generation is not a differentiator; for example, a BL465-c G1
and a BL465-c G5 are considered to be the same model.
The spare server is powered off.
Moves the profile from the source to the spare bay. If errors are encountered during the
move operation, the job will exit. As part of the move operation, VCEM:
Removes the source servers host name and IP address information from Systems Insight
Manager.
Removes the spare designation from the spare bay.
Powers on the spare server, thereby booting the SAN-attached system drive image. As
part of this operation HP Systems Insight Manager discovery is scheduled to be run on the
spare server in ten minutes. This will associate the host name and IP addresses of the spare
server with HP Systems Insight Manager.
To be effective, the operating system booted on the spare server must have established its
network communications by the time HP Systems Insight Manager discovery runs. If your
system takes longer than 10 minutes to boot due to large disk or memory configurations, you
may need to rerun discovery again once the system has fully completed its boot process
Completes job by unlocking the resources of the VC Domain Group.
Capabilities and limitations of profile failover
An understanding of the capabilities and limitations of profile failover is helpful to make the
best use of this feature.
When profile failover is complete, the workload of the source (production) server has been
restarted on a different (spare) server blade and the memory resident software and data
have been refreshed from disk. Profile failover can correct the following types of problems:
Server hardware malfunctions which can be fixed by replacing server hardware.

7
Aberrant software or data states which can be corrected by refreshing software and data
from disk.
Ensure that you are aware of the following points during failover:
User sessions established at the time of the failover might be interrupted.
In HP Systems Insight Manager, hostname, and IP address associations are removed from
the source server and reestablished with the spare about ten minutes after failover
completes.
Problems that profile failover does not address include:
o Loss of connectivity with dependent resources residing on an attached SAN or
Ethernet.
o Data or software corruption or related loss of state integrity.
o Operating errors and system configuration errors.
Profile failover depends on the following:
Servers must be pre-configured to boot from SAN on power-up.
Servers must have no dependency on a local drive.
The source servers system drive image must be able to run on the hardware and firmware
configuration of any spare server in the VC Domain Group that is the same model as the
source server. This includes server models of different generations, for example, the BL465-
c G1 and BL465-c G5.
Recommendations and best practices
It is possible that a small difference in the hardware or firmware configuration of two
different servers can create an incompatibility, such that a system drive image that works on
one server does not work on the other, or performs at a degraded level. Therefore to ensure
the best failover results:
Standardize on one hardware configuration for all servers of the same model for each VC
Domain Group. For each model, implement procedures to keep all firmware at the same
revision levels across the server population. It is best to upgrade the OA, iLO, and server
firmware to the latest revisions.
Avoid placing different generations of the same model server in the same VC Domain
Group. The internal composition of the hardware and firmware items across different
generations of the same model is often significantly different.
Test to ensure boot images reliably start and run on all the server configurations that could
possibly be used, especially if you need to use different hardware and firmware
configurations for the same server model within a VC Domain Group.
Take a step-by-step approach to direct initiation of failover using HP Systems Insight
Manager Automatic Event Handling. If you choose to implement event-initiated failover, be
certain that your selected collection of failover events has a level of problem detection
accuracy that is consistent with your operating procedures and service level objectives. For
more information see Initiating Failover using HP Systems Insight Manager Automatic Event
Handling.

8
Important notes concerning hardware compatibility
VCEM profile failover is built on the concept of VC Domain Groups. VC Domain Groups
ensure that a VC server profile running within them will have access to the same IP and SAN
networks, regardless of what VC Domain they currently reside in. However, within any
given VC Domain Group, the HP BladeSystem servers may not be interchangeable. A VC
Domain Group can include servers which have different processor types and memory
configurations. Servers may also be equipped with different mezzanine cards, potentially
providing different LAN and SAN connectivity.
Other differences between source and target servers may impact the operation of the
operating system or applications after a move. When moving an image-based Matrix OE
logical server on to a blade server, HP recommends the following:
Run the latest versions of firmware on each source and target system component,
particularly system ROM, iLO, and HBA firmware.
Include drivers in each system image for all mezzanine cards on each system. Mezzanine
cards on the target server which are not on the source server can interfere with a boot
process by causing the OS to prompt for the installation of different drivers. Mezzanine
cards should be in the same slot for both the source and target systems.
Standardize hardware configurations as much as possible. Subtle differences in system
hardware which are not reflected in the model or version number, such as changes within
network interface cards or HBAs, can interfere with a boot process by requiring different
drivers or configuration changes.
For maximum portability, some datacenter architects choose to configure each server blade
in the enterprise according to a single specification chosen to accommodate the demands of
most application workloads. The architect can then test likely profile moves before putting
the systems into production. HP recommends testing your workload on all target hardware
platforms prior to production deployment.
Installation and configuration checklist
This section covers what is installed and configured in advance of running VCEM.
All installations
All items must be installed and configured as described in the HP Virtual Connect Enterprise
Manager User Guide. See chapters 2 Installing and Configuring VCEM, and 3 Managing
VC Domains for specific information.
Pre-failover installation and configuration checklist
Installation and configuration requirements for profile failover depend on which methods of
initiation you wish to use. If you wish to use all methods of initiating failover, configuration
items 2 through 7 must be implemented.
Table 1. Profile failover configuration support and requirements by initiation method
Configuration item
Method of failover initiation
GUI
CLI: VCEM failover host HP SIM

9
CLI: VCEM failover bay CLI: VCEM failover ip Automatic Event Handling
1. VCEM is installed
standalone
Supported Supported Not supported
2. VCEM is installed as a
part of HP Insight Control
Supported Supported Required
3. Full HP Systems Insight
Manager server discovery
is in operation
N/A Required Required
4. DNS is properly
configured in the
environment
N/A Required Required
5. CLI user has HP Systems
Insight Manager
administrative privilege
Required (CLI only) Required N/A
6. Windows Administrator
account is usable on the
VCEM system
N/A N/A Required
7. HP Systems Insight
Manager agents running
on managed systems
N/A N/A Required

Configuring and managing spare bays and servers
Designating spare bays and populating them with servers is the final step before user-
initiated failover is ready to use. Profile Failover always selects a qualified spare from the
source servers VC Domain Group. A qualified spare is a bay designated as a spare that:
Has no assigned profile.
Has a server blade that is physically present and powered off.
Is the same model as the source server. If failover finds no qualified spare then the
failover operation fails.
For more granular control of spares selection you can create multiple VC Domain Groups,
each having an identical VC configuration, and populated with the desired combinations of
servers and spares. For more information, see Recommendations and best practices above.
To designate spares use the VCEM GUI Bays tab:
1. Set the filter for the VC Domain to which spares are to be applied.
2. Click the Show more details check box in the lower left of the screen to see the blade
model and power status of installed servers. The display refreshes.
3. Click the Spare check box for each bay to be designated as a spare. A bay with a profile
cannot be designated a spare. A bay with a non-server blade, such as a storage or
expansion blade, cannot be designated as a spare. HP recommends that a server be
present in the bay when the bay is designated a spare, but this is not mandatory.
Note: Removing the check from the Spare check box removes the bays designation as a
spare. Assigning or moving a server profile to a bay will also remove its spare designation
4. Click Apply Spares.
Note: Clicking twice on the Spare column heading brings the spare bays to the top of the
list.

10

Figure 2: Designating spares


Initiating failover manually
Profile failover may be initiated from the Server Profiles tab. Select a profile and click
Failover in the bottom right hand corner. You must have at least VCEM Group Limited
Operator permissions to be shown the Failover button.

Figure 3: Press Failover


Profile failover may also be initiated from the Bays tab. Check a specific bay with an
assigned profile and then select Failover.

11

Figure 4: Pressing Failover from the Bays tab



12
VCEM profile failover may also be initiated using the CLI. Select one of the following
command parameters options:
Using the failed systems host name
vcem failover host hostname
Using the failed systems IP address
vcem failover ip ip_address
Using the failed systems enclosure and bay number
vcem failover bay enclosure_name:bay_number
Once the command has been executed, a job ID number is displayed and can be used for
input to another VCEM CLI to monitor the status of the failover. An example command
showing how to display job details follows
vcem list details job job_id_number
Initiating Failover using HP SIM Automatic Event Handling
This section describes how to configure and use HP SIM Automatic Event Handling to
automatically initiate failover and also for user notification.
Concepts
To initiate failover, configure the following items for Automatic Event Handling:
Select events that you want to use to trigger failover. HP recommends a set of events for
you to consider below. See The set of failover events installed with VCEM for more
information. You may use these events or select your own. The selected events are
automatically placed into an HP Systems Insight Manager event collection.
Create an action-on-event task in HP Systems Insight Manager. In creating this task, the
failover event collection is applied to a collection of managed systems. One or more
actions are configured to execute whenever any event in the collection occurs. These
actions include e-mail notification and execution of a custom tool. VCEM provides two
custom tools to initiate failover.
o VCEM Profile Failover by Hostname
o Profile Failover by IP address
These custom tools are used as the argument to the VCEM failover CLI command. One of
these custom tools is selected and configured with action-on-event.
This procedure for configuring Automatic Event Handling is described in detail below.
The following figure depicts how the automatic failover initiation works once configured.

13

Figure 5: Automatic initiation of profile failover using HP SIM Automatic Event Handling


In Figure 5 above:
HP Systems Insight Manager events originate from the managed systems and are posted to
HP Systems Insight Manager via Ethernet.
HP Systems Insight Manager processes the events and acts on the selected failover events.
On the first occurrence of any failover event, HP Systems Insight Manager runs the custom
tool.
Note: It is possible to modify the first occurrence behavior through user-provided
scripting, as described below. For more information, see Configuring HP Systems Insight
Manager Action on Events script to initiate failover
The custom tool invokes the VCEM failover command and passes either the host name or
IP address of the system that posted the failover event.
The host name or IP address is mapped to its enclosure and bay location.
A failover job is started for that enclosure and bay.
The failover job runs as described above. Communicating via Ethernet with the VC
interconnects for the enclosures containing the source and spare bays, the job moves the
server profile from the source to the spare bay and then powers on the spare server.
Choosing HP Systems Insight Manager Events for failover
You must select the HP Systems Insight Manager events you wish to use to initiate failover.
HP has recommended a collection of events for your consideration. These recommendations
are listed after several practical topics on using events to automatically initiate Failover. See
How HP selected the set of installed failover events for more information.

14
HP Systems Insight Management monitoring
HP Insight Management (IM) agents monitor the health of ProLiant systems. These agents
require a host operating system, such as Windows or Linux. IM agents monitor server
hardware, largely at the component level, and send events
2
In addition to receiving events from servers, HP Systems Insight Manager polls each server
for its health status every few minutes. This server health reflects a combination of all the
subsystems monitored by the IM agent. For each server it monitors, HP Systems Insight
Manager displays an icon indicating its health status, as well as reflecting the category of
the most severe event received.
(SNMP traps or WBEM events)
to HP Systems Insight Manager to report changes in the servers health status. For this
communication HP Systems Insight Manager requires a working IP network connection to the
server. HP Systems Insight Manager groups events by category, such as server, storage, NIC
and so forth. HP Systems Insight Manager also assigns a severity level to each event.
The component level health information reported does not consider the importance of the
individual components to the overall ability of the system to meet its service level objectives.
System administrators must review the events to determine appropriate actions.
Determining a critical server hardware failure
When a server hardware component fails:
the server may fail, thereby ceasing to deliver services; or
the server may continue to operate and meet its service level objectives.
For purposes of failover, when the failure of a hardware component materially impacts a
servers ability to meet its service level objective, the component is critical and its failure
becomes a critical server hardware failure. Failover should provide effective remediation.
On the other hand, if the server continues to operate acceptably, then the component is not
critical and immediate failover is most often unnecessary. For example, failure of a
redundant power supply does not indicate a critical failure since the server continues to
operate and with HP BladeSystem, the power supply can be replaced without impacting the
servers operation.
Since HP Systems Insight Manager does not know which components might be redundant or
unused, it rates all component failures as critical events, leaving you to determine the most
effective remediation.
When a component is operating in a degraded state that threatens the server or threatens
the integrity of its retained data, there is cause to failover the server. Examples of this
component degradation state can be certain CPU and memory error conditions. HP Systems
Insight Manager rates conditions that indicate impending failures as major events;
however for failover, these events can also be considered critical.
Also, the server configuration and workload can further qualify what a critical component is
for any individual server.

2
See the latest version of the HP Virtual Connect Enterprise Manager User Guide at
http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=101&prodClassId=10008&contentType=
SupportManual&docIndexId=64255&prodTypeId=18964&prodSeriesId=3601866. Also see the HP Systems Insight Manager document, Part
Number: 347870-003: The Microsoft Windows Event ID and SNMP Traps Reference Guide,
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00293064/c00293064.pdf.

15
For example, a local array controller fails but the server is SAN-boot configured and the
workload does not access local drives, so the array controller is not a critical component.
When components are configured redundantly, a partner component can assume a failed
partners load.
Reporting critical events from a server
When a non-redundant component, such as a single CPU, fails instantly, there is no
opportunity for the HP Systems Insight Manager agents on that server to report the failure.
There are many examples of component failures that are not usually reported by HP Systems
Insight Manager agents.
Reporting component status also requires that the servers operating system and network
communications are working between the system and HP Systems Insight Manager.
A server can fail to deliver services because its hardware or its operating system (hang or
crash) failed critically. But if the root cause of the failure is unknown, you cannot be certain
that replacing the server will remedy the problem. For example, if the boot image has
become unusable, replacing the server will not remedy the problem. In other cases,
replacement may not be necessary at all, for example, where a reboot would restore
functionality. In any case, a system experiencing this type of issue is not able to send an
event to HP Systems Insight Manager.
The HP Systems Insight Manager System Unreachable event
System Unreachable is an optional but commonly used system status change event. When
configured in HP Systems Insight Manager, it causes the System Unreachable event
whenever a system does not respond to a ping issued by the HP Systems Insight Manager
Hardware Status Polling task.
The System Unreachable event can be configured to cause the system to perform failover.
The drawback is that System Unreachable is not a root cause event. It occurs whether the
root cause is a critical hardware failure, workload software failure, or failure of the
intervening communications network. (In the case of a network failure many servers could be
triggered to perform failover.)
Failover events and service level ratings
The ideal goal for failover events is 100% accuracy, so that failover is triggered for:
All situations in which a critical server hardware failure occurs.
Zero conditions in which the server remains capable of meeting its service level objectives.
If event detection overlooks a critical server hardware failure, then services will not be
restored until the failure is discovered by other means such as a call to the help desk.
If the need to failover is falsely detected, services are interrupted during the failover and
system restart, adversely impacting the service level and perhaps making the service
unavailable during a critical period.
Before relying on event initiated failover, ensure that the accuracy of the selected failure
detection events will result in acceptable service level ratings, given your local installation,
configurations, workloads, service level objectives, and operations policies.

16
How HP selected the set of installed failover events
The collection of recommended events includes those events HP believes will be consistently
useful to customers. You may customize this collection or add additional collections for your
local installation.
HP used the following factors and rationale to select the recommended events:
Exclude events that do not indicate a server hardware failure.
Exclude events related to components that are usually configured redundantly. (For
example, the HP BladeSystem c7000 Enclosure accommodates redundant power supplies
and fans. Enclosure events for power and fan failures are reported to HP Systems Insight
Manager and immediate action should be taken to hot-fix failed components. In most
data centers timely replacement is completed without impacting service delivery.)
Exclude events related to system components that are not needed to deliver services. It is
assumed that system configurations do not depend upon local storage for failover.
Therefore, related events will not impact service delivery capabilities.
Exclude events that are not likely to be received by HP Systems Insight Manager because
the reporting system is not healthy enough to transmit them. These events include HBA
failures and most CPU and NIC failures.
The set of failover events installed with VCEM
The following recommended events are pre-configured in the VCEM Failover Events
collection. All of these events can be characterized as pre-failure events, meaning that a
failure has not yet occurred but should be expected at any time.
Table 2 List of recommended, consistently useful failover events
Event
number
Event type Event category
1001 (SNMP) CPU error threshold passed ProLiant System and Environmental Events
2005 (SNMP) Excessive Correctable Memory Errors ProLiant System and Environmental Events
6015 (SNMP) Correctable Memory Error Occurred ProLiant System and Environmental Events
6029 (SNMP) Corr Mem Errors Require a Replacement
Memory Module
ProLiant System and Environmental Events
6056 Corrected Memory Errors Replace Memory Module ProLiant System and Environmental Events


For more information, see Configuring HP Systems Insight Manager Action on Events script
to initiate failover.

Using HP Systems Insight Manager action on events with failover
HP Systems Insight Manager has the ability to monitor SNMP traps and WBEM events.
These events can be used to notify you or to do more complex operations such as triggering
executables on the Central Management System or on remote systems. This section describes
how you can take advantage of this functionality to trigger VCEM profile failover.
VCEM provides ease-of-use enhancements for administrators to take advantage of this
functionality. Included are links to wizards, custom tool enabled CLIs, and collections of

17
events that you can customize. All of these combine to create automated failover in a VCEM
environment.
In summary there are three major ways to use HP SIM Automatic Event Handling with the
VCEM failover feature:
Event notifies the administrator but does not initiate failover.
Event initiates failover but does not notify the administrator.
Event both notifies the administrator and initiates failover.
Configuring HP SIM Automatic Event Handling script to initiate failover
This section explains how to create a sample HP SIM Automatic Event Handling script to
invoke a VCEM CLI. While this is an example, these steps will be the same regardless of
which of the aforementioned three Event Handling approaches you select.
The VCEM home page contains a link under Failover to HP SIM Automatic Event Handling
Manage Tasks. This is a convenient link to the HP SIM Automatic Event Handling script.

Figure 6: Using failover with HP SIM Automatic Event Handling


The link displays the following page. From this GUI you can configure Automatic Event
Handling. To create a new action on event:
1. Click New to initiate the wizard.

18

Figure 7: Creating a new action


2. Provide a meaningful name for the task.
The second step allows you to select the events which will trigger the task. This means that
if these events are received for any of the monitored systems, the task will be triggered.
The example shown in figure 7 will use the VCEM provided collection of events entitled
Server Profile Failover Trigger Events. However, you can create a collection of events or
choose individual events to use for failover initiation.

19

Figure 8: Selecting events


3. Use View Definition to display what is contained in an event collection. This collection
contains CPU and Memory error SNMP traps.

20

Figure 9: View definition



4. Choose the second radio button labeled use event attributes that I will specify to
select individual events. The wizard will advance to a new page that allows you to choose
events by severity, event category, or even the event type. Selecting the event type
displays all the available SNMP traps which can be used for failover purposes. A common
category is ProLiant System and Environmental Events which contains many of the system
specific warnings and failures. The following example uses the Server Profile Failover
Trigger Events collection.

21

Figure 10: Selecting individual events


5. Select systems or collections of systems that this particular task applies to. The following
example uses the second radio button in order to designate a specific system.

Figure 11: Selecting events



6. Select Next to provide you with a number of system attributes to select specific systems.
The following example uses the system name which in this case is IP address
170.50.0.69.

22


Figure 12: Using system attributes


7. Check any actions you want to initiate. There are a number of actions available. When
one of the previously selected events occurs for the selected systems, all of the actions
checked here will be initiated. You can choose to send a page or email if you want to be
notified of the event occurrence. These actions can be combined with a custom tool or they
may be used as monitoring tools

Figure 13: Selecting actions




23
The following example shows the two custom tools installed with VCEM. These custom tools
execute the VCEM CLI using the parameters available when a selected event is received on
the Systems Insight Manager CMS. When an event is received, the originating system is
indicated and can be used to execute custom tools. This example is using the VCEM Profile
Failover by IP Address custom tool. Choose either IP Address or Hostname and click
Next.

Figure 14: VCEM profile failover by IP address


8. Enable or disable Automatic Event Handling during certain time periods. For example, you
may want to enable VCEM profile failover during weekends, when you are not in the
office.


Figure 15: Enabling actions during certain time periods



24
9. Review the Automatic Event Handling task summary. The automatic event handler will be
invoked whenever one of the trigger events occurs for the specified system. Click Finish to
return to the Manage Tasks screen.


Figure 16: Reviewing the summary


Using customized scripts to initiate failover
You may choose to create customized scripts or batch files to initiate failover. These scripts
can be logically constructed to perform tasks before and after a VCEM profile failover. The
basic construction of a custom script might be:
Pre-Failover commands. Some examples include:
o Evacuate Virtual Machines from an ESX Host
o Gracefully shutdown an application
o Write volatile data to disk
VCEM failover command. Use one of the following:
o vcem failover host hostname
o vcem failover ip ip_address
o vcem failover bay enclosure_name:bay_number


25
Post-Failover commands. Some examples include:
o Importing Virtual Machines to an ESX Host
o Creating Trouble ticket for source system maintenance
o Sending email to administrators notifying them that a failover has occurred

These custom scripts can be used to create a Systems Insight Manager Custom Task and
combined with the aforementioned Action-on-Events so that they can be initiated in response
to system health events. See the HP Systems Insight Manager User Guide for more detailed
information concerning custom tasks.
Using a remote system to initiate failover
Remote monitoring and VCEM profile failover
If you are using a remote system to monitor servers, VCEM profile failover is still possible. An
example could be where one system is being used to monitor server health and another
system is running the VCEM implementation. Monitoring systems remotely indicates that
Systems Insight Managers discovery is not set to run automatically at periodic intervals on
the VCEM system. This means that VCEM will not have access to reliable information about
servers such as host names or IP addresses. For this example, you can still use the VCEM
CLI, specifically the enclosure:bay nomenclature.
Initiate Failover using enclosure and bay
The enclosure:bay CLI command can be applied if another system is used to monitor
and inventory systems information. You will need to derive some way to access the CLI
running on the VCEM system. SSH would be a logical choice. Once the remote system has
logged into the VCEM system, it can invoke the VCEM CLI with the enclosure and bay of the
defective server.
The following example uses an enclosure named Databases with a defective server in bay 8:
vcemf ai l over bay Databases: 8
This would initiate a VCEM profile failover on the profile that is assigned to bay 8.

Limitations of using a remote system to monitor systems:
Accurate data is crucial to a successful execution of VCEM failover from a remote system.
The system monitoring and data collection system needs to be able to:
Identify that a problem has occurred.
Have the ability to invoke the VCEM CLI remotely.
Have accurate data concerning the enclosure name and the appropriate bay number
of the affected server.
If the remote system has difficulties with the above tasks, HP recommends that Systems Insight
Manager be used for system monitoring as VCEM is designed to closely integrate with
Systems Insight Manager monitoring and data collection capabilities.

26
Summary
VCEM profile failover is a capable tool to help you maintain system uptime and reliability. It
has the ability to manually and automatically failover VC profiles to designated spares.
Furthermore, it can be incorporated into customized scripts so that you can build even more
powerful solutions using VCEM as a trusted component. The key limitation to the failover
functionality is that the associated VC profile must be able to operate on different blades. HP
highly recommends that you test this portability prior to relying on VCEM profile failover. If
the OS image associated with the profile is portable, the VCEM profile failover proves to be
an excellent addition to your server management toolbox.
For more information see the HP VCEM website, http://www.hp.com/GO/VCEM
To help us improve our documents, please provide feedback at www.hp.com/solutions/feedback















Share with colleagues






Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to
change without notice. The only warranties for HP products and services are set forth in the express warranty
statements accompanying such products and services. Nothing herein should be construed as constituting an
additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Trademark acknowledgments, if needed.
460924-003, February 2012

You might also like