Professional Documents
Culture Documents
You might have already read my old post, 7 social engineering tips that new system
admins should know in new team ,where I have discussed the social engineering points
that helps you to quickly sync with the new team. And this post is the continuation post
for the same topic, and I will be discussing the technical aspects that every system
admin should know during his initial phase in his new job. And these rules are same for
every one, irrespective of experience that he/she has in the old organization.
Before going to actual topic I would like to highlight one important point: whenever we
work for any organization for long time we feel that we are comfortable with the job, and
most of the times we will assume that it is our technical skill that is making us
comfortable in our existing job but the actual fact is it is not our technical skill alone
that makes us comfortable, it is our historical knowledge about the current environment
added to our technical skill which makes us more comfortable with the current job.
In simple terms, if you assume your technical skill as 1 then every piece of information
that you know about the history of the environment will add a 0 ( zero) next to the 1
and having more 0s next 1 will improve your value in the job. Whenever you join to a
new job, you will be carrying only the 1 with you from old job to new job and the rest of
the 0s you have to re-gain from the new job. So, during initial stage of new job keep
your focus to understand the historical information about current environment , from
the existing team, whenever you get a chance to discuss about it.
Team scope is something which is very important to know right immediate you join to a new
job because it will give you an idea to decide your priorities of learning related to the
new job.
For example, if you join into a team in a large organization where the scope of the team is
to support a set of servers which have only database but nothing else then your priority
will immediately change to understand how the Database works on Unix , and basics of DB
terminology , at the same time your team not supporting any DNS, NIS, DHCP servers and
all of them were under control of different team so you will not worry about those servers
in your initial learning.
2. Know about Technical architecture of environment
e. What Storage is in use right now, and What sort of Console systems we are using to connect to
the Servers remotely? EMC, Netapp, Cyclades Consoles ..etc.
f. What storage management software is in use in which operating systems? e.g. LVM, VxVM , ZFS
etc.
Ideally, any system administrator should deal with three types of operations:
a. Break / Fix activities ( Widely known as incidents )
This mainly involves in fixing the issues that encountered in a properly working environment. e.g.
disk failure on a server, unix server crashed due to overload, network failed due to network port
problemetc.
b. Changes and Service Requests
Change operations mainly involves, introducing configuration/hardware/application change in the
currently running environment either for the purpose of improved stability or for the purpose of
improved security, in the current environment.
Service Requests involves performing operations on specific user requests like creating user
accounts, changing permissions, installing new server ( called server commission), removing a server(
called server decommission) etc.
c. Auditing the Server environment to identify the Quality of Service (QoS)
This mainly involves periodic checking of all the servers to identify if there are any configuration or
security vulnerabilities which compromises the stability of server environment. And remediation of
such vulnerabilities by requesting changes in the configuration.
To perform above three kinds of operations , every organization will have internal rules to
identify how to act ? , when to act? , what to act? . And these rules will vary from job
to job, during the initial stage of your job you should understand these rules and perform
your duties accordingly.
Note : ITIL ( Information Technology Infrastructure Library) talks about the guidelines
to define the above rules in a standard way in any IT related organization. Now a days,
major companies streamlining their procedures to meet with these ITIL guidelines so that
it will be easy to manage the environment although the people who created that
environment leaves the organization. Learning ITIL is always beneficial to system
admins( or any Infrastructure Support person).
4. Supporting tools/applications and your access to them
To Perform the Support operations discussed in the above point, organizations needs to
have proper tools/applications to facilitate their employees and support people to request
and respond in automated way as per the procedures defined in the organization. E.g.
Remedy Ticketing tool , HP Service Manager ..etc.
Once you join to a new team, just make sure you have requested your access to all the
related tools in time and tested the access.
Being a System Admin, major part of our day job involves communication with other
support teams like. Database Team, Network Team, Application Team, Hardware Vendors,
Data Center Support Team etc.
For successful service delivery, it is important to system administrators to have all of
their contact details ( .. like Phone, email and Internal Chat IDs ) handy. So gather the
information and make a good document which you can use in your job. It is very important
to write down this information and keep it safe, because most of the times the minor
issues turns into major problems if we dont know whom to contact right immediate we
noticed the issue.
Every Team will have some kind of documentation which explains the operations performed
by the team, and this documentation gives you more information than any individual can
share to you. Unfortunately, reading all these documents doesnt help us to understand
what is actually going in the job during our initial stage in the team, but the same
documents might save your life once you actively start working in the team.
During Initial stage, just gather the information about where the documentation is saved
and get the access to it. And quickly go through entire documentation( you dont need to
remember everything you read) , so that you will know where to find the information
when you are looking for a specific piece of information related to a specific issue.
Ideally, System administrators will classify their servers in two groups , first set is the
servers which are used by users ( e.g. Database Servers / Application server ) and second
set is the infrastructure Servers which are used to manage the first set of servers
effectively ( e.g. Jumpstart Remote Installation Servers, DHCP , DNS , NIS , LDAP
servers ..etc) .
As i explained in the point 1, you may or may not manage these infrastructure servers
depending on the scope of your team, but you must know the details of these servers
because every other server in your environment depends on these infrastructure servers.
Below are the important question you can try to find answers, during the initial stage of job:
a. What Name servers( DNS / NIS / LDAP ) we are using, and what are the names / aliases / IPs
of those servers ?
b. What remote installation ( jumpstart/ kickstart) servers we are using and our access to them ?
c. Whether there is any DHCP server available in the environment or is it managed by customized
tools? E.g. QIP etc.
Every Unix administrator starts his work by requesting his access to a Windows product
( Desktop Access / Outlook )
your access to your desktop PC login, Voip phone ( with international dialing if your job
requires to call overseas ), Email account, internal Chat messenger access, Data center
Access ( if your job requires physical access to DC) , and smart cards / Security tokens
etc.
The moment you get your email access, you may have to manage the flood of emails that is
coming to your team every day, you might have to create appropriate Outlook rules to
filter out emails which you dont have to respond during the first one or two months of new
job. Later, you can slowly start reading and responding them once you actual ready to work
on the floor.
System Administrator cannot survive his job if he doesnt know how to automate the
work ( using scripting) that he is doing repeatedly. And whenever you join a new team,
you should specifically ask for the information about any automated scripts which in place
and used to perform day-to-day job.
Most of the time, system admins make scripts to perform daily/weekly system health
checks and they might be running regularly from some specific servers using Cron
scheduler. It is better to know them before hand, so that it will help you if you want to
introduce your own scripts for the teams benefit.
As I explained in the point 8, you will receive tons of mail the moment you added your
email id to team DL ( email distribution list), and major part of the mails could be from
automated monitoring system which checks health status of your server environment and
informs the system admin team, right immediate it notices an issue. If you are start
receiving such mails, dont just ignore them because you dont know what to do with them.
Actually you have note these alerts and keep raising questions with your team to know how
to respond these alerts.
And also keep auto notice reminders in your outlook, for some of the important are alerts
which are critical and urgent in their nature, so that you wont miss them.
Network connectivity checks For the server without OS ( just Racked hardware and
powered up)
Example :
ok> watch-net-all
/pci@7c0/pci@0/network@4,1
1000 Mbps full duplex
Link up
. is a Good Packet.
X is a Bad Packet.
# ethtool eth0
10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes:
10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Current message level: 0x000000ff (255)
Link detected: yes
Ping the the system from a other host of same network in one session and then watch the IP
traffic on the X64 system withtcpdump in another session.
A successful ping has a request and a reply. Iphost option used to filter the specific host
related network data, otherwise tcpdump will flood the console.
tccpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
16:34:53.133085 IP 10.16.8.21 > 10.16.8.63: ICMP echo request, id 13641, seq
0, length 64
16:34:53.133250 IP 10.16.8.63 > 10.16.8.21: ICMP echo reply, id 13641, seq 0,
length 64
The above example will find the network traffic on eth0 and sends the output to a file
network.log in the current directory. You have to terminate the command manually with ^C
otherwise it will keep an adding info to the file. And later you can view the file output with the
below command
# /usr/sbin/snoop -i snoop.out -D |grep v drops: 0
And D option displays about the number of packet drops, and with above command you can
easily figure out that if all the packets from the interface dropping, then there is a patch issue .
Once physical connectivity has been established, the interface can be observed for errors.
Solaris Operating System provides the kstat interface for this type of monitoring.
For example, to watch hme0s interface statistics, on an hme interface, one would use the
following:
# kstat -m hme0 -i 0 5
and monitor statistics such as collisions, alignment errors. If these error counters are increasing
on a switched network, it would indicate that further investigation is warranted. The most likely
cause of such issues would be a bad cable, or incorrect switch settings. Replace the cable,
ensure the switch is set to auto-negotiate and re-test.
Cable tester
A cable tester is one of the quickest and easiest ways to check a cable, and its connections
through patch boards to the target switch. If one is available, connect the tester to each end of
the cable, and verify that the cable has connectivity through all 8 pins. Normally Datacenter
Operations team will have these devices.
If you dont have one then you should go through the traditional approach, to verify from server
end.
http://www.gurkulindia.com/main/2014/10/ten-technical-tips-that-every-system-admin-shouldknow-when-joining-into-a-new-team/
http://www.gurkulindia.com/main/2012/05/network-physical-connectivity-check-for-solaris-andlinux/
This article describes how to configure link based IPMP interfaces in Solaris 10. IPMP
eliminates single network card failure and it ensures system will be always accessible via
network.You can also configure the failure detection seconds in /etc/default/mpathd
file and the default value is 10 seconds.In this file there is an option called FAILBACK
to specify IP behavior when primary interface recovered from the fault. in.mpathd is a
daemon which handles IPMP (Internet Protocol Multi-Pathing) operations.There are two
type of IPMP configuration available in Solaris 10.
Configure IP address 192.168.2.50 on e1000g1 & e1000g2 using Link based IPMP.
Step:1
Find out the installed NICs on the systems and its status.Verify the ifconfig output as
well.
Make sure the NIC status are up and not in use.
Arena-Node1#dladm show-dev
e1000g0
link: up
duplex: full
e1000g1
link: up
duplex: full
e1000g2
link: up
duplex: full
Arena-Node1#ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.2.5 netmask ffffff00 broadcast 192.168.2.255
ether 0:c:29:ec:b3:af
Arena-Node1#
Step:2
Add the IP address in /etc/hosts and specify the netmask value in /etc/netmasks like
below one.
Arena-Node1#cat /etc/hosts |grep 192.168.2.50
192.168.2.50
arenagroupIP
192.168.2.0
255.255.255.0
Arena-Node1#eeprom "local-mac-address?=true"
Step:3
Plumb the interfaces which you are going to use for new IP address. check the status in
ifconfig output.
Arena-Node1#ifconfig e1000g1 plumb
Arena-Node1#ifconfig e1000g2 plumb
Arena-Node1#ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.2.5 netmask ffffff00 broadcast 192.168.2.255
ether 0:c:29:ec:b3:af
e1000g1: flags=1000842 mtu 1500 index 3
inet 0.0.0.0 netmask 0
ether 0:c:29:ec:b3:b9
e1000g2: flags=1000842 mtu 1500 index 4
inet 0.0.0.0 netmask 0
ether 0:c:29:ec:b3:c3
Step:4
Configure IP on Primary interface and add the interfaces to IPMP group with your own
group name.
Step:5
Now we have to ensure IPMP is working fine.This can be done in two ways.
i.Test:1 Remove the primary LAN cable and check it.Here i have removed the LAN cable
from e1000g1 and let see what happens.
Arena-Node1#ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.2.5 netmask ffffff00 broadcast 192.168.2.255
ether 0:c:29:ec:b3:af
e1000g1: flags=19000802 mtu 0 index 3
inet 0.0.0.0 netmask 0
groupname arenagroup-1
ether 0:c:29:ec:b3:b9
e1000g2: flags=1000842 mtu 1500 index 4
inet 0.0.0.0 netmask 0
groupname arenagroup-1
ether 0:c:29:ec:b3:c3
e1000g2:1: flags=1000843 mtu 1500 index 4
inet 192.168.2.50 netmask ffffff00 broadcast 192.168.2.255
e1000g0
link: up
duplex: full
e1000g1
link: up
duplex: full
e1000g2
link: up
duplex: full
Arena-Node1#ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.2.5 netmask ffffff00 broadcast 192.168.2.255
ether 0:c:29:ec:b3:af
e1000g1: flags=1000843 mtu 1500 index 3
inet 192.168.2.50 netmask ffffff00 broadcast 192.168.2.255
groupname arenagroup-1
ether 0:c:29:ec:b3:b9
e1000g2: flags=1000842 mtu 1500 index 4
inet 0.0.0.0 netmask 0
groupname arenagroup-1
ether 0:c:29:ec:b3:c3
Here the configured IP is going back to original interface where it was running before.
Here I had specified FALLBACK=yes . Thats why IP is moving back to original
interface.The same way you can also specify failure detection time to mpathd using
parameter FAILURE_DETECTION_TIME in ms.
Arena-Node1#cat /etc/default/mpathd |grep -v "#"
FAILURE_DETECTION_TIME=10000
FAILBACK=yes
TRACK_INTERFACES_ONLY_WITH_GROUPS=yes
Arena-Node1#
ii.Test:2 Normally most of the Unix admins will be sitting in remote site. So you will be
not able to perform the above test.In this case ,you can use if_mpadm command to
disable the interface in OS level.
Fist i am going to disable e1000g1 and let see what happens.
Arena-Node1#if_mpadm -d e1000g1
Arena-Node1#ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843 mtu 1500 index 2
inet 192.168.2.5 netmask ffffff00 broadcast 192.168.2.255
ether 0:c:29:ec:b3:af
e1000g1: flags=89000842 mtu 0 index 3
inet 0.0.0.0 netmask 0
groupname arenagroup-1
ether 0:c:29:ec:b3:b9
e1000g2: flags=1000842 mtu 1500 index 4
inet 0.0.0.0 netmask 0
groupname arenagroup-1
ether 0:c:29:ec:b3:c3
e1000g2:1: flags=1000843 mtu 1500 index 4
The same way you can manually failover the IP to one interface to another interface.
In the both tests,we can clearly see IP is moving from e1000g1 to e1000g2 automatically
without any issues.So we have successfully configured Link based IPMP on Solaris.
These failover logs will be logged in /var/adm/messages like below.
Jun 26 20:57:24 node1 in.mpathd[3800]: [ID 215189 daemon.error] The link has gone down on e1000g1
Jun 26 20:57:24 node1 in.mpathd[3800]: [ID 594170 daemon.error] NIC failure detected on e1000g1 of group
arenagroup-1
Jun 26 20:57:24 node1 in.mpathd[3800]: [ID 832587 daemon.error] Successfully failed over from NIC
e1000g1 to NIC e1000g2
Jun 26 20:57:57 node1 in.mpathd[3800]: [ID 820239 daemon.error] The link has come up on e1000g1
Jun 26 20:57:57 node1 in.mpathd[3800]: [ID 299542 daemon.error] NIC repair detected on e1000g1 of group
arenagroup-1
Jun 26 20:57:57 node1 in.mpathd[3800]: [ID 620804 daemon.error] Successfully failed back to NIC e1000g1
Jun 26 21:03:59 node1 in.mpathd[3800]: [ID 832587 daemon.error] Successfully failed over from NIC
e1000g1 to NIC e1000g2
Jun 26 21:04:07 node1 in.mpathd[3800]: [ID 620804 daemon.error] Successfully failed back to NIC e1000g1
To make the above work persistent across the reboot create the configuration files for
both the network interfaces.
Arena-Node1#cat /etc/hostname.e1000g1
arenagroupIP netmask + broadcast + group arenagroup up
Arena-Node1#cat /etc/hostname.e1000g2
group arenagroup up
Thank you for reading this article. Please leave a comment if its useful for you.
HTTP://WWW.UNIXARENA.COM/2013/06/HOW-TO-CONFIGURE-SOLARIS-10-IPMP.HTML
5 Share
1 Tweet
1 Share
The failure detection and repair method used by the mpathd daemon differentiates
the IPMP as probe based or link based. In case of link based IPMP :
- The mpathd daemon uses the interface kernel driver to check the status of the
interface.
in.mpathd daemon observes the changes to IFF_RUNNING flag on the interface
to determine failure.
No test addresses required for failure detection.
Enabled by default (if supported by the interface).
One of the advantage of link based IPMP is it does not depend on external
sources to send ICMP reply to ensure the link status and it also saves IP addresses
as it is not require any test addresses for failure detection
mpathd Configuration file
# cat /etc/default/mpathd
#
#pragma ident
"@(#)mpathd.dfl 1.2
00/07/17 SMI"
#
# Time taken by mpathd to detect a NIC failure in ms. The minimum time
# that can be specified is 100 ms.
#
FAILURE_DETECTION_TIME=10000
#
# Failback is enabled by default. To disable failback turn off this option
#
FAILBACK=yes
#
# By default only interfaces configured as part of multipathing groups
# are tracked. Turn off this option to track all network interfaces
# on the system
#
TRACK_INTERFACES_ONLY_WITH_GROUPS=yes
Meanings of FLAGs
You would see flags such as NOFAILOVER, DEPRECATED, STANDBY etc.. in the
output of ifconfig -a command. The meanings of these flags and parameters to
enable them are:
deprecated -> can only be used as test address for IPMP and not for any actual
data transfer by applications.
-failover -> does not failover when the interface fails
standby -> makes the interface to be used as standby
Testing IPMP failover
We can check the failure and repair of an interface very easily using if_mpadm
command. -d detaches the interface whereas -r reattaches it.
# if_mpadm -d ce0
# if_mpadm -r ce0
availability. But can be used only to get intimated when an interface is failed.
Command line :
/etc/hostname.e1000g0
192.168.1.2 netmask + broadcast + group IPMPgroup up
Before failure :
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4] mtu 1500 index 13
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
groupname IPMPgroup
ether 0:c:29:f6:ef:67
After failure :
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=11000803[UP,BROADCAST,MULTICAST,IPv4,FAILED] mtu 1500 index 13
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
groupname IPMPgroup
ether 0:c:29:f6:ef:67
/etc/hostname.e1000g0
192.168.1.2 netmask + broadcast + group IPMPgroup up
/etc/hostname.e1000g1
group IPMPgroup up
Before Failure :
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4] mtu 1500 index 14
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
groupname IPMPgroup
ether 0:c:29:f6:ef:67
After Failure
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=19000802[BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED] mtu 0 index 14
inet 0.0.0.0 netmask 0
groupname IPMPgroup
ether 0:c:29:f6:ef:67
e1000g1: flags=1000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4] mtu 1500 index 15
inet 0.0.0.0 netmask ff000000
groupname IPMPgroup
ether 0:c:29:f6:ef:71
e1000g1:1: flags=1000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4] mtu 1500 index 15
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
/etc/hostname.e1000g0
192.168.1.2 netmask + broadcast + group IPMPgroup up
/etc/hostname.e1000g1
group IPMPgroup standby up
Before failure
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4] mtu 1500 index 20
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
groupname IPMPgroup
ether 0:c:29:f6:ef:67
e1000g0:1: flags=1000842[BROADCAST,RUNNING,MULTICAST,IPv4] mtu 1500 index 20
inet 0.0.0.0 netmask 0
e1000g1: flags=69000842[BROADCAST,RUNNING,MULTICAST,IPv4,NOFAILOVER,STANDBY,INACTIVE]
mtu 0 index 21
inet 0.0.0.0 netmask 0
groupname IPMPgroup
ether 0:c:29:f6:ef:71
After failure
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=19000802[BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED] mtu 0 index 20
inet 0.0.0.0 netmask 0
groupname IPMPgroup
ether 0:c:29:f6:ef:67
e1000g1: flags=21000842[BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY] mtu 1500 index 21
inet 0.0.0.0 netmask 0
groupname IPMPgroup
ether 0:c:29:f6:ef:71
e1000g1:1: flags=21000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY] mtu 1500 index
21
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
7 Share
1 Tweet
1 Share
The failure detection method used by the in.mpathd daemon differentiates the IPMP
as probe based or link based. Probe based IPMP uses 2 types of addresses in its
configuration.
1. Test address Used by in.mpathd daemon for detecting the failure (also called
as probe address).
2. Data Address Used by applications for actual data transfer.
In case of probe based IPMP :
- in.mpathd daemon sends out ICMP probe messages on test address to one or
more target systems on the same subnet.
in.mpathd daemon determines the available target systems to probe dynamically.
It uses a all hosts multicast (224.0.0.1) address to determine the target systems to
probe.
examples of target systems :
1. all default routes on same subnet.
2. all host routes on same subnet. (configured with route -p add command)
All test addresses should be in the same subnet.
mpathd Configuration file
# cat /etc/default/mpathd
#
#pragma ident
"@(#)mpathd.dfl 1.2
00/07/17 SMI"
#
# Time taken by mpathd to detect a NIC failure in ms. The minimum time
# that can be specified is 100 ms.
#
FAILURE_DETECTION_TIME=10000
#
# Failback is enabled by default. To disable failback turn off this option
#
FAILBACK=yes
#
# By default only interfaces configured as part of multipathing groups
# are tracked. Turn off this option to track all network interfaces
# on the system
#
TRACK_INTERFACES_ONLY_WITH_GROUPS=yes
We can check the failure and repair of an interface very easily using if_mpadm
command. -d detaches the interface whereas -r reattaches it.
# if_mpadm -d ce0
# if_mpadm -r ce0
Meanings of FLAGs
You would see flags such as NOFAILOVER, DEPRECATED, STANDBY etc.. in the
output of ifconfig -a command. The meanings of these flags and parameters to
enable them are:
deprecated -> can only be used as test address for IPMP and not for any actual
data transfer by applications.
-failover -> does not failover when the interface fails
standby -> makes the interface to be used as standby
Most commonly used Probe-Based IPMP configurations
Groupname:
ipmp0
Active interface(s):
e1000g0
e1000g1
Standby interface(s):
Data IP addresse(s):
192.168.1.2
Test IP addresse(s):
192.168.1.3
192.168.1.4
Command line :
/etc/hostname.e1000g0:
192.168.1.2 netmask + broadcast + group ipmp0 up \
addif 192.168.1.3 netmask + broadcast + deprecated -failover up
/etc/hostname.e1000g1:
192.168.1.4 netmask + broadcast + deprecated -failover group ipmp0 up
Before failure :
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4] mtu 1500 index 9
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
groupname ipmp0
ether 0:c:29:f6:ef:67
e1000g0:1: flags=9040843[UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER]
mtu 1500 index 9
inet 192.168.1.3 netmask ffffff00 broadcast 192.168.1.255
e1000g1: flags=9040843[UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER] mtu
1500 index 10
inet 192.168.1.4 netmask ffffff00 broadcast 192.168.1.255
groupname ipmp0
ether 0:c:29:f6:ef:71
After failure :
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=19000802[BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED] mtu 0 index 9
inet 0.0.0.0 netmask 0
groupname ipmp0
ether 0:c:29:f6:ef:67
e1000g0:1: flags=19040803[UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED]
mtu 1500 index 9
inet 192.168.1.3 netmask ffffff00 broadcast 192.168.1.255
2. Active Standby
The only difference in case of a active-standby configuration is the interface
configured as standby is not used to send any out bound traffic. Thus disabling the
load balancing feature of an active-active configuration.
Groupname:
ipmp0
Active interface(s):
e1000g0
Standby interface(s):
e1000g1
Data IP addresse(s):
192.168.1.2
Test IP addresse(s):
192.168.1.3
192.168.1.4
Command line :
/etc/hostname.e1000g0:
192.168.1.2 netmask + broadcast + group ipmp0 up \
addif 192.168.1.3 netmask + broadcast + deprecated -failover up
/etc/hostname.e1000g1:
192.168.1.4 netmask + broadcast + deprecated -failover group ipmp0 standby up
Before failure :
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=1000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4] mtu 1500 index 11
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
groupname ipmp0
ether 0:c:29:f6:ef:67
e1000g0:1: flags=9040843[UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER]
mtu 1500 index 11
inet 192.168.1.3 netmask ffffff00 broadcast 192.168.1.255
e1000g1:
flags=69040843[UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY,INACT
IVE] mtu 1500 index 12
inet 192.168.1.4 netmask ffffff00 broadcast 192.168.1.255
groupname ipmp0
ether 0:c:29:f6:ef:71
After failure :
# ifconfig -a
lo0: flags=2001000849[UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL] mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
e1000g0: flags=19000802[BROADCAST,MULTICAST,IPv4,NOFAILOVER,FAILED] mtu 0 index 11
inet 0.0.0.0 netmask 0
groupname ipmp0
ether 0:c:29:f6:ef:67
e1000g0:1: flags=19040803[UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED]
mtu 1500 index 11
inet 192.168.1.3 netmask ffffff00 broadcast 192.168.1.255
e1000g1:
flags=29040843[UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,STANDBY] mtu
1500 index 12
inet 192.168.1.4 netmask ffffff00 broadcast 192.168.1.255
groupname ipmp0
ether 0:c:29:f6:ef:71
e1000g1:1: flags=21000843[UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY] mtu 1500 index
12
inet 192.168.1.2 netmask ffffff00 broadcast 192.168.1.255
1 Share
1 Tweet
2 Share
We can check the failure and repair of an interface very easily using if_mpadm
command. -d detaches the interface whereas -r reattaches it.
# if_mpadm -d ce0
# if_mpadm -r ce0
in.mpathd deamon is responsible to detect and repair IPMP failures. Check if the
process is running on the system :
2222
0 20:41:10 ?
0:06 /usr/lib/inet/in.mpathd
In case its not running simply run the below command to start it :
# /usr/lib/inet/in.mpathd
First and foremost thing to do is to check the /var/adm/messages file and look for
mpathd related errors. You may find different errors ( as well as messages ) related
to IPMP as shown below. The errors in the messages file can easily tell you the
problem in the IPMP configuration.
The ifconfig -a command output displays the various flags related to IPMP and
interface configuration.
1. interfaces configured for IPMP missing the "UP" and/or "RUNNING" flag in the
ifconfig -a output
2. interfaces configured for IPMP showing as "FAILED" in "ifconfig -a" output
deprecated -> can only be used as test address for IPMP and not for any actual data
transfer by applications.
-failover -> does not failover when the interface fails
standby -> makes the interface to be used as standby
In the case interface is not showing the RUNNING flag, Check the output of any of
the below commands to ensure that you have a working link between server and
switch port.
Ensure that the switchport is set to auto-negotiate. Disconnect and reconnect the
ethernet from server side to renegotiate link speed with the switchport.
In the case interface is not showing the UP flag use :
Probe based IPMP will use any on-link routers to send ICMP probes to and listen for
responses. We can monitor the snoop command output to ensure that the onlink
router is responding to the pings. The in.mpathd daemon uses test addresses to
exchange ICMP probes, also called probe traffic, with other targets on the IP link.
Probe traffic helps to determine the status of the interface and its NIC, including
whether an interface has failed. The probes verify that the send and receive path to
the interface is working correctly.
In the first window :
Here 192.168.1.1 is the default router. You can check the default router in the netstat
-nrv output.
Now in the first window you should be able to see the traffic :
Here the first line is the outgoing ICMP request (the ping) and the second line is
the ICMP reply.
If you are using probe based IPMP ( an interface marked with -failover ), then use
pkill to provide a debug snapshot from in.mpathd and check for probes lost
messages output to /var/adm/messages:
If the netstat -gn outputs show interfaces that cannot respond to ALL-SYSTEMS
multicast (224.0.0.1), then add the host route using the route -p command.
Is VCS Multi-NIC In use with IPMP?
VCS uses a resource type called Multi-NIC to configure the IPMP using the solaris
mpathd daemon. Make sure you are not using the VCS by checking
/var/adm/messages file for VCS related errors.
# ps -ef|grep -i multi
# grep -i LLT /var/adm/messages
# grep -i GAB /var/adm/messages
If you are using VCS check the main.cf file for the configuration details and hastatus
command to check if the MULTI-NIC resource is configured properly and is running
fine.
Contact support with data
The last option, if everything fails is to contact the oracle support. Provide below
data to oracle support for troubleshooting.
1. snoop
2. Explorer
Sun Explorer output :
# explorer
3. dladm
Symptoms:
* mpathd error messages in /var/adm/messages:
Test address address is not unique; disabling probe based failure detection on
<interface_name>
* interfaces configured for IPMP missing an UP and/or RUNNING flag in the ifconfig -a output
* interfaces configured for IPMP showing as FAILED in ifconfig -a output
STEP 1: Check
configuration.
and
validate
the
IPMP
of
the
the
The ifconfig -a output for the interfaces in the IPMP group MUST indicate UP *AND*
RUNNING.
If UP is missing from the output:
ndd
-get
/dev/<interface>
adv_autoneg_cap
kstat
-p
|grep
e1000g:0
|grep
auto
dladm
show-dev
The proper setting for adv_autoneg_cap is 1, meaning that the Sun interface is advertising its
autonegotiation capability to the link partner (switch).
If adv_autoneg_cap is set to 0, correct with ndd for an immediate change:
Note: ce and hme device requires the instance to be set before any commands. Other devices
identify the instance in the /dev/ argument e.g. to retrieve information on the first instance of
bge: ndd -get /dev/bge0 adv_autoneg_cap.
ndd
-set
/dev/ce
instance
(device
instance)
to check:
# ndd -get /dev/ce adv_autoneg_cap
#
ndd
-set
ndd
/dev/ce
-get
instance
/dev/ce
adv_autoneg_cap
1
if the setting shows 1 after running the ndd command, but the link is not restored:
-ensure
the
switchport
is
set
to
autonegotiate.
-disconnect and reconnect the cable from the interface to the switch to allow the link partners to
re-negotiate.
Use
OBP
watch-net-all
to
test
Sun
interfaces
on
SPARC
hardware:
If you need further assistance to verify your network or switch connections, please consult your
local network administrator.
pkill
-USR1
mpathd
Mar 5 15:06:23 solarishost27 in.mpathd[6338]: [ID 942985 daemon.error] Missed sending total
of
Mar
0
5
probes
15:06:23
Mar
Mar
spread
solarishost27
15:06:23
15:06:23
over
in.mpathd[6338]:
solarishost27
Probe
solarishost27
15:06:23
solarishost27
Number
Mar
15:06:23
solarishost27
Number
Mar
15:06:23
Mar
15:06:23
Mar
Mar
Mar
Mar
Mar
5
5
5
solarishost27
solarishost27
15:06:23
15:06:23
15:06:23
solarishost27
15:06:23
of
probes
sent
419987
probe
acks
probes/acks
of
received
lost
probe
acks
419987
<<-
unacknowledged
stats
probes
received
on
(inet
aggr1)
of
probes
sent
419923
probe
acks
Number
Number
daemon.error]
aggr6)
Probe
Number
373034
(inet
valid
of
occurrences
on
ambiguous
Number
solarishost27
solarishost27
of
of
solarishost27
solarishost27
15:06:23
of
Number
Number
of
[ID
stats
Number
Mar
of
valid
probes/acks
received
lost
unacknowledged
123490
296324
probes
step
#6.
solarishost#
netstat
lo0
-g|grep
ALL-SYSTEMS.MCAST.NET
ALL-SYSTEMS.MCAST.NET
hme0 ALL-SYSTEMS.MCAST.NET 1
solarishost#
netstat
lo0
-gn|grep
224.0.0.1
224.0.0.1
hme0 224.0.0.1 1
If the netstat -gn outputs show interfaces that cannot respond to ALL-SYSTEMS multicast, the
configuration
MUST
ps
grep
-ef|grep
-i
-i
LLT
multi
/var/adm/messages
STEP
6.
Gather
troubleshooting
configuration data specified below
contact Sun Support.
and
and
At this point, if you have validated that each troubleshooting step above is true for your
environment,
and
the
issue
still
exists,
further
troubleshooting
is
required:
for
each
network
interface
in
the
IPMP
group.
note: explorer should be run with the -w localzones option to collect information on any
configured local zones.
II. collect the following outputs to a file using these commands:
#
dladm
show-dev
>
show-dev.out
dladm
show-link
>
show-link.out
following
commands
will
be
collected
for
machines
till
Solaris
10
update4
1.dladm_show-link.out
2.dladm_show-dev.out
3.dladm_show-aggr_-L.out
And the following commands will be collected for machines Solaris 10 update 4 onwards
1.dladm_show-link.out
2.dladm_show-dev.out
3.dladm_show-aggr_-L.out
4.dladm_show-linkprop.out
http://www.gurkulindia.com/main/2011/06/solaris-ipmp-diagnosis-andtroubleshooting/
BLOG FOR UNIX ADMIN, VCS FUNDAMENTALS, VERITAS CLUSTER SERVICES
The purpose of this post is to make the Cluster concept easy for those young
brothers who have just started their career as System Administrator. while
writing this post I have only one thing in mind, i.e. explain the entire cluster
concept with minimum usage of technical jargon and make it as simple as
possible. Thats all about the introduction, let us go to actual lesson.
In any organisation, every server in the network will have a specific purpose in
terms of its usage, and most of the times these servers are used to provide
stable environment to run software applications that are required for
organisations business. Usually, these applications are very critical for the
business, and organisations cannot afford to let them down even for minutes.
For Example: A bank having an application which takes care of its internet
banking.
From the below figure you can see an application running on a standalone
server which is configured with Unix Operating System and Database( oracle /
sybase / db2 /mssql etc). And the organisation considered to run it as
standalone application just because it was not critical in terms of business,
and in other words the whenever the application down it wont impact the
actual business.
Usually, the application clients for these application will connect to the
application server using the server name , server IP or specific application IP.
Let us assume, if the organisation is having an application which is very critical for its business
and any impact to the application will cause huge loss to the organisation. In that case,
organisation is having one option to reduce the impact of the application failure due to the
Operating system or Hardware failure, i.e Purchasing a secondary server with same hardware
configuration , install same kind of OS & Database, and configure it with the same application in
passive mode. And failover the application from primary server to these secondary server
whenever there is an issue with underlying hardware/operating system of primary server.
What is failover?
Whenever there is an issue related to the primary server which make application unavailable to
the client machines, the application should be moved to another available server in the network
either by manual or automatic intervention. Transferring application from primary server to the
secondary server and making secondary server active for the application is called failover
operation. And the reverse Operation (i.e. restoring application on primary server ) is called
Failback
Now we can call this configuration as application HA ( Highly Available ) setup compared to the
earlier Standalone setup. you agree with me ?
Now the question is, how is this manual fail over works when there is an application issue due to
Hardware/Operating System?
Manual Faiover basically involves below steps:
1.
2.
3.
1.
2.
Time Consuming
3.
Technically complex when it involves more dependent components for the application.
Oracle RAC Application level cluster for Oracle database that works on different
Operating Systems
Veritas Cluster Services Third Party Cluster Software works on Different Operating
Systems like Solaris / Linux/ AIX / HP UX.
And In this post, we are actually discussing about VCS and its Operations. This post is not
going to cover the actual implementation part or any command syntax of VCS, but will cover the
concept how VCS makes application Highly Available(HA).
Note: So far, I managed to explain the concept without using much complex terminology, but
now its time to introduce some new VCS terminology to you, which we use in every day
operations of VCS. Just keep little more focus on each new term.
VCS Components
VCS is having two types of Components 1. Physical Components 2. Logical Components
Physical Components:
1. Nodes
VCS nodes host the service groups (managed applications). Each system is connected to
networking hardware, and usually also to storage hardware. The systems contain components
to provide resilient management of the applications, and start and stop agents.
Nodes can be individual systems, or they can be created with domains or partitions on
enterprise-class systems. Individual cluster nodes each run their own operating system and
possess their own boot device. Each node must run the same operating system within a single
VCS cluster.
Clusters can have from 1 to 32 nodes. Applications can be configured to run on specific nodes
within the cluster.
2. Shared storage
Storage is a key resource of most applications services, and therefore most service groups. A
managed application can only be started on a system that has access to its associated data
files. Therefore, a service group can only run on all systems in the cluster if the storage is
shared across all systems. In many configurations, a storage area network (SAN) provides this
requirement.
You can use I/O fencing technology for data protection. I/O fencing blocks access to shared
storage from any system that is not a current and verified member of the cluster.
3. Networking Components
Networking in the cluster is used for the following purposes:
Communications between the cluster nodes and the Application Clients and external
systems.
Logical Components
1. Resources
Resources are hardware or software entities that make up the application. Resources include
disk groups and file systems, network interface cards (NIC), IP addresses, and applications.
1.1. Resource dependencies
Resource dependencies indicate resources that depend on each other because of application or
operating system requirements. Resource dependencies are graphically depicted in a hierarchy,
also called a tree, where the resources higher up (parent) depend on the resources lower down
(child).
A failover service group runs on one system in the cluster at a time. Failover groups are used for
most applications that do not support multiple systems to simultaneously access the
applications data.
A parallel service group runs simultaneously on more than one system in the cluster. A parallel
service group is more complex than a failover group. Parallel service groups are appropriate for
applications that manage multiple application instances running simultaneously without data
corruption.
A hybrid service group is for replicated data clusters and is a combination of the failover and
parallel service groups. It behaves as a failover group within a system zone and a parallel group
across system zones.
3. VCS Agents
Agents are multi-threaded processes that provide the logic to manage resources. VCS has one
agent per resource type. The agent monitors all resources of that type; for example, a single IP
agent manages all IP resources.
When the agent is started, it obtains the necessary configuration information from VCS. It then
periodically monitors the resources, and updates VCS with the resource status.
4. Cluster Communications and VCS Daemons
Cluster communications ensure that VCS is continuously aware of the status of each systems
service groups and resources. They also enable VCS to recognize which systems are active
members of the cluster, which have joined or left the cluster, and which have failed.
4.1. High availability daemon (HAD)
The VCS high availability daemon (HAD) runs on each system. Also known as the VCS engine,
HAD is responsible for:
o
The engine uses agents to monitor and manage resources. It collects information about
resource states from the agents on the local system and forwards it to all cluster members. The
local engine also receives information from the other cluster members to update its view of the
cluster.
The hashadow process monitors HAD and restarts it when required.
4.2. HostMonitor daemon
VCS also starts HostMonitor daemon when the VCS engine comes up. The VCS engine creates
a VCS resource VCShm of type HostMonitor and a VCShmg service group. The VCS engine
does not add these objects to the main.cf file. Do not modify or delete these components of
VCS. VCS uses the HostMonitor daemon to monitor the resource utilization of CPU and Swap.
VCS reports to the engine log if the resources cross the threshold limits that are defined for the
resources.
4.3. Group Membership Services/Atomic Broadcast (GAB)
The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible for cluster
membership and cluster communications.
Cluster Membership
GAB maintains cluster membership by receiving input on the status of the heartbeat from each
node by LLT. When a system no longer receives heartbeats from a peer, it marks the peer as
DOWN and excludes the peer from the cluster. In VCS, memberships are sets of systems
participating in the cluster.
Cluster Communications
GABs second function is reliable cluster communications. GAB provides guaranteed delivery of
point-to-point and broadcast messages to all nodes. The VCS engine uses a private IOCTL
(provided by GAB) to tell GAB that it is alive.
4.4. Low Latency Transport (LLT)
VCS uses private network communications between cluster nodes for cluster maintenance.
Symantec recommends two independent networks between all cluster nodes. These networks
provide the required redundancy in the communication path and enable VCS to discriminate
between a network failure and a system failure. LLT has two major functions.
Traffic Distribution
LLT distributes (load balances) internode communication across all available private network
links. This distribution means that all cluster communications are evenly distributed across all
private network links (maximum eight) for performance and fault resilience. If a link fails, traffic is
redirected to the remaining links.
Heartbeat
LLT is responsible for sending and receiving heartbeat traffic over network links. The Group
Membership Services function of GAB uses this heartbeat to determine cluster membership.
4.5. I/O fencing module
The I/O fencing module implements a quorum-type functionality to ensure that only one cluster
survives a split of the private network. I/O fencing also provides the ability to perform SCSI-3
persistent reservations on failover. The shared disk groups offer complete protection against
data corruption by nodes that are assumed to be excluded from cluster membership.
5. VCS Configuration files.
5.1. main.cf
/etc/VRTSvcs/conf/config/main.cf is key file interms VCS configuration. the main.cf file
basically explains below information to the VCS agents/VCS daemons.
What are the resources available in each Service Group, the types of resources and
its attributes?
What are the dependencies each service group having on other Service Groups?
5.2. types.cf
The file types.cf, which is listed in the include statement in the main.cf file, defines the VCS
bundled types for VCS resources. The file types.cf is also located in the folder
/etc/VRTSvcs/conf/config.
5.3. Other Important files
/etc/llttabdescribes the local systems private network links to the other nodes in the
cluster
nodes can see the storage devices from their local operating systems but at a time only one
node ( active ) can make write operations to the storage.
Why each server need two Storage Paths ( connected to two HBAs)?
To provide redundancy to the servers storage connection and to avoid single point of failure in
storage connection. When ever you notice multiple storage paths connected to any server, you
can safely assume that there is some storage multipath software running on the Operating
system e.g. multipathd, emc powerpath, hdlm, mpio etc.
Why each server need two network connection to physical network?
This is again , to provide redundancy for network connection of the server and to avoid single
point of failure in server physical network connectivity. When ever you see dual physical network
connection, you can assume that Server is using some king of IP multipath software to mange
dual path . e.g. IPMP in solaris, NIC Bonding in linux . etc.
Why we need minimum two Heart beat Connections, between the cluster nodes?
When the VCS lost all its heartbeat connections except the last one, the condition is
called cluster jeopardy. When the Cluster in jeopardy state any of the below things could
happen
1)
The
loss
of
the
last
available
interconnect
link
In this case, the cluster cannot reliably identify and discriminate if the last interconnect link is
lost or the system itself is lost and hence the cluster will form a network partition causing two or
more mini clusters to be formed depending on the actual network partition. At this time, every
Service Group that is not online on its own mini cluster, but may be online on the other mini
cluster will be marked to be in an autodisabled state for that mini cluster until such time that
the interconnect links start communicating normally.
2) The loss of an existing system which is currently in jeopardy state due to a problem
In this case, the situation is exactly the same as explained in step 1 forming two or more mini
clusters.
In case where both both the LLT interconnect links disconnect at the same time and we do not
have any low-pri links configured, then the cluster cannot reliably identify if it is the interconnects
that have disconnected and will assume that the other system is down and now unavailable.
Hence in this scenario, the cluster would consider this like a system fault and the service groups
will be attempted to be onlined on each mini cluster depending upon the system StartupList
defined on each Service Group. This may lead to a possible data corruption due to Applications
writing to the same underlying data on storage from different systems at the same time. This
Scenario is well known as Split Brain Condition .
This is all about introduction on VCS, and please stay tuned for the next posts , where I am
going to discuss about actual administration of VCS.
http://www.gurkulindia.com/main/2011/07/beginners-lesson-veritas-cluster-servicesfor-solaris/
Normally after creating the filesystem, we will add it in vfstab to mount automatically
across the server reboot.But this will be different if your system is part of VCS
cluster.Normally all the application filesystem will be managed VCS to mount the
filesystem whenever the cluster starts and we shouldnt add it in vfstab.Here we are
going to see how to add new filesystem in existing VCS cluster on fly.
Its a tricky job because if filesystem resource is set as critical and its not mounted on the
system,it will bring down the entire service group once you enabled the resource.So
before enabling the resource,we need to make sure, resource attribute set as noncritical.
This is an example that how to add new filesystem on two node vcs cluster without any
downtime.
Environment:
Cluster Nodes: Node1,Node2
Diskgroup name:ORAdg
Volume Name:oradata01
Mount Point:/ORA/data01
Service Group:ORAsg
Volume Resource Name:oradata01_Vol
Mount Resource Name:oradata01_Mount
Diskgroup Resource Name:ORADG
Creating the new volume:
#vxassist -g ORAdg make oradata01 100g ORA_12 ORA_12 layout=mirror
Thank you for reading this article.Please leave a comment if you have any doubt ,i will
get back to you as soon as possible.
http://www.unixarena.com/2012/07/how-to-add-new-filesystem-in-vcs-cluster.html
How do you start VCS cluster if its not started automatically after the server reboot?
Have you ever faced such issues ? If not just see how we can fix these kind of issues
on veritas cluster. I have been asking this questions on the Solaris interviews but most
of them are fail to impress me by saying some unrelated things with VCS stuffs. If you
know the basic of veritas cluster, it will be so easy for to troubleshoot in real time and
easy to explain on interviews too.
VCS troubleshooting
Scenario:
Two nodes are clustered with veritas cluster and you have rebooted one of the server.
Rebooted node has come up but VCS cluster was not started (HAD daemon). You are
trying to start the cluster using hastart command , but its not working.How do you
troubleshoot ?
Here we go.
1.Check the cluster status after the server reboot using hastatus command.
# hastatus -sum |head
Cannot connect to VCS engine
2.Trying to start the cluster using hastart . No Luck. ? Still getting same message like
above ? Proceed with Step 3.
3.Check the llt and GAB service. If its in disable state, just enable it .
[root@UA~]# svcs -a |egrep "llt|gab"
online
Jun_27 svc:/system/llt:default
online
Jun_27 svc:/system/gab:default
[root@UA~]#
* 1 UA
State
OPEN
HB1 UP
00:91:28:99:74:89
HB2 UP
00:91:28:99:74:BF
HB1 UP
00:71:28:9C:2E:OF
HB2 UP
00:71:28:9C:2F:9F
OPEN
[root@UA ~]#
5.If the LLT is down ,then try to configure using lltconfig -c command to configure the
private links. Still if you have any issue with LLT links, then need to check with network
team to fix the heartbeat links.
6.check the GAB status using gabconfig -a command.
7.As per the above command output, memberships are not seeded. We have to seed
the membership manually using gabconfig command.
[root@UA ~]# gabconfig -cx
[root@UA ~]#
Above output Indicates that GAB(Port a) is online on both the nodes. (0 , 1). To know
which node is 0 and which node 1 , refer /etc/llthosts file.
9.Try to start the cluster using hastart command.It should work now.
10.Check the Membership status using gabconfig.
[root@UA ~]# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen 6d0607 membership 01
Above output Indicates that HAD(Port h) is online on both the nodes. (0 , 1).
11.Check the cluster status using hastatus command. System should be back to
business.
[root@UA ~]# hastatus -sum |head
-- SYSTEM STATE
-- System
State
A UA2
Frozen
RUNNING
A UA
RUNNING
-- GROUP STATE
-- Group
System
B ClusterService UA
B ClusterService UA2
Probed
Y
Y
AutoDisabled
N
N
State
ONLINE
OFFLINE
[root@UA ~]#
This is very small thing but many of the VCS beginners failed to fix this start-up issues.
In interviews too ,they are not able say that , If the HAD is not starting using hastart
command , I will check the LLT & GAB services and will fix any issues with that.Then i
will start the cluster using hastart As an interviewers , everybody will expect this
answers.
Hope this article is informative to you .
http://www.unixarena.com/2014/07/troubleshoot-vcs-cluster-starting.html
cfg2html is very use full script to take all the system configuration backup in text format
and html format.This script is available for Solaris,various Linux flavors and HP-Unix.
For more information about the script,please visit
http://groups.yahoo.com/group/cfg2html.
Once you run the script by default it will generate three files.
1. System configuration in text format
2. System configuration in html format
3. Script Error log
These configuration backup files are very useful to build the server from scratch.But we
have to make sure you have latest configuration backup by running cfg2hmtl periodically
and keep the output in other location or web portal for future reference.
Here is the script which you can download it and use it for Solaris 10.
Download cfg2html
From the google drive, Click on File tab- > Select Download
bash-3.00# ./cfg2html_solaris_10v1.0
------------------------------------------------Starting
2012-07-18 14:46:51
bash-3.00# ls -lrt
total 337
-rwx------1 root root 24796 Jul 18 14:46 cfg2html_solaris_10v1.0
drwx------2 root root
-rw-r--r-- 1 root
-rw-r--r-- 1 root
-rw-r--r-- 1 root
bash-3.00# uname -a
SunOS sfos 5.10 Generic_142910-17 i86pc i386 i86pc
# corntab -e
Add the below lines in the end of the file.
00 23 15 * * /var/tmp/cfg2html_solaris10_v1.0/cfg2html_solaris_10v1.0 > /dev/null 2> /dev/null
00 23 01 * * /var/tmp/cfg2html_solaris10_v1.0/cfg2html_solaris_10v1.0 > /dev/null 2> /dev/null
save the file & exit.The above job will run cfg2html 1st and 15th of the month at 11PM .
Thank you for reading this article.Please leave a comment if you have any doubt ,i will
get back to you as soon as possible.
http://www.unixarena.com/2012/07/cfg2html-on-solaris-os-configuration.html
Is your Solaris environment is secure enough ? How can we tighten the system security ?
Here we will see some basic Hardening steps for Solaris OS.Every organization should
maintain hardening checklists of each operating systems which they are using it.Before
server is bringing to operation/production, hardening check list needs to be verified by
support team who supports the server.
Actually OS hardening part is begins before system built.Because you need to choose the
customized OS image according to your environment.By reducing the OS image
size,the possibility of risk(security and reliability) is very less and less size OS image
speeds up the boot process and consumes less disk space.
1.Apply Recommended Patch Cluster bundle regularly . It has very important bug fixes
and security fix patches. Visit https://support.oracle.com to check latest
additional security patches and install it if applicable to your environment.
2.Disable all the services which are not being used anymore.There are many services
which will make you system in high-risk.Disable services like RPC based services,NFS,NIS,
Sendmail,Apache,SNMP,printer services and internet based services if no longer used in
server.
3.Disable inetd services and use ssh for remote login and file-transfer.
Its better not to use telnet,ftp,rlogin services.
4.There are many parameters in the Solaris kernel which can be tuned to increase the
system security.Network parameters can be tuned using ndd
command.Other kernel parameters can modified using /etc/system file.
Network tweaks:
Disable IP forwarding on OS
This article will help you to understand some of the basic troubleshooting instructions for NFS
problems
OS Version
NFSv2
SunOS
UDP
UDP
NFSv3
NFSv4
Solaris[TM] 2.5,2.6,7,8,9
Solaris[TM] 10
*The UDP transport is not supported in NFSv4, as it does not contain the required congestion
control methods
2. Check the Connectivity for NFS Server from NFS client:
1. Check that the NFS server is reachable from the client by running:
#/usr/sbin/ping
2. If the server is not reachable from the client, make sure that the local name service is
running. For NIS+ clients:
#/usr/lib/nis/nisping -u
3. If the name service is running, make sure that the client has received the correct host
information # /usr/bin/getent hosts
4. If the host information is correct, but the server is not reachable from the client, run the ping
command from another client.
5. If the server is reachable from the second client, use ping to check connectivity of the first
client to other systems on the local network. If this fails, check the networking configuration on
the client. Check the following files:
/etc/hosts, /etc/netmasks, /etc/nsswitch.conf,
/etc/nodename, /etc/net/*/hosts etc.
6. If the software is correct, check the networking hardware.
TCP*
Additionally you can refer the NFS Hard mounts vs Soft Mounts
To display statistics for each NFS mounted file system, use the command ?nfsstat -m?. This
command will also tell you which options were used when the file system was mounted. You can
also check the contents of the /etc/mnttab. It should show what is currently mounted. Lastly,
check the dates between the server and the client. An incorrect date may show the file created
in the future causing confusion
http://www.gurkulindia.com/main/2011/05/nfs-troubleshooting/
A common problem which sometimes we use to see in vxvm is DGs in disabled state. In this
post I will try to provide a solution to this problem.
1.) Check out the outputs of df, vxdisk and vxdg to identify the state of DGs and filesystems.
yogesh-test# df -h
Filesystem
/dev/md/dsk/d10
52%
1%
swap
14G 120K
/var/run
dmpfs
7G
0K 7G
0%
/dev/vx/dmp
dmpfs
7G
0K 7G
0%
/dev/vx/rdmp
TYPE
DISK
GROUP
STATUS
c1t0d0s2
auto:sliced
disk01
rootdg
online
c1t1d0s2
auto:sliced
disk02
rootdg
online
c3t1d0s2
auto:sliced
mydg02
mydg
online dgdisabled
c3t1d1s2
auto:sliced
mydg01
mydg
online dgdisabled
c3t1d2s2
auto:sliced
yogdg01
yogdg
online dgdisabled
STATE
ID
rootdg
disabled
1090964640.15.yogesh-test
mydg
enabled
1090904042.16.yogesh-test
yogdg
disabled
1197441805.17.yogesh-test
Note: DGs are showing in disabled state, but still the volumes are still mounted. We need to
umount the filesystems which are in staevfs state. Also you can check the volume state by
vxinfo -pg <DG>. I missed out to take the output of this command to present here.
yogesh-test# fuser -cu /myvol1
yogesh-test# fuser -cu /yogvol
yogesh-test# fuser -ck /myvol1
yogesh-test# fuser -ck /yogvol
yogesh-test# umount /myvol1
yogesh-test# umount /yogvol
OR
2.) Now to get rid of the DGs from disabled state, we need to deport and import the DGs as
shown below:
STATE
ID
rootdg
enabled
1090964640.15.yogesh-test
mydg
enabled
1090904042.16.yogesh-test
yogdg
enabled
1197441805.17.yogesh-test
TYPE
DISK
GROUP
STATUS
c1t0d0s2
auto:sliced
disk01
rootdg
online
c1t1d0s2
auto:sliced
disk02
rootdg
online
c3t1d0s2
auto:sliced
mydg02
mydg
online
c3t1d1s2
auto:sliced
mydg01
mydg
online
c3t1d2s2
auto:sliced
yogdg01
yogdg
online
Note: Sometime we have to use force option for importing & deporting DGs i.e vxdg -f import
<dg> & vxdg -f deport <dg>.
3.) Next step is to proceed with the volumes start and there mount using vxvol & mount
commands.
yogesh-test# vxvol -g yogdg startall
yogesh-test# vxvol -g mydg startall
/dev/vx/dsk/yogdg//yogvol
134G 975M
44G
3%
/yogvol
/dev/vx/dsk/mydg//myvol1
124G
83G
41G
68%
/myvol1
Note: Some times you may encounter problem during mounts, at that time kindly proceed with
the fcsk to clean the bad blocks in the FS and then try to mount the FS again.
http://www.gurkulindia.com/main/2012/02/disk-groups-in-vxvm-are-in-disabledstate/
http://www.gurkulindia.com/main/category/unix-administration/veritas/veritasvolume-manager/veritas-volume-manager-troubleshooting/
Volumes can be striped, mirrored or RAID-5'ed. Mirrored volumes are made up of equallysized collections of subdisks known as plexes. Each plex is a mirror copy of the data in the
volume. The Veritas File System (VxFS) is an extent-based file system with advanced
logging, snapshotting, and performance features.
VxVM provides dynamic multipathing (DMP) support, which means that it takes care of path
redundancy where it is available. If new paths or disk devices are added, one of the steps to
be taken is to run vxdctl enable to scan the devices, update the VxVM device list, and
update the DMP database. In cases where we need to override DMP support (usually in
favor of an alternate multipathing software like EMC Powerpath), we can run vxddladm
addforeign.
Here are some procedures to carry out several common VxVM operations. VxVM has a
Java-based GUI interface as well, but I always find it easiest to use the command line.
Procedure
vxassist -g
vol-name
then
vxassist -g
or
vxedit -rf rm
vol-name
vxdisk list
vxprint -ht
vxdiskadm
or
vxdiskadd
drvconfig; disks
vxdiskconfig
vxdctl enable
vxdctl enable
vxdisk scandisks
Rename disks:
vxedit rename
Rename subdisks:
vxsd mv
vxstat
old-disk-name new-disk-name
old-subdisk-name new-subdisk-name
vxassist growto|growby|shrinkto|
vxassist relayout
volume-namelayout=layout
The progress of many VxVM tasks can be tracked by setting the -t flag at the time the
command is run: utility -t taskflag. If the task flag is set, we can use vxtask to list, monitor,
pause, resume, abort or set the task labeled by the tasktag.
Physical disks which are added to VxVM control can either be initialized (made into a native
VxVM disk) or encapsulated (disk slice/partition structure is preserved). In general, disks
should only be encapsulated if there is data on the slices that needs to be preserved, or if it
is the boot disk. (Boot disks must be encapsulated.) Even if there is data currently on a nonboot disk, it is best to back up the data, initialize the disk, create the file systems, and
restore the data.
When a disk is initialized, the VxVM-specific information is placed in a reserved location on
the disk known as a private region. The public region is the portion of the disk where the
data will reside.
VxVM disks can be added as one of several different categories of disks:
sliced: Public and private regions are on separate physical partitions. (Usually s3 is
the private region and s4 is the public region, but encapsulated boot disks are the reverse.)
simple: Public and private regions are on the same disk area.
cdsdisk: (Cross-Platform Data Sharing) This is the default, and allows disks to be
shared across OS platforms. This type is not suitable for boot, swap or root disks.
If there is a VxFS license for the system, as many file systems as possible should be
created as VxFS file systems to take advantage of VxFS's logging, performance and
reliability features.
At the time of this writing, ZFS is not an appropriate file system for use on top of VxVM
volumes. Sun warns that running ZFS on VxVM volumes can cause severe performance
penalties, and that it is possible that ZFS mirrors and RAID sets would be laid out in a way
that compromises reliability.
VxVM Maintenance
The first step in any VxVM maintenance session is to run vxprint -ht to check the state of
the devices and configurations for all VxVM objects. (A specific volume can be specified
with vxprint -ht volume-name.) This section includes a list of procedures for dealing with
some of the most common problems. (Depending on the naming scheme of a VxVM
installation, many of the below commands may require a -g dg-name option to specify the
disk group.)
Volumes which are not starting up properly will be listed as DISABLED orDETACHED. A
volume recovery can be attempted with the vxrecover -s volume-name command.
If all plexes of a mirror volume are listed as STALE, place the volume in maintenance
mode, view the plexes and decide which plex to use for the recovery:
vxvol maint volume-name (The volume state will be DETACHED.)
vxprint -ht volume-name
vxinfo volume-name (Display additional information about unstartable plexes.)
vxmend off plex-name (Offline bad plexes.)
vxmend on plex-name (Online a plex as STALE rather than DISABLED.)
vxvol start volume-name (Revive stale plexes.)
vxplex att volume-name plex-name (Recover a stale plex.)
If, after the above procedure, the volume still is not started, we can force a plex to a
clean state. If the plex is in a RECOVER state and the volume will not start, use a -f option
on the vxvol command:
vxmend fix clean plex-name
vxvol start volume-name
vxplex att volume-name plex-name
If a subdisk status is listing as NDEV even when the disk is listed as available with
vxdisk list the problem can sometimes be resolved by running
vxdg deport dgname; vxdg import dgname
to re-initialize the disk group.
To remove a disk:
Copy the data elsewhere if possible.
Unmount file systems from the disk or unmirror plexes that use the disk.
vxvol stop volume-name (Stop volumes on the disk.)
vxdg -g dg-name rmdisk disk-name (Remove disk from its disk group.)
vxdisk offline disk-name (Offline the disk.)
vxdiskunsetup c#t#d# (Remove the disk from VxVM control.)
drvconfig; disks
or a reconfiguration reboot.
In vxdiskadm, choose option 5: Replace a failed or removed disk. Follow the prompts and
replace the disk with the appropriate disk.
To replace a failed boot disk:
Use the eeprom command at the root prompt or the printenv command at the ok> prompt
to make sure that the nvram=devalias and boot-deviceparameters are set to allow a boot
from the mirror of the boot disk. If the boot paths are not set up properly for both mirrors of
the boot disk, it may be necessary to move the mirror disk physically to the boot disk's
location. Alternatively, the devalias command at the ok> prompt can set the mirror disk path
correctly, then use nvstore to write the change to the nvram. (It is sometimes necessary
to nvunalias aliasname to remove an alias from thenvramrc, then
nvalias aliasname devicepath
VxVM Mirroring
Most volume manager availability configuration is centered around mirroring. While RAID-5
is a possible option, it is infrequently used due to the parity calculation overhead and the
relatively low cost of hardware-based RAID-5 devices.
In particular, the boot device must be mirrored; it cannot be part of a RAID-5 configuration.
To mirror the boot disk:
eeprom use-nvramrc?=true
Before mirroring the boot disk, set use-nvramrc? to true in the EEPROM settings. If you
forget, you will have to go in and manually set up the boot path for your boot mirror disk.
(See To replace a failed boot disk in the VxVM Maintenance section for the procedure.) It
is much easier if you set the parameter properly before mirroring the disk!
The boot disk must be encapsulated, preferably in the bootdg disk group. (The
bootdg disk group membership used to be required for the boot disk. It is still a standard,
and there is no real reason to violate it.)
If possible, the boot mirror should be cylinder-aligned with the boot disk. (This means
that the partition layout should be the same as that for the boot disk.) It is preferred that 12MB of unpartitioned space be left at either the very beginning or the very end of the
cylinder list for the VxVM private region. Ideally, slices 3 and 4 should be left unconfigured
for VxVM's use as its public and private region. (If the cylinders are aligned, it will make OS
and VxVM upgrades easier in the future.)
(Before bringing the boot mirror into the bootdg disk group, I usually run an
installboot command on that disk to install the boot block in slice 0. This should no longer be
necessary; vxrootmir should take care of this for us. I have run into circumstances in the
past where vxrootmir has not set up the boot block properly; Veritas reports that those bugs
have long since been fixed.)
Mirrors of the root disk must be configured with "sliced" format and should live in the
bootdg disk group. They cannot be configured with cdsdisk format. If necessary, remove the
disk and re-add it in vxdiskadm.
In vxdiskadm, choose option 6: Mirror Volumes on a Disk. Follow the prompts from
the utility. It will call vxrootmir under the covers to take care of the boot disk setup portion
of the operation.
When the process is done, attempt to boot from the boot mirror. (Check the
EEPROM devalias settings to see which device alias has been assigned to the boot mirror,
and run boot device-alias from the ok> prompt.
Procedure to create a Mirrored-Stripe Volume: (A mirrored-stripe volume mirrors several
striped plexesit is better to set up a Striped-Mirror Volume.)
vxvol stop new-volume-name (To re-associate this plex with the old volume.)
vxedit rm new-volume-name
/etc/vx/bin/vxunroot
http://solaristroubleshooting.blogspot.in/2013/06/veritas-volume-managernotes.html
Create Device Tree: The hardware device tree will be built. This device tree can be
explored using PROM monitor commands at the ok> prompt, or by using prtconf once the
system has been booted.
Extended Diagnostics: If diag-switch? and diag-level are set, additional diagnostics will
appear on the system console.
auto-boot?: If the auto-boot? PROM parameter is set, the boot process will begin.
Otherwise, the system will drop to the ok> PROM monitor prompt, or (if sunmon-compat?
and security-mode are set) the > security prompt.
The boot process will use the boot-device and boot-file PROM parameters unlessdiagswitch? is set. In this case, the boot process will use the diag-device and diag-file.
bootblk: The OBP (Open Boot PROM) program loads the bootblk primary boot program
from the boot-device (or diag-device, if diag-switch? is set). If thebootblk is not present
or needs to be regenerated, it can be installed by running the installboot command after
booting from a CDROM or the network. A copy of the bootblk is available
at /usr/platform/`arch -k`/lib/fs/ufs/bootblk
ufsboot: The secondary boot program, /platform/`arch -k`/ufsboot is run. This
program loads the kernel core image files. If this file is corrupted or missing, a bootblk:
can't find the boot program or similar error message will be returned.
kernel: The kernel is loaded and run. For 32-bit Solaris systems, the relevant files are:
/platform/`arch -k`/kernel/unix
/kernel/genunix
/platform/`arch -k`/kernel/sparcV9/unix
/kernel/genunix
As part of the kernel loading process, the kernel banner is displayed to the screen. This
includes the kernel version number (including patch level, if appropriate) and the copyright
notice.
The kernel initializes itself and begins loading modules, reading the files with
the ufsbootprogram until it has loaded enough modules to mount the root filesystem itself.
At that point, ufsboot is unmapped and the kernel uses its own drivers. If the system
complains about not being able to write to the root filesystem, it is stuck in this part of the
boot process.
The boot -a command singlesteps through this portion of the boot process. This can be a
useful diagnostic procedure if the kernel is not loading properly.
/etc/system: The /etc/system file is read by the kernel, and the system parameters are
set.
The following types of customization are available in the /etc/system file:
rootfs: Specify the system type for the root file system. (ufs is the default.)
http://solaristroubleshooting.blogspot.in/2013/03/solaris-sparc-boot-sequence.html