Professional Documents
Culture Documents
Version 0.22.0
Table of Contents
1 Introduction............................................................................................................... 4
2 Architecture............................................................................................................... 4
3 Key Concepts ............................................................................................................ 5
3.1 Path Navigation .................................................................................................. 5
3.2 Classification...................................................................................................... 5
3.3 zProperties.......................................................................................................... 5
4 Quick Start ................................................................................................................ 5
4.1 Testing SNMP .................................................................................................... 5
4.2 Adding Devices with SNMP............................................................................... 5
4.2.1 Adding remote Windows boxes ................................................................... 5
4.2.2 Adding remote Linux boxes ......................................................................... 6
4.2.3 Adding Cisco devices................................................................................... 6
4.3 No SNMP........................................................................................................... 6
4.3.1 With ssh....................................................................................................... 6
4.3.2 With portscan............................................................................................... 6
5 Device Management .................................................................................................. 7
5.1 Adding a Device................................................................................................. 7
5.2 Editing a Device ................................................................................................. 7
5.3 Searching for a Device........................................................................................ 7
5.4 Process Monitoring............................................................................................. 7
5.5 File System Monitoring ...................................................................................... 7
5.6 Windows Monitoring.......................................................................................... 8
5.7 Zenmodeler ........................................................................................................ 8
5.8 Production State ................................................................................................. 8
5.9 Maintenance Windows ....................................................................................... 8
5.10 Devices Tabs .................................................................................................... 8
5.10.1 Status ......................................................................................................... 8
5.10.2 OS ............................................................................................................. 9
5.10.3 Hardware ................................................................................................... 9
5.10.4 Software..................................................................................................... 9
5.10.5 Events........................................................................................................ 9
5.10.6 History....................................................................................................... 9
5.10.7 Perf ............................................................................................................ 9
5.10.8 PerfConf .................................................................................................... 9
5.10.9 Edit ............................................................................................................ 9
5.10.10 Manage .................................................................................................... 9
5.10.11 Custom .................................................................................................... 9
5.10.12 zProperties ............................................................................................. 10
5.10.13 Changes ................................................................................................. 10
5.11 Custom Schema .............................................................................................. 10
6 Event Management.................................................................................................. 10
6.1 Dashboard ........................................................................................................ 10
This document will give an overview of the architecture of the Zenoss system and
describe how to perform common management tasks. At the core of the Zenoss system is
the Zope web application development environment. As an administrator it may be
useful at times to have access to the ZopeBook, which describes how to use Zope. This
book can be found at the main Zope web site http://www.zope.org.
2 Architecture
Zenoss is made up of twelve different daemons. They are described below:
1. zeo – the backend object database that stores the configuration model.
2. zope – the web application development environment used to develop the console.
3. zenping – high performance asynchronous testing of ICMP status.
4. zenperfsnmp – high performance asynchronous SNMP performance collection.
5. zenmodeler – high performance automated model population using SNMP, SSH,
and Telnet to collect its information. Zenmodeler works against devices that have
been loaded into the DMD.
6. zendisc – a subclass of zenmodeler that walks the routing table to discover the
network topology and then pings all discovered networks to find active IPs and
devices.
7. zenagios – runs Nagios plug-ins on the local box or on remote boxes through SSH
8. zensyslog – collection of and classification of syslog events.
9. zenstatus – perform active TCP connection testing of remote daemons.
10. zenprocess – process monitoring using SNMP host resources mib.
11. zentrap – receives traps and turns them into events
12. zenxevent – receives events through xml-rpc
Zenoss also has a set of programs that run under windows to perform Event log collection
and windows service monitoring using the native windows management interface (WMI).
These services must be run under windows:
1. zenwinmodeler – auto-discovery of services running on a windows box.
2. zenwin – monitoring up/down availability of windows services
3. zeneventlog – collection of event log events.
3.2 Classification
Many of Zenoss’s hierarchies are used to classify IT entities, things like Devices
(computers, routers, switches, etc) or Events (status information sent out by devices).
Once an item is properly classified the system understands more about the item. This
makes proper classification an important activity in the system. Often classification
happens automatically with the ability for manual override later. As Zenoss matures
auto-classification will be come more common.
3.3 zProperties
Zenoss allows configuration to be specified using its hierarchical organization system.
zProperties are properties that control different modules of Zenoss. They can be set at
any level of a hierarchy and values set at lower levels of the hierarchy override those
above. zProperties are described further in Managing zProperties.
4 Quick Start
Once Zenoss is up and running and you have followed the INSTALL.txt, the device
database needs to be populated. Zenoss can perform auto-discovery, devices can be
loaded one at a time through the web UI or you can batch load devices formatted in an
XML file. To be able to model a device Zenoss will need a valid SNMP, SSH, or Telnet
connection to the device. Add the Zenoss server to your device database by clicking
“Add Device” and adding it with its IP.
If the command does not time out, your SNMP is working correctly.
To collect Windows Event logs or log files from a windows box using syslog you can use
SyslogAgent from http://syslogserver.com/syslogagent.html. Windows event log can
also be monitored using zenwin’s native WMI connection but this requires a second
windows box that runs zenwin.
4.3 No SNMP
All Windows services are by default not monitored. If you would like to monitor a
specific service, it is very simple to turn it on. Navigate to the Windows device and click
the OS tab. Click on the service you wish to monitor and change the value of monitor to
“True”.
5.7 Zenmodeler
Zenmodeler goes through the list of devices known to Zenoss and performs auto-discover
against each devices sub-components (such as interfaces, file systems, processes,
ipservices, etc). By default the system is setup to perform complete remodeling every 6
hours. This may be too often for large deployments. Often this process is run only once
per day from cron as described above.
5.10.1 Status
The status tab is the default tab that is shown when you click on a device. It shows much
of the data of the device and its status (SNMP, ping, uptime, etc.). The status page also
shows how the device is organized such as which groups it is under, what kind of system
it is and where the data is being stored.
5.10.3 Hardware
The hardware tab gives you information on the devices available/used memory,
available/used swap and information on the CPU(s).
5.10.4 Software
The software tab gives a list of installed software and can be sorted by Manufacturer,
Name or Install Date.
5.10.5 Events
The events tab is very similar to other events menus. Events may be sorted by quite a
few things including, but not limited to, component, eventClass and count. They may
also be filtered by severity, state and a regular expression applied over all the events.
Events may be move, mapped and acknowledged here as well.
5.10.6 History
The history tab shows the events that have finished with the event life cycle by some
means and have been archived in the history.
5.10.7 Perf
The Perf tab has performance graphs of the device you are looking at. It displays Load
Average 5 Min, CPU Utilization, CPU Idle, Free Memory and Free Swap. This can be
changed to update hourly, daily, weekly, monthly or yearly.
5.10.8 PerfConf
PerfConf gives more detailed data on where the performance data is coming from, allows
you to set thresholds and tells you the specifics of the graphs.
5.10.9 Edit
See the Editing a Device section.
5.10.10 Manage
Using the Manage tab will allow you to change the location of the device, remodel the
device, reset the manage IP, reset the SNMP community, rename the device, clear
heartbeats and delete it.
5.10.11 Custom
The Custom tab allows you to set Custom values in the custom fields defined in Custom
Schema. For more on custom schema, see the Custom Schema section.
5.10.13 Changes
The Changes tab logs user changes via the Zope interface.
6 Event Management
The Zenoss Event Management system can collect events from syslog, Windows event
log, SNMP traps, and XML-RPC. Processing is performed on raw events to integrate
them tightly into the Zenoss model. Specifically, an event is run through a set of rules to
determine its class which can then provide additional information such as event severity
or up-down correlation.
6.1 Dashboard
The Dashboard is the initial menu of Zenoss. The Dashboard shows Systems-level event
summaries, devices that currently have events with severity of at least Error magnitude,
and infrastructure issues along with a navigational bar. There is a search function in
which all or some of a machines name may be typed in to search for it, as well as an IP
address. User settings may be accessed through the “Settings” link in the top right of the
screen. Below the settings link is the time and date that Zenoss was last updated. Every
60 seconds there is an AJAX call that refreshes the data fields and polls for new data. If
the poll fails, it will display “Lost Connection to Zenoss”.
6.3.2 De-duplication
If a single event is submitted multiple times for some reason, instead of the event
clogging up the event log with hundreds or perhaps even thousands of events a counter is
incremented.
6.3.4 Classification
When an event is found, it is automatically identified and placed in the correct location
and is correctly labeled with the corresponding class. Classes can cause actions to fire
and/or add more information to an event (such as correlations, severity modification, or
custom actions).
If the EventClassKey lookup returns no results a second lookup will be performed using
the key “defaultmapping” . Default mappings can be used to match large ranges of
events by regular expression.
After the event context has been applied the device context is applied. During this
process the productionState, location, DeviceClass, DeviceGroups, and Systems, are
added to the event. Once this is done a zEventProperties update is attempted as described
above but using the device class path instead of the event class path. This allows a
particular device or class of devices to override the default values for any given event.
To map this event, first select the checkbox next to the event, select a category next to the
Map button at the bottom of the page. Each category does different things to the event:
changing its severity, moving it to the history table, etc. For now, select "/App" and press
the Map button.
This will take you to the edit screen for a new "Mapper". These are the rules used to map
this event to the "/App" category. This rule, since it matches the Trap by a very specific
OID, is all you need.
In the "transform" section of the mapper, you can put some code to modify the summary.
For example, lets say you want to set the summary string to "spam Filter Detects Virus".
You would put this in the transform edit area:
A trap has a header with some standard (and mostly useless) information. But then it has
a sequence of attribute/values You can see these values as event details if you click on
the last column of the event.
You have indicated you want the value for the OID ".1.3.6.1.4.1.9789.1500.2.5" as the
summary. If you had the MIB loaded, you could do this:
evt.summary = evt.spamFilterDetectsVirus
The "device" object for the event has been made available, too:
By default a rule for new severity 4 and greater production events will be created. If
action is email the event will be emailed. If its page you will need to have an snpp
paging server setup (see externallibs dir of install directory sendpage does this). Lots of
wireless phone systems have SMTP to SMS gateways so you might just use email.
By default email alerts will be sent to the email address defined in the main user settings
tab and pager alerts will go to the pager address. You can override this by filling in the
optional address field.
The delay field is the number of seconds to wait before sending the alert. If an event
clears before delay time no alert is sent. Rules can be modified using the GUI provided.
You can see a list of valid fields in the popup called “Add Filter”.
Email messages can have a user specified subject and body. The first defines the alert to
be sent when a failure is detected. The second defines the clear alert to be sent once the
failure is closed. These fields are python format strings. At the bottom of the page is a
list of the fields available for an alert. A clear events fields are accessed by prefixing the
field name with “clear”. For instance the field prodState becomes clearProdState. There
is also a special field clearOrEventSummary which will print the clear summary or if it
does not exist the original alert summary. This is useful for the subject of a clear alert.
In the case where an alert has no clear (it was deleted for the UI for instance) a meaning
full subject will still be created.
7.1 RRDTemplates
The top level performance configuration object is an RRDTemplate. RRDTemplates
define the data sources to collect, any thresholds and how the data sources should be
graphed. RRDTemplates are defined in the PerfConf tab of any device tree object or on
the collected object itself. Binding of a template to an object is based its name. By
default binding is done by looking for templates with the same name as an objects
meta_type. For instance all devices in the system have the meta_type “Device” so their
RRDTemplate is called Device. Templates are inherited in the same way that zProperties
are so the template closest to the object is its definition. In the PerfConf tab on an actual
object there is a “Local Copy” button this will create a copy of the current template on
the local device, which can then be customized. If the custom copy is no longer
necessary it can be deleted by clicking the “Remove Local” button.
RRDGraphs designate global graph options such as which data sources should be shown
together on a graph, what the y-axis units are, the size of the image created, the width of
15 of 23 Copyright © 2002 Zenoss, Inc. All Rights Reserved
.
lines in the graph, if the graph should use a log based y-axis scale, if the data should be
stacked or not, if a summary should be generated, and what the min and max y-axis
values should be.
8 Availability Monitoring
The availability monitoring system within Zenoss provides active testing of the IT
Infrastructure. The system currently consists of zenping Zenoss’ layer 3 aware topology
monitoring daemon and zenstatus a TCP status tester.
8.1 Zenping
There isn’t much configuration work to be done setting up zenping. The most important
element of this daemon is that Zenoss has built a compete model of the your routing
system. If there are gaps in Zenoss’ routing model the power of zenpings topology
monitoring will not be available. This issue can be seen in the zenping.log file.
8.2 Zenstatus
Zenstatus performs monitoring of TCP services. It is configured by turning on
monitoring of a service under the “Services” root on the Navigation Toolbar. Service
monitoring can be turned on a service class but this can be overridden on any service
instance. For instance “SMTP” will be monitored by default. But it may not be a critical
service on all boxes. If this is the case it may be removed on specific devices. Also if the
service is configured to only listen on localhost (127.0.0.1) the service will not be
monitored.
Nagios plug-ins are configured using a Nagios Template that is much like the
RRDTemplates used for performance monitoring. So a template named “Device” will
bind to all devices below the template definition. Within each template is a list of
commands that will run. The commands can be any program that follows the Nagios
plug-in standard (inputs are command line arguments output is first line of stdout plus a
return code) as defined in http://nagiosplug.sourceforge.net/developer-guidelines.html
Run an http check against all devices using the uri /zport/dmd
check_http –H ${devname}-u /zport/dmd
In a template named FileSystem the following command will run against all FileSystems
on a device.
check_disk –w 10% -c 5% -p ${compname}
9 Modeling Maps
Zenoss uses plugin maps to map real world information into the standard Zenoss model.
Input to the plugins can come from SNMP, SSH or Telnet. Selection of which plugins to
run against a device is done by matching the plugin name against the zProperties:
zCollectorCollectPlugins and zCollectorIgnorePlugins. Plugins that match
zCollectorCollectPlugins are collected ones that match zCollectorIgnorePlugins are
ignored.
• DeviceMap – collect basic information about a device such as its OS type and
hardware model.
• InterfaceMap – collect the list of interfaces on a device.
• RouteMap – collect the routing table from the device.
• IpServicesMap – collect the ip services running on the device.
• FileSystemMap – collect the list of filesystems on a device.
#!/bin/bash
export ZENHOME=/usr/local/zenoss
export PYTHONPATH=$ZENHOME/lib/python
rm -rf /tmp/renderserver/
cd $ZENHOME
bin/zenmodeler run --logpath $ZENHOME/log -v30 >>
log/zenmodeler.err.log 2>&1
bin/repozo.py -B -f var/Data.fs -r var/backup >> log/repozo.log 2>&1
/usr/local/mysql/bin/mysqldump -u root --password=\!mercy\! events >
var/backup/events.sql
To backup or restore the Zeo database the repozo command is used. See its help page for
more details.
The Zeo database needs to be “packed” periodically to reclaim space. To do this you
should set up a cron job that runs the following command once a day.
#!/bin/bash
export ZENHOME=/usr/local/zenoss
export PYTHONPATH=$ZENHOME/lib/python
cd $ZENHOME
bin/zeopack.py -p 8100
find $ZENHOME/var/backup -name \*fsz -mtime +14 -exec rm {} \;
find $ZENHOME/var/backup -name \*.dat -mtime +14 -exec rm {} \;
/usr/local/zenoss/log/*.log {
weekly
rotate 2
copytruncate
}
This command will first model the monitoring machine and then walk through the
routing tables of all routers it can find. Auto-discovery will go as far as valid SNMP
access is found or until a network is discovered in the DMD that has its zAutoDiscover
property set to False.
Routers discovered through this process will be placed in the device path
/Network/Router.
Performing full discovery of all devices on the network run zendisc without the --routers
flag. This will need to be done as root so that a raw socket for ICMP pinging can be
created.
bin/zendisc run
When devices are discovered they are placed into the /Discovered device path. They
should then be moved to a more specific part of the tree. Servers are normally organized
by OS so windows machines might go to /Server/Windows. Other information can be
added to a device, like its Business System or its Location, using the Edit tab on a device.
This command will ping all devices on the 10.2.1.0 network and the attempt to perform
SNMP discovery on them. 10.2.1.0 must exist in the Networks root of the DMD. To
add a network to the system enter its IP in CIDR format i.e. 10.2.1.0/24 for the class C
network in the add field at the bottom of the networks screen.
13.1 Events
Property
Property Name Type Description
Location to which an event will be stored.
Possible values are: status, history and drop.
Default is status meaning the event will be an
“active” event. History sends the evnet directly to
the history table. Drop tells the system to discard
zEventAction string the event.
A list of classes that a clear event should clear in
zEventClearClass lines addition to its own class.
Allows you to override the severity value of an
event. If this is -1 it is ignored. Possible values
zEventSeverity int are 0 – 5.
13.2 Devices
Property
Property Name Type Description
13.3 Services
Property
Property Name Type Description
Determines what severity to send for the specified
zFailSeverity int service.
zHideFieldsFromList lines Fields to hide from Services instance list
zMonitor boolean Tells whether or not to monitor a service.
13.4 Networks
Property
Property Name Type Description
Should zendisc perform auto-discovery on this
zAutoDiscover boolean network
List of netmask numbers to use when creating
network containers. Default is 24, 32 which will
make /24 networks at the top level of the networks
zDefaultNetworkTree lines tree if a network us smaller than /24.
13.5 Manufacturers
Property Name Property Type Description
zDeviceClass string FUTURE USE
zDeviceGroup string FUTURE USE
zSystem string FUTURE USE