SNMP Trap Handling with Nagios
Francois Meehan
My company has been very successful in providing network-monitoring solutions based on Nagios/Netsaint. For environments that required SNMP trap handling, however, the technique suggested in Nagios's documentation was too cumbersome. That technique requires coding for each individual trap message that needs to be monitored and, for those clients, we could only suggest making use of commercial solutions, all of which come with high price tags and complicated implementations.
Recently, however, we discovered Alex Burger's "SNMP Trap Translator" project that extends Net-SNMP. Coupled with Risto Vaarandi's event correlation tool called "SEC", a small Python script, and, of course, Nagios, we have put together a very scalable, efficient alternative. The whole process is pictured in Figure 1.
Please note that we used Red Hat advanced server for this particular installation, but this solution should be adaptable to other modern Linux distributions.
Pre-requisites:
1. Net-SNMP with snmptrapd configured.
2. SNMPTT, SNMP trap translator.
3. SEC -- Simple event correlator.
4. Nagios.
5. Mib definition files for the equipment or software you need to monitor.
One of the beauties of this solution is that we can use the event severity set by the mib designer. Nagios will always report the event status based on this information.
Net-SNMP
Net-SNMP, formally known as UCD-SNMP, is installed by default on most Linux distributions. Here we are specifically interested in configuring the trap receiver portion of the installation. The trap receiver is a daemon that receives its startup configuration in /etc/rc.d/init.d/snmptrapd. We modified the following line:
OPTIONS="-s -u /var/run/snmptrapd.pid"To this:
OPTIONS="-On -u /var/run/snmptrapd.pid"As quoted from SNMP Trap Translator documentation: "The -On is recommended. This will make snmptrapd pass OIDs in numeric form and prevent SNMPTT from having to translate the symbolic name to numerical form." We then modified the file /usr/share/snmp/snmptrapd.conf by adding the following line:
traphandle default /usr/sbin/snmpttWhen the modification is done, the deamon must be restarted for the changes to take effect. SNMP Trap Translator Install the trap translator by following the supplied instructions and then configure the file /etc/snmp/snmptt.ini by altering the some of the parameters as follows:
mode = standalone dns_enable = 1 strip_domain = 1 net_snmp_perl_enable = 1 translate_value_oids = 1 translate_enterprise_oid_format = 1 translate_trap_oid_format = 1 translate_varname_oid_format = 1 log_enable = 1 syslog_enable = 1 syslog_level = info snmptt_conf_files = /etc/snmp/snmptt.confOnce the trap translator is successfully installed, you need to feed some mib trap definitions to it. For example, assume that you have the mib for the APC UPS "powernet361.mib":
./snmpttconvertmib --in=/usr/share/snmp/mibs/powernet361.mib --out=/etc/snmp/snmptt.confThe next time you add mib definitions using this command, the translated mib will be appended to the snmptt.conf file. You can use a different output file but, if you do so, then do not forget to change the snmptt.ini. For example, you can do:
snmptt_conf_files = <<END /etc/snmp/snmptt.conf.generic /etc/snmp/snmptt.conf.compaq /etc/snmp/snmptt.conf.cisco /etc/snmp/snmptt.conf.hp /etc/snmp/snmptt.conf.3com ENDWhen a trap occurs, the trap translator will output the information received from the trap to syslog and, at this point, SEC will take over. SEC SEC was installed before we discovered SNMP Trap Translator. We had used it as an event filter-correlator in a centralized syslog configuration to interface some events with Nagios. It may be possible to have the translator interface directly with Nagios, but we thought we would lose a lot in terms of flexibility if we did this. By using SEC, we can alter the flow of messages and do things such as:
- For a non-stable machine or non-production equipment, you can delay alarms that occur at night until morning.
- If you are suddenly bombarded by traps from a device, SEC can be used to regulate the flow.
/opt/sec/sec.pl -input=/var/log/messages -conf=/opt/sec/sec.conf \ -detach -log=/var/log/sec.logIn our sec.conf file is the rule that handles SNMP traps:
# Snmptrap event translated by snmptraptt type=Single ptype=RegExp pattern=snmptt.*(Normal|INFORMATIONAL|MINOR|WARNING|SEVERE|MAJOR|CRITICAL) \ \"Status Events\" (\w+) \- (.*) desc=snmptrap received from $2 action=shellcmd /opt/nagios/libexec/eventhandlers/snmptraphandling.py \ $2 $1 "$3"snmptraphandling.py The snmptraphandling.py Python script performs the following actions for us:
- Formats the output of SEC
- Obtains the current time in epoch format
- Translates the severity code
- Posts the data into the Nagios command file
define service { use passive-check-template name generic-snmptrap service_description snmp_trap_handling is_volatile 1 check_period none notification_interval 120 notification_options w,u,c,r notification_period 24x7 check_command passive_check_missing max_check_attempts 1 check_freshness 0 stalking_options o,w,u,c register 0 }Note the "stalking_options" parameter. Without it, if two or more traps are received quickly one after the other, the Nagios administrator will receive only alerts from the first trap and thus will not get duplicate messages. For each host to be monitored, we created the following services that refer to the previous service template as shown above:
define service { use generic-snmptrap host_name my_example_host contact_groups my_example_group service_description snmp_trap_handling_ok } define service { use generic-snmptrap host_name my_example_host contact_groups my_example_group service_description snmp_trap_handling_warning } define service { use generic-snmptrap host_name my_example_host contact_groups my_example_group service_description snmp_trap_handling_critical }To test the whole solution, we used an APC UPS with an on-board management card. After we configured the trap destination and community parameters, we began our tests. We simply disconnected the UPS from the mains, and Nagios immediately started to receive events and generate alarms. We then fed mib definitions for most of our major equipment vendors and, since then, have witnessed many different alarms for disks, cooling fans, and a host of other events.
Because we are using Nagios to monitor processes, it only makes sense to use it to monitor the presence of SEC and snmptrapd processes. This gives Nagios a brand new range of opportunities and makes our customer really happy.
This solution is only possible because smart individuals take the time to write high-quality software, such as Nagios, SNMP Trap Translator, Net-SNMP, and SEC.
References
Nagios -- http://nagios.org/
Net-SNMP -- http://www.net-snmp.org/
SEC -- http://kodu.neti.ee/~risto/sec/
SN MP Trap Translator -- http://www.snmptt.org/
Francois Meehan has more than 20 years of experience in computer operations, covering midrange computers to PCs, from AIX to OpenBSD, Linux, and Windows operating systems. He founded his own consulting company, CEDVAL Info Inc., specializing in network monitoring and open source solutions integration. CEDVAL Info customers are mostly in the pharmaceutical and manufacturing industries in the Montreal region and the United States. Francois, his wife, Lise, and their two children live in Notre-Dame-de-l'Ile-Perrot, an island on the St. Lawrence River west of Montreal. Francois can be contacted at: [email protected].