Using AWS SES for Zabbix email notifications

I’ve been using Amazon Simple Email Service as the SMTP server for my home lab notifications.  I don’t have to worry about setting up my own SMTP server and there is a minimal cost to this configuration.  My SES account is set for ‘sandbox’ access where there is a limit to daily email of 200.  I sent an email to Amazon Support and they upped the quota to 1000.  The pricing schedule says that it’s $.10 per 1000 email per day.  I send about 20 per month, so I see a $.01 charge every couple of months.  Good deal to me.

To get this going you’ll have to set up your SES account on AWS.  I won’t go into detail, but would do a post if someone wants more detail.

Once your account is set up, you’ll need to note the SMTP credentials from SES homepage, then under Email Sending you’ll click on SMTP Settings.

You’ll need the following to set up a media type in Zabbix:

  • SMTP Username
  • SMTP Password
  • Servername
  • port

Once you have these credentials, jump over to the Zabbix dashboard and click Administration, then Media types.  Click Create New Media Type and your new type a name ( IE: Email – AWS SES).  Here’s what I entered in the media type to get this working:

Type - EmailSMTP
server - servername from SES SMTP Settings page
SMTP server port - 587
SMTP helo - servername from SES SMTP Settings page
SMTP email - an email that's been verified by SES
Connection Security - STARTTLS
Authentication - Normal Password
Username - SES SMTP Username
Password - SES SMTP Password
Enabled - checked

Click add to save your config and let’s move on to configuring the new media type for a user.

Under Administration then Users, select the user who will be receiving trigger notifications and click the Media tab for that user.  Add a new Media type of Email – AWS SES then enter an email address in Send to.  Modify When active and Use if severity if desired.  Make sure it’s Enabled, then click Add.

Click Update and move on to Configuration, Actions.

Make sure the Event Source is set to TriggersCreate Action if you don’t already have one.  Enter a descriptive name and add Conditions to the Action.  See below for some good starting conditions.  These will send an email if a trigger is at least a severity of High.  It will also not send email when a host is in maintenance status.  Change these to your needs.

Once you finish the Action tab, move to the Operations tab.  Here you decide how the email message will be formatted and how the email will be sent.

I configure my actions to send an email at some interval (1hr in this example) until the solution is resolved or acknowledged.  This way no one can say they didn’t see the alert…

Click the Recovery Operation tab and add an entry under Operations.  Sam as you did on the Operations tab, click New, then

Operations type - Send Message
Send to Users - <select a user>
Send only to  - All or Email - AWS SES

Click add to save your operation details entries.

Once you’ve added the Operations and Recovery Operations steps, you should save your changes by clicking Add

Now, it’s time to test out your new email alerts.

An easy way to test this is to stop a zabbix-agent service or otherwise activate a trigger that has a severity of at least High.  If this is a lab test box, one trick I use regularly is to create an item that watches a file then set a trigger to alert when the file is missing.

Item
Name - Trigger Test
Type - Zabbix Agent (active)
Key - vfs.file.exists[/tmp/zabbix-trigger-test-.txt]
Type of Information - Numeric (unsigned)
Data Type - Decimal

Trigger
Name - Trigger Test
Severity - Disaster
Expression - {hostname001:vfs.file.exists[/tmp/zabbix-trigger-test-.txt].last()}<>1

Now just create a file called /tmp/zabbix-trigger-test.txt on your host.  When you want to test the trigger, simply rename/delete the file and the trigger will activate.  Add the file back and the trigger goes to RESOLVED state.

Troubleshooting the email send happens in Monitoring, Problems.  Show Recent Problems and in the Actions column you will either see Done or FailuresDone is good and everything is working as expected.  Failures will show an error message to help track down the issue.  Click on Failures and hover over the ‘i’ to see what the error message is.  You are on your own for figuring out what the error means.  It’s most likely a wrong port or connection security on your media type, but you’ll have to track it down.

This write-up, in general, will work with any SMTP server you’d like to connect to.  I hope this helps someone get their email alerts working in Zabbix.

 

Advertisements

export hostgroups to xml via API with python

According to the Zabbix docs, the only way to export hostgroups is through the API.  My exposure to the Zabbix API is limited, but I knew there were coding giants out there whose shoulders I could stand on.

I would like to give credit to someone directly, but the code I found had no author listed.  Here’s the link to the original on the zabbix.org wiki site for reference.

https://www.zabbix.org/wiki/Python_script_to_export_all_Templates_to_individual_XML_files

The code, as is, works great for exporting templates, but I needed to make some changes to get it to export hostgroups.  Luckily, the API reference pages on the Zabbix website are very helpful.

I’ll leave it up to you to diff the 2 versions to see exactly what changed, but for the basic summary, modify a couple parameters and a couple object properties and the script can used to export many other things.

See the API reference pages for the hostgroup method details.  https://www.zabbix.com/documentation/2.4/manual/api/reference/hostgroup/get

Here’s what I ended up with and it works great!  This will export all the hostgroups into separate xml files and put them into the ./hostgroups directory.

 

#!/usr/bin/python

#
 # pip install py-zabbix
 #
 # source: https://www.zabbix.org/wiki/Python_script_to_export_all_Templates_to_individual_XML_files
 #
 # usage: python zabbix_export_hostgroups_bulk.py --url https://<zabbix server name>/zabbix --user <api user> --password <user passwd>
 #

import argparse
 import logging
 import time
 import os
 import json
 import xml.dom.minidom
 from zabbix.api import ZabbixAPI
 from sys import exit
 from datetime import datetime

parser = argparse.ArgumentParser(description='This is a simple tool to export zabbix hostgroups')
 parser.add_argument('--hostgroups', help='Name of specific hostgroup to export',default='All')
 parser.add_argument('--out-dir', help='Directory to output hostgroups to.',default='./hostgroups')
 parser.add_argument('--debug', help='Enable debug mode, this will show you all the json-rpc calls and responses', action="store_true")
 parser.add_argument('--url', help='URL to the zabbix server (example: https://monitor.example.com/zabbix)',required = True)
 parser.add_argument('--user', help='The zabbix api user',required = True)
 parser.add_argument('--password', help='The zabbix api password',required = True)
 args = parser.parse_args()

if args.debug:
 logging.basicConfig(level = logging.DEBUG, format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
 logger = logging.getLogger(__name__)

def main():
 global args
 global parser

if None == args.url :
 print "Error: Missing --url\n\n"
 exit(2)

if None == args.user :
 print "Error: Missing --user\n\n"
 exit(3)

if None == args.password :
 print "Error: Missing --password\n\n"
 exit(4)

if False == os.path.isdir(args.out_dir):
 os.mkdir(args.out_dir)

zm = ZabbixHostgroups( args.url, args.user, args.password )

zm.exportHostgroups(args)

class ZabbixHostgroups:

def __init__(self,_url,_user,_password):
 self.zapi = ZabbixAPI(url=_url, user=_user, password=_password)

def exportHostgroups(self,args):
 request_args = {
 "output": "extend"
 }

if args.hostgroups != 'All':
 request_args.filter = {
 "name": [args.hostgroups]
 }

result = self.zapi.do_request('hostgroup.get',request_args)
 if not result['result']:
 print "No matching name found for '{}'".format(hostname)
 exit(-3)

if result['result']:
 for t in result['result']:
 dest = args.out_dir+'/'+t['name']+'.xml'
 self.exportTemplate(t['groupid'],dest)

def exportTemplate(self,tid,oput):

print "groupid:",tid," output:",oput
 args = {
 "options": {
 "hostgroups": [tid]
 },
 "format": "xml"
 }

result = self.zapi.do_request('configuration.export',args)
 hostgroup = xml.dom.minidom.parseString(result['result'].encode('utf-8'))
 date = hostgroup.getElementsByTagName("date")[0]
 # We are backing these up to git, steralize date so it doesn't appear to change
 # each time we export the hostgroups
 date.firstChild.replaceWholeText('2016-01-01T01:01:01Z')
 f = open(oput, 'w+')
 f.write(hostgroup.toprettyxml().encode('utf-8'))
 f.close()

if __name__ == '__main__':
 main()

 


Using ntpstat to check NTPD status with Zabbix

The standard way of checking a service in Zabbix checks that the service is running, but I wanted to know not only that the NTPD service was running but that the time was synchronized.  ntpstat is a great utility that does both, checks that the ntpd service is running and then tells you whether the server is synchronized.   ntpstat will report the synchronization state of the NTP daemon running on the local machine.  ntpstat returns 0 if clock is synchronized.  ntpstat returns 1 if clock is  not  synchronized.  ntpstat returns 2 if clock state is unknown, for example if ntpd can’t be contacted.

I created a Zabbix item to use ntpstat.  Here are the 2 ways I have used this new check:

The first way to use ntpstat with Zabbix is to simply create an item using the system.run function.

Name - ntpstat status
Type - Zabbix agent (active)
Key  - system.run[ntpstat &> /dev/null ; echo $?]
Type of Information - Text

Ensure EnableRemoteCommands=1 is set in your zabbix_agentd.conf file for this to work.

The second way to create the item is to use custom user parameters.  This requires a file modification on the monitored instance, so if you have a lot of instances to monitor or do not have a good way to automate this file modification, you may want to stick with option 1

I like creating new userparameter files for custom parameters.

/etc/zabbix/zabbix_agend.d/userparameter_custom_linux.conf
UserParameter=custom.net.ntpstat,ntpstat &> /dev/null ; echo $?

Then create an item similar to above but with a change to the key

Name - ntpstat status
Type - Zabbix agent (active)
Key  - custom.net.ntpstat
Type of Information - Numeric (unsigned)
Data Type - Decimal

Once your custom userparameter file is placed you’ll need to restart the zabbix agent. The last step with either item creation option is to create a trigger that alerts when the returned value is not 0.

I like this check much better than my original one that just alerted when the ntpd service was down.  Now I get alerted before time synchronization issues become an issue for the applications.

This was tested on both CentOS 6.7 and CentOS 7.1, but this should work on your Linux distro of choice as long as you have ntpstat installed.

Hope this helps

 

 


Upgrade Zabbix from 1.8 to 2.0

If you are not running the new version of Zabbix I highly recommend you upgrade.  I was holding out on the upgrade so I could get a handle on the new features of 2.0 such as low level discovery.

I came across some folks who were having trouble, specifically, with the database patches and I think this made me hesitant to complete the upgrade on the production server that has become integral to our monitoring environment.

I’m happy to say that with some planning the upgrade from Zabbix 1.8 to Zabbix 2.0 is as smooth as upgrading to any one of their minor releases.

The biggest hurdle with this upgrade is based on whether or not you have large database tables.  The database upgrade scripts adds new columns and changes data types on a bunch of the largest Zabbix database tables.  This process involves creating a temporary table with the data from your source table and then copying it back to the modified schema.  If you have a 10G history table, that’s 10G of data being copied twice.  Taking steps before the upgrade will minimize the pain of these changes.

In my opinion, the easiest way to handle these large tables is to not have them.  What I mean by that is to partition the tables so that you can drop old data to keep your table sizes manageable.  If you haven’t already seen my post about partitioning your Zabbix MySQL tables, check it out.

With the old history and history_uint partitions dropped, the database upgrade script still took about 1.5hrs to complete, YMMV.  This is completely related to the size of your existing Zabbix database.  I have a 2G history table and a 3G history_uint table.

You did make a backup of your database before starting this upgrade, right?…

I make it a point to enable maintenance mode on the Zabbix UI before I start any upgrades/DB maintenance.  It makes sure no users are messing around in the system.

Shut down the server process and you are ready to begin.

The documented upgrade procedure from Zabbix provides a good step by step process.

Again, the biggest issue I had was the time it took to upgrade the DB.  Once that was complete the rest of the upgrade was a piece of cake.

 


Find Zabbix Agent version easily across all hosts

I recently completed the Server upgrade from Zabbix 1.8 to 2.0.  To take full advantage of 2.0 I needed to also upgrade all of my host’s Zabbix agents to 2.0.

I have 450+ hosts in my Zabbix environment so manually upgrading all of them is not an option.  I wrote up a few scripts and found some good posts to help me automate this process, which I will write about soon, but one thing that helped a lot was to be able to quickly see which hosts were on what version of the Zabbix agent.

I couldn’t easily find a way to do this in the Zabbix dashboard so I figured I would just grab this info out of the database.  Turns out it is a pretty simple MySQL query to list all of the hostnames and what version of the software they were running.  I modified this query a bit so I could then feed it into my scripts and target exactly the hosts that needed to be upgraded.

Today it’s yours… Have fun

select hosts.host, items.lastvalue from hosts join items on hosts.hostid=items.hostid where items.name like '%zabbix_agent%' and items.lastvalue like '1.8%';

Zabbix large host delete or modify timeout

When modifying a host with many items or a template with many linked hosts, I was experiencing what I would call a timeout.  I selected the item or trigger I wished to modify or delete and then selected delete.  The result, after about 10 seconds of waiting, was to return a blank screen.  When I clicked on that host or template that I was modifying, the item I tried to delete was still there.

This became a huge problem when I was dealing with this issue.  I had a run away LLD trigger that continued to duplicate itself on all of my 150+ Windows hosts.  The underlying problem was that a disk space item that I created was using a lower case ‘c’ and the LLD rule to find drives was creating a trigger for an upper case ‘C’.  Fire and brimstone rained and I had 150+ Windows hosts with 120 ‘low disk space on c: drive’ triggers.

The duplication fix was easy, as explained in the above link, but I was still stuck with thousands of extra triggers that existed across my Windows hosts.

My attempt to delete the problem trigger so that it would be removed from all the hosts upon the next discovery period resulted in this ‘timeout’ issue where nothing seemed to happen.

I could ‘unlink and clear’ each host individually from the problem template but this would wipe out any history on that host, not to mention it would have taken forever to do this on 150+ hosts.  I was stuck.

Searches through the Zabbix bug list returned few possibilities.  I was starting to write up my own bug report when I checked a log that I should have checked from the beginning.  The Apache httpd log showed promise:

 PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted

This was it!  Zabbix requires a minimum PHP memory_limit of 128M.  I set mine to 256M because… 256M is bigger than 128M…  I don’t know, seemed like it may be beneficial to increase this when I started.

Well, I guess my environment has grown to a point where 256M isn’t enough.  In fact, I had to up this to 2G before the error went away and a was finally able to delete that trigger.

The change is dynamic for PHP but you will need to restart httpd before Zabbix picks it up.

So, short story, long…  Zabbix was starved for memory to complete this delete task and upping the PHP memory_limit in php.ini fixed the problem.

I have since dropped the memory_limit setting to 512M but I think keeping it at 2G would have been safe.  If PHP didn’t need to use that amount of memory it wouldn’t have horded it, I don’t think.

Let me know if this has happened to you.

Later.


Zabbix configure unable to find iconv.h

While trying to configure the Zabbix agent on CentOS 6.2 I was presented with the error checking for ICONV support… configure: error: Unable to find iconv.h “no”

This only showed up when I tried the –enable-static configure option. I needed this to be able to use the compiled binaries on multiple 6.2 machines.

After a lot of searching I found that I simply needed to install the glibc-static RPM. Problem solved.

Hope this helps