export hostgroups to xml via API with python

According to the Zabbix docs, the only way to export hostgroups is through the API.  My exposure to the Zabbix API is limited, but I knew there were coding giants out there whose shoulders I could stand on.

I would like to give credit to someone directly, but the code I found had no author listed.  Here’s the link to the original on the zabbix.org wiki site for reference.

https://www.zabbix.org/wiki/Python_script_to_export_all_Templates_to_individual_XML_files

The code, as is, works great for exporting templates, but I needed to make some changes to get it to export hostgroups.  Luckily, the API reference pages on the Zabbix website are very helpful.

I’ll leave it up to you to diff the 2 versions to see exactly what changed, but for the basic summary, modify a couple parameters and a couple object properties and the script can used to export many other things.

See the API reference pages for the hostgroup method details.  https://www.zabbix.com/documentation/2.4/manual/api/reference/hostgroup/get

Here’s what I ended up with and it works great!  This will export all the hostgroups into separate xml files and put them into the ./hostgroups directory.

 

#!/usr/bin/python

#
 # pip install py-zabbix
 #
 # source: https://www.zabbix.org/wiki/Python_script_to_export_all_Templates_to_individual_XML_files
 #
 # usage: python zabbix_export_hostgroups_bulk.py --url https://<zabbix server name>/zabbix --user <api user> --password <user passwd>
 #

import argparse
 import logging
 import time
 import os
 import json
 import xml.dom.minidom
 from zabbix.api import ZabbixAPI
 from sys import exit
 from datetime import datetime

parser = argparse.ArgumentParser(description='This is a simple tool to export zabbix hostgroups')
 parser.add_argument('--hostgroups', help='Name of specific hostgroup to export',default='All')
 parser.add_argument('--out-dir', help='Directory to output hostgroups to.',default='./hostgroups')
 parser.add_argument('--debug', help='Enable debug mode, this will show you all the json-rpc calls and responses', action="store_true")
 parser.add_argument('--url', help='URL to the zabbix server (example: https://monitor.example.com/zabbix)',required = True)
 parser.add_argument('--user', help='The zabbix api user',required = True)
 parser.add_argument('--password', help='The zabbix api password',required = True)
 args = parser.parse_args()

if args.debug:
 logging.basicConfig(level = logging.DEBUG, format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
 logger = logging.getLogger(__name__)

def main():
 global args
 global parser

if None == args.url :
 print "Error: Missing --url\n\n"
 exit(2)

if None == args.user :
 print "Error: Missing --user\n\n"
 exit(3)

if None == args.password :
 print "Error: Missing --password\n\n"
 exit(4)

if False == os.path.isdir(args.out_dir):
 os.mkdir(args.out_dir)

zm = ZabbixHostgroups( args.url, args.user, args.password )

zm.exportHostgroups(args)

class ZabbixHostgroups:

def __init__(self,_url,_user,_password):
 self.zapi = ZabbixAPI(url=_url, user=_user, password=_password)

def exportHostgroups(self,args):
 request_args = {
 "output": "extend"
 }

if args.hostgroups != 'All':
 request_args.filter = {
 "name": [args.hostgroups]
 }

result = self.zapi.do_request('hostgroup.get',request_args)
 if not result['result']:
 print "No matching name found for '{}'".format(hostname)
 exit(-3)

if result['result']:
 for t in result['result']:
 dest = args.out_dir+'/'+t['name']+'.xml'
 self.exportTemplate(t['groupid'],dest)

def exportTemplate(self,tid,oput):

print "groupid:",tid," output:",oput
 args = {
 "options": {
 "hostgroups": [tid]
 },
 "format": "xml"
 }

result = self.zapi.do_request('configuration.export',args)
 hostgroup = xml.dom.minidom.parseString(result['result'].encode('utf-8'))
 date = hostgroup.getElementsByTagName("date")[0]
 # We are backing these up to git, steralize date so it doesn't appear to change
 # each time we export the hostgroups
 date.firstChild.replaceWholeText('2016-01-01T01:01:01Z')
 f = open(oput, 'w+')
 f.write(hostgroup.toprettyxml().encode('utf-8'))
 f.close()

if __name__ == '__main__':
 main()

 

Advertisements

Using ntpstat to check NTPD status with Zabbix

The standard way of checking a service in Zabbix checks that the service is running, but I wanted to know not only that the NTPD service was running but that the time was synchronized.  ntpstat is a great utility that does both, checks that the ntpd service is running and then tells you whether the server is synchronized.   ntpstat will report the synchronization state of the NTP daemon running on the local machine.  ntpstat returns 0 if clock is synchronized.  ntpstat returns 1 if clock is  not  synchronized.  ntpstat returns 2 if clock state is unknown, for example if ntpd can’t be contacted.

I created a Zabbix item to use ntpstat.  Here are the 2 ways I have used this new check:

The first way to use ntpstat with Zabbix is to simply create an item using the system.run function.

Name - ntpstat status
Type - Zabbix agent (active)
Key  - system.run[ntpstat &> /dev/null ; echo $?]
Type of Information - Text

Ensure EnableRemoteCommands=1 is set in your zabbix_agentd.conf file for this to work.

The second way to create the item is to use custom user parameters.  This requires a file modification on the monitored instance, so if you have a lot of instances to monitor or do not have a good way to automate this file modification, you may want to stick with option 1

I like creating new userparameter files for custom parameters.

/etc/zabbix/zabbix_agend.d/userparameter_custom_linux.conf
UserParameter=custom.net.ntpstat,ntpstat &> /dev/null ; echo $?

Then create an item similar to above but with a change to the key

Name - ntpstat status
Type - Zabbix agent (active)
Key  - custom.net.ntpstat
Type of Information - Numeric (unsigned)
Data Type - Decimal

Once your custom userparameter file is placed you’ll need to restart the zabbix agent. The last step with either item creation option is to create a trigger that alerts when the returned value is not 0.

I like this check much better than my original one that just alerted when the ntpd service was down.  Now I get alerted before time synchronization issues become an issue for the applications.

This was tested on both CentOS 6.7 and CentOS 7.1, but this should work on your Linux distro of choice as long as you have ntpstat installed.

Hope this helps

 

 


Upgrade Zabbix from 1.8 to 2.0

If you are not running the new version of Zabbix I highly recommend you upgrade.  I was holding out on the upgrade so I could get a handle on the new features of 2.0 such as low level discovery.

I came across some folks who were having trouble, specifically, with the database patches and I think this made me hesitant to complete the upgrade on the production server that has become integral to our monitoring environment.

I’m happy to say that with some planning the upgrade from Zabbix 1.8 to Zabbix 2.0 is as smooth as upgrading to any one of their minor releases.

The biggest hurdle with this upgrade is based on whether or not you have large database tables.  The database upgrade scripts adds new columns and changes data types on a bunch of the largest Zabbix database tables.  This process involves creating a temporary table with the data from your source table and then copying it back to the modified schema.  If you have a 10G history table, that’s 10G of data being copied twice.  Taking steps before the upgrade will minimize the pain of these changes.

In my opinion, the easiest way to handle these large tables is to not have them.  What I mean by that is to partition the tables so that you can drop old data to keep your table sizes manageable.  If you haven’t already seen my post about partitioning your Zabbix MySQL tables, check it out.

With the old history and history_uint partitions dropped, the database upgrade script still took about 1.5hrs to complete, YMMV.  This is completely related to the size of your existing Zabbix database.  I have a 2G history table and a 3G history_uint table.

You did make a backup of your database before starting this upgrade, right?…

I make it a point to enable maintenance mode on the Zabbix UI before I start any upgrades/DB maintenance.  It makes sure no users are messing around in the system.

Shut down the server process and you are ready to begin.

The documented upgrade procedure from Zabbix provides a good step by step process.

Again, the biggest issue I had was the time it took to upgrade the DB.  Once that was complete the rest of the upgrade was a piece of cake.

 


Find Zabbix Agent version easily across all hosts

I recently completed the Server upgrade from Zabbix 1.8 to 2.0.  To take full advantage of 2.0 I needed to also upgrade all of my host’s Zabbix agents to 2.0.

I have 450+ hosts in my Zabbix environment so manually upgrading all of them is not an option.  I wrote up a few scripts and found some good posts to help me automate this process, which I will write about soon, but one thing that helped a lot was to be able to quickly see which hosts were on what version of the Zabbix agent.

I couldn’t easily find a way to do this in the Zabbix dashboard so I figured I would just grab this info out of the database.  Turns out it is a pretty simple MySQL query to list all of the hostnames and what version of the software they were running.  I modified this query a bit so I could then feed it into my scripts and target exactly the hosts that needed to be upgraded.

Today it’s yours… Have fun

select hosts.host, items.lastvalue from hosts join items on hosts.hostid=items.hostid where items.name like '%zabbix_agent%' and items.lastvalue like '1.8%';

Zabbix large host delete or modify timeout

When modifying a host with many items or a template with many linked hosts, I was experiencing what I would call a timeout.  I selected the item or trigger I wished to modify or delete and then selected delete.  The result, after about 10 seconds of waiting, was to return a blank screen.  When I clicked on that host or template that I was modifying, the item I tried to delete was still there.

This became a huge problem when I was dealing with this issue.  I had a run away LLD trigger that continued to duplicate itself on all of my 150+ Windows hosts.  The underlying problem was that a disk space item that I created was using a lower case ‘c’ and the LLD rule to find drives was creating a trigger for an upper case ‘C’.  Fire and brimstone rained and I had 150+ Windows hosts with 120 ‘low disk space on c: drive’ triggers.

The duplication fix was easy, as explained in the above link, but I was still stuck with thousands of extra triggers that existed across my Windows hosts.

My attempt to delete the problem trigger so that it would be removed from all the hosts upon the next discovery period resulted in this ‘timeout’ issue where nothing seemed to happen.

I could ‘unlink and clear’ each host individually from the problem template but this would wipe out any history on that host, not to mention it would have taken forever to do this on 150+ hosts.  I was stuck.

Searches through the Zabbix bug list returned few possibilities.  I was starting to write up my own bug report when I checked a log that I should have checked from the beginning.  The Apache httpd log showed promise:

 PHP Fatal error:  Allowed memory size of 268435456 bytes exhausted

This was it!  Zabbix requires a minimum PHP memory_limit of 128M.  I set mine to 256M because… 256M is bigger than 128M…  I don’t know, seemed like it may be beneficial to increase this when I started.

Well, I guess my environment has grown to a point where 256M isn’t enough.  In fact, I had to up this to 2G before the error went away and a was finally able to delete that trigger.

The change is dynamic for PHP but you will need to restart httpd before Zabbix picks it up.

So, short story, long…  Zabbix was starved for memory to complete this delete task and upping the PHP memory_limit in php.ini fixed the problem.

I have since dropped the memory_limit setting to 512M but I think keeping it at 2G would have been safe.  If PHP didn’t need to use that amount of memory it wouldn’t have horded it, I don’t think.

Let me know if this has happened to you.

Later.


Zabbix configure unable to find iconv.h

While trying to configure the Zabbix agent on CentOS 6.2 I was presented with the error checking for ICONV support… configure: error: Unable to find iconv.h “no”

This only showed up when I tried the –enable-static configure option. I needed this to be able to use the compiled binaries on multiple 6.2 machines.

After a lot of searching I found that I simply needed to install the glibc-static RPM. Problem solved.

Hope this helps


Partitioning Zabbix tables in MySQL

My Zabbix server was in a world of hurt and Ricardo over at zabbixzone.com saved the day.  At the time, I was monitoring about 200 devices with a 4GB, 4CPU virtual machine.  This poor box was running everything, Zabbix server, Zabbix UI, and MySQL.  Needless to say, it was struggling. It needed some performance tuning.

I came across Ricardo’s post and immediately implemented a slightly altered version of the plan he laid out.  Check out his post, it’s fantastic.

The basic premise is that there are certain tables in Zabbix that collect a ton of data.  Not only do the tables get big, they slow down the UI due to long query times, etc.  If you utilize MySQL’s ability to partition these tables you not only make your queries quicker, you can easily keep your disk space under control too.  I’m sure there’s more benefits, but those 2 were enough to sell me on it.

MySQL has an optimization trick known as pruning where it will only look in the partitions that have the expected data in them.  This saves a lot of disk IO in the process.

When you have partitioned tables in MySQL you can drop old partitions and only the data from those dropped partitions is deleted.  In the case of the history tables in Zabbix, I have item history set to 7 days.  I have created weekly partitions and delete any partitions older than 2 weeks.

My environment monitors over 400 servers now, on the same modest VM, and my weekly history partitions have about 75m rows each.  Each week I drop a single partition and get back almost 5G of disk space.  Not only does this save disk space, but Zabbix no longer has to look through, potentially, 75m extra rows to get what it needs and the indexes on the tables are a lot smaller too.

Here’s a good example of the pruning by MySQL.

mysql> explain partitions select value from history_uint where itemid=18435 and clock<=1340923260 order by itemid,clock desc limit 4;
+----+-------------+--------------+----------------+-------+----------------+----------------+---------+------+-------+-------------+
| id | select_type | table        | partitions     | type  | possible_keys  | key            | key_len | ref  | rows  | Extra       |
+----+-------------+--------------+----------------+-------+----------------+----------------+---------+------+-------+-------------+
|  1 | SIMPLE      | history_uint | p201207wk01    | range | history_uint_1 | history_uint_1 | 12      | NULL | 21525 | Using where |
+----+-------------+--------------+----------------+-------+----------------+----------------+---------+------+-------+-------------+

This explain plan shows that MySQL only looked in 1 partition, only about 75m rows instead of a possible 225m+.  The index knocks that number down to about 21525. That’s a win as far as I’m concerned.

Now, I’m going to stop babbling and you should go check out Ricardo’s post at Zabbix Zone.