I recently completed the Server upgrade from Zabbix 1.8 to 2.0. To take full advantage of 2.0 I needed to also upgrade all of my host’s Zabbix agents to 2.0.
I have 450+ hosts in my Zabbix environment so manually upgrading all of them is not an option. I wrote up a few scripts and found some good posts to help me automate this process, which I will write about soon, but one thing that helped a lot was to be able to quickly see which hosts were on what version of the Zabbix agent.
I couldn’t easily find a way to do this in the Zabbix dashboard so I figured I would just grab this info out of the database. Turns out it is a pretty simple MySQL query to list all of the hostnames and what version of the software they were running. I modified this query a bit so I could then feed it into my scripts and target exactly the hosts that needed to be upgraded.
Today it’s yours… Have fun
select hosts.host, items.lastvalue from hosts join items on hosts.hostid=items.hostid where items.name like '%zabbix_agent%' and items.lastvalue like '1.8%';
When modifying a host with many items or a template with many linked hosts, I was experiencing what I would call a timeout. I selected the item or trigger I wished to modify or delete and then selected delete. The result, after about 10 seconds of waiting, was to return a blank screen. When I clicked on that host or template that I was modifying, the item I tried to delete was still there.
This became a huge problem when I was dealing with this issue. I had a run away LLD trigger that continued to duplicate itself on all of my 150+ Windows hosts. The underlying problem was that a disk space item that I created was using a lower case ‘c’ and the LLD rule to find drives was creating a trigger for an upper case ‘C’. Fire and brimstone rained and I had 150+ Windows hosts with 120 ‘low disk space on c: drive’ triggers.
The duplication fix was easy, as explained in the above link, but I was still stuck with thousands of extra triggers that existed across my Windows hosts.
My attempt to delete the problem trigger so that it would be removed from all the hosts upon the next discovery period resulted in this ‘timeout’ issue where nothing seemed to happen.
I could ‘unlink and clear’ each host individually from the problem template but this would wipe out any history on that host, not to mention it would have taken forever to do this on 150+ hosts. I was stuck.
Searches through the Zabbix bug list returned few possibilities. I was starting to write up my own bug report when I checked a log that I should have checked from the beginning. The Apache httpd log showed promise:
PHP Fatal error: Allowed memory size of 268435456 bytes exhausted
This was it! Zabbix requires a minimum PHP memory_limit of 128M. I set mine to 256M because… 256M is bigger than 128M… I don’t know, seemed like it may be beneficial to increase this when I started.
Well, I guess my environment has grown to a point where 256M isn’t enough. In fact, I had to up this to 2G before the error went away and a was finally able to delete that trigger.
The change is dynamic for PHP but you will need to restart httpd before Zabbix picks it up.
So, short story, long… Zabbix was starved for memory to complete this delete task and upping the PHP memory_limit in php.ini fixed the problem.
I have since dropped the memory_limit setting to 512M but I think keeping it at 2G would have been safe. If PHP didn’t need to use that amount of memory it wouldn’t have horded it, I don’t think.
Let me know if this has happened to you.