Yesterday I set up Munin, by recommendation of a good friend of mine. It's like an EKG, only for servers, and produces charts like the following:
It can produce charts for literally any resouce you can think of (provided you have some way to get to the data via scripting). It'll produce charts for the last 24 hours, week, month, and year.
Installation Hurdles:
No packages for RHEL5!
I installed Munin on a server running RHEL5. Which, for me, meant no packages! The instructions here worked well for me.
Perl library went missing: "Can't locate RRDs.pm in @INC"
On the host, I had to download rrdtool. Even after installation, the munin-cron process still failed to find it (even if I modified the PERL5LIB variable to include rrdtool's isolated include directory.
Cron jobs have a very bare environment. Modifying the PERL5LIB environment variable in the cron file to include the rrd-tool library did the job (hint: to edit the munin's user cron jobs, run crontab -e
as the user munin).
PERL5LIB = /usr/local/rrdtool-1.2.27/lib/perl/5.8.8/i386-linux-thread-multi/
*/5 * * * * /opt/munin/bin/munin-cron # Update munin every 5 minutes
Warning spam: "Possible attempt to put comments in qw() list at /opt/munin/lib/munin-graph line 169"
Now munin was working. However, I started getting this email message every 5 minutes (since all cron system errors get directed to me):
Possible attempt to put comments in qw() list at /opt/munin/lib/munin-graph line 169
Turned out, the munin-graph perl script includes a -w
flag in the shebang line. This flag tells perl to be extra-whiny (perl -h
claims that the warnings are extra-useful). Removing this effectively suppressed the warning.
Conclusion
After I installed the nodes on both machines I wanted to monitor, and installed the master process, and hooked it into the cron job, all was working peachy king. I can watch every resource on all of our machines, and see potential problems before they happen.
But, the journey's not over! I'm going to get some plugins, either via discovery or via blazing my own trail, so we can monitor all of our rails processes - see what's running, cpu usage of each, memory usage, etc., to help us detect any problems with memory leaks or rampant processes.
1 comment:
For those who stumble upon this, I've since switched to using NagiosGraph. It's not quite as powerful as munin, but it rides on top of nagios so doesn't require an additional set of sensors, which was appealing enough to me to make the switch.
Post a Comment