
Monitoring TCP/IP daemonns and system resources on Linux systems via snmp
Hi there folks, I have been given the following mini-brief by my boss:
"X needs to develop a set of proactive checks with their related proactive
actions for the Internet cluster to ensure the highest possible availability
and quality of service to the customers. <blah blah> checks must be
documented and a monthly report is generated for both the system
availability and the quality of service"
I have worked as a Linux sys admin for around two years now, so I'm fairly
familiar with the standard monitoring tools (free / vmstat / iostat /
netstat / ps / mrtg / etc), however I was wondering if anyone has experience
in completing a target such as this?
Our systems (Linux 86 Intel based 2.2 or 2.4 series kernel) use snmpd as
standard so I think snmp would be a good way to go as this is already pretty
much ready to roll - but what I'm wondering is how I can get snmp data from
20/30 machines into some graphical format and analyse it collectively?
You can see my brief. It is fairly wide but I know I need to monitor both
availability of customer facing services (pop / smtp / web / etc) and their
speed of response. I am also hoping to monitor general system stats (disk
I/o, memory paging, etc, etc).
If anyone has any ideas for a Linux (non X) application/application suite
that can do all of this I would be very happy.
Big thanks in advance + apologies if this is a little off-topic.
Kevin Donnelly