How to fix non working RRD graphs after uncontrolled OP5 shutdown
Check if data files are updated
Goto RRD data directory
# cd /opt/monitor/op5/pnp/perfdata
Goto the host’s RRD data directory and check if the files are outdated
# cd /opt/monitor/op5/pnp/perfdata/gepacsdb.health.root.loc
# ll
total 5680
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:50 DB_Connection_Sessions.rrd
-rw-rw-r-- 1 monitor apache 2074 Jan 16 15:35 DB_Connection.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:50 DB_Freespace_Freespace.rrd
-rw-rw-r-- 1 monitor apache 2093 Jan 16 15:35 DB_Freespace.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:50 DB_Queue_Archive_A.rrd
-rw-rw-r-- 1 monitor apache 2072 Jan 16 15:37 DB_Queue_Archive.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:50 DB_Queue_Retrieve_R.rrd
-rw-rw-r-- 1 monitor apache 2074 Jan 16 15:35 DB_Queue_Retrieve.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 21:50 DB_Unspecified_Unspecified.rrd
-rw-rw-r-- 1 monitor apache 2171 Jan 16 15:35 DB_Unspecified.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:50 _HOST__pkt.rrd
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:22 _HOST__pl.rrd
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:22 _HOST__rta.rrd
-rw-rw-r-- 1 monitor apache 3223 Jan 16 15:28 _HOST_.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:50 HTTP_Server_size.rrd
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:50 HTTP_Server_time.rrd
-rw-rw-r-- 1 monitor apache 2719 Jan 16 15:27 HTTP_Server.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:50 HTTPS_Server_size.rrd
-rw-rw-r-- 1 monitor apache 384952 Oct 30 21:50 HTTPS_Server_time.rrd
-rw-rw-r-- 1 monitor apache 2731 Jan 16 15:36 HTTPS_Server.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 21:50 PING_pl.rrd
-rw-rw-r-- 1 monitor apache 384952 Oct 30 20:22 PING_rta.rrd
-rw-rw-r-- 1 monitor apache 2665 Jan 16 15:33 PING.xml
-rw-rw-r-- 1 monitor apache 384952 Oct 30 21:50 SSH_Server_time.rrd
-rw-rw-r-- 1 monitor apache 2026 Jan 16 15:29 SSH_Server.xml
Check if nagios data can be streamed to RRD daemon
# tail -f /opt/monitor/op5/pnp/perfdata.log
2017-01-16 15:41:34 [6096] [0] RRDs::update /opt/monitor/op5/pnp/perfdata/ksaspx71.health.root.loc/PING_pl.rrd 1484577692:0
2017-01-16 15:41:34 [6096] [0] RRDs::update ERROR Unable to connect to rrdcached: Connection refused
2017-01-16 15:41:34 [6096] [0] RRDs::update /opt/monitor/op5/pnp/perfdata/ksaspx18.health.root.loc/System_Load_load1.rrd 1484577693:1.160
2017-01-16 15:41:34 [6096] [0] RRDs::update ERROR Unable to connect to rrdcached: Connection refused
2017-01-16 15:41:34 [6096] [0] RRDs::update /opt/monitor/op5/pnp/perfdata/ksaspx18.health.root.loc/System_Load_load5.rrd 1484577693:1.250
2017-01-16 15:41:34 [6096] [0] RRDs::update ERROR Unable to connect to rrdcached: Connection refused
2017-01-16 15:41:34 [6096] [0] RRDs::update /opt/monitor/op5/pnp/perfdata/ksaspx18.health.root.loc/System_Load_load15.rrd 1484577693:1.210
2017-01-16 15:41:34 [6096] [0] RRDs::update ERROR Unable to connect to rrdcached: Connection refused
2017-01-16 15:41:34 [6096] [0] RRDs::update /opt/monitor/op5/pnp/perfdata/ksaspx08.health.root.loc/System_Processes_Total_procs.rrd 1484577693:199
2017-01-16 15:41:34 [6096] [0] RRDs::update ERROR Unable to connect to rrdcached: Connection refused
Check if rrdcached service is running
Check if RRD daemon is up and running
# service rrdcached status
rrdcached is stopped
[root@thuop5 pnp]# service rrdcached start
Starting rrdcached: rrdcached: can't create pid file '/opt/monitor/var/rrdtool/rrdcached/rrdcached.pid' (File exists)
rrdcached: daemonize failed, exiting.
Starting rrdcached: [FAILED]
We found out, the rrdcached.pid file is still there. It was not deleted because the service did not shut down properly. We renamed the file and tried to start the service again.# mv /opt/monitor/var/rrdtool/rrdcached/rrdcached.pid /opt/monitor/var/rrdtool/rrdcached/rrdcached.pid.2delete
[root@thuop5 pnp]# service rrdcached status
rrdcached is stopped
[root@thuop5 pnp]# service rrdcached start
Starting rrdcached: [ OK ]
Afterwards the RRD graphs are updating again. By default, the rrdcached service is observed by OP5 itself – if your graphs are not updated, check your OP5 host via WebGUI for additional problems with the installation. For details please see OP5 KB record: How is performance data transformed into graph data?