[icinga-devel] icinga-core: proposal: stop checking service once it's "OK"

Halpaap, Mark Mark.Halpaap at partner.commerzbank.com
Fri Mar 12 16:58:29 CET 2010


Hello list,

I've been using icinga for two month now and found that I needed to
implement a small
change for it to fit my needs.

My installation does application monitoring besides the usual
machine/network monitoring,
and there's quite a lot of service checks that may very well go to sleep
(for the current check/time period) 
if and when the corresponding state changes to "OK".
Just imagine a check for existance of a specific file due on a daily
basis.
I have not found a way to tell icinga: stop issuing further (active)
checks once this 
(service-)check is "OK", so I made a shot at implementing it.
Basically I added a service-based attribute called
"stop_checking_on_success", 
setting it to 1 in the service-cfg does just that.

Of course I'd love Icinga to be able to do something like this out of
the box, and maybe
it does and I simply have not found the right switches in Icinga/Nagios
docs?

In any case, below is the patch against 1.0.1, not very beautiful, but
it does the trick 
hereabouts (where otherwise a single machine would be needlessly pounded
with ca. 2000
nrpe-calls every 10 minutes...). 

Any thoughts on the idea itself?
Maybe there are solutions more elegant than this?

One more thing I have not found a solution for yet: In case such a
service-check
(if "OK" in a given period) leaves it's checking period I'd very much
like it's (cgi/web-)
state to change to "PENDING" for the beginning of the next checking
period,
currently it stays at "OK" in that case.

Here's the patch (hopefully not too big for this list, will possibly be
abused
by Outlook):

These files have been patched:

   * base/checks.c
   * cgi/config.c
   * common/objects.c
   * include/objects.h
   * xdata/xodtemplate.c
   * xdata/xodtemplate.h


--------------------------------------------------- snip
-------------------------------------------------------------
diff -rupN icinga-1.0.1/base/checks.c icinga-1.0.1.MHA/base/checks.c
--- icinga-1.0.1/base/checks.c	2010-03-03 10:29:37.000000000 +0100
+++ icinga-1.0.1.MHA/base/checks.c	2010-03-12 11:34:03.000000000
+0100
@@ -1834,6 +1834,26 @@ int check_service_check_viability(servic
 		        }
 	        }
 
+		/* MHA check whether these checks should be continued
(inside the current valid time period) if there was a positive result
before */
+		if(perform_check==TRUE){
+			/* MHA Is this a STOP_CHECKING_ON_SUCCESS
service? */
+			/* MHA And: Has this service been checked
successfully (OK) inside the current checking period? */
+			/* MHA And: Have th other checks above OK'd this
check? */
+			if(svc->current_state==STATE_OK &&
svc->stop_checking_on_success==TRUE && svc->last_time_ok >=
(current_time - check_interval)){
+			      /* MHA Then do not continue checking until
a new checking period begins */
+
preferred_time=current_time+check_interval;
+			      perform_check=FALSE;
+			      
+			      /* MHA This service is _still_ considered
OK */
+			      /* MHA Has the condition above coming TRUE
if time gone by is NOT > check_interval */
+			      svc->last_time_ok = current_time;
+			      
+			      /* MHA To avoid to always have this OK
dummy setting performed provide check periods or dependencies that have
the conditions above come true */
+
+
log_debug_info(DEBUGL_FUNCTIONS,0,"check_service_check_viability(): %s:
This STOP_CHECKING_ON_SUCCESS service has already been checked OK in
this checking period.\n", svc->description);
+			      }
+	        }
+
 	/* pass back the next viable check time */
 	if(new_time)
 		*new_time=preferred_time;
diff -rupN icinga-1.0.1/cgi/config.c icinga-1.0.1.MHA/cgi/config.c
--- icinga-1.0.1/cgi/config.c	2010-03-03 10:29:37.000000000 +0100
+++ icinga-1.0.1.MHA/cgi/config.c	2010-03-12 11:42:12.000000000
+0100
@@ -1173,6 +1173,7 @@ void display_services(void){
 	printf("<TH CLASS='data'>Check Period</TH>\n");
 	printf("<TH CLASS='data'>Parallelize</TH>\n");
 	printf("<TH CLASS='data'>Volatile</TH>\n");
+	printf("<TH CLASS='data'>Stop On Success</TH>\n"); /* MHA */
 	printf("<TH CLASS='data'>Obsess Over</TH>\n");
 	printf("<TH CLASS='data'>Enable Active Checks</TH>\n");
 	printf("<TH CLASS='data'>Enable Passive Checks</TH>\n");
@@ -1248,6 +1249,8 @@ void display_services(void){
 
 		printf("<TD
CLASS='%s'>%s</TD>\n",bg_class,(temp_service->is_volatile==TRUE)?"Yes":"
No");
 
+		printf("<TD
CLASS='%s'>%s</TD>\n",bg_class,(temp_service->stop_checking_on_success==
TRUE)?"Yes":"No"); /* MHA */
+
 		printf("<TD
CLASS='%s'>%s</TD>\n",bg_class,(temp_service->obsess_over_service==TRUE)
?"Yes":"No");
 
 		printf("<TD
CLASS='%s'>%s</TD>\n",bg_class,(temp_service->checks_enabled==TRUE)?"Yes
":"No");
diff -rupN icinga-1.0.1/common/objects.c
icinga-1.0.1.MHA/common/objects.c
--- icinga-1.0.1/common/objects.c	2010-03-03 10:29:37.000000000
+0100
+++ icinga-1.0.1.MHA/common/objects.c	2010-03-12 12:07:28.000000000
+0100
@@ -1762,7 +1762,7 @@ contactsmember *add_contact_to_contactgr
 
 
 /* add a new service to the list in memory */
-service *add_service(char *host_name, char *description, char
*display_name, char *check_period, int initial_state, int max_attempts,
int parallelize, int accept_passive_checks, double check_interval,
double retry_interval, double notification_interval, double
first_notification_delay, char *notification_period, int
notify_recovery, int notify_unknown, int notify_warning, int
notify_critical, int notify_flapping, int notify_downtime, int
notifications_enabled, int is_volatile, char *event_handler, int
event_handler_enabled, char *check_command, int checks_enabled, int
flap_detection_enabled, double low_flap_threshold, double
high_flap_threshold, int flap_detection_on_ok, int
flap_detection_on_warning, int flap_detection_on_unknown, int
flap_detection_on_critical, int stalk_on_ok, int stalk_on_warning, int
stalk_on_unknown, int stalk_on_critical, int process_perfdata, int
failure_prediction_enabled, char *failure_prediction_options, int
check_freshness, int freshness_threshold, char *notes, char *notes_url,
char *action_url, char *icon_image, char *icon_image_alt, int
retain_status_information, int retain_nonstatus_information, int
obsess_over_service){
+service *add_service(char *host_name, char *description, char
*display_name, char *check_period, int initial_state, int max_attempts,
int parallelize, int accept_passive_checks, double check_interval,
double retry_interval, double notification_interval, double
first_notification_delay, char *notification_period, int
notify_recovery, int notify_unknown, int notify_warning, int
notify_critical, int notify_flapping, int notify_downtime, int
notifications_enabled, int is_volatile, int stop_checking_on_success,
char *event_handler, int event_handler_enabled, char *check_command, int
checks_enabled, int flap_detection_enabled, double low_flap_threshold,
double high_flap_threshold, int flap_detection_on_ok, int
flap_detection_on_warning, int flap_detection_on_unknown, int
flap_detection_on_critical, int stalk_on_ok, int stalk_on_warning, int
stalk_on_unknown, int stalk_on_critical, int process_perfdata, int
failure_prediction_enabled, char *failure_prediction_options, int
check_freshness, int freshness_threshold, char *notes, char *notes_url,
char *action_url, char *icon_image, char *icon_image_alt, int
retain_status_information, int retain_nonstatus_information, int
obsess_over_service){ /* MHA arg 21 added */
 	service *new_service=NULL;
 	int result=OK;
 #ifdef NSCORE
@@ -1888,6 +1888,7 @@ service *add_service(char *host_name, ch
 	new_service->notify_on_flapping=(notify_flapping>0)?TRUE:FALSE;
 	new_service->notify_on_downtime=(notify_downtime>0)?TRUE:FALSE;
 	new_service->is_volatile=(is_volatile>0)?TRUE:FALSE;
+
new_service->stop_checking_on_success=(stop_checking_on_success>0)?TRUE:
FALSE; /* MHA */
 
new_service->flap_detection_enabled=(flap_detection_enabled>0)?TRUE:FALS
E;
 	new_service->low_flap_threshold=low_flap_threshold;
 	new_service->high_flap_threshold=high_flap_threshold;
diff -rupN icinga-1.0.1/include/objects.h
icinga-1.0.1.MHA/include/objects.h
--- icinga-1.0.1/include/objects.h	2010-03-03 10:29:37.000000000
+0100
+++ icinga-1.0.1.MHA/include/objects.h	2010-03-12 12:11:50.000000000
+0100
@@ -433,6 +433,7 @@ struct service_struct{
 	int     stalk_on_unknown;
 	int     stalk_on_critical;
 	int     is_volatile;
+	int     stop_checking_on_success; /* MHA */
 	char	*notification_period;
 	char	*check_period;
 	int     flap_detection_enabled;
@@ -691,7 +692,7 @@ servicesmember *add_service_to_servicegr
 contactgroup *add_contactgroup(char *,char *);
/* adds a contactgroup definition */
 contactsmember *add_contact_to_contactgroup(contactgroup *,char *);
/* adds a contact to a contact group definition */
 command *add_command(char *,char *);
/* adds a command definition */
-service *add_service(char *,char *,char *,char
*,int,int,int,int,double,double,double,double,char
*,int,int,int,int,int,int,int,int,char *,int,char
*,int,int,double,double,int,int,int,int,int,int,int,int,int,int,char
*,int,int,char *,char *,char *,char *,char *,int,int,int);	/* adds
a service definition */
+service *add_service(char *,char *,char *,char
*,int,int,int,int,double,double,double,double,char
*,int,int,int,int,int,int,int,int,int,char *,int,char
*,int,int,double,double,int,int,int,int,int,int,int,int,int,int,char
*,int,int,char *,char *,char *,char *,char *,int,int,int);	/* adds
a service definition */ /* MHA argument 22 added */
 contactgroupsmember *add_contactgroup_to_service(service *,char *);
/* adds a contact group to a service definition */
 contactsmember *add_contact_to_service(service *,char *);
/* adds a contact to a host definition */
 serviceescalation *add_serviceescalation(char *,char
*,int,int,double,char *,int,int,int,int);          /* adds a service
escalation definition */
diff -rupN icinga-1.0.1/xdata/xodtemplate.c
icinga-1.0.1.MHA/xdata/xodtemplate.c
--- icinga-1.0.1/xdata/xodtemplate.c	2010-03-03 10:29:37.000000000
+0100
+++ icinga-1.0.1.MHA/xdata/xodtemplate.c	2010-03-12
12:23:55.000000000 +0100
@@ -4004,6 +4004,11 @@ int xodtemplate_add_object_property(char
 
temp_service->is_volatile=(atoi(value)>0)?TRUE:FALSE;
 			temp_service->have_is_volatile=TRUE;
 		        }
+		/* MHA new option stop_checking_on_success */
+		else if(!strcmp(variable,"stop_checking_on_success")){
+
temp_service->stop_checking_on_success=(atoi(value)>0)?TRUE:FALSE;
+
temp_service->have_stop_checking_on_success=TRUE;
+		        }
 		else if(!strcmp(variable,"obsess_over_service")){
 
temp_service->obsess_over_service=(atoi(value)>0)?TRUE:FALSE;
 			temp_service->have_obsess_over_service=TRUE;
@@ -6334,6 +6339,8 @@ int xodtemplate_duplicate_service(xodtem
 
new_service->have_parallelize_check=temp_service->have_parallelize_check
;
 	new_service->is_volatile=temp_service->is_volatile;
 	new_service->have_is_volatile=temp_service->have_is_volatile;
+
new_service->stop_checking_on_success=temp_service->stop_checking_on_suc
cess; /* MHA */
+
new_service->have_stop_checking_on_success=temp_service->have_stop_check
ing_on_success; /* MHA */
 
new_service->obsess_over_service=temp_service->obsess_over_service;
 
new_service->have_obsess_over_service=temp_service->have_obsess_over_ser
vice;
 
new_service->event_handler_enabled=temp_service->event_handler_enabled;
@@ -8348,6 +8355,11 @@ int xodtemplate_resolve_service(xodtempl
 
this_service->is_volatile=template_service->is_volatile;
 			this_service->have_is_volatile=TRUE;
 	                }
+		/* MHA */
+		if(this_service->have_stop_checking_on_success==FALSE &&
template_service->have_stop_checking_on_success==TRUE){
+
this_service->stop_checking_on_success=template_service->stop_checking_o
n_success;
+
this_service->have_stop_checking_on_success=TRUE;
+	                }
 		if(this_service->have_obsess_over_service==FALSE &&
template_service->have_obsess_over_service==TRUE){
 
this_service->obsess_over_service=template_service->obsess_over_service;
 			this_service->have_obsess_over_service=TRUE;
@@ -10715,7 +10727,7 @@ int xodtemplate_register_service(xodtemp
 		return OK;
 
 	/* add the service */
-
new_service=add_service(this_service->host_name,this_service->service_de
scription,this_service->display_name,this_service->check_period,this_ser
vice->initial_state,this_service->max_check_attempts,this_service->paral
lelize_check,this_service->passive_checks_enabled,this_service->check_in
terval,this_service->retry_interval,this_service->notification_interval,
this_service->first_notification_delay,this_service->notification_period
,this_service->notify_on_recovery,this_service->notify_on_unknown,this_s
ervice->notify_on_warning,this_service->notify_on_critical,this_service-
>notify_on_flapping,this_service->notify_on_downtime,this_service->notif
ications_enabled,this_service->is_volatile,this_service->event_handler,t
his_service->event_handler_enabled,this_service->check_command,this_serv
ice->active_checks_enabled,this_service->flap_detection_enabled,this_ser
vice->low_flap_threshold,this_service->high_flap_threshold,this_service-
>flap_detection_on_ok,this_service->flap_detection_on_warning,this_servi
ce->flap_detection_on_unknown,this_service->flap_detection_on_critical,t
his_service->stalk_on_ok,this_service->stalk_on_warning,this_service->st
alk_on_unknown,this_service->stalk_on_critical,this_service->process_per
f_data,this_service->failure_prediction_enabled,this_service->failure_pr
ediction_options,this_service->check_freshness,this_service->freshness_t
hreshold,this_service->notes,this_service->notes_url,this_service->actio
n_url,this_service->icon_image,this_service->icon_image_alt,this_service
->retain_status_information,this_service->retain_nonstatus_information,t
his_service->obsess_over_service);
+
new_service=add_service(this_service->host_name,this_service->service_de
scription,this_service->display_name,this_service->check_period,this_ser
vice->initial_state,this_service->max_check_attempts,this_service->paral
lelize_check,this_service->passive_checks_enabled,this_service->check_in
terval,this_service->retry_interval,this_service->notification_interval,
this_service->first_notification_delay,this_service->notification_period
,this_service->notify_on_recovery,this_service->notify_on_unknown,this_s
ervice->notify_on_warning,this_service->notify_on_critical,this_service-
>notify_on_flapping,this_service->notify_on_downtime,this_service->notif
ications_enabled,this_service->is_volatile,this_service->stop_checking_o
n_success,this_service->event_handler,this_service->event_handler_enable
d,this_service->check_command,this_service->active_checks_enabled,this_s
ervice->flap_detection_enabled,this_service->low_flap_threshold,this_ser
vice->high_flap_threshold,this_service->flap_detection_on_ok,this_servic
e->flap_detection_on_warning,this_service->flap_detection_on_unknown,thi
s_service->flap_detection_on_critical,this_service->stalk_on_ok,this_ser
vice->stalk_on_warning,this_service->stalk_on_unknown,this_service->stal
k_on_critical,this_service->process_perf_data,this_service->failure_pred
iction_enabled,this_service->failure_prediction_options,this_service->ch
eck_freshness,this_service->freshness_threshold,this_service->notes,this
_service->notes_url,this_service->action_url,this_service->icon_image,th
is_service->icon_image_alt,this_service->retain_status_information,this_
service->retain_nonstatus_information,this_service->obsess_over_service)
; /* MHA: added stop_checking_on_success */
 
 	/* return with an error if we couldn't add the service */
 	if(new_service==NULL){
diff -rupN icinga-1.0.1/xdata/xodtemplate.h
icinga-1.0.1.MHA/xdata/xodtemplate.h
--- icinga-1.0.1/xdata/xodtemplate.h	2010-03-03 10:29:37.000000000
+0100
+++ icinga-1.0.1.MHA/xdata/xodtemplate.h	2010-03-12
12:26:48.000000000 +0100
@@ -382,6 +382,7 @@ typedef struct xodtemplate_service_struc
         int        passive_checks_enabled;
         int        parallelize_check;
        int        is_volatile;
+       int        stop_checking_on_success; /* MHA */
        int        obsess_over_service;
        char       *event_handler;
        int        event_handler_enabled;
@@ -449,6 +450,7 @@ typedef struct xodtemplate_service_struc
         int        have_passive_checks_enabled;
         int        have_parallelize_check;
        int        have_is_volatile;
+       int        have_stop_checking_on_success; /* MHA */
        int        have_obsess_over_service;
        int        have_event_handler_enabled;
        int        have_check_freshness;
--------------------------------------------------- snip
-------------------------------------------------------------






Cheers,


Mark Halpaap

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.icinga.org/pipermail/icinga-devel/attachments/20100312/d70a1fa0/attachment.html>


More information about the icinga-devel mailing list