[icinga-checkins] icinga.org: icinga-core/mfriedrich/core: core: avoid duplicate events when scheduling forced host|service check ( Imri Zvik) #2993

git at icinga.org git at icinga.org
Sun Aug 19 19:24:53 CEST 2012


Module: icinga-core
Branch: mfriedrich/core
Commit: ec9c5e327eebd3bcaf3442470484c46a46ae2efe
URL:    https://git.icinga.org/?p=icinga-core.git;a=commit;h=ec9c5e327eebd3bcaf3442470484c46a46ae2efe

Author: Michael Friedrich <michael.friedrich at gmail.com>
Date:   Sun Aug 19 19:09:21 2012 +0200

core: avoid duplicate events when scheduling forced host|service check (Imri Zvik) #2993

previously, we had introduced a hash-like implementation of
host|service->next_check_event directly pointing to the next
scheduled event. this algorithm is being used within
schedule_host|service_check, determing wether to use the
already assigned event, or scheduling a new event. Since we
did not populate the event_data (host or service object) when
adding a new event to the scheduler, this became insame, always
rescheduling a new event, but not discarding the old one.

This has been partly fixed in #2676 with refactoring that detection
and saving the next_check_event accordingly. But on already scheduled
events which were forced (overridden), another bug was unveiled.

Now that we add the reverse pointer from the host|service event_data
back to the newly created event when forcing a check, we will make sure
that those events are checked correctly, and executed/discarded in the
first place, and not always creating a new event, seperated from the rest.

basically, using the previous implementation (with and without the fix
from #2676) we've created an event bomb under various circumstances,
especially when future events would have been overridden (forced checks).
as events usually result in checks, which can result into perfdata, this
could possibly the root cause for #2924 as well, as other users reported
on the portal as well.

http://www.monitoring-portal.org/wbb/index.php?page=Thread&threadID=26352

With the kind patch provided by Imri Zvik, this now works like expected.
Adapted the debug output a bit myself, so with debug_level=272 it will be
easy to see what happens on events scheduling - and not the non-telling
mess before.

kudos to Imri Zvik for the patch.

refs #2993
refs #2676
refs #2182
refs #2924

---

 Changelog     |    1 +
 THANKS        |    1 +
 base/checks.c |   12 +++++++-----
 base/events.c |   16 ++++++++++++++++
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/Changelog b/Changelog
index c1d936a..56eae25 100644
--- a/Changelog
+++ b/Changelog
@@ -39,6 +39,7 @@ FIXES
 * core: fsync() files before fclose() (Andreas Ericsson) #2949 - MF
 * ocre: remove weird switch() statement when scanning checkresult queue (Andreas Ericsson) #2950 - MF
 * core: fix deleting too old check result files (Andreas Ericsson) #2951 - MF
+* core: avoid duplicate events when scheduling forced host|service check (Imri Zvik) #2993
 
 CHANGES
 * core: new command
diff --git a/THANKS b/THANKS
index d11030f..6a8a243 100644
--- a/THANKS
+++ b/THANKS
@@ -343,3 +343,4 @@ in various ways.  If we missed your name, let us know.
 * Michal Zimen
 * Dennis van Zuijlekom
 * Pawel Zuzelski
+* Imri Zvik
diff --git a/base/checks.c b/base/checks.c
index 3c48607..c6f1e63 100644
--- a/base/checks.c
+++ b/base/checks.c
@@ -1895,7 +1895,7 @@ void schedule_service_check(service *svc, time_t check_time, int options) {
 	 */
 	if (temp_event != NULL) {
 
-		log_debug_info(DEBUGL_CHECKS, 2, "Found another service check event for this service @ %s", ctime(&temp_event->run_time));
+		log_debug_info(DEBUGL_CHECKS, 2, "Found another service check event for service '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&temp_event->run_time));
 
 		/* use the originally scheduled check unless we decide otherwise */
 		use_original_event = TRUE;
@@ -1939,6 +1939,8 @@ void schedule_service_check(service *svc, time_t check_time, int options) {
 	 */
 	if (use_original_event == FALSE) {
 
+		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new service check event for '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&check_time));
+
 		/* allocate memory for a new event item */
 		new_event = (timed_event *)malloc(sizeof(timed_event));
 
@@ -1949,12 +1951,11 @@ void schedule_service_check(service *svc, time_t check_time, int options) {
 
 		/* make sure we kill off the old event */
 		if (temp_event) {
+			log_debug_info(DEBUGL_CHECKS, 2, "Removing service check event for service '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&temp_event->run_time));
 			remove_event(temp_event, &event_list_low, &event_list_low_tail);
 			my_free(temp_event);
 		}
 
-		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new service check event.\n");
-
 		/* set the next service check event and time */
 		svc->next_check_event = new_event;
 		svc->next_check = check_time;
@@ -2371,7 +2372,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) {
 	 */
 	if (temp_event != NULL) {
 
-		log_debug_info(DEBUGL_CHECKS, 2, "Found another host check event for this host @ %s", ctime(&temp_event->run_time));
+		log_debug_info(DEBUGL_CHECKS, 2, "Found another host check event for host '%s' @ %s", hst->name, ctime(&temp_event->run_time));
 
 		/* use the originally scheduled check unless we decide otherwise */
 		use_original_event = TRUE;
@@ -2414,7 +2415,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) {
 	 */
 	if (use_original_event == FALSE) {
 
-		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event.\n");
+		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event for '%s' @ %s", hst->name, ctime(&check_time));
 
 		/* allocate memory for a new event item */
 		if((new_event = (timed_event *)malloc(sizeof(timed_event))) == NULL) {
@@ -2423,6 +2424,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) {
 		}
 
 		if (temp_event) {
+			log_debug_info(DEBUGL_CHECKS, 2, "Removing host check event for host '%s' @ %s", hst->name, ctime(&temp_event->run_time));
 			remove_event(temp_event, &event_list_low, &event_list_low_tail);
 			my_free(temp_event);
 		}
diff --git a/base/events.c b/base/events.c
index 7af5e96..4e4b1eb 100644
--- a/base/events.c
+++ b/base/events.c
@@ -935,6 +935,22 @@ int schedule_new_event(int event_type, int high_priority, time_t run_time, int r
 		new_event->event_interval = event_interval;
 		new_event->timing_func = timing_func;
 		new_event->compensate_for_time_change = compensate_for_time_change;
+		/*
+		 * we need to keep the reverse link from the (service|host *)event_data->next_check_event
+		 * to the new_event in order to stay sane on schedule_host|service_check() checks
+		 * later on, already having a new event assigned to host/service, not rescheduling a new event.
+		 * see #2993 for deeper analysis
+		 */
+		if (event_type == EVENT_SERVICE_CHECK) {
+			service *temp_service = (service *)event_data;
+			temp_service->next_check_event = new_event;
+			log_debug_info(DEBUGL_CHECKS, 2, "Service '%s' on Host '%s' next_check_event populated\n", temp_service->description, temp_service->host_name);
+		}
+		if (event_type == EVENT_HOST_CHECK) {
+			host *temp_host = (host *)event_data;
+			temp_host->next_check_event = new_event;
+			log_debug_info(DEBUGL_CHECKS, 2, "Host '%s' next_check_event populated\n", temp_host->name);
+		}
 	} else
 		return ERROR;
 





More information about the icinga-checkins mailing list