[icinga-users] Host retry interval not adhered to

Michael Friedrich michael.friedrich at gmail.com
Wed Jan 30 11:52:31 CET 2013


On 30.01.2013 11:41, Mike Noordermeer wrote:
> Hi,
>
> interval_length is 60 (and all these settings are confirmed by the web
> interface, which shows a retry interval of '1m 0s').
>
> It's running version 1.7.0-4~bpo60+1 (Debian Backport on Debian
> Squeeze).

AFAIK there's a newer version in backports available.

http://packages.debian.org/squeeze-backports/icinga

that 1.7.1 is a special revision with several patches applied from 1.7.2 
and so forth.

https://wiki.icinga.org/display/Dev/Icinga+Core+Changelog#IcingaCoreChangelog-172-27082012

likely the first 2 lines will fix your described problem.

core: fix duplicated events on check scheduling logic for new events 
(Andreas Ericsson) #2676 #2993 - MF
core: avoid duplicate events when scheduling forced host|service check 
(Imri Zvik) #2993 - MF

guess you should upgrade to latest stable then.

> I've looked through the changelog, but couldn't find anything
> host retry/check interval related since that version. Sometimes the 1
> min retry interval is adhered to, but I couldn't find any logic in this.
>
> The config is quite extensive (using check_mk etc.), but I believe the
> base settings did not change from the default settings (only the check
> timeouts were increased a bit as far as I can see).
>
> Regards,
>
> Mike
>
> On 30/01/2013 11:34, Michael Friedrich wrote:
>> On 30.01.2013 10:56, Mike Noordermeer wrote:
>>> Hi,
>>>
>>> My host has a host check interval of 1 min, retry interval of 1 min and
>>> max check attempts of 3. Now I'm seeing the following in the event log
>>> for this host:
>>
>> please provide listed information in the first place
>>
>> - os and version
>> - icinga version
>> - icinga install method (src, pkg, etc)
>> - configuration modifications
>>
>> ?
>>
>>>
>>> [2013-01-30 02:04:41] HOST NOTIFICATION:
>>> noc-email;host.nl;DOWN;notify-host-by-email;CRITICAL - host: rta nan,
>>> lost 100%
>>> [2013-01-30 02:04:41] HOST ALERT: host.nl;DOWN;HARD;3;CRITICAL - host:
>>> rta nan, lost 100%
>>> [2013-01-30 02:04:31] SERVICE ALERT: host.nl;MSSQL lock
>>> timeouts;CRITICAL;HARD;1;CRITICAL - cannot connect to host. DBI
>>> connect(':server=host','monitor',...) failed: OpenClient message: LAYER
>>> = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (41)
>>> [2013-01-30 02:04:31] HOST ALERT: host.nl;DOWN;SOFT;2;CRITICAL - host:
>>> rta nan, lost 100%
>>> [2013-01-30 02:04:21] HOST ALERT: host.nl;DOWN;SOFT;1;CRITICAL - host:
>>> rta nan, lost 100%
>>>
>>> Does anyone have an idea why it seems to perform the host check every 10
>>> seconds, instead of every minute (and is thus generating a notification
>>> after 30 seconds instead of 3 minutes)?
>>>
>>
>>
>
>


-- 
DI (FH) Michael Friedrich

mail:     michael.friedrich at gmail.com
twitter:  https://twitter.com/dnsmichi
jabber:   dnsmichi at jabber.ccc.de
irc:      irc.freenode.net/icinga dnsmichi

icinga open source monitoring
position: lead core developer
url:      https://www.icinga.org




More information about the icinga-users mailing list