[Icinga-devel] Icinga 2 Development Week 19/14

Michael Friedrich michael.friedrich at gmail.com
Mon May 12 10:54:12 CEST 2014


Summary for Icinga 2 Development Week 19/14:

Target version: 0.0.11

https://dev.icinga.org/projects/i2/issues?query_id=21

3.+4.5.: "weekend"
5.-9.5.: work on cluster changes & open issues
10.+11.5.: "weekend"

-----------------------------------------------------------
Work done:
-----------------------------------------------------------

* rename Dependency 'state_filter' to 'states' #6113
* Documentation: Apply new structure. #6115
* Don't allow "managed" downtimes to be deleted by users #5980
* Livestatus: test host comments with joins #5937
* Dependencies: Service states changes to critical instead of 
unknown/unreachable #5872
* StatusDataWriter only supports host->host, service->service 
dependencies #6131
* Decrease default check intervals. #6107
* Remove the ZlibStream class and the stream_bio functionality #6119
* Apply: Inherit zone from parent object. #6107
* Reimplement load-balancing for checks.#6107
* Implement HA for IDO connections. #6107
* Fix an issue where expired Timer pointers caused other timers to be 
delayed. #6179
* Config validator: Make sure that objects are not abstract. #6148
* Implement support for arrays for the indexer operator #6182
* enable/disable commands do not update status tables #6151
* Check if livestatus log functionality still works #6161
* ITL: Move monitoring plugin commands into a separate config file #6130
* Documentation: add developers section #6184
* DB IDO/Livestatus/Status files: add 'is_reachable' to host and service 
state tables #6094
* Livestatus:  add check_source to host table #6185
* Documentation & Feature: command argument conditionals #5933
* rename host.total_* runtime macros to host.num_* #6189
* Documentation: migration: runtime macros renamed #6149
* Documentation: explain how macro resolving works #6010
* Non sticky acknowledgements won't be removed once the host/service 
recovers #5363
* Remove unnecessary includes #6189


-----------------------------------------------------------
Ongoing
-----------------------------------------------------------

* Cluster changes #6192
** Implement shared API primitives for the cluster #6107
** Reimplement load-balancing for checks.#6107
** Implement HA for IDO connections. #6107 #4739
** Zone configuration sync #6191
* migration script #5821
* Everything else: https://dev.icinga.org/projects/i2/issues?query_id=21


-----------------------------------------------------------
Changes
-----------------------------------------------------------

* add 'is_reachable' to host and service state tables requires Classic 
UI 1.11.3


* Cluster version 3: ClusterListener and Domains are gone. New: 
ApiListener and Zones. #6192

There's a generic ApiListener object defining the ssl certificates 
required for this instance. The new default bind port is '5665'. This 
feature is called 'api' and can be enabled through

# icinga2-enable-feature api

object ApiListener "api" {
   cert_path = SysconfDir + "/icinga2/pki/" + NodeName + ".crt"
   key_path = SysconfDir + "/icinga2/pki/" + NodeName + ".key"
   ca_path = SysconfDir + "/icinga2/pki/ca.crt"
}

For now, this is being used for the cluster functionality only, but 
serves us as solid base for future implementations ("agent", "api", etc) 
for 2.1+ (Icinga 2 development doesn't stop with 2.0!).

The Endpoint objects still exist, but they do not control the 
configuration sync anymore. We've played around with the view based 
configuration and it just did not feel right. Therefore these endpoints 
are the same on all involved nodes, but must be kept in sync/available 
wherever required.
The new default port is '5665' and is now optional.

object Endpoint "icinga2a" {
   host = "icinga2a.localdomain"
}

Additionally the keep_alive for connections can be configured, as well 
as the log_duration for keeping relay logs on connection loss. Defaults 
to 5m and 1d.

That way, the connection stuff for the cluster is "basically" like it 
has been before, but with one difference: Zones.

Zones in Icinga 2 declare a trusted zone among multiple nodes. All nodes 
are considered running in a high availability active/active setup which 
means they elect one active master at runtime.
If the active master dies, the ongoing heartbeat messages will garantuee 
failover detection and make the remaining instances elect a new active 
master.

object Zone "ha-master" {
   endpoints = [ "icinga2a", "icinga2b" ]
}

object Zone "check-satellite" {
   endpoints = [ "icinga2c", "icinga2d", "icinga2e" ]
   parent "ha-master"
}


Communication between zones may happen between all involved nodes, but 
if a "passive" node gets a checkresult, it will forward it to the active 
master which then processes the result (replication to other nodes or 
zones, notifications, backend features like ido).

High availability also means that features like IDO should only run on 
the active node, while the passive ones remain in standby (feature is 
paused). If there's a split brain situation, both (or multiple) 
instances will attempt to write for example ido, or fire notifications.

Load distribution for checks and notifciations works like before - all 
acitve checkers will share the check load. If you're planning to check a 
specific zone on only one satellite, just assign it like that.

Multiple zones can be stacked into a parent-child-tree whereas the 
configuration sync should then happen too. Zone configuration will be
done like

   /etc/icinga2/zones.d/<zonename>

(to be implemented in #6191). Additional permissions will be required to
receive the configuration.

Currently, the following already works:

* APiListener, Endpoints, Zones.
* HA for IDO connections
* Load Balancing for Checks
* active master election in zones

To-do:

* configuration sync in/between zones
* additional permissions


-----------------------------------------------------------
Feedback/Tests required
-----------------------------------------------------------

* Documentation: Read through, try getting started. Everything clear, 
what's missing/unclear/could get a better phrasing?

This is clearly a call to native English speakers (which I am not). Help 
us shaping the documentation! Git patches or github pull requests 
preferred :)

* Backends: Install IDO, Livestatus, Status files. Use your favourite 
GUI/addon and test the functionality.

* Command arguments: Try them out. How does it feel to optionally add 
arguments to check_http and such?

* Apply rules: Use them. Tell us what you think.

* Cluster: It's still work in progress, but keep an eye on that, and try 
the current state of the art.




-- 
DI (FH) Michael Friedrich

michael.friedrich at gmail.com  || icinga open source monitoring
https://twitter.com/dnsmichi || lead core developer
dnsmichi at jabber.ccc.de       || https://www.icinga.org/team
irc.freenode.net/icinga      || dnsmichi


More information about the icinga-devel mailing list