A well-know Bruce Schneier’s citation is “Security is a process, not a product“. Monitoring your infrastructure is fully part of your security policy. You don’t have only to deploy security blocks (applications, servers, appliances, …) to build your security perimeter(s), you also need to take care of them via monitoring tools. Monitoring is also a process. Here is a “nice” example (even if nice is not the right word here).
A Belgian incinerator not far from Charleroi has operated without a filter for 20 (!) hours on 11 June, spitting ash, dioxins and furans in the air. The residents close to the site were only notified last Tuesday.
The communication to the press was “After a power outage, an oven operated without a filter and the incident was not detected. There are visual alarms on the monitors but the staff did not detect the problem“. [Original article in French here – Google translation here]
First of all, this incident could have a negative impact on the incinerator neighborhood. They may have inhaled toxic substances. But we see here a goodbad example of monitoring process which failed!
Such story may also append in IT and have negative impacts on human health (maybe less), financial markets or service providers (Internet, electricity, etc). Spending huge amount of resources (financial as human) in monitoring tools without defining strong processes to process alarms will result in a flop of the project.
Can we blame the operators who did not see the alarms in the above story? It’s not possible to answer this question without an analyze of the whole process. Let’s imagine that the monitoring tool was too sensible and generated alarms five times a day for three months? There are (mal)chances that the operators won’t take alarms into account. They reaction will logically be “Don’t care, this f*cking filter is still in a stuck state!” Until this time…
That’s why a monitoring project must follow a cycle similar to the risk management one:
There are recurrent steps to perform:
1. Inventory of assets to monitor
2. Select best monitoring methodology/tools
3. Implement basic checks
4. Perform fine-tuning
5. Write procedures (incident management)
6. Train the operators
7. Deploy the solution
8. Listen to operators feedback
9. Perform fine-tuning
10. Goto 1.
By following those steps, you will reduce the number of false-positive alerts, you will increase the interaction with operators (and give them the feeling of being involved). But your IT environment changes all the time, new servers are installed, others are decommissioned, server are overloaded during specific events. All those scenarios must be taken into account to build an effective solution. That’s why, yes, monitoring is also a process!