I would like to tell you about the situation I experienced this afternoon. The goal of a log management solution is to collect and store events from several devices and applications in a central and safe place. By using search and reporting tools, useful information can be extracted from those events to investigate incidents or suspicious behaviors. During a live implementation, I started to collect Syslog messages from a bunch of Cisco switches and routers. While looking if the events were correctly normalized and processed, I discovered lot of “traceback” messages like the following one:
-Process= "xxx", level= 0, pid= 172 -Traceback= 1A32 1FB4 5478 B172 1054 1860 ...
For the Cisco administrators amongst you, this means a problem: The device generated useful debug information. When reporting the problem to Cisco, this information is always useful. Of course. I’m not a Cisco admin but it looked suspicious to me and I reported this information to the local network admin:
Me: “FYI, I detected a suspicious behavior on the router xxx, there are regular tracebacks generated by IOS”
Admin: (He checks) “Hmmm… My device is working as expected.”
Me: “Maybe but there is an issue on this device!”
Admin: “Could you implement a filter on the log management platform to get rid of those events? They are not important for me.”
Me: “Technically, I could. But they’re generated for a good reason! You should investigate…“
Some minutes later…
Admin: “Ok, I reduced the log level. You shouldn’t see them anymore.“
Indeed, no more traceback events were collected by the log management platform. I suppose he applied the following configuration:
Router# conf t Router(config)# no service log backtrace Router(config)# end
From a technical point of view, this guy was right: it’s always possible to filter some “unwanted” event and prevent them to be processed then indexed. However, how to define an “unwanted” event? The admin was wrong while reducing the log level. Again the goal of a log management solution is to distinguish critical or suspicious events from the continuous flow of events generated by your infrastructure. If you implement too strict filters, they are risks of missing interesting events.
Don’t be an ostrich! A log management without alerts or a dashboard always full of green indicators are not reliable! They give a false sense of security. Instead of getting rid of them by implementing filters, search for the root causes!
Good post. One of the problems I actually see in log management deployments is that people never think to check the audit policy on the devices to make sure the appropriate stuff is getting logged. Then they have an incident and find out that there’s nothing there to help them.