If there is a gold principle in IT, that’s the one called “KISS“: “Keep It Simple and Stupid“. It says that systems will work best if they are kept simple rather than complex. Simplicity must be a key goal during the design phase. This sounds logical: Keep in mind that information systems must be maintained, patched, debugged, monitoring. When a problem will occur and that your boss will put some pressure on you to solve it asap, it will be much easier if the architecture is simple (and documented!). But I admit that it’s not always easy to stick to this principle. Here is a good example with logs…
Today, one of my projects was to integrate DNS logs with Splunk. On paper, it was easy:
- Splunk running ✔
- Bind running with queries logged ✔
- Splunk for Bind app installed ✔
Let’s see the bashboard and… nothing! The classic message displayed by Splunk: “Waiting for data“. Hmmm… Let’s take a deep breath and jump into the app code and configuration files. After a few checks, a first problem was identified: the app was designed to process Bind logs in a Syslog format but my Bind queries were logged to a “queries.log” flat file… And the formats are not the same (oh joy!):
Feb 5 21:42:22 shiva named[27229]: client 192.168.254.206#6993: \ query: safebrowsing.google.com IN A + (192.168.254.8) 05-Feb-2014 21:42:22.868 queries: info: client 192.168.254.206#6993: \ query: safebrowsing.google.com IN A + (192.168.254.8)
The file “queries.log” being read by an OSSEC server for HIDS purposes, it was not possible to replace the file by a unique Syslog feed. I reconfigured Bind to generate two differents entries for each queries: one sent to the flat file and one sent via Syslog.
logging { channel local_syslog { syslog local0; severity info; }; channel queries_log { file "/var/log/named/queries.log" versions 10 size 10m; severity info; print-category yes; print-severity yes; print-time yes; }; channel stat_file { file "/var/log/named/stats.log" versions 3 size 1k; }; category queries { queries_log; local_syslog; }; };
Bind & Rsyslog reconfigured, I had two destinations for my queries (and two times the size stored on disk!). But it was not yet working. Jumping again to the configuration files (more precisely the “transform.conf“), I found a second problem related to the regex used to parse queries. As seen in the Syslog event above, a date, a time and the view details were not available but expected in the regex:
[named_query] REGEX=\s(named)\[\d+\]\:\s(\S+)\s(\d+\:\d+\:\d+\.\d+)\sclient\s(\d+\.\d+\.\d+\.\d+)\#\d+\:\sview\s(\S+)\s(query)\:\s(\S+)\s(\S+\s\S+)\s(\S+) FORMAT=process::$1 named_query_date::$2 named_query_time::$3 src_ip::$4 named_view::$5 named_message_type::$6 named_lookup::$7 named_query_type::$8 named_query_code::$9
I fixed the regex to match my Syslog and it worked but I still had the same events sent twice! Getting rid of the flat file was not possible (used by OSSEC) so I rewrite again my transform.conf file and adapted the Bind app to parse the same file as OSSEC.
Conclusions: Managing logs is not an easy tasks and may require time, even more if your infrastructure changes often. There are multiple ways to collect events. The most common ways are flat files or Syslog (UNIX) or EventViewer (Windows) and, sometimes, databases. Keep your logging configuration clean and simple. The goal is:
- Avoid redundant storage
- Avoid lost events (worse case!)
- Have a clear view of the events flows to debug any issue (for sure, you will have some!)
Also be sure to cover all event types by testing your tools with exhaustive sets of data. Document how events are processed (from the source to the log management solution). My Splunk environment has 17 data sources configured from multiple types (files, UDP, TCP & scripts), not easy to remember how those data are processed!
RT @xme: [/dev/random] KISS… Your Logs Too! http://t.co/SAGTkFxaEy
RT @xme: [/dev/random] KISS… Your Logs Too! http://t.co/SAGTkFxaEy
RT @xme: [/dev/random] KISS… Your Logs Too! http://t.co/SAGTkFxaEy