Events centralization: the normalization problem

In a previous article, I talked about SIEM. SIEM is not for small organizations. But, if you really need to analyze logs, the first step is to concentrate them in one central place.

The syslog standard is available on almost all devices having IP connectivity (routers, switches, servers, appliances) and is very easy to deploy. Just configure all your devices to send their messages to the same host (ex: logger.yourdomain.com) and use a syslog concentrator. syslog-ng is one of the well-known solution available. With a simple configuration directive, you can split the received messages and save them on files based on the hostname:

destination host_log {
    file( "/var/log/syslog-ng/$HOST.log"
};

You can also pass the message to an external script and insert rows in a MySQL DB:

destination mysql_log {
    pipe("/var/log/mysql.pipe");
    template("INSERT INTO syslog_incoming (facility, priority, \
                 date, time, h
ost, message) VALUES ( '$FACILITY', '$PRIORITY', \
                 '$YEAR-$MONTH-$DAY', '$HOUR:$MI
N:$SEC', '$HOST', '$MSG' );\n") template-escape(yes));
};

/var/log/mysql.pipe is a named-pipe read by a mysql client:

mysql -u syslog --password secretpwd syslog 

Once you started to send logs from many different sources, problems will occur. Look at the following syslog messages:

Jul  5 14:52:52 bigbrother vmpsd: [ID 102129 local6.info] ALLOW: \
      00809f535052 -> EXISTING_LAN, switch 10.10.20.21 port 2/26
Jul  5 13:51:31 2007 [10.40.30.10] authmgr[405]:  station \
      up <00:12:f0:2b:b0:7b> bssid=00:0b:86:ac:e0:62
Jun 28 10:50:04 bfs-a 138: Jun 28 08:50:03: %SYS-5-CONFIG_I: \
      Configured from console by admin onvty3 (10.20.0.41)
Jul  5 14:22:09 gw vmpsd: ALLOW: 000ffe08d8b7 -> WKSMARKETING, \
      switch 10.10.20.13 port Fa0/19

Check the line formats:

  • line 1: vmpsd running on a Solaris 10 box
  • line 2: Aruba Wireless appliance
  • line 3: Cisco Catalyst running CatOS
  • line 4: vmpsd running on a CentOS

You can see that in a second step (the first one is called "Collection"), we need to perform a very important operation called "Normalization". It very easy to understand why this tep is so important: If we assume from line1 that $4 is the host name, on line 2, $4 will be filled with the year! (*) To be sure to not miss critical events, we must parse them, extract all required information then store them in a DB. Important fields to take care of are:

  • Timestamps
  • Source or Destination IP addresses
  • Source or Destionation ports
  • Credentials (user names, realms, domains)
  • Application/Server names
  • Raw messages

(This list is non exhaustive!)
Once extracted, data can be injected in the DB and indexed. A copy of the raw message will also be stored. Raw devices will be important in forensics search or in case of official requests from authorities (only untouched devices logs can be used as a proof).

(*) Check the following topic for more info about syslog message processing: Adaptive syslog message parsing.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.