Tag Archives: Log Management

Howto: Distributed Splunk Architecture

Distributed ArchitectureImplementing a good log management solution is not an easy task! If your organisation decides (should I add “finally“?) to deploy “tools” to manage your huge amount of logs, it’s a very good step forward but it must be properly addressed. Devices and applications have plenty of ways to generate logs. They could send SNMP traps, Syslog messages, write in a flat file, write in a SQL database or even send smoke signals (thanks to our best friends the developers). It’s definitively not an out-of-the-box solution that must be deployed. Please, do NOT trust $VENDORS who argue that their killing-top-notch-solution will be installed in a few days and collect everything for you! Before trying to extract the gold of your logs, you must correctly collect events. This mean first of all: do not loose some of them. It’s a good opportunity to remind the Murphy’s laws here: The lost event will always be the one which contained the most critical piece of information! In most cases, a log management solution will be installed on top of an existing architecture. This involves several constraints:

  • From a security point of view, firewalls will for sure block flows used by the tools. Their policy must be adapted. The same applies to the applications or devices.
  • From a performance point of view, the tools can’t have a negative impact on the “business” traffic.
  • From a compliance point of view, the events must be properly handled in respect of the confidentiality, integrity and availability (you know the well-know CIA principle).
  • From a human point of view (maybe the most important), you will have to fight with other teams and ask them to change the way they work. Be social! ;-)

To achieve those requirements, or at least trying to reach them, your tools must be deployed in a distributed architecture. By “distributed“, I mean using multiple software componants desployed in multiple places in your infrastructure. The primary reason for this is to collect the events as close as possible to their original source. If you do this, you will be able to respect the CIA principle and:

  • To control the resources usage to process them and centralise them
  • To get rid of proprietary or open multiple protocols
  • To control the good processing of them from A to Z.

For those who are regular readers of my blog, you know that I’m a big fan of OSSEC. This solution implements a distributed architecture with agents installed on multiple collection points to grab and centralise the logs:

OSSEC SchemaOSSEC is great but lack of a good web interface to search for events and generate reports. Lot of people interconnect their OSSEC server with a Splunk instance. There is a very good integration of both products using a dedicated Splunk app. Usually, Splunk is deployed on the OSSEC server itself. The classic way to let Splunk collect OSSEC events is to configure a new Syslog destination for alerts like this (in your ossec.conf file):

<syslog_output>
<server>10.10.10.10</server>
<port>10001</port>
</syslog_output>

This configuration blog will send alerts (only!) to Splunk via Syslog messages sent to 10.10.10.10:10001 (where Splunk will listen for them). Note that the latest OSSEC version (2.7) can write native Splunk events over UDP. Personally, I don’t like this way of forwarding events because UDP remains unreliable and only OSSEC alerts are forwarded. I prefer to process the OSSEC files using the file monitor feature of Splunk:

[monitor:///data/ossec/logs]
whitelist=\.log$

But what if you have multiple OSSEC server across multiple locations? Splunk has also a solution for this called the “Universal Forwarder“. Basically, this is a light Splunk instance which is installed without any console. This goal is just to collect events in the native format and forward them to a central Splunk instance (the “Indexer“):

Splunk Schema

If you have experience with ArcSight products, you can compare the Splunk Indexer with the ArcSight Logger and the Universal Forwarder with the SmartConnector. The configuration is pretty straight forward. Let’s assume that you already have a Splunk server running. In your $SPLUNK_HOME/etc/system/local/inputs.conf, create a new input:

[splunktcp-ssl:10002]
 disabled = false
 sourcetype = tcp-10002
 queue = indexQueue

[SSL]
 password = xxxxxxxx
 rootCA = $SPLUNK_HOME/etc/auth/cacert.pem
 serverCert = $SPLUNK_HOME/etc/auth/server.pem

Restart Splunk and it will now bind to port 10002 and wait for incoming traffic. Note that you can use the provided certificate or use your own. It’s of course recommended to encrypt the traffic over SSL! Now install an Universal Forwarder. Like the regular Splunk, packages are available for most modern OS. Let’s play with Ubuntu:

# dpkg -i splunkforwarder-5.0.1-143156-linux-2.6-intel.deb

Configuration can be achieved via the command line but it’s very easy to do it directly by editing the *.conf files. Configure your Indexer in the $SPLUNK_HOME/etc/system/local/outputs.conf:

[tcpout]
 defaultGroup = splunkssl

[tcpout:splunkssl]
 server = splunk.index.tld:10003
 sslVerifyServerCert = false
 sslCertPath = $SPLUNK_HOME/etc/auth/server.pem
 sslPassword = xxxxxxxx
 sslRootCAPath = $SPLUNK_HOME/etc/auth/cacert.pem

The Universal Forwarder inputs.conf file is a normal one. Just define all your sources there and start the process. It will start forwarding all the collected events to the forwarder. This is a quick example which demonstrate how to improve your log collection process. The Universal Forwarder will take care of the collected events and send them safely to your central Splunk instance (compressed, encrypted) and will queue them in case of outage.

A final note, don’t ask me to compare Splunk, OSSEC or ArcSight. I’m not promoting a tool. I just gave you an example of how to deploy a tool, whatever your choice is ;-)

The value of HTTP 404 Errors

404 ErrorThe HTTP protocol has a list of response status codes to help communication between the server and the browser. Everytime a server responds to a browser request, a status code is sent. The most common ones are: “200” which means “Everything is ok, here is some food!” and “404” which means “Not found“. The second error may be caused by the client (example: an error in the URL typed in the browser) or by the developer/administrator who forgot to copy files or also made typo errors in his code. That’s why the amount of 404 errors is directly related to the type of environment. During development and test phases, it’s common to have more errors. On the other side, in a production environment, the amount of 404 errors should be limited and the main source of errors will be the client/browser.

Sometimes, “404” errors are considered useless by webmasters and are simply ignored in their reports. After all, their goal is to know how many visitors browsed to their websites. From a security perspective, those errors could be very helpful to detect unusual traffic targeting a web sites.

I analyzed one year of my blog logs (yes, I’ve a long retention policy!). Some facts to start:

  • Total hits: 9.534.062
  • 404 errors: 343.606 (3.6%)

As you can see on the graph below, the 404 error code comes in the fifth position after the classic 200 and 3xx codes.

HTTP Responses

(Click to enlarge)

As I’m trying to keep the blog clean, this huge amount of “not found” errors looked strange to me. I decided to generate more statistics. What can we deduct? For a while, the big winner is the TimThumb vulnerability discovered in Augustus 2011. The exploit was released the 3rd of Augustus and the first attempt hit me on the 4th! Still today, I received plenty of probes (see this month):

Timthumb Requests

(Click to enlarge)

The TimThumb scans are coming from three main sources as see on the Google map below (the live map is available here).

Timthumb Google Map

(Click to enlarge)

 Another trend this month: more and more .rar archive files are tested. Especially this month. Why? I’ve absolutely no idea! If you have ideas, feel free to post your comments!

.rar File Requests

(Click to enlarge)

The top-10 of requested .rar files is:

  • /mirserver.rar
  • /web.rar
  • /www.rar
  • /mirserver1.rar
  • /wwwroot.rar
  • /youxi.rar
  • /mh.rar
  • /manhua.rar
  • /mirserver2.rar
  • /mirserver3.rar

Some of them look like performed by scanners which are looking for websites backups. But I did not see the same amount of requests for .tar.gz or .zip files! (Except for “www.zip“) I also saw request for files based on numbers: 5555.rar, 8888.rar, 444.rar, etc. Based on Google, those file are massively infected with malwares but why look for them on my server?

Finally, scanners are looking for .asp (Microsoft .Net) pages. Especially for the last two months:

.asp File Requests

(Click to enlarge)

The top-10 of requested .asp pages is:

  • /save.asp
  • /plug/save.asp
  • /gmsave.asp
  • /diy.asp
  • /shell.asp
  • /dama.asp
  • /upfile_flash.asp
  • /FCKeditor/editor/filemanager/connectors/asp/connector.asp
  • /xiaoma.asp
  • /up_BookPicPro.asp

And what about common tools or web interfaces? The top-10 is:

  • /setup.php
  • /scripts/setup.php
  • /admin
  • /login.php
  • /phpmyadmin/
  • /myadmin/
  • /mysql/
  • /db/
  • /administrator/
  • /db/

As you can see, there is plenty of useful information in your Apache (or any other webserver) log files! Keep an eye on your 404 errors to discover new trends! A temporary peak of 404 errors could mean that your server is under an attack…

Use the Ports, Luke!

Luke SkywalkerLast week, I went to London to attend the RSA Conference Europe (my wrap up is here). One of the sessions I followed was presented by Eric Vyncke about “forensics in a post IPv4 exhaustion“. You should live on another planet if you’re not aware of the coming IPv4 exhaustion. Today, the big challenge for Internet Service Provider is to handle IPv6 deployments in parallel to the implementation of new ways to limit the lack of IPv4 addresses. Once of these techniques is called “Carrier Grade NAT“.  As Eric mentioned in his talk, ISP will start (some already started – mainly mobile operators) to share IPv4 addresses between customers. This will have a huge impact on forensics investigations!

Why? When you are looking for evidences (example: to track a security incident), your first reflex is to? Have a look at your logs! Good! Let’s have a look at a regular Apache access log entry:

 

12.34.56.78 - - [17/Oct/2011:18:52:22 +0200] \
"GET /favicon.ico HTTP/1.1" 200 1173 "-" \
"Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Ubuntu/10.04 (lucid) Firefox/3.6.23"

The IP address ’12.34.56.78′ is the machine which made an HTTP request on your server. A simple whois lookup will give you the ISP or company which “owns” this IP address with usually a way to contact to report an abuse. Example:

NetRange:       12.0.0.0 - 12.255.255.255
CIDR:           12.0.0.0/8
OriginAS:       
NetName:        ATT
NetHandle:      NET-12-0-0-0-1
Parent:         
NetType:        Direct Allocation
Comment:        For abuse issues contact abuse@att.net
RegDate:        1983-08-23
Updated:        2010-11-18
Ref:            http://whois.arin.net/rest/net/NET-12-0-0-0-1

OrgName:        AT&T Services, Inc.
OrgId:          ATTW-Z
Address:        200 S. Laurel AVE.
City:           MIDDLETOWN
StateProv:      NJ
PostalCode:     07748
Country:        US
RegDate:        2009-12-18
Updated:        2010-08-30
Comment:        Contact AT&T Abuse ( abuse@att.net ) for policy abuse issues.
Comment:        All policy abuse issues sent to other POCs will be disregarded.

Now, if this ISP uses Carrier Grade Nat, it cannot point to the right customer immediately! One thousand (or more) customers are potential hackers. To uniquely identify the bad guy, the following critical information is mandatory:

  • The ISP must keep a track of all the IP flows generated by its customers INCLUDING THE SOURCE PORT!
  • You have to provide the IP address INCLUDING THE SOURCE PORT!

Based on those points, correlation is possible to find the attacker. From the ISP perspective, this could have huge impacts on the tools it uses to store the flows. From your perspective, if you have access to the firewall logs in from of your application, just have a look at the logs. If your source of information is only your application log, it’s time to upgrade it!

Start logging the source port NOW! Here is an example (source: Eric Vyncke) with Apache logs. Replace the default format:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

With this one:

LogFormat "%h:%{remote}p %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

Of course, if you change the format of your log, you have to replicate this change to your log management tools! My Apache servers are monitored by OSSEC, so my new Apache decoder is now:

<decoder name="web-accesslog">
  <type>web-log</type>
  <prematch>^\d+.\d+.\d+.\d+:\d+ </prematch>
  <regex>^(\d+.\d+.\d+.\d+) :( \d+) \S+ \S+ [\S+ \S\d+] </regex>
  <regex>"\w+ (\S+) HTTP\S+ (\d+) </regex>
  <order>srcip, srcport, url, id</order>
</decoder>

This is an excellent example which proves that log management is complex and must be deployed with the help of strong procedures. I quickly made a round of classic open source applications almost none is able to log the source port! To resume:

  • Start to log the source port now!
  • Change your application log format.
  • Adapt your log management solution
  • Train your developers to write correct information into log files
  • And (last but not least), implement IPv6!

 

 

From Logs to Hell!

Vade-RetroI have hesitated a while before choosing the right image to illustrate this article. I read again a press-release about a new log-management product which pretends to provide “out-of-the-box security and compliance for business of all sizes“. Dear v€ndor, are you living in a care bears world or are you possessed?

Yes, your solution has maybe nice reports ready to be produced just by clicking a “Generate now” button. You allow searching across millions of events in seconds? Why not… But the power and reliability of a log management solution directly depend on how you feed it with events! Lack of events will produce poorly reports… then poor ROI!

Before deploying your top-notch log management solution, there are so many issues that could occur:

  • Unreachable devices – They could be located on remote sites with limited bandwidth. Some firewalls might prevent monitoring protocols (which are often and hopefully blocked). Are they using private IP addresses with NAT rules?
  • Supported format – Your devices might generate events in a unsupported format.
  • Performance impacts on the network flows
  • (De)commissioning of (old)devices
  • Overlapping in IP subnets
  • Procedures / follow-up

Still today, most log management solutions are deployed urgently to face a need of compliance. For me, v€ndor$ pretending to provide “out-of-the-box” log management services could be blamed of “false advertising“! Implementing a log management solution is not a road without pitfalls. It could quickly drive you to hell…