Throughout the day a SOC team uses dozens of tools to complete tasks in a few minutes that would normally take much longer. Tools improve productivity and efficiency. But with the immensity of tasks to be performed in a timely fashion, it is sometimes difficult to figure out what tool to use and how it fits into the continuous cycle of monitoring and incident response.

Tools as differentiators

A tool that can automate a manual task can save hundreds of hours a year, allowing us to focus on non-trivial tasks. A tool that can monitor and alert takes defenders from staring at a screen waiting for something to happen to allowing us to look at ways to refine our sensors and detection methodologies. Tools are the backbone of cyber security operations.

One thing I often get asked is what tool would I recommend for solving a specific problem regarding monitoring and incident response. A great example would be: “What tool should I use to collect network traffic?” I find this type of question a good opening toward finding the best tool. It shows that a problem area has been identified as either a gap, where a tool can fill a need, or as an inefficiency or ineffective use of an existing tool. From this, our conversation can then explore the larger issues the organization faces, and sometimes, it turns out that they don’t need a tool for network traffic but a better tool for PCAP carving or tap management.

However, what I find in many cases is that the “big picture” thinking of cyber operations isn’t addressed in the decision-making process for acquiring tools. I can usually tell this is the case when someone asks me my opinion about a specific tool. Such a conversation soon reveals that they haven’t thought through what other processes could be affected by acquiring and using the tool. What seemed like an obvious choice at-first becomes, after our exchange, a not so clear-cut decision.

I’ve spent time reviewing the general tasks performed on a regular basis in active threat-based operations believing that if I could provide an outline to anyone asking a question about tools, it might allow him or her to quickly identify which task or tasks were associated with their problem and how any changes to them might alter other tasks. This would allow them to make a more informed decision about the tools they were looking to implement.

To help facilitate such an impact assessment, but more importantly to provide a “big picture” view of incident response tasks for which tools could be a differentiator, I created a ring diagram that generalizes the tasks and groups them by the M&R life cycle. See Figure 1: Task Analysis for Cyber Threats (TACT) diagram.

Figure 1: Task Analysis for Cyber Threats (TACT) diagram
Figure 1: Task Analysis for Cyber Threats (TACT) diagram

**Differentiate the tasks to find the tools **

The TACT Diagram has four sections that represent monitoring and response activities. Each group of tasks are usually performed for that given phase of M&R (i.e., Pre-incident, Incident Response, Digital Forensics, Post-Mortem).; however, the tasks are loosely ordered and can be performed out-of-order, in serial, in parallel, in conjunction with other tasks, including task from other phases. The order is entirely dependent on the evolving situations we, as cyber defenders, find ourselves in.

Each task in the diagram is a point within which simplicity or complexity of execution is indeterminate: sometimes there are no tools to assist in performing the task, and other times, one or more tools are necessary to complete the work. The TACT diagram can help you address the gamut of tasking your cyber operations undertakes, so that you can then determine what tools you currently have, and find gaps in coverage, but also to identify efficiency problems as you interrelate tasks around the ring.

**Assessing the tools **

When it comes to finding the right tool for a task, approach it like you have a job opening and the tool is applying to fill the position. Just like employment, a tool opening comes about because:

  • A tool has left the job (e.g., no longer supported or patched) or was let go for performance reasons.
  • You’ve identified a gap in tool coverage or output and need to find one or more tools to fill that gap.

Before you start “interviewing” for tools, determine your needs by listing the tools you currently leverage for each task, making note of any inefficiencies. From my experience, inefficiencies usually arise as tools overlap in functionality and tasks. For example, if you have several tools that help with building an incident timeline, how does your SOC team determine which one to use? Valuable time can be lost in this type of decision-making, or worse, if your analysts have to spend time populating each tool with incident data, thus slowing down the overall resolution time of your incident.

If you’ve identified any tool inefficiencies, try to resolve them by:

  • Trimming your toolset to a single tool for that specific task if it makes sense;
  • Allowing analysts to use a tool they are most comfortable with to complete a task if it makes them more efficient and productive;
  • Developing glue-code so analysts will only have to interact with one tool that can automatically push data to other tools and/or tasks;
  • Identifying tool priority so that the most important tools are used first to produce results, then when the situation is less pressing, use the secondary tools on the same data set.

Next, consider the complexity of a task. For these, it may be necessary to have several tools. To determine if that may be the case, break down the task into sub-tasks to expose the underlying needs. For example, the task “Collection of Raw Data” may require tools for collecting border traffic, network node logs, suspect emails, rather than a single tool. If this is the case, then it becomes critical that you then list interdependencies that might require glue code or that have other data requirements among them.

This methodology review of tasks in your SOC’s M&R should produce a nice list of needs with which to begin the tool “interviews”: requirements, interfaces, expected outcome and performance, etc. Armed with your tasks and needs, reviewing any externally developed tool, whether it’s COTS or freeware, will help you determine if the right tool is available or needs to be built in-house. Also, just as you would when interviewing someone for a job, contact other organizations for references about their experiences using the tool, which will give you a better idea about if it has performed as expected. And be sure the analysts who will be using the tool are a part of its acquisition or creation. Many tools fail because the people who are suppose to use them can’t or won’t.

**An example: the “answer” is tool replacement **

Let’s take a look at an example using the “Isolate and Decode Network Traffic” Task. Several years ago some fellow CND-ers and I concluded that our solution (i.e., one-off scripts) for decoding network traffic had become too unwieldy. Code was being duplicated, re-implemented, and scattered across host systems. We identified this Task as an inefficient cog in our overall forensics processes. For a solution, we first determined our big-picture requirements: (1) allow us to develop a decoder in a centralized location for which everyone on the SOC team had access; (2) our process of developing a decoder needed to be quick, simple, with commonly duplicated code abstracted; (3) not cause us to use a different development language because all of our existing decoders had been written in Python; (4) allow us to create modular decoders to allow “plug and play;” and (5) enable us to develop shareable modules. Armed with these requirements, we “interviewed” a number of closed and open source tools but found none that fit well with our requirements. For our optimal solution we concluded we had to develop it in-house: the result was that our developers and analysts created what became Chopshop.

Conclusion

Investing time to assess your existing workflows, procedures, and tools using the TACT diagram as a beginning outline might help you identify what gaps, inefficiencies in workflows and procedures, and ways to streamline your responsiveness to an Incident.

As I discuss Task Analysis with other groups I may refine this diagram to reflect the changes in the Cyber-landscape. If you have any input, I’d love to hear your feedback!

Recently I’ve been asked several times about how CRITs handles authentication. The questions were less about the choices between basic, ldap, and remote authentication and more about invalid login attempts, logging, and this weird countdown timer they see when failing to authenticate properly.

There’s a few things to understand before the bigger picture of CRITs authentication makes sense. We have three authentication sources:

  • Web Interface
  • Command-line
  • API

The web interface is fairly self-explanatory. From the command-line we developed a Django management command called runscript. This requires you to authenticate in order to run any scripts that leverage CRITs code. The API uses API Keys that allow users to generate a unique key for each of their API needs making revoking keys less painful.

For web and command-line authentication, the CRITs admin has the choice of forcing TOTP. This will require that the user also provide their pin and token along with their username and password. Users can opt into using TOTP even if the CRITs admin hasn’t forced people into using it. The CRITs admin can set the number of invalid login attempts before a user’s account is disabled (locked out). We log every authentication attempt to a user’s profile so they can see the last 50 attempts (successful or not) and what was involved (web auth, runscript, some basic environment info, etc.).

When discussing the pros and cons of authentication two things came up:

  1. Brute-force lockout of accounts.
  2. Window of attack for TOTP.

When thinking about these issues, we decided we wanted to focus on increasing the duration of time it would take an attacker to brute-force lockout an account and limit the window of attack for TOTP.

To solve this we came up with the idea of a 10 second “blackout” period after a failed login attempt. During this time period any authentication attempt, using valid credentials or not, is dropped on the floor. We log the attempt but we don’t attempt to authenticate. Once the 10 second window is up, the next authentication attempt will be paid attention to. After a successful login you can immediately login again without a window to wait for. This was important to include so people running batch jobs using runscript don’t have to wait 10 seconds before their process can continue.

The 10 second “blackout” period is a minor inconvenience to the end-user. From a brute-force lockout perspective, it increases the amount of time an attacker needs to spend per-user. The length of time is (10s * Invalid Login Attempts) where the CRITs admin can set how many invalid login attempts are acceptable before an account is disabled. Balancing this is up to the admin as increasing the time a brute-force lockout takes increases the number of authentication attempts an attacker can try before the account is no longer viable.

From a TOTP perspective, the window of attack becomes severely limited. In the case where an admin sets Invalid Login Attempts to 3, a user’s account will be locked out after 20 seconds of invalid authentication attempts no matter how many attempts they try during that window. Since we drop all but 3 of those on the floor, even if an attacker tries to brute force the pin + token, only 3 of them are actually used (0 seconds, 10 seconds, and 20 seconds). Now, instead of having 20 seconds of iterating over potential pin + token combinations, they only get 3 chances to get it right. We felt this limited an attacker’s ability to leverage the window of attack sufficiently.

To reflect this in the interface, we added the countdown timer and disable the login button so it is clear to the end-user what is going on. It doesn’t prevent people from crafting requests and submitting them manually so the backend code doesn’t assume the timer prevents people from submitting attempts more than once every 10 seconds.

There are many ways this can be enhanced. For example, instead of having the 10 second blackout period be static, you could change it so every attempt within that period resets it. This way for anyone failing more than once in the blackout period, they will always be blacked out and none of their authentication attempts will be used. Another option would be extending the blackout window to be more than 10 seconds whether it be permanently increased to another value (like 15 seconds) or that every failed attempt compounds on the original 10 seconds.

Today, my employer MITRE has announced the open sourcing of my project CRITs: Collaborative Research Into Threats.

The project has been in development for the last 3.5 years and available for the last 2.5 under a limited-release license. I am amazingly proud at how far the project has come. With 100’s of organizations across government, private, and public sectors using the platform already, we’ve seen a drastic improvement in how organizations are able to research threats, share threat data, and implement protective measures across their networks. We decided to open source CRITs and give back to the security community because it is important to have freely available tools in an industry where every organization, big or small, needs to do all they can to protect themselves against threats. Being able to communicate and share threats between organizations large and small, government and public, opens up the industry to a more collaborative effort towards intelligence-based active threat defense. I wanted to give some background about how CRITs came to be. I’ve been there since the beginning as the lead developer and I continue to stay in that role in my personal time while taking on the role of Project Manager.

Towards the end of 2010 I was asked to tackle a code-rewrite of a small project at work. The project was a malware database written in Python using Django and MySQL. It was growing long in the tooth and was in dire need of a more scalable design. Around this same time I was becoming frustrated with what I will call the Data Rediscovery Problem.

Most of us while we work tend to put data where it is most convenient at the time. If we’re working on a malicious binary we’ll drop it onto a host with the tools necessary to handle it. If we’re writing quick scripts we’ll develop them on whichever box winds up being the first terminal we find available to us. This is all to save time and effort to get the job done quickly and easily. Even worse, what do we name those scripts? Usually things like foo.py, foo2.py, etc. What happens in a few weeks when we find ourselves in need of that data or those scripts again? We have to remember where we put them. A lot of times we’ll eventually rediscover them. Other times we’ll forget and wind up re-generating the data or re-writing the script. Even worse is when someone else is duplicating the work we did because our stuff wasn’t readily available. This is a huge waste of time and resources. It also makes it frustratingly difficult for new hires and teammates to find data they need to do their job.

While I was rewriting the malware database, I thought it would be neat if it could be extended to solve the Data Rediscovery issue. I wanted to provide a way for people to quickly upload content and make it as efficient as possible to save time and be a reference point for all of their data. That would solve half of the rediscovery issue, which is a good start.

The first thing I did was extend past storing Samples (malicious binaries) and allow for the storage of PCAPs. Network traffic is such a huge help when trying to dig towards “ground truth”. I always had problems finding the network traffic I needed so adding PCAP support was an easy decision. I was approached by an analyst and asked if I could extend the project to support Emails. Seemed like a reasonable request so I added Email support. Tracking Domains and IPs seemed like the next logical step given their presence in binaries, PCAPs, and emails, so those were added.

Then I was asked about supporting IoCs (Indicators of Compromise). This was sort of a game-changer. IoCs are amazingly important for an organization looking to defend its network. They need to be handled carefully. Even though they might be simple in concept, the response to them can sometimes be extremely complex. We spent some time thinking about what handling IoCs in the project meant but eventually settled in on a design and got it implemented.

At this time it became obvious we needed a name for this project. It started out being a critical tool for our Malware Analysts but was now being used heavily by our Intel Analysts as well as our Incident Response team. They all began collaborating in a way that they were never able to before. Sharing data was as simple as uploading it to the same place everyone else was. This made analysis and response a much more efficient and effective process. It moved the groups to a more intelligence-based active threat defense. With all of us researching these threats, I quickly settled on the name Collaborative Research Into Threats.

Over the last three years, the project has grown support for Campaigns, Certificates, Events, Raw Data, and Targets (with more in the works!). Through word-of-mouth hundreds of organizations asked to get their hands on CRITs so they can make use of it to improve their security posture. Many developers have jumped on board and contributed hundreds of features for managing and working with threat data. One of the biggest additions was the Services Framework. This allowed people to develop extensions to the project and enhance functionality without the need to modify core code. To date we have 25 services that have been contributed back to the project giving Intel and Malware analysts a way to work with threat data and not have to download it, integrate CRITs with third-party and homegrown software, as well as visualize and discover threat data in ways that were previously unavailable. This solved the second half of the Data Rediscovery Issue. Now someone can write a tool once, create a service out of it, and make it available for all of their users and the entire community if they wish. The ability to work with threat data without always needing to download it to another machine has proven time and time again to be invaluable.

Another major enhancement to the project was the inclusion of support for Structured Data Exchange Formats (CybOX, STIX, and TAXII). Being able to collaborate with groups internally is a huge win. But being able to collaborate with other trusted organizations in real time changes the security industry completely. Where organizations used to keep this information to themselves, they are now able to use these exchange formats to communicate threat data and position other organizations to protect their network well in advance of the threat making its way to them. By incorporating these formats into CRITs we’ve given all of the users of the platform the capability to share threat data quickly when time matters the most.

Like any other tool in the industry, CRITs isn’t the end-all and be-all solution to defending your network. It is simply an enabler. It gives analysts a better way of storing, enriching, and discovering threat data necessary to protect their organization. It doesn’t replace analysts. If anything it speaks to the necessity for all organizations to have smart, capable analysts who can leverage a platform like CRITs to put good data in, and get good data out. Too long has our industry worked in isolation to solve the same problem. It’s time we all work together and change our mindsets to a more open, intelligence-based active threat defense.

People are hard at work today patching OpenSSL due to the Heartbleed bug (CVE-2014-0160).

There’s tons of information pouring out as package repositories are rapidly updating to the latest OpenSSL 1.0.1g release that came out yesterday. If you want to see if your server is vulnerable, you can run this:

openssl version -a

If you are anywhere in the 1.0.1 to 1.0.1f (inclusive) range or have a compile time earlier than yesterday, you should look to upgrade. For Ubuntu servers, you can find information on how to upgrade here, or here if you are running Lucid (10.04). There’s also a useful python script that will allow you to test your sites for being vulnerable. Do not use that script for anything other than testing your own sites! I’m sure we all have enough to deal with today :)

About five or so years ago, Debian decided to mess with their ZSH package. One of the things that changed is the cursor location when browsing your command history. On every other OS, when you browse your command history, whether in vi mode (viins or vicmd) or in emacs mode, the cursor always goes to the end of the command. Debian decided to change it so that it goes to the beginning of the line. WTF!

This isn’t a huge change, but when you are going back and forth between systems all day and the same action has different results, it can get pretty infuriating. There was really no need to change this whatsoever. Unfortunately, the stance they have is:

tag 383737 + wontfix
quit

Unfortunately this boils down to a binary division in a fundamental belief,
and either way people are unhappy.

Essentially, they decided to make the change to be non-compliant with the rest of the ZSH community, and then call it a binary division as an excuse for their change. Nice.

The issue manifested because they decided to put values in the global /etc/zsh/zshrc which causes this behavior. There’s horrible hacks you can do in your personal .zshrc to sort of fix the functionality, or if you have sudo on the box you can change the behavior for everyone by altering the global rc file. However, there is a much easier way shown to me by zi. I hadn’t seen this solution anywhere I searched so I thought I’d write it up for anyone who wants a clean fix:

% echo 'unsetopt global_rcs' >> ~/.zprofile

Fixed! You’ll have to invoke a new shell for this to take effect of course. What this does is it tells ZSH to ignore the global rc file before loading your own. The only drawback from this is that if there are any specific things from the global rc file that you wanted, you’ll have to copy them into your own. Is it a pain that we have to do this at all just because Debian wants to be different? Sure. But it’s a few extra seconds tacked onto copying your own .zshrc file to the machine that I am more than happy to take to get ZSH behaving like it should!