It was a pretty ordinary day. I think I was doing a review of our firewall ruleset–a decidedly monotonous but necessary task. Then in came an alert that McAfee had deleted a file on one of our workstations. That doesn’t happen often, but it’s also not out of the ordinary. A minute later there was another alert. Then another. By this time, my attention had peaked and my heart rate started to increase. Did we have an outbreak?
The CISO, who also gets these alerts, nearly crashed into me as we raced to each other’s office. Something was definitely up and we had to act fast. The pressure was on. An incident was happening and the first step was to contain the threat. We had two choices:
- It could be a bad DAT update. We could disable AV across the company until we had a chance to identify the problem, roll back the DAT and turn AV back on. If we didn’t disable AV, then we might have hundreds of blue-screening computers to deal with.
- It could be a worm spreading throughout the company. If we disabled AV, that could be catastrophic. It could allow the malware to take complete hold of the company.
So, what was the first thing I did? Nothing. That’s right, nothing. I took a few deep breaths. I centered myself. I prepared to act, but did not act.
When dealing with an incident, it’s imperative that you keep a cool head and keep your wits about you. It’s vital that you block out all distractions, including your superiors if they can’t help you in that moment, and focus on containing the incident. Minutes matter. Seconds matter. This is no time for disruptions.
Fortunately, the CISO and I work well together as a team, and he has the technical chops to be useful in situations like this. So we analyzed the symptoms:
- The files being deleted were not always the same. Many of them were trusted executables that, as far as we could tell, had not been modified.
- The files being deleted weren’t newly dropped onto the system
- The files being deleted weren’t always binaries.
- The detection component was always Artemis, which has a higher risk of false-positives.
- Web traffic did not correspond with the pattern we were seeing.
Considering that we blocked executables at the proxy, combined with the other symptoms, we made the call that it must be a problem with Artemis and disabled that component temporarily in order to contain the immediate threat. We then followed with the remaining incident response steps: eradication, recovery and lessons learned.
It turns out we were right and we made the right call. But we could have just as easily made the wrong call. In the heat of the moment, you sometimes have to make the best decision you can with the facts at hand and then deal with the consequences of that decision.
A mature incident response program will allow for bad calls as well as good ones. It will have checks and balances. It will stand behind the people making the call if they are otherwise good people, even if it turns out to be the wrong call. It will be team-focused and not individual-focused. It will learn and grow and it will evolve. The point is to have a program in place and do something. The rest will follow.